# Introduction

This repository contains the source code of the paper: Reinforced Anchor Knowledge Graph Generation for News Recommendation Reasoning

![framework](./framework.png)


![](./framework.PNG)

## Dataset description and download

MIND dataset [2] is a large-scale English news dataset. It was collected from anonymized behavior logs of Microsoft News website. MIND contains 1,000,000 users, 161,013 news articles and 15,777,377 impression logs. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression.

For quicker training and evaluaiton, we sample MINDdemo dataset of 5k users from MIND small dataset. The MINDdemo dataset has the same file format as MINDsmall and MINDlarge. If you want to try experiments on MINDsmall and MINDlarge, please change the dowload source. Select the MIND_type parameter from ['large', 'small', 'demo'] to choose dataset.

MINDdemo_train is used for training, and MINDdemo_dev is used for evaluation. Training data and evaluation data are composed of a news file and a behaviors file. You can find more detailed data description in [MIND repo](https://github.com/msnews/msnews.github.io/blob/master/assets/doc/introduction.md)

In [None]:
import os
from utils.util import *

# Options: demo, small, large
MIND_type = 'demo'
data_path = "./data/"

train_news_file = os.path.join(data_path, 'train', r'news.tsv')
train_behaviors_file = os.path.join(data_path, 'train', r'behaviors.tsv')
valid_news_file = os.path.join(data_path, 'valid', r'news.tsv')
valid_behaviors_file = os.path.join(data_path, 'valid', r'behaviors.tsv')
knowledge_graph_file = os.path.join(data_path, 'kg/wikidata-graph', r'wikidata-graph.tsv')
entity_embedding_file = os.path.join(data_path, 'kg/wikidata-graph', r'entity2vaecd100.vec')
relation_embedding_file = os.path.join(data_path, 'kg/wikidata-graph', r'relation2vecd100.vec')

mind_url, mind_train_dataset, mind_dev_dataset, _ = get_mind_data_set(MIND_type)

kg_url = "https://kredkg.blob.core.windows.net/wikidatakg/"

if not os.path.exists(train_news_file):
    download_deeprec_resources(mind_url, os.path.join(data_path, 'train'), mind_train_dataset)
    
if not os.path.exists(valid_news_file):
    download_deeprec_resources(mind_url, \
                               os.path.join(data_path, 'valid'), mind_dev_dataset)

if not os.path.exists(knowledge_graph_file):
    download_deeprec_resources(kg_url, \
                               os.path.join(data_path, 'kg'), "kg.zip")a

## loading config

In [2]:
import sys
import os
sys.path.append('')

import argparse
from parse_config import ConfigParser

parser = argparse.ArgumentParser(description='AnchorKG')


parser.add_argument('-c', '--config', default="./config.json", type=str,
                    help='config file path (default: None)')
parser.add_argument('-r', '--resume', default=None, type=str,
                    help='path to latest checkpoint (default: None)')
parser.add_argument('-d', '--device', default=None, type=str,
                    help='indices of GPUs to enable (default: all)')

config = ConfigParser.from_args(parser)




SystemExit: 2



## Create hyper-parameters

In [3]:
epochs = 5
batch_size = 64

config['trainer']['epochs'] = epochs
config['data_loader']['batch_size'] = batch_size



## Process MIND dataset

In [4]:
process_mind_data(config)

constructing item2item dataset ... 
constructing news features ... 


## Process dataset


In [5]:
data = load_data(config)

constructing train ...
constructing val ...
constructing test ...
constructing doc feature embedding ...
constructing adjacency matrix ...
constructing kg env ...
build neiborhood embedding ...
constructing embedding ...
constructing hit dict ...
fininsh loading data!


## Train the model

In [6]:
train(data, config)

anchor graph training
at epoch 1
anchor all loss: 258.7453 
embedding all loss: 1976.6974 
reasoning all loss: 2116.8164 
eval info: auc:0.6390
at epoch 2
anchor all loss: 183.7712 
embedding all loss: 1771.4130 
reasoning all loss: 2027.1009 
eval info: auc:0.6706
at epoch 3
anchor all loss: 161.0648 
embedding all loss: 1687.2168 
reasoning all loss: 1977.3318 
eval info: auc:0.6855
at epoch 4
anchor all loss: 155.2929 
embedding all loss: 1636.4263 
reasoning all loss: 1939.5623 
eval info: auc:0.6903
at epoch 5
anchor all loss: 149.8370 
embedding all loss: 1602.0648 
reasoning all loss: 1905.3371 
eval info: auc:0.6930
NDCG=5.706 |  Recall=10.23 | HR=13.41 | Precision=1.596


## Evaluate the model

In [7]:
test_data = data[4]
testing(test_data, config)

NDCG=5.706 |  Recall=10.23 | HR=13.41 | Precision=1.596


## Performance on MINDlarge

We test the performance on MINDlarge dataset for your reference, note that the publish dataset doesn't contain the testdata and only have title and abstract, in the papaer, we used all the data and used title, abstract and body.

...



## Reference

[1] Wu, Fangzhao, et al. "MIND: A Large-scale Dataset for News Recommendation" Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.