# DKN : Deep Knowledge-Aware Network for News Recommendation

DKN [1] is a deep learning model which incorporates information from knowledge graph for better news recommendation. Specifically, DKN uses TransX [2] method for knowledge graph representation learning, then applies a CNN framework, named KCNN, to combine entity embedding with word embedding and generate a final embedding vector for a news article. CTR prediction is made via an attention-based neural scorer.

## Properties of DKN:

  - DKN is a content-based deep model for CTR prediction rather than traditional ID-based collaborative filtering.
  - It makes use of knowledge entities and common sense in news content via joint learning from semantic-level and knowledge-level representations of news articles.
  - DKN uses an attention module to dynamically calculate a user's aggregated historical representaition.


In [1]:
import sys
import os
from tempfile import TemporaryDirectory
#import scrapbook as sb
import tensorflow as tf
tf.get_logger().setLevel('ERROR') # only show error messages

from recommenders.models.deeprec.deeprec_utils import download_deeprec_resources, prepare_hparams
from recommenders.models.deeprec.models.dkn import DKN
from recommenders.models.deeprec.io.dkn_iterator import DKNTextIterator

print(f"System version: {sys.version}")
print(f"Tensorflow version: {tf.__version__}")

System version: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) 
[GCC 9.3.0]
Tensorflow version: 2.9.1


### Download data

In [2]:
cwd = os.getcwd()
data_path = os.path.join(cwd, "mind-demo-dkn")

yaml_file = os.path.join(data_path, r'dkn.yaml')
train_file = os.path.join(data_path, r'train_mind_demo.txt')
valid_file = os.path.join(data_path, r'valid_mind_demo.txt')
test_file = os.path.join(data_path, r'test_mind_demo.txt')

news_feature_file = os.path.join(data_path, r'doc_feature.txt')
user_history_file = os.path.join(data_path, r'user_history.txt')

wordEmb_file = os.path.join(data_path, r'word_embeddings_100.npy')
entityEmb_file = os.path.join(data_path, r'TransE_entity2vec_100.npy')
contextEmb_file = os.path.join(data_path, r'TransE_context2vec_100.npy')

if not os.path.exists(yaml_file):
    download_deeprec_resources(r'https://recodatasets.z20.web.core.windows.net/deeprec/', cwd, 'mind-demo-dkn.zip')
    

100%|██████████| 11.3k/11.3k [00:00<00:00, 39.7kKB/s]


### Create hyper-parameters

In [3]:
epochs = 10
history_size = 50
batch_size = 100

In [5]:
hparams = prepare_hparams(yaml_file,
                          news_feature_file = news_feature_file,
                          user_history_file = user_history_file,
                          wordEmb_file=wordEmb_file,
                          entityEmb_file=entityEmb_file,
                          contextEmb_file=contextEmb_file,
                          epochs=epochs,
                          history_size=history_size,
                          batch_size=batch_size)
print(hparams)

HParams object with values {'use_entity': True, 'use_context': True, 'cross_activation': 'identity', 'user_dropout': False, 'dropout': [0.0], 'attention_dropout': 0.0, 'load_saved_model': False, 'fast_CIN_d': 0, 'use_Linear_part': False, 'use_FM_part': False, 'use_CIN_part': False, 'use_DNN_part': False, 'init_method': 'uniform', 'init_value': 0.1, 'embed_l2': 1e-06, 'embed_l1': 0.0, 'layer_l2': 1e-06, 'layer_l1': 0.0, 'cross_l2': 0.0, 'cross_l1': 0.0, 'reg_kg': 0.0, 'learning_rate': 0.0005, 'lr_rs': 1, 'lr_kg': 0.5, 'kg_training_interval': 5, 'max_grad_norm': 2, 'is_clip_norm': 0, 'dtype': 32, 'optimizer': 'adam', 'epochs': 10, 'batch_size': 100, 'enable_BN': True, 'show_step': 10000, 'save_model': False, 'save_epoch': 2, 'write_tfevents': False, 'train_num_ngs': 4, 'need_sample': True, 'embedding_dropout': 0.0, 'EARLY_STOP': 100, 'min_seq_length': 1, 'slots': 5, 'cell': 'SUM', 'doc_size': 10, 'history_size': 50, 'word_size': 12600, 'entity_size': 3987, 'data_format': 'dkn', 'metrics'

In [6]:
model = DKN(hparams, DKNTextIterator)

  training=self.is_train_stage,
  training=self.is_train_stage,


In [7]:
print(model.run_eval(valid_file))

2022-12-28 11:29:14.920086: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-12-28 11:29:14.921026: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.


{'auc': 0.4978, 'group_auc': 0.5048, 'mean_mrr': 0.1613, 'ndcg@5': 0.148, 'ndcg@10': 0.2138}


In [8]:
model.fit(train_file, valid_file)

at epoch 1
train info: logloss loss:0.688963481682842
eval info: auc:0.5191, group_auc:0.5176, mean_mrr:0.1853, ndcg@10:0.2436, ndcg@5:0.1799
at epoch 1 , train time: 120.9 eval time: 4.5
at epoch 2
train info: logloss loss:0.6209856089899095
eval info: auc:0.5331, group_auc:0.5152, mean_mrr:0.1766, ndcg@10:0.2337, ndcg@5:0.1743
at epoch 2 , train time: 31.2 eval time: 4.5
at epoch 3
train info: logloss loss:0.5871429717389204
eval info: auc:0.5535, group_auc:0.5324, mean_mrr:0.186, ndcg@10:0.2521, ndcg@5:0.1888
at epoch 3 , train time: 31.5 eval time: 4.5
at epoch 4
train info: logloss loss:0.5635184015510446
eval info: auc:0.5834, group_auc:0.548, mean_mrr:0.1848, ndcg@10:0.2554, ndcg@5:0.1836
at epoch 4 , train time: 31.7 eval time: 4.5
at epoch 5
train info: logloss loss:0.545540526761847
eval info: auc:0.5904, group_auc:0.5398, mean_mrr:0.1774, ndcg@10:0.2423, ndcg@5:0.1836
at epoch 5 , train time: 32.0 eval time: 4.6
at epoch 6
train info: logloss loss:0.5273528638280044
eval inf

<recommenders.models.deeprec.models.dkn.DKN at 0x7fa12ad57f10>

## Evaluate the DKN model

Now we can check the performance on the test set:

In [9]:
res = model.run_eval(test_file)
print(res)

{'auc': 0.583, 'group_auc': 0.5641, 'mean_mrr': 0.1926, 'ndcg@5': 0.1893, 'ndcg@10': 0.2547}
