<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# NPA: Neural Snack Recommendation with Personalized Attention
We applied this NPA model for our snack recommnendation. The keypoint for our project is that although we recommend our snacks based on the property name ('Authentic', 'Japanese', 'beautifully', 'tasty', 'organic',
                   'gluten-free', 'GMO-free', 'all-natural', 'artificial-ingredient-free',
                   'classic', 'trendy', 'healthy'). We can actually still consider those properties as "the title" of the snacks. Thus we can still use the word-embeding we download from the CNN model.

## Global settings and imports

In [30]:
import sys
sys.path.append("../../")
from reco_utils.recommender.deeprec.deeprec_utils import download_deeprec_resources 
from reco_utils.recommender.newsrec.newsrec_utils import prepare_hparams
from reco_utils.recommender.newsrec.models.npa import NPAModel
from reco_utils.recommender.newsrec.io.news_iterator import NewsIterator
import papermill as pm
from tempfile import TemporaryDirectory
import tensorflow as tf
import pandas as pd
import os

print("System version: {}".format(sys.version))
print("Tensorflow version: {}".format(tf.__version__))

tmpdir = TemporaryDirectory()
localdir = '~/notebooks/A3_Data/'

System version: 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21) 
[GCC 7.3.0]
Tensorflow version: 1.15.2


## Download and load data

In [36]:
# yaml and wordEmbedding we use the same file as the news model,
# the training and valid dataset use fake data we generated
data_path = tmpdir.name
yaml_file = os.path.join(data_path, r'npa.yaml')
train_file = os.path.join(localdir, r'snack_npa_train.csv')
valid_file = os.path.join(localdir, r'snack_npa_valid.csv')
wordEmb_file = os.path.join(data_path, r'embedding.npy')
if not os.path.exists(yaml_file):
    download_deeprec_resources(r'https://recodatasets.blob.core.windows.net/newsrec/', data_path, 'npa.zip')

In [37]:
train_data = pd.read_csv(train_file)

In [38]:
train_data.head()

Unnamed: 0,id,ImpressionID,User,CandidateNews,ClickedNews
0,0,62,786,"[6, 3, 9]","[2, 5, 10]"
1,1,10,196,"[4, 1, 7]","[8, 5, 11]"
2,2,9,755,"[4, 8, 10]","[9, 1, 10]"
3,3,79,41,"[11, 10, 7]","[7, 11, 4]"
4,4,54,331,"[11, 1, 3]","[8, 6, 7]"


In [39]:
valid_data = pd.read_csv(valid_file)

In [40]:
valid_data.head()

Unnamed: 0,id,ImpressionID,User,CandidateNews,ClickedNews
0,0,24,190,"[7, 4, 8]","[11, 3, 2]"
1,1,17,564,"[5, 1, 2]","[8, 4, 8]"
2,2,77,73,"[2, 2, 10]","[7, 5, 10]"
3,3,27,349,"[4, 10, 8]","[6, 11, 4]"
4,4,91,383,"[4, 9, 4]","[2, 10, 5]"


## Create hyper-parameters

In [3]:
epochs=5
seed=42

In [4]:
hparams = prepare_hparams(yaml_file, wordEmb_file=wordEmb_file, epochs=epochs)
print(hparams)

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

data_format=news,iterator_type=None,wordEmb_file=/tmp/tmp55j7_pnx/embedding.npy,doc_size=10,title_size=None,body_size=None,word_emb_dim=100,word_size=28929,user_num=10338,vert_num=None,subvert_num=None,his_size=50,npratio=4,dropout=0.2,attention_hidden_dim=200,head_num=4,head_dim=100,cnn_activation=relu,dense_activation=None,filter_num=400,window_size=3,vert_emb_dim=100,subvert_emb_dim=100,gru_unit=400,type=ini,user_emb_dim=50,learning_rate=0.0001,loss=cross_entropy_loss,optimizer=adam,epochs=5,batch_size=64,show_step=100000,metrics=['group_auc', 'mean_mrr', 'ndcg@5;10']


In [5]:
iterator = NewsIterator

## Train the NPA model

In [6]:
model = NPAModel(hparams, iterator, seed=seed)


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
If using Keras pass *_constraint arguments to layers.



In [7]:
print(model.run_eval(valid_file))


{'group_auc': 0.5102, 'mean_mrr': 0.1631, 'ndcg@5': 0.1525, 'ndcg@10': 0.212}


In [8]:
model.fit(train_file, valid_file)

at epoch 1
train info: logloss loss:1.6397553074116609
eval info: group_auc:0.538, mean_mrr:0.1686, ndcg@10:0.2298, ndcg@5:0.1625
at epoch 1 , train time: 188.6 eval time: 47.3
at epoch 2
train info: logloss loss:1.590539109950163
eval info: group_auc:0.5392, mean_mrr:0.1737, ndcg@10:0.2335, ndcg@5:0.1673
at epoch 2 , train time: 195.4 eval time: 48.9
at epoch 3
train info: logloss loss:1.547255853730805
eval info: group_auc:0.5517, mean_mrr:0.1738, ndcg@10:0.2382, ndcg@5:0.1663
at epoch 3 , train time: 184.6 eval time: 44.6
at epoch 4
train info: logloss loss:1.4370535110940739
eval info: group_auc:0.5578, mean_mrr:0.1745, ndcg@10:0.2414, ndcg@5:0.1679
at epoch 4 , train time: 192.6 eval time: 46.7
at epoch 5
train info: logloss loss:1.3469961764861127
eval info: group_auc:0.5608, mean_mrr:0.1781, ndcg@10:0.2485, ndcg@5:0.1699
at epoch 5 , train time: 154.1 eval time: 37.5


<reco_utils.recommender.newsrec.models.npa.NPAModel at 0x7f217f98c0b8>

In [9]:
res_syn = model.run_eval(valid_file)
print(res_syn)
pm.record("res_syn", res_syn)

{'group_auc': 0.5608, 'mean_mrr': 0.1781, 'ndcg@5': 0.1699, 'ndcg@10': 0.2485}


  This is separate from the ipykernel package so we can avoid doing imports until


## Reference
\[1\] Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie: NPA: Neural News Recommendation with Personalized Attention, KDD 2019, ADS track.<br>