#### Notebook configs

In [8]:
import warnings
warnings.filterwarnings('ignore')
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

# Overview 

Many recommendation system methods, like matrix factorisation, are developed with the assumption that it is possible to build and use long-term user profiles to produce recommendations. 

This is not true for several reasons:
1) Identifying and tracking users by unique ids may not be an option for small to medium-sized companies
2) Browser fingerprints and cookies may be unstable in different environments 
3) A user may visit the site with different intentions with each visit. *E.g. a user may visit a seller one day for travelling goods and the next for stationary*


### Session-based recommendation system 
With session-based recommendation systems, the recommendation is based solely on the user's interaction with the environment in **contextual session**; ignoring the user's long-term profile. 

The challenge is therefore:<br>
**how can one infer a user's implied interest from te relatively short but complex interaction pattern in a given session?**


### Approach
This notebook demonstrates a *graph-based neural network* approach to the task, implemented via pytorch's graph-based machine learning framework [pytorch geometric](https://pytorch-geometric.readthedocs.io/en/latest/)

#### Background on approach
Before diving into the modelling, lets review some of the previous approaches to the task of session-based recommendation systems.

<br>**ML-based**
1) Matrix factorisations <br>
üëç
<br>
üëé
2) item-based neighborhood methods <br>
üëç
<br>
üëé
3) Markov chain methods <br>
üëç
<br>
üëé

**DL-based**
1) Recurrent neural networks; [GRU3Rec](https://github.com/hidasib/GRU4Rec#:~:text=Notifications-,GRU4Rec%20is%20the%20original%20Theano%20implementation%20of%20the%20algorithm%20in,for%20execution%20on%20the%20GPU.) <br>
üëç 
<br>
üëé  
2) Attentive recurrent neural network; [NARM](https://github.com/Wang-Shuo/Neural-Attentive-Session-Based-Recommendation-PyTorch) <br>
üëç
<br>
üëé
### Dataset
The dataset used in this notebook can be found [here](https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset). It is composed of ecommerce data: web events, item properties (with texts) and category tree information. 

In [29]:
import pandas as pd 
from pathlib import Path 
data_path = Path().cwd() / "data" 
df_e = pd.read_csv(data_path / "events.csv")
df_cats = pd.read_csv(data_path / "category_tree.csv")
df_p = pd.read_csv(data_path / "item_properties_part1.csv")

In [27]:
x = df.groupby(by="parentid")['categoryid'].apply(list)
x

parentid
8.0                              [397, 1230, 681, 1225, 70]
9.0                        [570, 1295, 142, 625, 916, 1189]
14.0                [789, 1573, 1117, 1184, 892, 165, 1415]
19.0                                                 [1297]
20.0                              [35, 295, 973, 1153, 928]
                                ...                        
1687.0    [108, 1512, 32, 294, 68, 443, 906, 922, 79, 10...
1691.0                                          [536, 1152]
1692.0                                    [1198, 785, 1661]
1696.0                                    [1059, 773, 1148]
1698.0             [1160, 110, 1678, 1034, 1582, 1502, 760]
Name: categoryid, Length: 362, dtype: object

In [36]:
df_p[df_p.property == "categoryid"]

Unnamed: 0,timestamp,itemid,property,value
0,1435460400000,460429,categoryid,1338
140,1432436400000,281245,categoryid,1277
151,1435460400000,35575,categoryid,1059
189,1437274800000,8313,categoryid,1147
197,1437879600000,55102,categoryid,47
...,...,...,...,...
10999880,1432436400000,441523,categoryid,1167
10999917,1433041200000,250848,categoryid,769
10999932,1438484400000,116380,categoryid,1509
10999960,1431226800000,84186,categoryid,209


In [59]:
df_e.groupby(by=["visitorid",'itemid']).aggregate(list)

Unnamed: 0_level_0,Unnamed: 1_level_0,timestamp,event,transactionid
visitorid,itemid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,67045,[1442004917175],[view],[nan]
0,285930,[1442004589439],[view],[nan]
0,357564,[1442004759591],[view],[nan]
1,72028,[1439487966444],[view],[nan]
2,216305,"[1438970468920, 1438971463170]","[view, view]","[nan, nan]"
...,...,...,...,...
1407575,121220,[1433972768922],[view],[nan]
1407576,356208,[1433343689991],[view],[nan]
1407577,427784,[1431899284867],[view],[nan]
1407578,188736,[1431825683288],[view],[nan]


In [43]:
df_e

Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,
...,...,...,...,...,...
2756096,1438398785939,591435,view,261427,
2756097,1438399813142,762376,view,115946,
2756098,1438397820527,1251746,view,78144,
2756099,1438398530703,1184451,view,283392,
