#### Notebook configs

In [8]:
import warnings
warnings.filterwarnings('ignore')
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

# Overview 

Many recommendation system methods, like matrix factorisation, are developed with the assumption that it is possible to build and use long-term user profiles to produce recommendations. 

This is not true for several reasons:
1) Identifying and tracking users by unique ids may not be an option for small to medium-sized companies
2) Browser fingerprints and cookies may be unstable in different environments 
3) A user may visit the site with different intentions with each visit. *E.g. a user may visit a seller one day for travelling goods and the next for stationary*


### Session-based recommendation system 
With session-based recommendation systems, the recommendation is based solely on the user's interaction with the environment in **contextual session**; ignoring the user's long-term profile. 

The challenge is therefore:<br>
**how can one infer a user's implied interest from te relatively short but complex interaction pattern in a given session?**


### Approach
This notebook demonstrates a *graph-based neural network* approach to the task, implemented via pytorch's graph-based machine learning framework [pytorch geometric](https://pytorch-geometric.readthedocs.io/en/latest/)

#### Background on approach
Before diving into the modelling, lets review some of the previous approaches to the task of session-based recommendation systems.

<br>**ML-based**
1) Matrix factorisations <br>
👍
<br>
👎
2) item-based neighborhood methods <br>
👍
<br>
👎
3) Markov chain methods <br>
👍
<br>
👎

**DL-based**
1) Recurrent neural networks; [GRU3Rec](https://github.com/hidasib/GRU4Rec#:~:text=Notifications-,GRU4Rec%20is%20the%20original%20Theano%20implementation%20of%20the%20algorithm%20in,for%20execution%20on%20the%20GPU.) <br>
👍 
<br>
👎  
2) Attentive recurrent neural network; [NARM](https://github.com/Wang-Shuo/Neural-Attentive-Session-Based-Recommendation-PyTorch) <br>
👍
<br>
👎
### Dataset
The dataset used in this notebook can be found [here](https://www.kaggle.com/datasets/retailrocket/ecommerce-dataset). It is composed of ecommerce data: web events, item properties (with texts) and category tree information. 

In [20]:
import pandas as pd 
from pathlib import Path 
data_path = Path().cwd() / "data" 
pd.read_csv(data_path / "events.csv")
df_cats = pd.read_csv(data_path / "category_tree.csv")
df_cats
df_p = pd.read_csv(data_path / "item_properties_part1.csv")

Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,
...,...,...,...,...,...
2756096,1438398785939,591435,view,261427,
2756097,1438399813142,762376,view,115946,
2756098,1438397820527,1251746,view,78144,
2756099,1438398530703,1184451,view,283392,


Unnamed: 0,categoryid,parentid
0,1016,213.0
1,809,169.0
2,570,9.0
3,1691,885.0
4,536,1691.0
...,...,...
1664,49,1125.0
1665,1112,630.0
1666,1336,745.0
1667,689,207.0


Unnamed: 0,timestamp,itemid,property,value
0,1435460400000,460429,categoryid,1338
1,1441508400000,206783,888,1116713 960601 n277.200
2,1439089200000,395014,400,n552.000 639502 n720.000 424566
3,1431226800000,59481,790,n15360.000
4,1431831600000,156781,917,828513
...,...,...,...,...
10999994,1439694000000,86599,categoryid,618
10999995,1435460400000,153032,1066,n1020.000 424566
10999996,1440298800000,421788,888,35975 856003 37346
10999997,1437879600000,159792,400,n552.000 639502 n720.000 424566


In [27]:
x = df.groupby(by="parentid")['categoryid'].apply(list)
x

parentid
8.0                              [397, 1230, 681, 1225, 70]
9.0                        [570, 1295, 142, 625, 916, 1189]
14.0                [789, 1573, 1117, 1184, 892, 165, 1415]
19.0                                                 [1297]
20.0                              [35, 295, 973, 1153, 928]
                                ...                        
1687.0    [108, 1512, 32, 294, 68, 443, 906, 922, 79, 10...
1691.0                                          [536, 1152]
1692.0                                    [1198, 785, 1661]
1696.0                                    [1059, 773, 1148]
1698.0             [1160, 110, 1678, 1034, 1582, 1502, 760]
Name: categoryid, Length: 362, dtype: object