# Comparison of Running Efficiency between STAN and Fast STAN
 
Fast STAN accelerates the training and inference process by:
1. Multi-processing inference. According to my experimental analysis, the inference process in STAN takes much more time than the fitting. It is necessary to parallelize the inference process when conducting experiments on a large dataset. Therefore, I forgo some not process-safe designs (e.g., the incremental test data cache in inference) to enable inference of the next item in a multi-process manner.
2. Accelerate data operation. Making inferences by pandas' methods is significantly faster than the built-in methods. According to experimental results, it is about **5x faster** (within a single process) than the original STAN.
3. Support out-of-order input. The previous implementation required the input data to be ordered by time stamp. Some data will miss if the input data is not ordered by session id. I dropped this requirement by sacrificing some fitting efficiency (i.e., it takes **3x longer** for fitting). But it is acceptable when compared with the cost savings of inference, let alone the flexibility and conciseness.
 
Please note that multi-process inference can only be performed in an offline manner. It sacrifices some online natures to achieve faster inference. If you want to perform online inference, please refer to the original STAN [code](https://github.com/rn5l/session-rec/blob/master/algorithms/knn/stan.py).

In [1]:
import glob

import numpy as np
import pandas as pd

from stan import STAN, FastSTAN

## Loading Training & Test Data
Here we perform our comparison experiment on a large scale recommendation dataset proposed by otto [(source)](https://www.kaggle.com/competitions/otto-recommender-system/data). It is an extremely large dataset that contains 12899779 session with 216716096 events in its training set, and 1671803 sessions with about 6928123 events in its test set.

In [2]:
def load_session(path):
    file = glob.glob(path)
    data = [pd.read_parquet(f, engine='fastparquet') for f in file]
    data = pd.concat(data, axis=0)
    data = data.sort_values(by=['session', 'ts'], ascending=[True, True]).reindex()
    data['ts'] = data['ts'] // 1000
    return data
    
train_data = load_session('./data/otto/train_*_20.parquet')
test_data  = load_session('./data/otto/test_?_5.parquet')
train_data

Unnamed: 0,session,aid,ts,type
0,0,1517085,1659304800,0
1,0,1563459,1659304904,0
2,0,1309446,1659367439,0
3,0,16246,1659367719,0
4,0,1781822,1659367871,0
...,...,...,...,...
5261220,12899776,1737908,1661723987,0
5261221,12899777,384045,1661723976,0
5261222,12899777,384045,1661723986,0
5261223,12899778,561560,1661723983,0


Setting skip mask for original `STAN` so as to only make inference on the last event for each session.

In [3]:
skip_mask    = np.ones(test_data.shape[0]).astype(bool)
not_skip_idx = test_data.groupby('session').size().values.cumsum() - 1
skip_mask[not_skip_idx] = False
test_data['skip'] = skip_mask
test_data.head(10)

Unnamed: 0,session,aid,ts,type,skip
0,12899779,59625,1661724000,0,False
1,12899780,1142000,1661724000,0,True
2,12899780,582732,1661724058,0,True
3,12899780,973453,1661724109,0,True
4,12899780,736515,1661724136,0,True
5,12899780,1142000,1661724155,0,False
6,12899781,141736,1661724000,0,True
7,12899781,199008,1661724022,0,True
8,12899781,57315,1661724170,0,True
9,12899781,194067,1661724246,0,True


## Comparing Running Time
I compare the running time of model fitting and inference. The results show the original STAN is 3x faster than FastSTAN in model fitting, but the FastSTAN is 5x faster than the original STAN in next item inference. When encounter a large test set, the FastSTAN can significantly save the computing cost.

In [4]:
stan = STAN(k=1500, sample_size=2500, lambda_spw=0.905, 
            lambda_snh=100, lambda_inh=0.4525, extend=True,
            session_key='session', item_key='aid', time_key='ts')
stan.fit(train_data)

In [5]:
%%timeit
for i in test_data.loc[test_data.session == 12899780].itertuples(index=False):
    ret = stan.predict_next(i.session, i.aid, train_data.aid.unique(), timestamp=i.ts, skip=i.skip)

10.6 s ± 384 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [6]:
for i in test_data.loc[test_data.session == 12899780].itertuples(index=False):
    ret = stan.predict_next(i.session, i.aid, train_data.aid.unique(), timestamp=i.ts, skip=i.skip)
ret.sort_values(ascending=False)

1142000    244.889338
582732      14.541701
736515      11.778748
973453       6.289222
1502122      1.526870
              ...    
1610597      0.000000
1501996      0.000000
1460857      0.000000
1135857      0.000000
853626       0.000000
Length: 1855603, dtype: float64

In [7]:
fast_stan = FastSTAN(k=1500, sample_size=2500, sampling='recent', remind=True, 
                     lambda_spw=0.905, lambda_snh=100, lambda_inh=0.4525,
                     session_key='session', item_key='aid', time_key='ts')
fast_stan.fit(train_data)

In [8]:
%%timeit
ret = fast_stan.predict_next(test_data.loc[test_data.session == 12899780], reference=train_data.aid.unique())

2.33 s ± 98.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [9]:
ret = fast_stan.predict_next(test_data.loc[test_data.session == 12899780], reference=train_data.aid.unique())
ret.sort_values(ascending=False)

1142000    244.889338
582732      14.541701
736515      11.778748
973453       6.289222
1502122      1.526870
              ...    
1610597      0.000000
1501996      0.000000
1460857      0.000000
1135857      0.000000
853626       0.000000
Length: 1855603, dtype: float64