# Demo Pipeline

In this workbook we will be demonstrating how to use streamsight to create
a pipeline to evaluate RecSys algorithms. Please refer to [demo.ipynb](demo.ipynb)
if you have not as it outlines how the pipeline works under the hood.

We will use Amazon movie data to show case the pipeline and some of the common
methods that you can call to evaluate your RecSys algorithms.

To start off, we will set the k+100 value to be 10 first. This will mean that
for any of the top K metric or algorithm, we will only consider the top 10
recommendations.

In [1]:
k = 10

## Load

We will load the dataset of choice and instantiate the setting that we want
to evaluate the algorithms on. We will use the sliding window setting for this
demo to show case the results of the evaluation.

Similarly to the demo, we can specify a range of parametres to create different
window sizes.

In [2]:
from streamsight.datasets import AmazonMovieDataset
from streamsight.settings import SlidingWindowSetting

dataset = AmazonMovieDataset(use_default_filters=False)
data = dataset.load()
setting_sliding = SlidingWindowSetting(
    background_t=1530000000,
    window_size=60 * 60 * 24 * 30, # day times N
    n_seq_data=1,
    top_K=k
)
setting_sliding.split(data)


[32mINFO    [0m - streamsight.datasets.base - [34mAmazonMovieDataset is loading dataset...[0m
[32mINFO    [0m - streamsight.datasets.base - [34mAmazonMovieDataset dataset loaded - Took 12.9s[0m


4it [00:01,  3.96it/s]                       

[32mINFO    [0m - streamsight.settings.sliding_window_setting - [34mFinished split with window size 2592000 seconds. Number of splits: 4 in total.[0m





## Evaluate

The evaluation of the algorithm will be abstracted by the pipeline. To create
the pipeline a builder class is used. This is the recommended way to create
pipelines as it allows for easy modification of the pipeline and easy
reproduction of the pipeline.

Adding of algorithm and metric can be done as shown below. Once the builder is
set up, the pipeline can will be returned by calling the `build` method.

Running the pipeline can be done via the `run` method. To run the pipeline in
step, the `run_step` method can be used. This will run the pipeline step by
step.

In [3]:
from streamsight.evaluators import EvaluatorPipelineBuilder

builder = EvaluatorPipelineBuilder(ignore_unknown_item=True,
                           ignore_unknown_user=True)
builder.add_setting(setting_sliding)
builder.set_metric_K(k)
builder.add_algorithm("ItemKNNStatic", {"K": k})
builder.add_algorithm("ItemKNNRolling", {"K": k})
builder.add_algorithm("ItemKNNIncremental", {"K": k})
builder.add_algorithm("Popularity", {"K": k})

builder.add_metric("PrecisionK")
builder.add_metric("RecallK")
evaluator = builder.build()

evaluator.run()

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 1: Preparing the evaluator...[0m


  0%|          | 0/4 [00:00<?, ?it/s]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


 25%|██▌       | 1/4 [00:47<02:23, 47.99s/it]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


 50%|█████     | 2/4 [01:21<01:19, 39.52s/it]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


 75%|███████▌  | 3/4 [01:42<00:30, 30.91s/it]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


100%|██████████| 4/4 [01:56<00:00, 29.09s/it]


## Metric Results
The following metrics are calculated for each algorithm in the various splits.

We define the micro metric as the computation of all users and items in that
particular window.

We define the macro metric as the computation of all users and items across all
windows where the confusion matrix is first summed across all windows before
computing the metric.

In [4]:
evaluator.metric_results(level="macro")

[0m
[0m


Unnamed: 0_level_0,Unnamed: 1_level_0,macro_score,num_window
Algorithm,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1
ItemKNNIncremental(K=10),PrecisionK_10,0.002122,4
ItemKNNIncremental(K=10),RecallK_10,0.014975,4
ItemKNNRolling(K=10),PrecisionK_10,0.000961,4
ItemKNNRolling(K=10),RecallK_10,0.007483,4
ItemKNNStatic(K=10),PrecisionK_10,0.000961,4
ItemKNNStatic(K=10),RecallK_10,0.007483,4
Popularity(K=10),PrecisionK_10,0.007126,4
Popularity(K=10),RecallK_10,0.063441,4


In [5]:
evaluator.metric_results(level="micro")

  return umr_sum(a, axis, dtype, out, keepdims, initial, where)
[0m


Unnamed: 0_level_0,Unnamed: 1_level_0,micro_score,num_user
Algorithm,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1
ItemKNNIncremental(K=10),PrecisionK_10,0.002233,7749
ItemKNNIncremental(K=10),RecallK_10,0.016921,7749
ItemKNNRolling(K=10),PrecisionK_10,0.001613,7749
ItemKNNRolling(K=10),RecallK_10,0.01186,7749
ItemKNNStatic(K=10),PrecisionK_10,0.001613,7749
ItemKNNStatic(K=10),RecallK_10,0.01186,7749
Popularity(K=10),PrecisionK_10,0.005291,7749
Popularity(K=10),RecallK_10,0.043027,7749


In [6]:
evaluator.metric_results(level="window")

[0m
[0m


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,window_score,num_user
Algorithm,Timestamp,Metric,Unnamed: 3_level_1,Unnamed: 4_level_1
ItemKNNStatic(K=10),t=1530000000,PrecisionK_10,0.002425,4289
ItemKNNStatic(K=10),t=1530000000,RecallK_10,0.017662,4289
ItemKNNStatic(K=10),t=1532592000,PrecisionK_10,0.000513,2532
ItemKNNStatic(K=10),t=1532592000,RecallK_10,0.00322,2532
ItemKNNStatic(K=10),t=1535184000,PrecisionK_10,0.000905,884
ItemKNNStatic(K=10),t=1535184000,RecallK_10,0.00905,884
ItemKNNStatic(K=10),t=1537776000,PrecisionK_10,0.0,44
ItemKNNStatic(K=10),t=1537776000,RecallK_10,0.0,44
ItemKNNRolling(K=10),t=1530000000,PrecisionK_10,0.002425,4289
ItemKNNRolling(K=10),t=1530000000,RecallK_10,0.017662,4289


To run the pipeline without ignoring unknown items and users, set ignore_unknown_item
and ignore_unknown_user to False. The default value for these parameters is True.

Below we can see the results of the evaluation for the different algorithms. The evaluation
results are shown for the micro and macro levels. Note that the number of users being
evaluated on would be now more than the case where the unknown items and users are ignored.

Note that there will be a need to load the dataset again as there would be modification
of the data in the setting object when the evaluator is executed.

In [7]:
from streamsight.datasets import AmazonMovieDataset
from streamsight.settings import SlidingWindowSetting

dataset = AmazonMovieDataset(use_default_filters=False)
data = dataset.load()
setting_sliding = SlidingWindowSetting(
    background_t=1530000000,
    window_size=60 * 60 * 24 * 30, # day times N
    n_seq_data=1,
    top_K=k
)
setting_sliding.split(data)


from streamsight.evaluators import EvaluatorPipelineBuilder

builder = EvaluatorPipelineBuilder(ignore_unknown_user=False,
                           ignore_unknown_item=False)
builder.add_setting(setting_sliding)
builder.set_metric_K(k)
builder.add_algorithm("ItemKNNIncremental", {"K": k})
builder.add_algorithm("Popularity", {"K": k})

builder.add_metric("PrecisionK")
builder.add_metric("RecallK")
evaluator = builder.build()

evaluator.run()

[32mINFO    [0m - streamsight.datasets.base - [34mAmazonMovieDataset is loading dataset...[0m
[32mINFO    [0m - streamsight.datasets.base - [34mAmazonMovieDataset dataset loaded - Took 12.5s[0m


4it [00:01,  3.80it/s]                       

[32mINFO    [0m - streamsight.settings.sliding_window_setting - [34mFinished split with window size 2592000 seconds. Number of splits: 4 in total.[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 1: Preparing the evaluator...[0m



  0%|          | 0/4 [00:00<?, ?it/s]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


 25%|██▌       | 1/4 [00:48<02:24, 48.31s/it]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


 50%|█████     | 2/4 [01:22<01:20, 40.08s/it]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


 75%|███████▌  | 3/4 [01:40<00:29, 29.90s/it]

[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 2: Evaluating the algorithms...[0m
[32mINFO    [0m - streamsight.evaluators.evaluator_pipeline - [34mPhase 3: Releasing the data...[0m


100%|██████████| 4/4 [01:52<00:00, 28.25s/it]


In [10]:
evaluator.metric_results(level="micro")

  return umr_sum(a, axis, dtype, out, keepdims, initial, where)
[0m


Unnamed: 0_level_0,Unnamed: 1_level_0,micro_score,num_user
Algorithm,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1
ItemKNNIncremental(K=10),PrecisionK_10,0.00103,16794
ItemKNNIncremental(K=10),RecallK_10,0.007778,16794
Popularity(K=10),PrecisionK_10,0.006389,16794
Popularity(K=10),RecallK_10,0.054041,16794


In [9]:
evaluator.metric_results(level="macro")

Unnamed: 0_level_0,Unnamed: 1_level_0,macro_score,num_window
Algorithm,Metric,Unnamed: 2_level_1,Unnamed: 3_level_1
ItemKNNIncremental(K=10),PrecisionK_10,0.00093,4
ItemKNNIncremental(K=10),RecallK_10,0.006662,4
Popularity(K=10),PrecisionK_10,0.00844,4
Popularity(K=10),RecallK_10,0.076742,4
