# Loading and Splitting

The loading mechanism gets the class of dataset that the user wants to use for
evaluation on his algorithmn.

The splitting will take in a split type and create the necessary split on the
dataset.

In [None]:
from streamsight.setting import SingleTimePointSetting
from streamsight.datasets import AmazonMusicDataset

dataset = AmazonMusicDataset()
# yelp or amazon as a base instead
# movielens timestamp cutting might be problematic
data = dataset.load()

item_user_based = "user"

# user creates his own custom dataset class if needed
setting = SingleTimePointSetting(
    background_t=1406851200,
    delta_after_t=1398556800,
    n_seq_data=1,
    item_user_based=item_user_based
)
# once a setting is defined, it can be used to split data
# the data will be stored in the attribute of the setting object
setting.split(data)

In [None]:
from streamsight.setting import SlidingWindowSetting

item_user_based = "user"

setting_window = SlidingWindowSetting(
    background_t=1406851200,
    window_size=60 * 60 * 24 * 600, # 600 days
    n_seq_data=1,
    item_user_based=item_user_based
)
setting_window.split(data)

# Training the Algorithm

Training the RecSys algorithm is as straight forward. The choice of the algorithm
is selected by instantiating the class of algorithm choice then training the
model with the dataset from the setting. The setting class provides multiple
public attribute calls that can be used by the programmer.

We will demo a simple example below.

In [None]:
############# Single global timeline split #############
from streamsight.algorithms import ItemKNN

algo = ItemKNN(K=10)

# Note that the data feed to the model must first be masked before
# it is fed to the model. The rational for this is to define the set
# of known user/item base knowledge such that the evaluation is
# well defined.
setting.background_data.mask_shape()
# each algorithm has a fit method that takes the training data and fits the model
algo.fit(setting.background_data)

setting.unlabeled_data.mask_shape(setting.background_data.shape)
X_pred = algo.predict(setting.unlabeled_data)

# Evaluation

In [None]:
from streamsight.metrics import PrecisionK

# Here we mask the ground truth data to match the shape of the prediction
# data. By dropping unknown users and items, we are only evaluating the
# users and items that are only known to the model.
setting.ground_truth_data.mask_shape(setting.background_data.shape,
                                     drop_unknown_user=True,
                                     drop_unknown_item=True)

metric = PrecisionK(10)
metric.calculate(setting.ground_truth_data.binary_values, X_pred)
metric.macro_result

# Evaluation for sliding window setting

The evaluation for the sliding window setting case is a lot more complex and
uses an array of `InterationMatrix` unlike the single time point setting.

# Pipeline to streamline API usage

The pipeline built uses an `Evaluator` class to help us run the entire process of
instantiating the model, data feed and metrics for us. To build the `Evaluator`
the builder class must first be created as shown below. Following the creation,
the programmer will define the algorithm/model of interest followed by the setting
that was created earlier along with the metric of interest.

Note that multiple algorithms and metrics can be added to the builder, allowing
for evaluation of multiple algorithms and metrics over a single run.

In [None]:
from streamsight.evaluator.evaluator_builder import EvaluatorBuilder


b = EvaluatorBuilder(item_user_based)
b.add_algorithm("ItemKNNIncremental", {"K": 1})
b.add_setting(setting_window)
b.add_metric("PrecisionK")
evaluator = b.build()

evaluator.run()