# Using Polara for custom evaluation scenarios

Polara is designed to automate the process of model prototyping and evaluation as much as possible. As a part of it,
<div class="alert alert-block alert-info">Polara follows a certain data management workflow, aimed at maintaining a consistent and predictable internal state.</div> 

By default, it implements several conventional evaluation scenarios fully controlled by a set of configurational parameters. A user does not have to worry about anything beyond just setting the appropriate values of these parameters (a complete list of them can be obtained by calling the `get_configuration` method of a `RecommenderData` instance). As the result an input preferences data will be automatically pre-processed and converted into a convenient representation with an independent access to the training and evaluation parts. 

This default behaviour, however, can be flexibly manipulated to run custom scenarios with externally provided evaluation data. This flexibility is achieved with the help of the special `set_test_data` method implemented in the `RecommenderData` class. This guide demonstrates how to use the configuration parameters in conjunction with this method to cover various customizations.

## Prepare data

We will use Movielens-1M data for experimentation. The data will be divided into several parts:
1. *observations*, used for training, 
2. *holdout*, used for evaluating recommendations against the true preferences,
3. *unseen data*, used for warm-start scenarios, where test users with their preferences are not a part of training.

The last two datasets serve as an imitation of external data sources, which are not a part of initial data model.  
Also note, that *holdout* dataset contains items of both known and unseen (warm-start) users.

In [None]:
from __future__ import print_function
import numpy as np
from polara.datasets.movielens import get_movielens_data

In [None]:
seed = 0
def random_state(seed=seed): # to fix random state in experiments
    return np.random.RandomState(seed=seed)

Downloading the data (alternatively you can provide a path to the local copy of the data as an argument to the function):

In [None]:
data = get_movielens_data()

Sampling 5% of the preferences data to form the *holdout* dataset:

In [None]:
data_model.set_test_data(test_users=test_users, warm_start=False)

Recommendations in that case will have a corresponding shape of `number of test users` x `top-n` (by default top-10).

In [None]:
svd.get_recommendations().shape

In [None]:
print((len(test_users), svd.topk))

As the holdout was not provided, it's previous state is cleared from the data model:

In [None]:
print(data_model.test.holdout)

The order of test user id's in the recommendations matrix may not correspond to their order in the `test_users` list. The true order can be obtained via `index` attribute - the users are sorted in ascending order by their internal index. This order is used to construct the recommendations matrix.

In [None]:
data_model.index.userid.training.query('old in @test_users')

In [None]:
test_users

Note, that **there's no need to provide *testset* argument in the case of known users**.
All the information about test users' preferences is assumed to be fully present in the training data and the following function call will intentionally raise an error:  
```python
data_model.set_test_data(testset=some_test_data, warm_start=False)
```
If the testset contains new (unseen) information, you should consider the warm-start scenarios, described below.

## Scenario 3: see recommendations for unseen users without evaluation

Let's form a dataset with new users and their preferences:

In [None]:
unseen_data = data_sampled.query('userid in @unseen_users')
unseen_data.shape

In [None]:
assert unseen_data.userid.nunique() == len(unseen_users)
print(len(unseen_users))

None of these users are present in the training:

In the competitions like [*Netflix Prize*](https://en.wikipedia.org/wiki/Netflix_Prize) you may be provided with a dedicated evaluation dataset (a *probe* set), which contains hidden preferences information about *known* users. In terms of the Polara syntax, this is a *holdout* set.

You can assign this holdout set to the data model by calling the `set_test_data` method as follows:

In [None]:
data_model.set_test_data(holdout=holdout, warm_start=False)

Mind the `warm_start=False` argument, which tells Polara to work only with known users. If some users from holdout are not a part of the training data, they will be filtered out and the corresponding notification message will be displayed (you can turn it off by setting `data_model.verbose=False`). In this example 1129 users were filtered out, as initially the holdout set contained both known and unknown users.

Note, that items not present in the training data are also filtered. This behavior can be changed by setting `data_model.ensure_consistency=False` (not recommended).

In [None]:
data_model.test.holdout.userid.nunique()

The recommendation model can now be evaluated:

In [None]:
svd.switch_positive = 4 # treat ratings below 4 as negative feedback
svd.evaluate()

In [None]:
data_model.test.holdout.query('rating>=4').shape[0] # maximum number of possible true_positive hits

In [None]:
svd.evaluate('relevance')

## Scenario 2: see recommendations for selected known users without evaluation

Polara also allows to handle cases, where you don't have a probe set and the task is to simply generate recommendations for a list of selected test users. The evaluation in that case is to be performed externally.

Let's randomly pick a few test users from all known users (i.e. those who are present in the training data):

In [None]:
test_users = random_state().choice(users, size=5, replace=False)
test_users

You can provide this list by setting the `test_users` argument of the `set_test_data` method:

In [None]:
data_model.index.userid.training.old.isin(unseen_users).any()

In order to generate recommendations for these users, we assign the dataset of their preferences as a *testset* (mind the *warm_start* argument value):

In [None]:
data_model.set_test_data(testset=unseen_data, warm_start=True)

As we use an SVD-based model, there is no need for any modifications to generate recommendations - it uses the same analytical formula for both standard and warm-start regime:

Note, that internally the `unseen_data` dataset is transformed: users are reindexed starting from 0 and items are reindexed based on the current item index of the training set.

In [None]:
data_sampled = data.sample(frac=0.95, random_state=random_state()).sort_values('userid')

In [None]:
holdout = data[~data.index.isin(data_sampled.index)]

Make 20% of all users unseen during the training phase:

In [None]:
users, unseen_users = np.split(data_sampled.userid.drop_duplicates().values,
                               [int(0.8*data_sampled.userid.nunique()),])

In [None]:
observations = data_sampled.query('userid in @users')

## Scenario 0: building a recommender model without any evaluation

This is the simplest case, which allows to completely ignore evaluation phase. **This sets an initial configuration for all further evaluation scenarios**.

In [None]:
from polara.recommender.data import RecommenderData
from polara.recommender.models import SVDModel

In [None]:
data_model = RecommenderData(observations, 'userid', 'movieid', 'rating', seed=seed)

We will use `prepare_training_only` method instead of the general `prepare`:

In [None]:
data_model.prepare_training_only()

This sets all the required configuration parameters and transform the data accordingly.

Let's check that test data is empty,

In [None]:
data_model.test

and the whole input was used as a training part:

In [None]:
data_model.training.shape

In [None]:
observations.shape

Internally, the data was transformed to have a certain numeric representation, which Polara relies on:

In [None]:
data_model.training.head()

In [None]:
observations.head()

<div class="alert alert-block alert-info">The mapping between external and internal data representations is stored in the `data_model.index` attribute.</div>
The transformation can be disabled by setting the `build_index` attribute to `False` before data processing (not recommended).

You can easily build a recommendation model now:

In [None]:
svd = SVDModel(data_model)
svd.build()

However, the recommendations cannot be generated, as there is no testing data. The following function call will raise an error:
```python
svd.get_recommendations()
```

## Scenario 1: evaluation with pre-specified holdout data for known users

In [None]:
data_model.test.testset.head()

In [None]:
data_model.index.userid.test.head() # test user index mapping, new index starts from 0

In [None]:
data_model.index.itemid.head() # item index mapping

In [None]:
unseen_data.head()

## Scenario 4: evaluate recommendations for unseen users with external holdout data

This is the most complete scenario. We generate recommendations based on the test users' preferences, encoded in the `testset`, and evaluate them against the `holdout`. You should use this setup only when the Polara's built-in warm-start evaluation pipeline (turned on by `data_model.warm_start=True` ) is not sufficient, , e.g. when the preferences data is fixed and provided externally.

In [None]:
data_model.set_test_data(testset=unseen_data, holdout=holdout, warm_start=True)

As previously, all unrelated users and items are removed from the datasets and the remaining entities are reindexed.