# Train our first movie recommender

In this tutorial, we build our first recommender system using a simple algorithm called [P3alpha](https://nms.kcl.ac.uk/colin.cooper/papers/recommender-rw.pdf).

We will be learning

- How to represent the implicit feedback dataset as a sparse matrix.
- How to fit `irspack`'s models using the sparse matrix representation.
- How to make a recommendation using our API.

In [1]:
import numpy as np
import scipy.sparse as sps

from irspack.dataset.movielens import MovieLens1MDataManager
from irspack.recommenders import P3alphaRecommender

## Read Movielens 1M dataset

We first load the [Movielens1M](https://grouplens.org/datasets/movielens/1m/) dataset. For the first time, you will be asked to allow downloading the dataset.

In [None]:
loader = MovieLens1MDataManager(force_download=True)

df = loader.read_interaction()
df.head()

`df` stores the users' watch event history.

Although the rating information is available in this case, we will not be using this column. What matters to implicit feedback based recommender system is "which user interacted with which item (movie)".

By `loader` we can also read the dataframe for the movie meta data:

In [None]:
movies = loader.read_item_info()
movies.head()

## Represent your data as a sparse matrix

We represent the data as a sparse matrix $X$, whose element $X_{ui}$ is given by

$$
X_{ui} = \begin{cases}
1 & \text{if the user }u\text{ has watched the item (movie) } i \\
0 & \text{otherwise}
\end{cases}
$$

For this purpose, we use `np.unique` function with `return_inverse=True`.
This will return a tuple that consists of

1. The list of unique user/movie ids appearing in the original user/movie id array
2. How the original user/movie id array elements are mapped to the array 1.

So if we do

In [None]:
unique_user_ids, user_index = np.unique(df.userId, return_inverse=True)
unique_movie_ids, movie_index = np.unique(df.movieId, return_inverse=True)

then ``unique_user_ids[user_index]`` and ``unique_movie_ids[movie_index]`` is equal to the original array:

In [None]:
assert np.all( unique_user_ids[user_index] == df.userId.values )
assert np.all( unique_movie_ids[movie_index] == df.movieId.values )

Thus, we can think of ``user_index`` and ``movie_index`` as representing the row and column positions of non-zero elements, respectively.

Now $X$ can be constructed as [scipy's sparse csr matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) as follows.

In [None]:
X = sps.csr_matrix(
    (
        np.ones(df.shape[0]), # values of non-zero elements
        (
            user_index, # rows of non-zero elements
            movie_index # cols of non-zero elements
        )
    )
)

X

## Fit the recommender.

We fit `P3alphaRecommender` against X.

In [None]:
recommender = P3alphaRecommender(X)
recommender.learn()

## Check the recommender's output

Suppose there is a new user who has just watched "Toy Story". Let us see what would be the recommended for this user.

We first represent the user's watch profile as another sparse matrix (which contains a single non-zero element).

In [None]:
movie_id_vs_movie_index = { mid: i for i, mid in enumerate(unique_movie_ids)}

toystory_id = 1
toystory_watcher_matrix = sps.csr_matrix(
    ([1], ([0], [movie_id_vs_movie_index[toystory_id]])),
    shape=(1, len(unique_movie_ids)) # this time shape parameter is required
)

movies.loc[toystory_id]

Since this user is new (previously unseen) to the recommender, we use `get_score_cold_user_remove_seen` method.

`remove_seen` means that we mask the scores for the items that user had watched already (in this case, Toy Story) so that such items would not be recommended again.

As you can see, the score corresponding to "Toy Story" has $-\infty$ score.

In [None]:
score = recommender.get_score_cold_user_remove_seen(
    toystory_watcher_matrix
)

# Id 1 (index 0) is masked (have -infinity score)
score

To get the recommendation, we ``argsort`` wthe score by descending order
and convert "movie index" (which starts from 0) to "movie id".

In [None]:
recommended_movie_index = score[0].argsort()[::-1][:10]
recommended_movie_ids = unique_movie_ids[recommended_movie_index]

# Top-10 recommendations
recommended_movie_ids

And here are the titles of the recommendations.

In [None]:
movies.reindex(recommended_movie_ids)

The above pattern - mapping item IDs to indexes, creating sparse matrices, and reverting indexes of recommended items to item IDs - is a quite common one, and we have also created a convenient class that does the item index/ID mapping:

In [None]:
from irspack.utils.id_mapping import IDMappedRecommender

id_mapped_rec = IDMappedRecommender(
    recommender,
    user_ids=unique_user_ids,
    item_ids=unique_movie_ids
)
id_and_scores = id_mapped_rec.get_recommendation_for_new_user(
    [toystory_id], cutoff = 10
)
movies.reindex(
    [ item_id for item_id, score in id_and_scores ]
)

While the above result might make sense, this is not an optimal result.
To get better results, we have to tune the recommender's hyper parameters
against some accuracy metric measured on a validation set.

In the next tutorial, we will see how to define the hold-out and validation score.