# Recomender system example with Collaborative Filtering
This example demonstrates the use of collaborative filtering for the implementation of a movie recomender system

### Libraries importation

In [1]:
import numpy as np
import pandas as pd
import cf # module implemeted at repository: https://github.com/daniel-lima-lopez/Collaborative-Filtering-in-Recomender-System
from sklearn.model_selection import train_test_split

### Auxiliary code

In [2]:
# construct a user-rating matrix
def get_Matrix(data, u_ids, m_ids):
    M = np.zeros(shape=(len(u_ids), len(m_ids)), dtype=np.float32)
    for ui in u_ids:
        auxm = data[data['userId']==ui]['movieId'] # movies rated by user ui
        auxm = np.array(auxm)
        auxr = data[data['userId']==ui]['rating'] # rating of movies
        auxr = np.array(auxr)

        auxui = np.where(u_ids==ui)[0][0] # corresponding user index
        for i in range(len(auxm)):
            auxmi = np.where(m_ids==auxm[i])[0][0] # corresponding movie index
            M[auxui, auxmi] = auxr[i]
    return M

# return the Movie names given a list of indices
def get_movies(inds, m_ids):
    movies_data = pd.read_csv('ml-latest-small/movies.csv')
    aux_ids = m_ids[inds]
    outs = []
    for id in aux_ids:
        outs.append(movies_data['title'].values[movies_data['movieId']==id][0])
    return outs

## Data importation
The dataset used in this example, provided by GroupLens Research, can be found at [ml-latest-small](https://grouplens.org/datasets/movielens/)

The dataset contains 100,000 registers of movie ratings of more than 600 users:

In [3]:
path = 'ml-latest-small'
data = pd.read_csv(f'{path}/ratings.csv')
data.head(6)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400


Calculate the user-rating matrix needed for the Collaborative Filtering technique:

In [4]:
# save users and movie ids
user_ids = np.unique(data['userId'])
movie_ids = np.unique(data['movieId'])

# user-rating matrix
M_train = get_Matrix(data, user_ids, movie_ids)
M_train, M_train.shape

(array([[4. , 0. , 4. , ..., 0. , 0. , 0. ],
        [0. , 0. , 0. , ..., 0. , 0. , 0. ],
        [0. , 0. , 0. , ..., 0. , 0. , 0. ],
        ...,
        [2.5, 2. , 2. , ..., 0. , 0. , 0. ],
        [3. , 0. , 0. , ..., 0. , 0. , 0. ],
        [5. , 0. , 0. , ..., 0. , 0. , 0. ]], dtype=float32),
 (610, 9724))

## Experimentation
Instantiate the Collaborative Filtering algorithm. The `k` parameter indicates the number of nearest neighbors considered on each prediction to gather information of similar users.

The `fit` method performs teh calculations needed for predictions.

In [5]:
test = cf.CollabFilt(k=3)
test.fit(M_train)

array([[0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       [4.25, 0.  , 4.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       ...,
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 2.  , 2.  , ..., 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , ..., 0.  , 0.  , 0.  ]], dtype=float32)

Once the calculations are done, we can perform predictions:

In [6]:
users = [1,4,5,65,232] # user indexes
topk = 5 # number of movies considered

for ui in users:    
    # top liked movies of ui
    inds = np.argsort(M_train[ui])[-topk:]
    top_ui = get_movies(inds, movie_ids)

    # predictions for ui
    pred_inds = test.predict(ui, topk)
    pred_ui = get_movies(pred_inds, movie_ids)

    print(f'\nTop u{ui} movies:')
    for i, mi in enumerate(top_ui):
        print(f' - {mi}')

    print(f'Top u{ui} preds:')
    for i, mi in enumerate(pred_ui):
        print(f' - {mi}')


Top u1 movies:
 - Mad Max: Fury Road (2015)
 - Wolf of Wall Street, The (2013)
 - The Jinx: The Life and Deaths of Robert Durst (2015)
 - Step Brothers (2008)
 - Warrior (2011)
Top u1 preds:
 - Citizen Kane (1941)
 - Lock, Stock & Two Smoking Barrels (1998)
 - Adventures of Robin Hood, The (1938)
 - Wolf Man, The (1941)
 - Go (1999)

Top u4 movies:
 - Once Were Warriors (1994)
 - Schindler's List (1993)
 - In the Name of the Father (1993)
 - Snow White and the Seven Dwarfs (1937)
 - Pinocchio (1940)
Top u4 preds:
 - Run Lola Run (Lola rennt) (1998)
 - Dr. Horrible's Sing-Along Blog (2008)
 - Crazy, Stupid, Love. (2011)
 - Avengers, The (2012)
 - Inception (2010)

Top u5 movies:
 - Dolores Claiborne (1995)
 - Tombstone (1993)
 - Shawshank Redemption, The (1994)
 - Fugitive, The (1993)
 - Braveheart (1995)
Top u5 preds:
 - Heavenly Creatures (1994)
 - Top Gun (1986)
 - Once Were Warriors (1994)
 - Sound of Music, The (1965)
 - Mask of Zorro, The (1998)

Top u65 movies:
 - Dark City (199