# 10.13 Recommendation - User-Movie

Summary:
    1. Import libraries and the dataset
    2. Identify total number of users and movies
    3. Split the data into training and testing sets
    4. Popilate the train test metrics with random ratings
    5. Create cosine similarity metrics for users and movies
    6. Perform predictions

In [1]:
import pandas as pd
import numpy as np

Read the dataset and add the column names against the dataset

In [2]:
df = pd.read_csv('Recommend.csv', header=None)
df

Unnamed: 0,0,1,2,3
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596
...,...,...,...,...
99995,880,476,3,880175444
99996,716,204,5,879795543
99997,276,1090,1,874795795
99998,13,225,2,882399156


In [3]:
df.columns = ['user_id', 'movie_id', 'rating', 'timestamp']
df.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


Since we are developing a user-movie recommendation model, we must be aware of the number of users and the number of movies. So let's write the code for this:

In [4]:
from sklearn.model_selection import train_test_split
n_users = df.user_id.unique().shape[0]
n_movies = df.movie_id.unique().shape[0]

# split data into train and test data sets
train_data, test_data = train_test_split(df, test_size=0.25)

Create a zero matrix with dimensions as n_users and n_movies

In [5]:
train_data_matrix = np.zeros((n_users, n_movies))

Now that we have created the train and test sets, we will populate the train and test matrix with rating such that the user id index - movie id index equals the given rating

In [6]:
for line in train_data.itertuples():
    train_data_matrix[line[1]-1, line[2]-1] = line[3]
train_data_matrix

array([[5., 3., 4., ..., 0., 0., 0.],
       [4., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [5., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 5., 0., ..., 0., 0., 0.]])

We can now see that we have some movie ratings in the training set. We will do the same for the testing set.

In [7]:
test_data_matrix = np.zeros((n_users, n_movies))
for line in test_data.itertuples():
        test_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

Now import the pairwise distances to create Cosine similarities for users and movies. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.  The cosine is 0 degrees is 1 and it is less than 1 for any angle in the interval 0 to pi radius.

Now we will go ahead with a prediction based on the fact that user-movie __collaborative filtering__ difference from mean rating.

Calculate user prediction such that user prediction is the sum of the mean_user_rating and the dot product of user_similarity and ratings difference divided by the absolute value of the user_similarity.

The matrix we received as output describes how those unrated movies will be rated by users based on their prior rating styles.

In [9]:
from sklearn.metrics import pairwise_distances
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
movie_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')
mean_user_rating = train_data_matrix.mean(axis=1)[:,np.newaxis]
ratings_diff = (train_data_matrix - mean_user_rating)
user_pred = mean_user_rating + user_similarity.dot(ratings_diff)/np.array([np.abs(user_similarity).sum(axis=1)]).T

user_pred

array([[ 1.57299028,  0.57310294,  0.47315094, ...,  0.29167488,
         0.29415907,  0.29384676],
       [ 1.31553192,  0.28564128,  0.13875934, ..., -0.06903304,
        -0.06562029, -0.06571825],
       [ 1.33285776,  0.24764935,  0.10955   , ..., -0.10396893,
        -0.10075155, -0.10060996],
       ...,
       [ 1.20047045,  0.21586536,  0.07598084, ..., -0.12432093,
        -0.12119957, -0.12115452],
       [ 1.36066427,  0.31242284,  0.19629496, ..., -0.01022188,
        -0.00713081, -0.00700755],
       [ 1.39365065,  0.36374943,  0.26974654, ...,  0.0834817 ,
         0.08576791,  0.08603372]])

__Now we have generated ratings for all movies!!!__