### Here I will use a technique called Collaborative Filtering to make recommendations to Movie Watchers. 
### Collaborative Filtering is based on the idea that users similar to me can be used to predict how much I will like a particular product or service that those users have used/experienced which I have not (means new movies/products/services).

### Here I gonna use the Surprise library that used extremely powerful algorithms like Singular Value Decomposition (SVD) to minimise RMSE (Root Mean Square Error) and give great recommendations.

## Importing required packages

In [17]:
#importing neccessary packages

# matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from ast import literal_eval
from sklearn.externals import joblib

from surprise import Reader, Dataset, SVD, evaluate
from surprise import KNNBasic
from surprise.model_selection import cross_validate
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
import ast

## Lodaing Data.. Data cleaning and slicing

In [18]:
reader = Reader()

ratings = pd.read_csv("D:/DataScience/EarlyBirds_DataScience_Test/ratings.csv", 
                      low_memory=False, encoding='latin-1')
ratings = ratings[:500000]
ratings.head()


Unnamed: 0,userId,movieId,rating,timestamp
0,1,81834,5.0,1425942133
1,1,112552,5.0,1425941336
2,1,98809,0.5,1425942640
3,1,99114,4.0,1425941667
4,1,858,5.0,1425941523


In [19]:
# let's make a proper shape to our data by droping unneccessary data 
colsToDrop = ['timestamp']
ratings = ratings.drop(colsToDrop, axis=1)
ratings.head()


Unnamed: 0,userId,movieId,rating
0,1,81834,5.0
1,1,112552,5.0
2,1,98809,0.5
3,1,99114,4.0
4,1,858,5.0


In [20]:
#let's have movies which has ratings
movies_with_ratings = ratings['movieId'].unique()

## Train-Test our data

In [21]:
#split our dataset as train data and test data using train_test_split function
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
trainset, testset = train_test_split(data, test_size=.25)

## Fit our data into Model

In [22]:
# fit the model SVD (Singular Value Decomposition )
algo = SVD()
algo.fit(trainset)

# predict and test
pred = algo.test(testset)


## RMSE Score

In [23]:
# let's compute RMSE (root mean square prediction error)
print(accuracy.rmse(pred))


RMSE: 0.8542
0.8542041743928422


### Here We got Root Mean Sqaure Prediction Error of 0.8553 which is good enough for our case.
### RMSE score varies on how we sliced our data (how much % of data we used from the original data).

## Final Predictions

In [24]:
#  let's Predict the rating for the pairs of (user, movie) 
output_set = pd.read_csv('D:/DataScience/EarlyBirds_DataScience_Test/evaluation_ratings.csv')

predictions =[]

for index, row in output_set.iterrows():

    pr=algo.predict(row['userId'], row['movieId'])
    predictions.append(pr.est)
                                       

output_set['predictions'] = predictions
output_set.to_csv('D:/DataScience/EarlyBirds_DataScience_Test/final_evaluation_ratings_results.csv')


### Let's have a look on our prediction results

In [25]:
pred_ratings = pd.read_csv('D:/DataScience/EarlyBirds_DataScience_Test/final_evaluation_ratings_results.csv')
pred_ratings.head()

Unnamed: 0.1,Unnamed: 0,userId,movieId,predictions
0,0,1,110,4.451473
1,1,1,1968,4.355597
2,2,1,4878,4.533701
3,3,1,54503,4.104788
4,4,1,91542,4.167086


In [26]:
# Remove Unamed id columns as it is meaningless
# let's have a look on meaningfull final prediction results
pred_ratings = pred_ratings.drop(pred_ratings.columns[pred_ratings.columns.str.contains('unnamed',case = False)],axis = 1)
pred_ratings.head(10)

Unnamed: 0,userId,movieId,predictions
0,1,110,4.451473
1,1,1968,4.355597
2,1,4878,4.533701
3,1,54503,4.104788
4,1,91542,4.167086
5,2,79,3.041228
6,2,141,3.550564
7,2,260,3.598673
8,2,1210,3.472426
9,3,1968,3.428769


### One great feature of this recommender system is that it doesn't care what the movie is (or what it contains). It works purely on the basis of an assigned pairs(userId, movieId) and tries to predict ratings based on how the other users have predicted that movie.
### Humans we do change tastes, if we like a perticular genre movies at one period, might be we won't like that kind of movies anymore in future as we seen most of them and our mind always try to enjoy/discover different things.
### As this recommender is concentrate only on ratings so we can recommend the appropriate movies/products/services to the target client easily based on his previous ratings.(This is one of the best thing of using item based collabertive filer)
### So the predicting ratings of pairs(user,movie) done

## bonus  Credit

In [30]:
# let's recommend a movie to user which has similar rating of other movie that been wacthed and rated by that user
fl = 'D:/DataScience/EarlyBirds_DataScience_Test/movies_metadata.csv'
df2 = pd.read_csv(fl, low_memory=False, encoding='latin-1')
df2.dropna(inplace = True)

movies = df2['belongs_to_collection'].values

movies = [ ast.literal_eval(mv) for mv in movies]


movie_ids = [ mv['id'] for mv in movies]

movie_ids_with_ratings = list(set(movies_with_ratings) & set(movie_ids))

bonus_df = pd.DataFrame(columns=['userId','movieId','rating', 'movie2'])

#import pdb; pdb.set_trace()

for mv in movie_ids_with_ratings:
    row = ratings[ratings['movieId'] == mv].iloc[0].copy()
    for mv2 in movies_with_ratings:
        pr = algo.predict( row['userId'], mv2)
        if (round(pr.est,1) == row['rating']):#if mv2 predicting rating matches with ratings in ratings list
            row['movie2'] = mv2 #then we recommend this movie to that user who given that rating
            bonus_df = bonus_df.append({'userId': row['userId'], 'movieId': row['movieId'],
                                        'rating': row['rating'], 'movie2': row['movie2']}, ignore_index=True)
            break
        
        
        
bonus_df.to_csv('D:/DataScience/EarlyBirds_DataScience_Test/Bonus_results.csv')




## let's see the results of Recommending a movie to user based on his/her previous ratings

In [31]:
# let's load our output file and remove Unnamed id columns as it is meaningless
# let's have a look on meaningfull final prediction results

bonus_df =pd.read_csv('D:/DataScience/EarlyBirds_DataScience_Test/Bonus_results.csv')
bonus_df = bonus_df.drop(bonus_df.columns[bonus_df.columns.str.contains('unnamed',case = False)],axis = 1)
bonus_df.head()

Unnamed: 0,userId,movieId,rating,movie2
0,249.0,8581.0,4.0,2918.0
1,1893.0,645.0,1.0,2338.0
2,1608.0,263.0,1.0,2383.0
3,805.0,264.0,2.0,1431.0
4,25.0,10.0,3.0,2581.0


###  here, movie2 (prediction movies list)column consists of list of movies to recommend to the corresponding users 

### So here is the Recommendation System i built where i predicted ratings from pairs(userId, movieId) and recommending movies to users