<a href="https://colab.research.google.com/github/AntonyBoza/PROJECTS/blob/master/Movie_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Hola amigos.

Please follow these basic conventions while working:
<br><ul>
<li> Comment what you are doing right before your piece of code.</li>
<li> Add your name in <i>italics</i> at the end of the comment</li>
<li> Do not delete / change someone elses code </li>
<li> if you feel like something should change / be deleted, leave a comment in before the code and raise your issue on the Whatsapp / slack group.</li>
</ul>

##**Project goal:** 
To build a machine learning system that takes the history of what movies a person has watched and comparing with the preferences of other persons, their similarities, to make suggestions to each one in a personalized way

##**Approach:**
The first step is to find similar users or items. The second step is to predict the ratings of the items that are not yet rated by a user. So, we will need the answers to these questions:

*   How do we determine which users or items are similar to one another?
*   Given that we know which users are similar, how do we determine the rating that a user would give to an item based on the ratings of similar users?
*   How do we measure the accuracy of the ratings you calculate? 

Then, we'll use SVD algorithm, this approach is the most widely used today in some form in various companies like Amazon, Netflix, etc. Also, we are going to use the Python package "SURPRISE". See http://surprise.readthedocs.io/en/stable/matrix_factorization.html for more information

Dataset acknowledgment: 
*F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>*

*Antonio*




### Let's first import necesary packages and use Python SURPRISE package for recommendation systems. *http://surpriselib.com/* 

*Antonio*

In [1]:
!pip install surprise



In [3]:
import pandas as pd
import numpy as np
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)
from surprise import SVDpp
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
from surprise.model_selection import GridSearchCV
from surprise.model_selection import cross_validate

## Load and check the data. *Antonio*

In [29]:
# Load the movielens-100k dataset from Github repository and from built-in SURPRISE dataset. The first is to take a look at the data, the second for 
# the model definition, training and testing.

url1 = 'https://raw.githubusercontent.com/AntonyBoza/PROJECTS/master/.github/workflows/movies.csv'
url2 = 'https://raw.githubusercontent.com/AntonyBoza/PROJECTS/master/.github/workflows/ratings.csv'
movies = pd.read_csv(url1)
ratings = pd.read_csv(url2)
# data = Dataset.load_builtin('ml-100k') # This is a more straightforward way to loading dataset


In [5]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [6]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [7]:
# Merge two dataframes...Antonio
movie_data = pd.merge(ratings, movies, on='movieId')

In [8]:
# Check the new dataframe. Antonio
movie_data.head()

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


Let's split the data into Train and Test. *Antonio*

In [9]:
trainset, testset = train_test_split(data, test_size=.2)

In [None]:
print(testset)

### Create the model & Check the algorithm. 
The famous singular vector decomposition (SVD) algoritm shown here employs the use of gradient descent to minimize the squared error between predicted rating and actual rating, eventually getting the best model...This algorithm has been sucessfully used by Amazon and Netflix....*Antonio*

In [None]:
# Use the optimal parameters with the train data
algo = SVD(n_factors=160, n_epochs=100, lr_all=0.005, reg_all=0.1)
algo.fit(trainset)
test_pred = algo.test(testset)
print("SVD : Test Set")
accuracy.rmse(test_pred, verbose=True)

In [None]:
print(test_pred)

In [None]:
# Check the first userId from de above results (uid=417). It was suggested the MovieId=384 with a predicted rating of 4.0.
movie_data[movie_data['userId']==417]

## Now, let's save the model for further use. We're going to use Joblib Module from Scikit Learn.

In [None]:
from sklearn.externals import joblib

In [None]:
# Save RL_Model to file in the current working directory

joblib_file = "joblib_Rec_Model.pkl"  
joblib.dump(algo, joblib_file)

In [None]:
# Reload the Saved Model using Joblib
joblib_Rec_model = joblib.load(joblib_file)


joblib_Rec_model

## **TO BE DONE**.... *Antonio*

In [None]:
# Calculate the accuracy score and predict target values
# Calculate the Score 
# score = joblib_Rec_model.score(Xtest, Ytest) 

# Print the Score
# print("Test score: {0:.2f} %".format(100 * score))  

# Predict the Labels using the reloaded Model
# Ypredict = joblib_Rec_model.predict(Xtest)  

# Ypredict

---

**# P.S: NIDHI please let's do the cloud deployment!**

---

