<a href="https://colab.research.google.com/github/MatheusRocha0/Recommendation-Engine/blob/main/Recommendation-Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 # Data Science Project: Recommendation Engine
 
YouTube, Amazon, Facebook and Instagram are some of the companies that use this kind of implemantation. This kind of project is the most commonly used Data Science application in the entire world. Some years ago, you would have to hire the best statisticians and mathematicians of the world to build a nice system. But nowadays with our advanced technology, anyone can build their own recommendation system.
 
# Recommendation Engine Types
 
There are basically three distinct types of reccomender systems:
 
## Collaborative Filtering
 
This filtering method is usually based on collecting and analyzing information on user’s behaviors, their activities or preferences and predicting what they will like based on the similarity with other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and thus it is capable of accurately recommending complex items such as movies without requiring an “understanding” of the item itself.
 
## Content-Based Filtering
 
These filtering methods are based on the description of an item and a profile of the user’s preferred choices. In a content-based recommendation system, keywords are used to describe the items; besides, a user profile is built to state the type of item this user likes.
 
## Hybrid Recommendation Systems
 
Recent research shows that combining collaborative and content-based recommendation can be more effective. Hybrid approaches can be implemented by making content-based and collaborative-based predictions separately and then combining them. Further, by adding content-based capabilities to a collaborative-based approach and vice versa; or by unifying the approaches into one model.
 
## Scikit Surprise
 
Surprise (stands for Simple Python Recommendation System Engine) is an easy-to-use Python scikit for recommender systems. This tool allows anyone to build Collaborative Filtering Reccomendation Engines easily using Python with few lines of code.
 
# About the Project
 
## Fictional Context
 
**Disclaimer: the context I am going to present here is only for performance purposes. The CEO and the company only exist on my mind.**
 
The All in One Place is a company that offers streaming services, but their users average watching time is not enough for the CEO.
 
Then he decided to hire me, a data science consultant, because he knew from friends that Data Science is helping many companies out there. I suggested to implement a Recommendation Engine into the platform, by doing this, the average time is going to get higher. 

After he agreed I had access to their customers database so I could start working in the project.
 
## Data
 
You can download the data I will be using in the project here: https://bit.ly/3qDOziX

# Libraries (Tools)

In [None]:
pip install scikit-surprise -q

In [48]:
# Data manipulation
 import pandas as pd
 
# Mathematics
import numpy as np
 
# Recommendation System
from surprise import Reader, Dataset, SVDpp, accuracy
from surprise.model_selection import train_test_split, cross_validate
 
# Save the model
import pickle
 
# Make APi requests
import requests

## Importing data

In [63]:
movies = pd.read_csv("https://raw.githubusercontent.com/MatheusRocha0/Recommendation_Engine/main/movies.csv")
ratings = pd.read_csv("https://raw.githubusercontent.com/MatheusRocha0/Recommendation_Engine/main/ratings.csv")
 
movies.drop("genres", axis = 1, inplace = True)
ratings.drop("timestamp", axis = 1, inplace = True)
 
data = pd.merge(ratings, movies, on = "movieId")
data.head()

Unnamed: 0,userId,movieId,rating,title
0,1,1,4.0,Toy Story (1995)
1,5,1,4.0,Toy Story (1995)
2,7,1,4.5,Toy Story (1995)
3,15,1,2.5,Toy Story (1995)
4,17,1,4.5,Toy Story (1995)


# Data Cleaning
 
It is important to verify if the data is clean, otherwise it is necessary to perform what we call Data Cleaning. In this process we remove outliers, treat missing values and etc.
 
It is necessary because noises in the data may affect the model's performance.

## Missing Values

In [64]:
data.isnull().sum()

userId     0
movieId    0
rating     0
title      0
dtype: int64

There is no any missing value in data.

 ## Drop Duplicates

In [65]:
data.drop_duplicates(inplace = True)

## Transforming the dataframe into Surprise Dataset

Surprise does not accept Pandas Dataframes, so it is necessary to convert it into an acceptable format.

In [54]:
 reader = Reader(rating_scale = (0.5, 5))
dataset = Dataset.load_from_df(data.drop("title", axis = 1), reader)

 ## Splitting the data into Training and Testing sets

It makes possible to train and evaluate the model.

In [55]:
train_set, test_set = train_test_split(dataset, test_size = .5)

# Machine Learning Model
 
The model will take the user ID and the item/movie ID as input and will return how the user will rate this movie. The model makes it by analyzing how other similar users rated this same item.
 
There are many algorithms I could use here, but this one (SVD++) presented the best performance in the tests.

In [None]:
engine = SVDpp(
random_state = 1,
n_epochs = 30,
lr_all = .01,
reg_all = .07
)
 
engine.fit(train_set)

## Evaluating the model
 
Now, it is time to evaluate the model. In Data Science we use metrics, that are numeric values that represent how the model interpreted the data.
 
In this case I am using the Root Mean Squared Error, a Regression metric, here, the smaller the better.
 
We must remember that the target variable we used here is the Rating column, that have values that may vary from 0 to 5, and this is important when we interpret the results.

In [58]:
 p = engine.test(test_set)
score = accuracy.rmse(p)

RMSE: 0.8795


0.8795 of error is very good considering the context.

## Saving the model to build the API
 
But none model will help the business by staying in the local machine, we must save the model into a file then we can create and API to make requests.

In [59]:
fileObj = open('model.pkl', 'wb')
pickle.dump(engine,fileObj)
fileObj.close()

## API Requests
 
After I had the model in my hands I built the API and deployed into Heroku, a free cloud service that allow people to deploy their apps.
 
I created a function to make the process easier.

In [67]:
def api_request(sample):
 
   json = sample.to_json(orient = "records")
 
   url = "https://api-recommendation-engine.herokuapp.com/"
   data_ = json
   headers = {"Content-type": "application/json"}
 
   r = requests.post(url = url, data = data_, headers = headers)
 
   df = pd.DataFrame(r.json(), columns = r.json()[0].keys())
   return df
 
sample = data.drop(["rating", "title"], axis = 1).sample()
api_request(sample)

Unnamed: 0,userId,movieId,user_rating
0,104,471,3.598539
