# Building Movie Recommender System

### Steps of the Project
 - Import Required Libraries
 - Import data
 - Visualize the data (__Optional__)
 - Create a Recommender Model
     - Prepare dataset (<i>K-fold cross validation for train-test-validation dataset splitting</i>)
 - Apply Recommender Algorithms
     - Popularity Recommender Model
     - Collaborative Filtering Model
     - Item Similarity Filtering Model
 - Get Top-K Recommendations for three models
 - Evaluate your models : RMSE (Root Mean Squared Error)
 - Get Confusion Matrix Results : Precision/Recall metrics
 - Report the results
     - Which model is the best fit for this dataset?
     - What are the top-k recommendations for each model?
     - Evaluation Results : Which model has the best performance for recommending?

### Import Required Libraries

---

Since you will use Python programming language to implement this project, Python ecosystem has many recommender system libraries that you can use. [Turi Create](https://github.com/apple/turicreate) is one of them and highly recommended library that you can easily use for this project.

In [3]:
import pandas as pd
import turicreate as tc

### Visualize Dataset

---

Sometimes to understand the big picture of the dataset, you may want to check some visualizations to decide which algorithm would fit to solve your problem.

In [4]:
ratingsCols = ['userID', 'movieID', 'rating']
ratings = pd.read_csv('ml-100k/u.data', sep='\t', names=ratingsCols, usecols=range(3))

moviesCols = ['movieID', 'title']
movies = pd.read_csv('ml-100k/u.item', sep='|', names=moviesCols, usecols=range(2))



### Import Dataset

---

Movie Review [dataset](https://grouplens.org/datasets/movielens/100k/) will be used for this project. After you download the dataset, you can import to your project by using [Pandas](http://pandas.pydata.org/) Python Data Analysis library. For more info check [here](https://pandas.pydata.org/pandas-docs/stable/io.html).

In [5]:
analysis = pd.merge(movies, ratings)
analysis.head(10)

Unnamed: 0,movieID,title,userID,rating
0,1,Toy Story (1995),308,4
1,1,Toy Story (1995),287,5
2,1,Toy Story (1995),148,4
3,1,Toy Story (1995),280,4
4,1,Toy Story (1995),66,3
5,1,Toy Story (1995),5,4
6,1,Toy Story (1995),109,4
7,1,Toy Story (1995),181,3
8,1,Toy Story (1995),95,5
9,1,Toy Story (1995),268,3


### Create a Recommender Model

---
A recommender system allows you to build personalized recommendation systems to users. There are many methods to do this. In this project, you will need to use three of these methods which are __popularity-based recommender model__ , __factorization recommenders model__ , and __item similarity model__ . 

[Turi](https://github.com/apple/turicreate) has easy-to-implement recommender models. You can use Turi's recommender models for your dataset. For more information check [here](https://apple.github.io/turicreate/docs/api/turicreate.toolkits.recommender.html#creating-a-recommender)
<br>

The steps of building a recommender system are;
 - Do not forget to convert your dataset to [SFrame](https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.html#turicreate.SFrame) type. Also split the dataset for training, testing, and validation processes. [Write code here](#Convert-dataset-to-SFrame)
 - Create and Apply each model to your dataset. (In this assignment, you will use three different methods as listed above) [Write code here](#Create-and-Apply-Recommender-Algorithms).
 - Find top k recommendations. (Display top k=5 recommendations and check the scores of them.)[Write code here](#Get-Top-K-Recommendations-for-Three-Models)
 - Evaluate your model. (Check your model with confusion matrix metrics and find how accurate your model is to recommend movies to users.)[Write code here](#Evaluate-your-model)

### Convert dataset to SFrame

In [6]:
ratings = tc.SFrame.read_csv('ml-100k/u1.base', header=False, delimiter='\t', usecols=['X1','X2','X3'])
ratings = ratings.rename({'X1':'movie_id', 'X2':'user_id', 'X3':'rating'})

movies = data = tc.SFrame.read_csv('ml-100k/u.item', header=False, delimiter='|', usecols=['X1', 'X2'])
movies = movies.rename({'X1':'movie_id', 'X2':'title'})
analysis = movies.join(ratings)

ratingsTest = tc.SFrame.read_csv('ml-100k/u1.test', header=False, delimiter='\t', usecols=['X1','X2','X3'])
ratingsTest = ratingsTest.rename({'X1':'movie_id', 'X2':'user_id', 'X3':'rating'})
analysisTest = movies.join(ratingsTest)

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


### Create and Apply Recommender Algorithms

---
Implement three recommender algorithms;
 - __Popularity Recommender Model__,
 - __Factorization Recommenders Model__,
 - __Item Similarity Filtering Model__ .
 
 


In [7]:
ISF = tc.recommender.item_similarity_recommender.create(analysis, user_id='user_id', item_id='movie_id', target='rating')

Factor = tc.recommender.factorization_recommender.create(analysis, user_id='user_id', item_id='movie_id', target='rating')

POP = tc.recommender.popularity_recommender.create(analysis, user_id='user_id', item_id='movie_id', target='rating')






### Get Top-K Recommendations for Three Models

---

Test your recommender models by finding top k=5 movies. Write your outcomes about the result of testing. <i>What are the scores of each recommendations?</i> <i>How are they accurate?</i> <i>Which model test results are the highest one?</i>

In [8]:
IR = ISF.recommend()
FR = Factor.recommend()
PR = POP.recommend()

print "Item Similarity Filtering Model"
print IR.head(5)
print "Factorization Recommenders Model"
print FR.head(5)
print "Popularity Recommender Model"
print PR.head(5)


Item Similarity Filtering Model
+---------+----------+-----------------+------+
| user_id | movie_id |      score      | rank |
+---------+----------+-----------------+------+
|    1    |   551    | 0.0752465760739 |  1   |
|    1    |   474    | 0.0685495279788 |  2   |
|    1    |   666    | 0.0682896051955 |  3   |
|    1    |   774    | 0.0626353159275 |  4   |
|    1    |   846    | 0.0571808460176 |  5   |
+---------+----------+-----------------+------+
[5 rows x 4 columns]

Factorization Recommenders Model
+---------+----------+---------------+------+
| user_id | movie_id |     score     | rank |
+---------+----------+---------------+------+
|    1    |   304    | 4.63659710689 |  1   |
|    1    |   810    | 4.59536202691 |  2   |
|    1    |   565    | 4.43176933072 |  3   |
|    1    |   849    | 4.42326534978 |  4   |
|    1    |   351    | 4.35643856666 |  5   |
+---------+----------+---------------+------+
[5 rows x 4 columns]

Popularity Recommender Model
+---------+-----

### Evaluate your model

#### Step 1: Calculate RMSE Score for Three Models

---

**RMSE : Root Mean Squared Error**

Write outcomes about rmse scores for each recommender model. Compare results in your report.

In [9]:
IE = ISF.evaluate_rmse(analysisTest, target='rating')
FE = Factor.evaluate_rmse(analysisTest, target='rating')
PE = POP.evaluate_rmse(analysisTest, target='rating')

print "Similarity Error"
print IE
print "Factor Error"
print FE
print "Popularity Error"
print PE

Similarity Error
{'rmse_by_user': Columns:
	user_id	int
	count	int
	rmse	float

Rows: 1410

Data:
+---------+-------+----------------+
| user_id | count |      rmse      |
+---------+-------+----------------+
|   118   |   53  | 3.24815511648  |
|   1029  |   2   | 0.930541907482 |
|   435   |   48  | 4.22745878612  |
|   1517  |   2   | 3.11744251663  |
|   537   |   4   | 3.51436717264  |
|   526   |   24  | 3.87446498868  |
|   232   |   23  | 3.30854299714  |
|   310   |   21  | 3.80563802163  |
|    49   |   17  | 3.35636912147  |
|    13   |   50  |  3.5388918241  |
+---------+-------+----------------+
[1410 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'rmse_by_item': Columns:
	movie_id	int
	count	int
	rmse	float

Rows: 459

Data:
+----------+-------+---------------+
| movie_id | count |      rmse     |
+----------+-------+---------------+
|   118    |   29  | 4.72959598454 |
|  

#### Step 2: Report Confusion Matrix Metrics, Precision and Recall

---

Precision and Recall are two metrices to evaluate the performance of recommender model. Compare and report all three recommender models according to their precision and recall scores.

In [10]:
IEPR = ISF.evaluate_precision_recall(analysisTest)
FEPR = Factor.evaluate_precision_recall(analysisTest)
PEPR = POP.evaluate_precision_recall(analysisTest)

print "Similarity Error"
print IEPR
print "Factor Error"
print FEPR
print "Popularity Error"
print PEPR

Similarity Error
{'precision_recall_overall': Columns:
	cutoff	int
	precision	float
	recall	float

Rows: 18

Data:
+--------+-------------------+-------------------+
| cutoff |     precision     |       recall      |
+--------+-------------------+-------------------+
|   1    | 0.000709219858156 | 7.16383695107e-06 |
|   2    |  0.00106382978723 | 0.000137595075233 |
|   3    |  0.00189125295508 | 0.000221673304876 |
|   4    |  0.00177304964539 | 0.000240276171797 |
|   5    |  0.00212765957447 | 0.000295347553613 |
|   6    |  0.00271867612293 | 0.000723083657904 |
|   7    |  0.00293819655522 |  0.00134629569841 |
|   8    |  0.0031914893617  |  0.00214085072577 |
|   9    |  0.00346729708432 |  0.0023720203339  |
|   10   |  0.00418439716312 |  0.00311387849983 |
+--------+-------------------+-------------------+
[18 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns., 'precision_recall_by

### Final Report

---

Summary the whole process. What did you understand? What kind of actions that may increase the accuracy of recommender models? Write some suggestions.

## Submission

---

You need to submit your source code as .py file or .ipynb (ipython notebook). The report of your study should be in .pdf format. Take snapshots of the results that you get in each step of the project and explain your outcomes below of these snap.