# Introduction and Purpose
    
   In this spotlight, I would be introducing a flexible, powerful and versatile machine learning framework, created by Apple, namely [turicreate](https://github.com/apple/turicreate). The framework offers aid in accomplishing many ML tasks like Image Classification, Drawing Classification, Sound Classification, Object Detection, One Shot Object Detection, Activitiy Classification, Clustering, Text Classification etc. Specifically, in the context of Information Storage and Retreival, I will be walking you through the understanding and usage of the Recommender System Toolkit offered by this framework. Finally we will create a recommender system built on top of this framework.
   
   We will be broadly working in two stages. 
   1. First, exploring the API and taking a look at various types of recommendation systems and other associated functionalities provided by the framework. 
   2. Second, we will be applying the understanding into practice. Essentially, we will leveraging the framework to create a Recommender System for the popular [MovieLens Dataset](https://grouplens.org/datasets/movielens/20m/).
   
   Without further ado lets dive into it! 😀😀

# Installation and Initial Setup

The following code guides through the installation and initial setup which also includes some configuration changes on the GPU. 

For installing just migrate to the preferred conda environment and run 

$ (preferred conda env) pip install -U turicreate

Once the package is installed we can import the package turicreate and set the configuration to use all the GPU's. Following code demonstrates the setup and configuration part.

In [113]:
import turicreate as tc

"""
There are three configurations to use with turicreate:

1. Using all the GPU's ---- tc.config.set_num_gpus(-1)
2. Using only 1 GPU    ---- tc.config.set_num_gpus(1)
3. Using the CPU       ---- tc.config.set_num_gpus(0)

We will be using all the GPU's configuration in this demonstration.
"""

tc.config.set_num_gpus(-1)

print("turicreate installed and imported")
print("Config: Using all the gpu's with turicreate")

turicreate installed and imported
Config: Using all the gpu's with turicreate


# Exploring the Recommender Library

The Turi Create Recommender Toolkit provides a unified interface to train a variety of recommender models and use them to make recommendations.

Recommender models can be created using turicreate.recommender.create() or loaded from a previously saved model using turicreate.load_model(). The input data must be an SFrame with a column containing user ids, a column containing item ids, and optionally a column containing target values such as movie ratings, etc. When a target is not provided (as is the case in implicit feedback settings), then a collaborative filtering model based on item-item similarity is returned. For more details, please see the documentation for turicreate.recommender.create().

A Recommender Model object can perform key tasks including predict, recommend, evaluate, and save. 

Next few sub-sections we will be looking exploring various parts of this toolkit in depth:

1. Creating a Recommender
2. Item Similarity Models
3. Item Content Recommenders
4. Factorization Recommenders
5. Factorization Recommenders for Ranking
6. Popularity-based Recommenders
7. Utilities

## 1. Creating a Recommender

**turicreate.recommender.create** is a unified interface for training recommender models. Based on simple characteristics of the data, a type of model is selected and trained. The trained model can be used to predict ratings and make recommendations. However, to use specific options of a desired model, use the create function of the corresponding model.

Given below are some examples of the usages:

**A. BASIC USAGE**

The following code demonstrates how to create a model given basic user-item observation data (without ratings) and then make rating predictions. An ItemSimilarityRecommender is created implicitly by the tooklit.

In [114]:
sframe = tc.SFrame({'user_id': ['0', '0', '0', '1', '1', '2', '2', '2'], 'item_id': ['a', 'b', 'd', 'a', 'c', 'a', 'c', 'd']})

model = tc.recommender.create(sframe)

recommendations = model.recommend()

In [117]:
print("Derived Results using our Model:")

recommendations

Derived Results using our Model:


user_id,item_id,score,rank
0,c,0.3333333333333333,1
1,d,0.5,1
1,b,0.1666666567325592,2
2,b,0.2777777711550395,1


**B. CREATING A MODEL FOR RATINGS DATA**

In this case we are given the user-item observation data along with the target 'ratings' data. This trains a FactorizationRecommender that can predict target ratings (again the toolkit uses the appropriate algorithm implicitly if nothing is specified). The following code demonstrates this use-case scenario.

In [41]:
sframe2 = tc.SFrame({'user_id': ['0', '0', '0', '1', '1', '2', '2', '2'], 'item_id': ['a', 'b', 'd', 'a', 'c', 'a', 'c', 'd'], 'rating': [1, 4, 3, 5, 4, 2, 4, 3]})

model2 = tc.recommender.create(sframe2, target="rating", ranking = False)

recommendations2 = model2.recommend()

In [53]:
print("Derived Results using our Model:")

recommendations2

Derived Results using our Model:


user_id,item_id,score,rank
0,c,2.573836714029312,1
1,b,6.529737710952759,1
1,d,5.212547659873962,2
2,b,4.837176163680852,1


## 2. Item Similarity Models

**turicreate.recommender.item_similarity_recommender.create** creates a recommender that uses item-item similarities based on users in common. 

One use case as we already saw in the previous section was in the case of input being given in the form of just user-item pairs devoid of the target ratings, where the framework implicitly uses Item Similarity Models. 
Given below are some other use-case scenarios:

**A. WHEN TARGET RATINGS ARE PROVIDED IN THE DATASET**

When a target is available, we can specify the desired similarity. For example we may choose to use a cosine similarity, and use it to make predictions or recommendations.

In [48]:
sf = tc.SFrame({'user_id': ['0', '0', '0', '1', '1', '2', '2', '2'], 'item_id': ['a', 'b', 'c', 'a', 'b', 'b', 'c', 'd'], 'rating': [1, 3, 2, 5, 4, 1, 4, 3]})

m = tc.item_similarity_recommender.create(sf, target="rating", similarity_type='cosine')

In [52]:
print("Derived Results using our Model:")

m.predict(sf)

m.recommend()

Derived Results using our Model:


user_id,item_id,score,rank
0,d,0.7924009362856547,1
1,c,1.0963225066661837,1
1,d,0.3922322988510132,2
2,a,0.4118128418922424,1


**B. INCORPORATING PRE-DEFINED SIMILAR ITEMS**

For item similarity models, we may choose to provide user-specified nearest neighbors graph using the keyword argument nearest_items. If provided, these item similarity scores are used for recommendations.

For example, suppose you first create an ItemSimilarityRecommender and use get_similar_items. With the above code, the item similarities computed for model m can be used to create a new recommender object, m2. Note that we could have created nn from some other means, but now use m2 to make recommendations via m2.recommend().

In [118]:
sf2 = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], 'item_id': ["a", "b", "d", "a", "d", "a", "b", "d"]})

m2 = tc.item_similarity_recommender.create(sf2)

nn2 = m.get_similar_items()

m2_nn2 = tc.item_similarity_recommender.create(sf, nearest_items=nn2)

In [120]:
print("Derived Results: ")

m2_nn2.predict(sf)

m2_nn2.recommend()

Derived Results: 


user_id,item_id,score,rank
0,d,0.6666666666666666,1
1,c,0.5,1
1,d,0.5,2
2,a,0.6666666666666666,1


## 3. Item Content Recommender

**turicreate.recommender.item_content_recommender.create** creates a content-based recommender model in which the similarity between the items recommended is determined by the content of those items rather than learned from user interaction data.

The similarity score between two items is calculated by first computing the similarity between the item data for each column, then taking a weighted average of the per-column similarities to get the final similarity. The recommendations are generated according to the average similarity of a candidate item to all the items in a user’s set of rated items.

Given below is an example to demonstrate its usage:

In [121]:

item_data = tc.SFrame({"my_item_id" : range(4), "data_1" : [ [1, 0], [1, 0], [0, 1], [0.5, 0.5] ], "data_2" : [ [0, 1], [1, 0], [0, 1], [0.5, 0.5] ] })

mod = tc.recommender.item_content_recommender.create(item_data, "my_item_id")

Applying transform:
Class             : AutoVectorizer

Model Fields
------------
Features          : ['data_1', 'data_2']
Excluded Features : ['my_item_id']

Column  Type   Interpretation  Transforms  Output Type
------  -----  --------------  ----------  -----------
data_1  array  vector          None        array      
data_2  array  vector          None        array      


Defaulting to brute force instead of ball tree because there are multiple distance components.


In [122]:
print("Derived Results 1 from Item Content Recommender: ")

mod.recommend_from_interactions([0])

Derived Results 1 from Item Content Recommender: 


my_item_id,score,rank
3,0.7071067690849304,1
1,0.5,2
2,0.5,3


In [67]:
print("Derived Results 2 from Item Content Recommender: ")


mod.recommend_from_interactions([0, 1])

Derived Results 2 from Item Content Recommender: 


my_item_id,score,rank
3,0.7071067690849304,1
2,0.25,2


## 4. Factorization Recommender

**turicreate.recommender.item_content_recommender.create** creates a FactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions. This includes both standard matrix factorization as well as factorization machines models (in the situation where side data is available for users and/or items).

Given below is an example to demonstrate its usage:

**A. BASIC USAGE**

The following code demonstrates how to create a model given basic user-item observation data (with ratings) and then make rating predictions. This doesn't include any side features.

In [123]:
s_frame = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], 'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"], 'rating': [1, 3, 2, 5, 4, 1, 4, 3]})

mod1 = tc.factorization_recommender.create(s_frame, target='rating')

In [124]:
print("Derived Results from Factorization Recommender Model: ")

mod1.predict(s_frame)

Derived Results from Factorization Recommender Model: 


dtype: float
Rows: 8
[1.0495356321334839, 2.9574617743492126, 2.0009122490882874, 4.896780729293823, 4.0393983125686646, 1.0060159862041473, 3.9454184472560883, 3.018059005960822]

In [125]:
mod1.recommend()

user_id,item_id,score,rank
0,d,2.5537068247795105,1
1,c,6.285333156585693,1
1,d,5.621438324451447,2
2,a,2.462188631296158,1


**B. INCLUDING SIDE FEATURES**

The following code demonstrates how to create a model given basic user-item observation data with ratings as well as side features.

In [76]:
user_info = tc.SFrame({'user_id': ["0", "1", "2"], 'name': ["Alice", "Bob", "Charlie"], 'numeric_feature': [0.1, 12, 22]})

item_info = tc.SFrame({'item_id': ["a", "b", "c", "d"], 'name': ["item1", "item2", "item3", "item4"], 'dict_feature': [{'a' : 23}, {'a' : 13}, {'b' : 1}, {'a' : 23, 'b' : 32}]})

mod2 = tc.factorization_recommender.create(sf, target='rating', user_data=user_info, item_data=item_info)

In [126]:
print("Derived Results from Factorization Recommender Model with the inclusion of Side Features: ")

mod2.recommend()

Derived Results from Factorization Recommender Model with the inclusion of Side Features: 


user_id,item_id,score,rank
0,d,2.012446575362882,1
1,d,3.954316022348414,1
1,c,3.940909242590052,2
2,a,5.884604244074063,1


**C. USING THE ALTERNATING LEAST SQUARES (ALS) SOLVER**

The factorization model can also be solved using alternating least squares (ALS) as a solver option. This solver does not support side columns or other similar features.

In [79]:
mod3 = tc.factorization_recommender.create(sf, target='rating',
                                                            solver = 'als')

In [80]:
print("Derived Results Using the ALS solver ")

mod3.recommend()

Derived Results Using the ALS solver 


user_id,item_id,score,rank
0,d,2.8185904175043106,1
1,c,3.579881429672241,1
1,d,2.8854209780693054,2
2,a,4.527128100395203,1


## 5. Factorization Recommender for Ranking

**turicreate.recommender.ranking_factorization_recommender.create** creates a RankingFactorizationRecommender that learns latent factors for each user and item and uses them to make rating predictions.

Given below are some examples to demonstrate its usage:

**A. BASIC USAGE**

When given just user and item pairs, one can create a RankingFactorizationRecommender as follows.

In [127]:
sf = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"],'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"]})

m = tc.recommender.ranking_factorization_recommender.create(sf)

In [128]:
print("Derived Results 1 Using Ranking Factorization Recommender: ")

m.recommend()

Derived Results 1 Using Ranking Factorization Recommender: 


user_id,item_id,score,rank
0,d,0.2796198230228643,1
1,c,0.3098294743621465,1
1,d,0.0487286778232128,2
2,a,0.2747066955493962,1


When a target column is present, one can include this to try and recommend items that are rated highly.

In [129]:
sf = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], 'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"], 'rating': [1, 3, 2, 5, 4, 1, 4, 3]})

m = tc.recommender.ranking_factorization_recommender.create(sf, target='rating')

In [130]:
print("Derived Results 2 Using Ranking Factorization Recommender: ")

m.recommend()

Derived Results 2 Using Ranking Factorization Recommender: 


user_id,item_id,score,rank
0,d,1.3748019933700562,1
1,c,3.899396225810051,1
1,d,3.1445026993751526,2
2,a,2.4954473823308945,1


**B. INCLUDING SIDE FEATURES**

For incorporating side features, the following code snippet could be useful:

In [131]:
user_info = tc.SFrame({'user_id': ["0", "1", "2"], 'name': ["Alice", "Bob", "Charlie"], 'numeric_feature': [0.1, 12, 22]})

item_info = tc.SFrame({'item_id': ["a", "b", "c", "d"], 'name': ["item1", "item2", "item3", "item4"],'dict_feature': [{'a' : 23}, {'a' : 13}, {'b' : 1}, {'a' : 23, 'b' : 32}]})

m2 = tc.recommender.ranking_factorization_recommender.create(sf, target='rating', user_data=user_info, item_data=item_info)

In [132]:
print("Derived Results Using Ranking Factorization Recommender with Inclusion of Side Features: ")

m2.recommend()

Derived Results Using Ranking Factorization Recommender with Inclusion of Side Features: 


user_id,item_id,score,rank
0,d,0.4093496439219517,1
1,c,0.5514636122470558,1
1,d,-0.067732938751055,2
2,a,0.0952094020395761,1


**C. CUSTOMIZING RANKING REGULARIZATION**

This creates a model that pushes predicted ratings of unobserved user-item pairs toward 1 or below.

The example below describes its usage in this scenario:

In [133]:
m3 = tc.recommender.ranking_factorization_recommender.create(sf, target='rating', ranking_regularization = 0.1, unobserved_rating_value = 1)

In [100]:
print("Derived Results: ")

m3.recommend()

Derived Results: 


user_id,item_id,score,rank
0,d,1.933268317952752,1
1,c,4.923643589019775,1
1,d,4.364913992583752,2
2,a,2.358806610107422,1


## 6. Popularity Based Recommender

**turicreate.recommender.popularity_recommender.create** creates a model that makes recommendations using item popularity. When no target column is provided, the popularity is determined by the number of observations involving each item. When a target is provided, popularity is computed using the item’s mean target value. When the target column contains ratings, for example, the model computes the mean rating for each item and uses this to rank items for recommendations.

Given below is an example to demonstrate its usage:

In [134]:
sf = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], 'item_id': ["a", "b", "c", "a", "b", "b", "c", "d"], 'rating': [1, 3, 2, 5, 4, 1, 4, 3]})

m = tc.popularity_recommender.create(sf, target='rating')

In [135]:
print("Derived Results Using Popularity Based Recommender: ")

m.recommend()

Derived Results Using Popularity Based Recommender: 


user_id,item_id,score,rank
0,d,3.0,1
1,c,3.0,1
1,d,3.0,2
2,a,3.0,1


## 7. Utilities and Comparision of Models (Evaluation)

**turicreate.recommender.util.compare_models** compares the prediction or recommendation performance of recommender models on a common test dataset.

Models that are trained to predict ratings are compared separately from models that are trained without target ratings. The ratings prediction models are compared on root-mean-squared error, and the rest are compared on precision-recall.

Given below is an example to demonstrate its usage:

If you have created two ItemSimilarityRecommenders m1 and m2 and have an SFrame test_data, then we may compare the performance of the two models on test data using:

In [136]:
train_data = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], 'item_id': ["a", "c", "e", "b", "f", "b", "c", "d"]})

test_data = tc.SFrame({'user_id': ["0", "0", "1", "1", "1", "2", "2"], 'item_id': ["b", "d", "a", "c", "e", "a", "e"]})

m1 = tc.item_similarity_recommender.create(train_data)

m2 = tc.item_similarity_recommender.create(train_data, only_top_k=1)

tc.recommender.util.compare_models(test_data, [m1, m2], model_names=["m1", "m2"])

PROGRESS: Evaluate model m1





Precision and recall summary statistics by cutoff




+--------+--------------------+--------------------+
| cutoff |   mean_precision   |    mean_recall     |
+--------+--------------------+--------------------+
|   1    | 0.6666666666666666 | 0.3333333333333333 |
|   2    | 0.8333333333333334 | 0.7777777777777778 |
|   3    | 0.6666666666666666 | 0.8888888888888888 |
|   4    | 0.6944444444444443 |        1.0         |
|   5    | 0.6944444444444443 |        1.0         |
|   6    | 0.6944444444444443 |        1.0         |
|   7    | 0.6944444444444443 |        1.0         |
|   8    | 0.6944444444444443 |        1.0         |
|   9    | 0.6944444444444443 |        1.0         |
|   10   | 0.6944444444444443 |        1.0         |
+--------+--------------------+--------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model m2

Precision and recall summary statistics by cutoff
+--------+--------------------+--------------------+
| cutoff |   mean_precision   |    mean_recall     |
+--------+--------------------+-------------------

[{'precision_recall_by_user': Columns:
  	user_id	str
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 54
  
  Data:
  +---------+--------+--------------------+--------+-------+
  | user_id | cutoff |     precision      | recall | count |
  +---------+--------+--------------------+--------+-------+
  |    0    |   1    |        1.0         |  0.5   |   2   |
  |    0    |   2    |        1.0         |  1.0   |   2   |
  |    0    |   3    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   4    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   5    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   6    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   7    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   8    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   9    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   10   | 0.6666666666666666 |  1.0   |   2   |
  +---------+--------+--------------------+--------+-------+
  [54

The evaluation metric is automatically set to ‘precision_recall’, and the evaluation will be based on recommendations that exclude items seen in the training data.

If we want to evaluate on the original training set:

In [106]:
tc.recommender.util.compare_models(train_data, [m1, m2], exclude_known_for_precision_recall=False)



PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+--------------------+---------------------+
| cutoff |   mean_precision   |     mean_recall     |
+--------+--------------------+---------------------+
|   1    |        1.0         | 0.38888888888888884 |
|   2    |        1.0         |  0.7777777777777777 |
|   3    | 0.8888888888888888 |         1.0         |
|   4    | 0.6666666666666666 |         1.0         |
|   5    | 0.5333333333333333 |         1.0         |
|   6    | 0.4444444444444444 |         1.0         |
|   7    | 0.4444444444444444 |         1.0         |
|   8    | 0.4444444444444444 |         1.0         |
|   9    | 0.4444444444444444 |         1.0         |
|   10   | 0.4444444444444444 |         1.0         |
+--------+--------------------+---------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+--------------------+---------------------+
| cutoff 

[{'precision_recall_by_user': Columns:
  	user_id	str
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 54
  
  Data:
  +---------+--------+-----------+--------------------+-------+
  | user_id | cutoff | precision |       recall       | count |
  +---------+--------+-----------+--------------------+-------+
  |    0    |   1    |    1.0    | 0.3333333333333333 |   3   |
  |    0    |   2    |    1.0    | 0.6666666666666666 |   3   |
  |    0    |   3    |    1.0    |        1.0         |   3   |
  |    0    |   4    |    0.75   |        1.0         |   3   |
  |    0    |   5    |    0.6    |        1.0         |   3   |
  |    0    |   6    |    0.5    |        1.0         |   3   |
  |    0    |   7    |    0.5    |        1.0         |   3   |
  |    0    |   8    |    0.5    |        1.0         |   3   |
  |    0    |   9    |    0.5    |        1.0         |   3   |
  |    0    |   10   |    0.5    |        1.0         |   3   |
  +---------+--------+-----

Suppose we have four models, two trained with a target rating column, and the other two trained without a target. By default, the models are put into two different groups with “rmse”, and “precision-recall” as the evaluation metric respectively.

In [107]:
train_data2 = tc.SFrame({'user_id': ["0", "0", "0", "1", "1", "2", "2", "2"], 'item_id': ["a", "c", "e", "b", "f", "b", "c", "d"], 'rating': [1, 3, 4, 5, 3, 4, 2, 5]})

test_data2 = tc.SFrame({'user_id': ["0", "0", "1", "1", "1", "2", "2"], 'item_id': ["b", "d", "a", "c", "e", "a", "e"], 'rating': [3, 5, 4, 4, 3, 5, 2]})

m3 = tc.factorization_recommender.create(train_data2, target='rating')

m4 = tc.factorization_recommender.create(train_data2, target='rating')

tc.recommender.util.compare_models(test_data2, [m3, m4])

PROGRESS: Evaluate model M0



Precision and recall summary statistics by cutoff
+--------+--------------------+--------------------+
| cutoff |   mean_precision   |    mean_recall     |
+--------+--------------------+--------------------+
|   1    | 0.6666666666666666 | 0.3333333333333333 |
|   2    | 0.6666666666666666 | 0.611111111111111  |
|   3    | 0.6666666666666666 | 0.8888888888888888 |
|   4    | 0.6944444444444443 |        1.0         |
|   5    | 0.6944444444444443 |        1.0         |
|   6    | 0.6944444444444443 |        1.0         |
|   7    | 0.6944444444444443 |        1.0         |
|   8    | 0.6944444444444443 |        1.0         |
|   9    | 0.6944444444444443 |        1.0         |
|   10   | 0.6944444444444443 |        1.0         |
+--------+--------------------+--------------------+
[10 rows x 3 columns]


Overall RMSE: 2.400589057111826

Per User RMSE (best)
+---------+-------------------+-------+
| user_id |        rmse       | count |
+---------+-------------------+-------+
|    0   

[{'precision_recall_by_user': Columns:
  	user_id	str
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 54
  
  Data:
  +---------+--------+--------------------+--------+-------+
  | user_id | cutoff |     precision      | recall | count |
  +---------+--------+--------------------+--------+-------+
  |    0    |   1    |        1.0         |  0.5   |   2   |
  |    0    |   2    |        1.0         |  1.0   |   2   |
  |    0    |   3    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   4    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   5    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   6    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   7    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   8    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   9    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   10   | 0.6666666666666666 |  1.0   |   2   |
  +---------+--------+--------------------+--------+-------+
  [54

To compare all four models using the same ‘precision_recall’ metric, we can do:

In [137]:
tc.recommender.util.compare_models(test_data2, [m1, m2, m3, m4], metric='precision_recall')

PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+--------------------+--------------------+
| cutoff |   mean_precision   |    mean_recall     |
+--------+--------------------+--------------------+
|   1    | 0.6666666666666666 | 0.3333333333333333 |
|   2    | 0.8333333333333334 | 0.7777777777777778 |
|   3    | 0.6666666666666666 | 0.8888888888888888 |
|   4    | 0.6944444444444443 |        1.0         |
|   5    | 0.6944444444444443 |        1.0         |
|   6    | 0.6944444444444443 |        1.0         |
|   7    | 0.6944444444444443 |        1.0         |
|   8    | 0.6944444444444443 |        1.0         |
|   9    | 0.6944444444444443 |        1.0         |
|   10   | 0.6944444444444443 |        1.0         |
+--------+--------------------+--------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+--------------------+--------------------+
| cutoff |   mean_precis

[{'precision_recall_by_user': Columns:
  	user_id	str
  	cutoff	int
  	precision	float
  	recall	float
  	count	int
  
  Rows: 54
  
  Data:
  +---------+--------+--------------------+--------+-------+
  | user_id | cutoff |     precision      | recall | count |
  +---------+--------+--------------------+--------+-------+
  |    0    |   1    |        1.0         |  0.5   |   2   |
  |    0    |   2    |        1.0         |  1.0   |   2   |
  |    0    |   3    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   4    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   5    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   6    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   7    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   8    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   9    | 0.6666666666666666 |  1.0   |   2   |
  |    0    |   10   | 0.6666666666666666 |  1.0   |   2   |
  +---------+--------+--------------------+--------+-------+
  [54

### ⭐️ NOTE:

    With Turi Create we can use to build a recommender system for three types of data - explicit, implicit and item content data. Given below is the description on working in these three types of scenarios and choosing the appropriate models under those circumstances.
    
### Scenario 1: Working With Explicit Data:
    If our data is explicit, i.e., the observations include an actual rating given by the user, then the model we wish to use depends on whether we want to predict the rating a user would give a particular item, or if we want the model to recommend items that it believes the user would rate highly.
    If we have ratings data and care about accurately predicting the rating a user would give a specific item, then it is recommended to use the factorization_recommender.
    If we care about ranking performance, instead of simply predicting the rating accurately, we can choose ItemSimilarityRecommender or RankingFactorizationRecommender.
    
### Scenario 2: Working With Implicit Data:
    The goal with a recommender system built with implicit data is to recommend items that are similar to the collection of items a user has interacted with. "Similar" in this case is determined by other user interactions -- if most users with similar behavior to a given user also interacted with a item the given user had not, that item would likely in the given user's recommendations.
    We can use ItemSimilarityRecommender which computes the similarity between each pair of items and recommends items to each user that are closest to items they have already used or liked.
    The ranking_factorization_recommender is also great for implicit data.
    
###  Scenario 3: Working With Item Content Data:
    We can use the ItemContentRecommender in this case. It builds a model similar to the item similarity model, but uses similarities between item content to actually build the model. In this model, the similarity score between two items is calculated by first computing the similarity between the item data for each column, then taking a weighted average of the per-column similarities to get the final similarity.

So far we have looked at all the relevant and essential functionalities which could help us in building powerful and customized recommendation systems catered to our needs.

***Lets now put this theory into practice by building a recommender system on the popular [Movie Lens 20m Data Set](https://grouplens.org/datasets/movielens/20m/).***

# -------------------------------------------------------------------------------------------------------------

# Putting Theory Into Practice: Building a Recommender System 

#### STEP 1:

    Download the zip file of the dataset from - https://grouplens.org/datasets/movielens/20m/ site. Unzip it in the directory of this notebook file. 

#### STEP 2:

    Visualize and read the data (into actions) from the given dataset - ratings.csv, movies.csv file to get better insight and to prepare it for putting into our model later.

In [8]:
import turicreate as tc

actions = tc.SFrame.read_csv('ml-20m/ratings.csv')
actions.head()

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,int,float,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


userId,movieId,rating,timestamp
1,2,3.5,1112486027
1,29,3.5,1112484676
1,32,3.5,1112484819
1,47,3.5,1112484727
1,50,3.5,1112484580
1,112,3.5,1094785740
1,151,4.0,1094785734
1,223,4.0,1112485573
1,253,4.0,1112484940
1,260,4.0,1112484826


In [9]:
actions.groupby('userId', [tc.aggregate.COUNT]).sort("Count", ascending = False)

userId,Count
118205,9254
8405,7515
82418,5646
121535,5520
125794,5491
74142,5447
34576,5356
131904,5330
83090,5169
59477,4988


In [10]:
items = tc.SFrame.read_csv('ml-20m/movies.csv')
items.head()

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Child ren|Comedy|Fantasy ...
2,Jumanji (1995),Adventure|Children|Fantas y ...
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995) ...,Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
8,Tom and Huck (1995),Adventure|Children
9,Sudden Death (1995),Action
10,GoldenEye (1995),Action|Adventure|Thriller


#### STEP 3: 

     Create a random split of the data to produce a test set and a validation set that can be used to evaluate the model.

In [11]:
data_train, data_validation = tc.recommender.util.random_split_by_user(actions, 'userId', 'movieId')

#### STEP 4:

    Now create a model for our data. I would be leveraging the create method given by turi create to automatically decide the appropriate modelling method using our dataset. Other recommenders can also be explicitly specified if required.

In [12]:
model = tc.recommender.create(training_data, 'userId', 'movieId')

Now we have the model, we can make recommendations.

#### STEP 5:

    First we look at making recommendations for all users. By default, calling m.recommend() without any arguments returns the top 10 recommendations for all users seen during model creation. It automatically excludes items that were seen during model creation. Hence all generated recommendations are for items that the user has not already seen.

In [13]:
results = model.recommend()

In [14]:
print(results)

+--------+---------+---------------------+------+
| userId | movieId |        score        | rank |
+--------+---------+---------------------+------+
|   1    |   1682  | 0.09008133377347674 |  1   |
|   1    |   2115  |  0.0870423402105059 |  2   |
|   1    |   3578  | 0.08415550231933594 |  3   |
|   1    |   1265  | 0.08409027917044504 |  4   |
|   1    |   1580  | 0.08351736920220511 |  5   |
|   1    |   2571  | 0.08207650354930333 |  6   |
|   1    |   1527  | 0.08185571432113647 |  7   |
|   1    |   1270  | 0.07933410712650844 |  8   |
|   1    |   3793  | 0.07469871214457921 |  9   |
|   1    |   5349  | 0.07203888688768659 |  10  |
+--------+---------+---------------------+------+
[1384930 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.


#### STEP 6:

    Now we will make recommendations for a given user specified with a user id.

In [26]:
given_user_id = 8

In [27]:
def print_favorite_movie(row):
  if row["rating"] > 4.0:
    print(items[row["movieId"]==items["movieId"]])

def print_movie(row):
  print(items[row["movieId"]==items["movieId"]])


test_user_actions = actions[actions["userId"]==given_user_id].sort("rating", ascending = False)

print(test_user_actions.head())

test_user_actions.apply(print_favorite_movie)

+--------+---------+--------+-----------+
| userId | movieId | rating | timestamp |
+--------+---------+--------+-----------+
|   8    |   527   |  5.0   | 833973298 |
|   8    |   349   |  5.0   | 833973175 |
|   8    |   356   |  5.0   | 833982259 |
|   8    |   296   |  5.0   | 833973081 |
|   8    |   288   |  5.0   | 833981870 |
|   8    |   434   |  5.0   | 833981804 |
|   8    |   266   |  5.0   | 833981963 |
|   8    |   457   |  5.0   | 833981896 |
|   8    |   553   |  5.0   | 833982013 |
|   8    |   589   |  5.0   | 833982668 |
+--------+---------+--------+-----------+
[10 rows x 4 columns]

+---------+-------------------------+-----------+
| movieId |          title          |   genres  |
+---------+-------------------------+-----------+
|   527   | Schindler's List (1993) | Drama|War |
+---------+-------------------------+-----------+
[? rows x 3 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use sf.materialize() to force m

dtype: float
Rows: 70
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

#### STEP 7:

    We can also save the model so that they can later be loaded if necessary to make more predictions. It gets saved in the same directory with the provided name.

In [29]:
model.save("My_Recommender.model")

In [31]:
## The following code can be used to load the model again.
## model = tc.load_model("Recommender.model")

## For export to use in Core ML
## model.export_coreml('Recommender.mlmodel')

Thats it. This finishes my example and we have created our recommendation system using turicreate.

# Conclusion

A recommender system allows you to provide personalized recommendations to users. With this toolkit, we can create a model based on past interaction data and use that model to make recommendations. It has a simple and easy to use API which lets users with even little machine learning background to leverage its power. I felt it really simplifies the custom machine learning models. In a nutshell it is:
1. Easy-to-use: Focus on tasks instead of algorithms
2. Visual: Built-in, streaming visualizations to explore your data
3. Flexible: Supports text, images, audio, video and sensor data
4. Fast and Scalable: Work with large datasets on a single machine
5. Ready To Deploy: Export models to Core ML for use in iOS, macOS, watchOS, and tvOS apps
6. Continuously Evolving and improving

I liked it and would encourage everyone, if required, to try using this powerful framework in your projects 😉

# References

1. Github Wiki of [turicreate github repo](https://github.com/apple/turicreate) for user guide and documentation.
2. Some Pointers from [Recommender System Wikipedia](https://en.wikipedia.org/wiki/Recommender_system).
3. [Analytics Vidhya Blog](https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-recommendation-engine-python/) - guide to make a recommendation system in python.