<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-family:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Matrix Factorization: Recommender Systems
              
</p>
</div>


<p>NYC Data Science Bootcamp Nov 2023</p>
<p>Phase 4</p>
<br>
<br>

<div align = "right">
<img src="Images/flatiron-school-logo.png" align = "right" width="200"/>
</div>
    
    

### Recommender systems in your day-to-day

#### Automated content curation and personalized suggestions

<div align = "center">

<img src="Images/goodreads_logo.jpg" align = "center" width="300"/>

</div>
Based on explicit star ratings of books:


<div align = "center">
<img src="Images/Goodreads_starred.png" align = "center" width="700"/>
</div>

Yielding recommendations:

<div align = "center">
<img src="Images/goodreads.png" align = "center" width="700"/>
</div>

### The Recommendation Problem (in general)

*Users*

- List of users, preferences and/or other info (in some form or another)


*Items/Entities*

- collection of items that users has or might interact with
- potentially info on these items

*User-Item Interactions*
- **Explicit**: rating items directly
    - like = 1/dislike = 0
    - 1-5 stars
 
<div align = "center">
<img src="Images/star_rating.webp" align = "center" width="450"/>
</div>

**Will focus on explicit rating schemes.**

<div align = "center">
<img src="Images/gladiator_up.jpeg" align = "center" width="600"/>
</div>

<center><i>Likes/dislikes can sometimes be a life or death matter</i></center>

*User-Item Interactions*
- **Implicit**: monitor user activity and *deduce* item relevance
    - did a user watch a movie to the end?
    - did a user click on a product page? 


Interaction: watched for more than a minute.
- Interaction = 1
- No interaction = 0

| User | Intro to Hollow Earth | Optimizing your garden tomato yield | Egypt: The Middle Kingdom | George Motz: The Burger Scholar| Rexamining Darwin|
| --- | --- | --- | --- | --- | --- |
| Praveen | 1 | 0 | 1 | 0 | 1 |

<center> Youtube Videos: Past Month Click History </center>


<center>Youtube keeps recommending:</center>
<div align = "center">
<img src="Images/aliens.jpg" align = "center" width="600"/>
</div>

<center><i>Dear lord...what have I been clicking on?</i></center>

Focus on recommendations using explicit ratings here:
- implicit rating schemes are usually modifications to explicit rating algorithms.

**Goal** 

Given user interaction with items:


<div align = "center">
<img src="Images/user_movies_explicit.png" align = "center" width="600"/>
</div>


**Predict user engagement or ratings on entity user has not yet interacted with.**

<div align = "center">
<img src="Images/user_movies_unknown.png" align = "center" width="600"/>
</div>

Filter this to get top $K$ results with highest predicted engagement/rating for user.

**Matrix representation**
- Ratings matrix $R$ 
- rows: users
- columns: items



<div align = "center">
<img src="Images/R_ij.png" align = "center" width="600"/>
</div>


- User and item understanding to impute.
- Users: 
    - Bob similar to Alice. 
    - Dissimilar to Carol.
- Items: 
    - Batman, Xmen, and Star Wars: Scifi
    - The Notebook, Bridget Jones: Romance



<div align = "center">
<img src="Images/R_ij.png" align = "center" width="600"/>
</div>

There are two principles we just implicitly used:

- **Collabarative filtering**:
    - Similar users should behave similarly across movies.
    - Similar movies should behave similarly across users.

- **Principle of redundancy (multicollinearity)**:
    - many user rows highly related to each other (i.e. users with similar preferences)
    - many item columns highly related to each other (i.e. movies with similar characteristics)

<div align = "center">
<img src="Images/R_ij.png" align = "center" width="600"/>
</div>

- **Principle of redundancy (multicollinearity)**:
    - e.g. Alice, Bob
    - e.g. X-men, Batman

Multicollinearity:

- think PCA and dimensionality reduction
- our data likely has far fewer controlling factors generating ratings across users and items.

In our simple case: **d = 2**
- Item space: weights of scifi or romantic comedy for a movie.
- User space: how much does a user like scifi or romantic comedy?



Then what a user $i$  rated a movie $j$ might look like this:

$$ r_{ij} = u_{i, romcom}v_{romcom, j} + u_{i,scifi}v_{scifi, j} $$

- $ v_j $ are 2D feature vector for movies.
- $u_i$ are 2D preference vector for users.


In other words:

$$ r_{ij} = u_{i, romcom}v_{romcom, j} + u_{i,scifi}v_{scifi, j} $$

$$ r_{ij} = u_i^Tv_j $$

Rating $r_{ij}$ as dot product between:
- user vector representing preferences for features.
- item vector representing weights of these features.

Doing this across all users $i$ and items $j$:
    
- We build up two matrices 

<div align = "center">
<img src="Images/matrixfactor.png" align = "center" width="500"/>
</div>


Assuming a lower feature dimension $d$, sufficient to capture the user-item rating space:

- factorization of ratings matrix: $R = UV$.
- $U$: rows are vectors for preferences for the $d$ features for each user.
- $V$: columns are vectors for weights of the $d$ features for each item.


<div align = "center">
<img src="Images/matrixfactor.png" align = "center" width="500"/>
</div>

**The task at hand**

In general, $R$ is sparse.
- most entries are NaNs 
- each user only interacts with a small subset of items

Our goal is to impute these NaNs!

Using matrix factorization:
- assuming $d$-dimensional feature space
- use observed values to learn $U$ and $V$

<div align = "center">
<img src="Images/matrixfactor.png" align = "center" width="500"/>
</div>

$$ u_i^Tv_j \approx  r^{obs}_{ij}$$ for observed $r_{ij}$ in $R$.


In [1]:
import pandas as pd
import numpy as np

user_item_r = pd.DataFrame([[3, 2, np.nan, np.nan], [np.nan, 2, 0, 1.5], [4, np.nan, 3, np.nan], [5, 3, np.nan, 2]])

user = pd.DataFrame(
    [[1.2, 0.8],[1.4, 0.9], 
     [1.5, 1.0], [2, 1.3]],
    index = ['A', 'B', 'C', 'D'])

items = pd.DataFrame([[1.5, 1.2, 1.0, 0.8], 
         [1.7, 0.6, 1.1, 0.4]],
                     columns = ['W', 'X', 'Y', 'Z'])

In learning $U$ and $V$:, something nice happens.
- Concrete example


In [2]:
user_item_r

Unnamed: 0,0,1,2,3
0,3.0,2.0,,
1,,2.0,0.0,1.5
2,4.0,,3.0,
3,5.0,3.0,,2.0


Matrix factorization might learn $U$, $V$ to approximate observed values:

In [3]:
user

Unnamed: 0,0,1
A,1.2,0.8
B,1.4,0.9
C,1.5,1.0
D,2.0,1.3


In [4]:
items

Unnamed: 0,W,X,Y,Z
0,1.5,1.2,1.0,0.8
1,1.7,0.6,1.1,0.4


Comparing factorization and observed values:

In [5]:
user@items

Unnamed: 0,W,X,Y,Z
A,3.16,1.92,2.08,1.28
B,3.63,2.22,2.39,1.48
C,3.95,2.4,2.6,1.6
D,5.21,3.18,3.43,2.12


In [6]:
user_item_r

Unnamed: 0,0,1,2,3
0,3.0,2.0,,
1,,2.0,0.0,1.5
2,4.0,,3.0,
3,5.0,3.0,,2.0


What do you notice?

A good matrix factorization learns:
- $d$-dimensional representation of users
- $d$-dimensional repesentation of items

from observed data

These representations can be used to *guess* what a particular user might rate a given movie:
- even when the user has not watched/rated the movie

**Our ultimate goal**

### Matrix Factorization fitting and optimization:

Minimize squared error loss with regularization:

$$ \sum_{(i,j) \in \Omega^{obs}} |u_i^Tv_j  - r_{ij}|^2 + \lambda |u_i|^2 + \lambda |v_j|^2 $$

Get $u_i^Tv_j$ close to observed $r_{ij}$

<div align = "center">
<img src="Images/matrixfactor.png" align = "center" width="500"/>
</div>

We don't know $U$ or $V$.

### The algorithms will help us

Step 1: Randomly initialize $U$, $V$:
- $U = U_0$
- $V = V_0$

Algorithm implementations:
- Alternating Least Squares
- Simon Funk's "SVD"
- Katon's SVD++

The last two use a variant of **gradient descent**:

-  pick $u_i$, $v_j$ from $U$, $V$ correspond to observed rating $r_{ij}$


- Take gradients of $$ L = \sum_{(i,j) \in \Omega^{obs}} |u_i^Tv_j  - r_{ij}|^2 + \lambda |u_i|^2 + \lambda |v_j|^2 $$ 


with respect to specific row $u_i$ of $U$ as well as specific column $v_j$ of $V$. 


A little gradient action:
$$ \nabla_{u_i}L =  (v_jv_j^T - \lambda I) u_i - r_{ij}v_j $$
$$ \nabla_{v_j}L =  (u_iu_i^T - \lambda I) v_j - r_{ij}u_i $$

A little update acion:
$$ u_i \rightarrow u_i - \alpha \nabla_{u_i}L $$
$$ v_j \rightarrow v_j - \alpha \nabla_{v_j}L $$

**Then:** choose new $u_i$, $v_j$ corresponding to another $r^{obs}_{ij}$:
- take gradients
- update $u_i$, $v_j$

- cycle through observed values until convergence

**What is the result of all this?**

As we converge:

- User ($u_i$), item ($v_j$) representations in lower $d$-dimensional **embedding** space:
    - $d$ features: user preferences for corresponding item characteristics 
    - these feature representations found from the observed rating data
    - don't know what these are apriori (may not be interpretable)



- Estimate what users will rate items they haven't interacted with (yet!) 

### Do this using a high-level library

- Surprise: a scikit-learn type package
- different types of content and collabarative filtering techniques (including matrix factorization)
- Special data loaders (due to data sparsity)

**Let's load in a real dataset and use package to see how its typically done.**

<div align = "center">
<img src="Images/surprise_logo.jpg" align = "center" width="500"/>
</div>

In [7]:
# import necessary libraries and submodules
from surprise import Dataset, Reader
from surprise import SVD # implementation of Funk's SVD (gradient descent-based matrix factorization)
from surprise import accuracy # metric
from surprise.model_selection import train_test_split, GridSearchCV #train/test splits, crossval

#### The data

- Build a movie recommendation system
- Collaborative filtering using ratings (and user/movie ids) only


<div align = "center">
<img src="Images/IMDB.webp" align = "center" width="700"/>
</div>
<center>IMDB: Some highly rated movies. </center>

In [8]:
cols =  ['user', 'item', 'rating', 'timestamp']
ratings_view = pd.read_csv('data/ratings.dat',
                delimiter = '::',
                engine = 'python',
                nrows = 5, 
                names = cols) 
# pandas' read_csv function
ratings_view.head()

Unnamed: 0,user,item,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


We also have a lookup dictionary of names for each movie:

In [9]:
cols_movie = ['item_id', 'movie_name', 'genre']
movie_metadata = pd.read_csv('data/movies.dat', 
                             delimiter = '::',
                             engine = 'python',
                            encoding = 'latin-1',
                            names = cols_movie) 
movie_metadata = movie_metadata.set_index('item_id')

In [10]:
movie_metadata.head()

Unnamed: 0_level_0,movie_name,genre
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Toy Story (1995),Animation|Children's|Comedy
2,Jumanji (1995),Adventure|Children's|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama
5,Father of the Bride Part II (1995),Comedy


#### Getting the data ready for Surprise

Surprise has a Reader object:
- parser for data directly from text file
- fed into Dataset loader

We will now instatiate a Reader object

In [11]:
reader = Reader(line_format='user item rating timestamp', sep='::')
reader # will be used to parse the ratings textfile

<surprise.reader.Reader at 0x1402b6ca888>

Dataset generator:
- creates a dataset object
- optimized for train/test based off of ratings

In [12]:
file_path = 'data/ratings.dat'
data_gen = Dataset.load_from_file(file_path, reader=reader)
data_gen

<surprise.dataset.DatasetAutoFolds at 0x1402b6e1708>

We also should do a train/test split of the data:

- 80% of observed rating data used for training 
- 20% of observed rating data used for evaluation 


In [13]:
trainset, testset = train_test_split(data_gen, test_size=.2, random_state = 42)

#### Modeling

- Use Surprise's implementation of Funk's SVD
- d = 100 (size of user/item vector representation)
- .fit() method

In [15]:
# instantiate SVD and fit the trainset
svd = SVD(reg_all = .05, lr_all = 0.0025, n_factors = 100, n_epochs = 30)
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x14047979448>

#### Predicting on testset
- Test set is data for which we have the true rating values
- can then compare estimated (est) with true values ($r_{ui}$) and estimate root mean square error

In [16]:
predictions = svd.test(testset)

In [17]:
print(accuracy.mae(predictions))

MAE:  0.6972
0.6971890207108381


In [18]:
predictions[0:6]

[Prediction(uid='1841', iid='3717', r_ui=1.0, est=2.098318559172525, details={'was_impossible': False}),
 Prediction(uid='3715', iid='880', r_ui=3.0, est=2.586345979332663, details={'was_impossible': False}),
 Prediction(uid='2002', iid='3072', r_ui=4.0, est=3.997121537350777, details={'was_impossible': False}),
 Prediction(uid='3332', iid='2734', r_ui=3.0, est=3.297773008698587, details={'was_impossible': False}),
 Prediction(uid='3576', iid='631', r_ui=3.0, est=2.297093812912193, details={'was_impossible': False}),
 Prediction(uid='2092', iid='3247', r_ui=3.0, est=2.7188282291781194, details={'was_impossible': False})]

Not too shabby. 

A better metric than MAE or RMSE:
- Fraction of concordant pairs (FCP)
- care about capturing item rankings for each user.

For user $u$:
$$ n^u_c = \Big|\{(l,m) | (\hat{r}_{ul} > \hat{r}_{um}) \& ( r_{ul} > r_{um}) \}\Big|$$

Number of concordant pair for user $u$: number of item pairs ($l$,$m$) where algorithm predicted $u$ preferences of $l$ with respect to $m$ correctly.


Total number of concordant pairs: $$n_c = \sum_u n^u_c$$

$$ FCP = \frac{n_c}{n_c+n_d} $$

where $n_d$ is number of discordant pairs.

Calculating for our test set:

In [19]:
accuracy.fcp(predictions)

FCP:  0.7408


0.7407552260421906

We are getting pairwise item ranking preferences for each user right ~ 75% of the time.

- Compare top predicted results in testset from what a user has rated
- Versus top ranked predictions on what user has not rated.

In [85]:
from collections import defaultdict

# given prediction for a set of users, get the top n ranked for each user 

def get_top_n(predictions, n=10):

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est, true_r))

    # Then sort the predictions for each user and retrieve the n highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [284]:
top_n_preds_test = get_top_n(predictions, 10)

In [288]:
top_n_preds_test['10']

[('318', 4.948907894042801, 4.0),
 ('2762', 4.822975005160314, 5.0),
 ('2501', 4.753599685250887, 5.0),
 ('1234', 4.705753937023928, 4.0),
 ('356', 4.699462316208538, 5.0),
 ('953', 4.692757825062432, 5.0),
 ('1307', 4.63471336958094, 5.0),
 ('914', 4.579969798690182, 5.0),
 ('3114', 4.5739535655048735, 4.0),
 ('1', 4.555294787813869, 5.0)]

Convert this to a list of movie names in user_id 10's top 15:

In [291]:
movie_id_list = np.array(
    list(zip(*top_n_preds_test['10']))[0], dtype = 'int')
movie_id_list

array([ 318, 2762, 2501, 1234,  356,  953, 1307,  914, 3114,    1])

In [292]:
movie_metadata.loc[movie_id_list]['movie_name']

item_id
318     Shawshank Redemption, The (1994)
2762             Sixth Sense, The (1999)
2501                  October Sky (1999)
1234                   Sting, The (1973)
356                  Forrest Gump (1994)
953         It's a Wonderful Life (1946)
1307      When Harry Met Sally... (1989)
914                  My Fair Lady (1964)
3114                  Toy Story 2 (1999)
1                       Toy Story (1995)
Name: movie_name, dtype: object

Could evaluate estimated rating of user for a single item:

In [293]:
movie_metadata.loc[1022, 'movie_name']

'Cinderella (1950)'

In [295]:
user = '10'
item = '1022'
svd.predict(user, item)

Prediction(uid='10', iid='1022', r_ui=None, est=4.241959718166177, details={'was_impossible': False})

#### Getting ranked ratings on items not rated by user in trainset:
- need to construct **antitest** set from training set
- may contain elements in test set as well as elements not in test set


In [159]:
# (user, item, mean of ratings in dataset)
anti_test = trainset.build_anti_testset() # creates user-item interactions with no observed ratings

In [166]:
cnames = ['uid', 'iid', 'fill_val']
anti_test_df = pd.DataFrame(anti_test, columns = cnames)
anti_test_df.head()

Unnamed: 0,uid,iid,fill_val
0,5412,904,3.58149
1,5412,3717,3.58149
2,5412,3697,3.58149
3,5412,2011,3.58149
4,5412,173,3.58149


Get unrated elements for user id 10:

In [296]:
user_unrated \
=  anti_test_df.loc[anti_test_df['uid'] == '10',
                    ['uid', 'iid']]
user_unrated.head()

Unnamed: 0,uid,iid
433678,10,2683
433679,10,3717
433680,10,3697
433681,10,173
433682,10,1829


In [297]:
user_unrated.shape

(3351, 2)

In [298]:
user_unrated['est'] = user_unrated.apply(lambda x: svd.predict(str(x[0]), str(x[1])).est, axis = 1)

In [299]:
user_unrated = user_unrated.sort_values(by = 'est', ascending = False)

Some are in our test set but others are not:

In [301]:
movie_metadata.loc[user_unrated['iid'].astype('int')][0:15] 

Unnamed: 0_level_0,movie_name,genre
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1
2905,Sanjuro (1962),Action|Adventure
318,"Shawshank Redemption, The (1994)",Drama
2762,"Sixth Sense, The (1999)",Thriller
3147,"Green Mile, The (1999)",Drama|Thriller
3801,Anatomy of a Murder (1959),Drama|Mystery
668,Pather Panchali (1955),Drama
3030,Yojimbo (1961),Comedy|Drama|Western
3022,"General, The (1927)",Comedy
2501,October Sky (1999),Drama
2357,Central Station (Central do Brasil) (1998),Drama


Can also get user's vector representation (could do the same for features as well):
- discovered a latent space embedding of user encoding preferences

In [145]:
print(svd.pu[1000])

[ 0.0191942   0.01403982  0.04360765  0.07674086  0.16940854 -0.01396835
 -0.01398386  0.02060609  0.039887    0.02694982  0.05970694 -0.02724103
 -0.01435352 -0.01920879  0.05474796  0.03333212 -0.02467037 -0.03979847
 -0.02005583  0.00521604  0.24959288  0.02327175 -0.08398503 -0.10462099
 -0.0747534   0.11651636  0.2889927   0.13040034 -0.00909112  0.26219057
 -0.07512899 -0.10116927 -0.04853132  0.0571843  -0.00094407  0.10978824
 -0.12926303  0.01972353  0.08269089  0.00478198  0.03098744 -0.07777348
 -0.09603433 -0.16697493 -0.14623217  0.10459132  0.00865979 -0.03684253
  0.07240603  0.03385711  0.07082168  0.04500256  0.14466287  0.05183009
 -0.13263368 -0.0236935  -0.00240698 -0.06513064  0.15596166 -0.04968411
  0.03721671 -0.10648383  0.13691733 -0.19507691  0.09175964  0.18586295
  0.1554053   0.07776647  0.07721736 -0.06791734 -0.14788199 -0.13860837
 -0.09065557 -0.0586371  -0.01849488  0.06229392  0.04138993 -0.10290081
 -0.12440059  0.11790614  0.10688126  0.02957123  0

In [146]:
print(len(svd.pu[5]))

100


To get better results:
- really need to tune and cross-validate
- there are bunch of hyperparameters for SVD
    - embedding dimension $d$
    - regularization strength $\lambda$
    - learning rates, etc



In [309]:
param_grid = { "reg_all": [0.005, 0.025, 0.05, 0.25, 0.5], "n_factors": [50, 100, 250, 500]}
gs = GridSearchCV(SVD, param_grid, measures=["mae", "fcp"], cv=3) # wraps svd class

In [310]:
gs.fit(data_gen) # yes, you must use the full dataset created by the DataSet Loader
# yes, that means you don't have a holdout here

To get results of cross validation:
- GridSearchCV .cv_results attribute.

In [325]:
pd.DataFrame(gs.cv_results)

Unnamed: 0,split0_test_mae,split1_test_mae,split2_test_mae,mean_test_mae,std_test_mae,rank_test_mae,split0_test_fcp,split1_test_fcp,split2_test_fcp,mean_test_fcp,std_test_fcp,rank_test_fcp,mean_fit_time,std_fit_time,mean_test_time,std_test_time,params,param_reg_all,param_n_factors
0,0.705788,0.706633,0.705739,0.706053,0.000411,9,0.732369,0.731771,0.733216,0.732452,0.000593,12,43.083112,0.257159,3.997527,0.014276,"{'reg_all': 0.005, 'n_factors': 50}",0.005,50
1,0.717137,0.717743,0.717973,0.717618,0.000353,10,0.723274,0.722865,0.723027,0.723055,0.000168,3,66.799678,0.405062,28.832525,35.034613,"{'reg_all': 0.005, 'n_factors': 100}",0.005,100
2,0.73443,0.735049,0.735715,0.735065,0.000524,15,0.711642,0.710078,0.710635,0.710785,0.000648,2,90.373631,0.752638,2.853297,0.128621,"{'reg_all': 0.005, 'n_factors': 250}",0.005,250
3,0.746046,0.747614,0.746903,0.746855,0.000641,16,0.703479,0.702247,0.701977,0.702568,0.000654,1,512.110797,391.578967,3.041929,0.011113,"{'reg_all': 0.005, 'n_factors': 500}",0.005,500
4,0.692021,0.693136,0.694096,0.693084,0.000848,3,0.747899,0.747501,0.746231,0.74721,0.000711,16,28.206495,0.557262,2.754596,0.136845,"{'reg_all': 0.025, 'n_factors': 50}",0.025,50
5,0.692417,0.693147,0.693643,0.693069,0.000504,2,0.7477,0.747353,0.747566,0.74754,0.000143,17,43.478236,0.323973,2.729806,0.183871,"{'reg_all': 0.025, 'n_factors': 100}",0.025,100
6,0.696622,0.697942,0.697194,0.697253,0.00054,6,0.745706,0.744646,0.744202,0.744851,0.000631,14,88.636994,0.446301,2.837224,0.149433,"{'reg_all': 0.025, 'n_factors': 250}",0.025,250
7,0.702281,0.702889,0.702254,0.702474,0.000293,8,0.741561,0.742153,0.741625,0.74178,0.000265,13,232.397039,10.327545,16.862615,19.609104,"{'reg_all': 0.025, 'n_factors': 500}",0.025,500
8,0.69732,0.698776,0.697883,0.697993,0.0006,7,0.748111,0.746777,0.746477,0.747122,0.00071,15,28.227761,0.20221,2.818417,0.024541,"{'reg_all': 0.05, 'n_factors': 50}",0.05,50
9,0.694711,0.695815,0.695802,0.695443,0.000517,5,0.751157,0.749445,0.74924,0.749947,0.00086,18,43.034383,0.421487,2.593631,0.111949,"{'reg_all': 0.05, 'n_factors': 100}",0.05,100


In [317]:
best_fcp_est = gs.best_estimator['fcp']
print(gs.best_params['fcp'])
gs.best_score['fcp']

{'reg_all': 0.05, 'n_factors': 500}


0.7533786480857493

Now we make a trainset out of the entire dataset:
- build_full_trainset()
- method of Surprise Dataset object

In [321]:
full_train = data_gen.build_full_trainset()

Then fit best estimator on full train set:

In [322]:
best_fcp_est.fit(full_train)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1404ce81588>

And now we use this model for serving:
- .predict()
- etc.

Collaborative filtering via matrix factorization:
- finds embedding/features from unsupervised decomposition technique
- does well when you have a LOT of data (i.e. big data level with distributed computing)
- performs poorly when you have too few observations (**the cold start problem**)


- Best performing recommenders are hybrid. Uses:
- useful features of users and movies
- embeddings found via ratings from collaborative filtering