## Collaborative Filtering

Collaborative filtering is a method of making automatic predictions (i.e. filtering) about the interests of a user by collecting preferences or taste information from many users on the aggregate (i.e. collaborating). There are two main approaches to Collaborative Filter that we will learn about. The basic idea behind collaborative filtering model is:

- Predict a numerical value expressing the predicted score of an item for a user. The predicted value should be  within the same scale that is used by all users for rating (i.e. number of stars or rating between 0-5)

- Recommend a list of Top-N items that the active user will like the most based on the highest predicted ratings for the items that they have not yet seen

This can be done with two different methods:

* Memory-Based also known as Neighborhood-Based
* Model-Based approaches

## Memory-Based / Neighborhood-Based Collaborative Filtering

Remember that the key idea behind Collaborative Filtering is that similar users share similar interests and that users tend to like items that are similar to one another. With neighborhood-based collaborative filtering methods, you're attempting to quantifying just how similar users and items are to one another and getting the top N recommendations based on that similarity metric.

Let's look at the explicit utility matrix we saw in a previous lesson. Below the first utility matrix, we'll also have at a version of the matrix with _implicit_ data, which assumes that we do not have a rating for each movie, we only know whether or not someone has watched the movie. 

*Explicit Ratings*:

|        | Toy Story | Cinderella | Little Mermaid | Lion King |
|--------|-----------|------------|----------------|-----------|
| Matt   |           | 2          |                | 5         |
| Lore   | 2         |            | 4              |           |
| Mike   |           | 5          | 3              | 2         |
| Forest | 5         |            | 1              |           |
| Taylor | 1         | 5          |                | 2         |

*Implicit Ratings*

|        | Toy Story | Cinderella | Little Mermaid | Lion King |
|--------|-----------|------------|----------------|-----------|
| Matt   |           | 1          |                | 1         |
| Lore   | 1         |            | 1              |           |
| Mike   |           | 1          | 1              | 1         |
| Forest | 1         |            | 1              |           |
| Taylor | 1         | 1          |                | 1         |

When dealing with utility matrices, there are two different ways to go about determining similarity within the utility matrix. 

* Item-based: measure the similarity between the items that target users rates/ interacts with and other items
* User-based: measure the similarity between target users and other users

Before we dive into the differences between these two methods, let's look at what these similarity metrics are, and how they are related to the final score prediction.

### Similarity Metrics:

**Pearson Correlation**: Is a commonly used method for computing similarity. It ranges from [-1, 1] and it represents the linear correlation between two vectors. A correlation value of 0 represents no relationship, -1 represents high negative correlation and +1 represents high positive correlation. This similarity metric only takes into account those items that are rated by both individuals. The pearson correlation is great because it takes into account

### $$ \text{pearson correlation}(u,v) = \frac{\sum_{i \in I_{uv}}{(r_{ui}- \mu_{u})*(r_{vi}- \mu_{v})}}{\sqrt{\sum_{i \in I_{uv} }{(r_{ui}-\mu_{u})^{2}  }}  * \sqrt{\sum_{i \in I_{uv} }{(r_{vi}-\mu_{v})^{2}  }}} $$


**Cosine Similarity**: Determines how vectors are related to each other by measuring the cosine angle between two vectors. The value also ranges from [-1,1], with -1 meaning that the two vectors are diametrically opposed, 0 meaning the two vectors are perpendicular to one another, and 1 meaning that the vectors are the same. Here is the formula in the context of user similarity:

### $$ \text{cosine similarity}(u,v) = \frac{\sum_{i \in I_{uv}}{r_{ui}*r_{vi}}}{\sqrt{\sum_{i \in I_{uv} }{r_{ui}^{2}  }}  * \sqrt{\sum_{i \in I_{uv} }{r_{ui}^{2}  }}} $$

where u is a user and v is another user being compared to u. i represents each item being rated. I is the entire item set.

**Jaccard Similarity**: Uses the number of preferences in common between two users into account. Importantly, it does not take the actual values of the ratings into account, only whether or not users have rated the same items. In other words, all explicit ratings are effectively turned into values of 1 when using the Jaccard Similarity metric.


### $$ \text{Jaccard Similarity}(u,v) = \frac{I_{u} \cup I_{v}}{I_{u} \cap I_{v}}$$

### Calculating a Predicted Rating

Once these similarities have been calculated, the ratings are calculated essentially as a weighted average of the k most similar neighbors. For example, if trying to   is that the values of the individual ratings can be calculated as 
$$ r_{ij} = \frac{\sum_{k}{Similarities(u_i,u_k)r_{kj}}}{\text{number of ratings}} $$



#### Item-item filtering  
When someone looks at the similarity of one vector of an items ratings from every user and compares it to every other item. Now, the most similar items can be recommended to those that a customer has liked. This is similar to content-based recommendation, except we are not looking at any actual characteristics of items. We are merely looking at who has liked an item and compared it to who has liked other items. Let's look at this in a table with the similarity metric as Jaccard Index. To start off with, let's compare Toy Story and Cinderella. The union of everyone that has liked both movies is 5 and the intersection of the two movies is 1 (we can see that Taylor liked both Toy Story and Cinderella. The rest of the similarities have been filled in.



    |                | Toy Story | Cinderella | Little Mermaid | Lion King |
    |----------------|-----------|------------|----------------|-----------|
    | Toy Story      |           | 1 /  5     |    2 / 4       |   1/5     |
    | Cinderella     | 1/5       |            |   1/5          |    1      |
    | Little Mermaid | 2/4       |   1/5      |                |  1/5      |
    | Lion King      | 1/5       |     1      |  1/5           |           |

#### User-User filtering.
The other method of collaborative filtering is to see similar customers are to one another. Once we've determined how similar customers are to one another, we can recommend items to them that are liked by the other customers that are most similar to them. Similar to above, here is a similarity table for each of the users, made by taking their jaccard similarity to one another. The process of calculating the Jaccard index is the same when comparing the users except now we are comparing how each user voted compared to one another.



    |        | Matt | Lore | Mike | Forest | Taylor |
    |--------|------|------|------|--------|--------|
    | Matt   |      |  0   |  2/3 |  0     |   2/3  |
    | Lore   |  0   |      | 2/4  |  2/2   |   2/4  |
    | Mike   |  2/3 | 1/4  |      |  1/4   |   2/4  |
    | Forest |  0   | 2/2  | 1/4  |        |   1/4  |
    | Taylor |  2/3 | 2/4  | 2/4  |  1/4   |        |



## Model Based Collaborative Filtering

Matrix Factorization models are based on the concept of the __Latent Variable Model__. 

### Latent Variable Model 

Latent variable models try to explain complex relationships between several variables by way of simple relationships between variables and underlying "latent" variables. If this sounds extremely similar to the ideas we established in Dimensionality Reduction and PCA, it's because it is very similar...it's just that the exact implementation is a bit different.


With latent variable models, we have some number of observable variables (the features from our dataset) and a collection of unobservable latent variables. These latent variables should capable of explaining the relationships of the  to one another such that the observable variables are conditionally independent given the latent variables. 


The Matrix Factorization approach is found to be most accurate approach to reduce the problem from high levels of  sparsity in RS database as all users do not buy all products and services and our utility matrix remains highly sparse. If people had already rated every item, it would be unnecessary to recommend them anything! In the model-based recommendations,  techniques like __Latent Semantic Index (LSI)__,  and the dimensionality reduction method __Singular Value Decomposition (SVD)__ are typically combined to get rid of sparcity. Below is an example of sparse matrix , which can lead to the problems highlighted earlier in the PCA section. 

Let's look at how a recommendation problem can be translated into matrix decomposition context. The idea behind such models is that preferences of a users can be determined by a small number of hidden factors. We can call these factors as Embeddings.

### Embeddings:
Embeddings are __low dimensional hidden factors__ for items and users. 

For e.g. say we have 5 dimensional (i.e. D or n_factors = 5 in above figure) embeddings for both items and users (5 chosen randomly, this could be any number - as we saw with PCA and dim. reduction). 

For user-X & movie-A, we can say the those 5 numbers might represent 5 different characteristics about the movie e.g.:

- How much movie-A is political
- How recent is the movie 
- How much special effects are in movie A 
- How dialogue driven is the movie 
- How linear is the narrative in the movie

In a similar way, 5 numbers in user embedding matrix might represent:
- How much does user-X like sci-fi movie 
- How much does user-X like recent movies … and so on. You get the idea.

In the above figure, a higher number from dot product of user-X and movie-A matrix means that movie-A is a good recommendation for user-X.

Now let's look at one of the ways, one can factor the matrix. One of these ways we can perform matrix factorization is called Singular Value Decomposition.

### Basic Reccomendation System Using Collaborative Filtering
    # Load the Dataset¶
    import pandas as pd
    df = pd.read_csv('books_data.edgelist', names=['source', 'target', 'weight'], delimiter=' ')
    df.head()
    #import networkx as nx
    G = nx.Graph()
    #Load the MetaData
    #Next, load the metadata associated with each of the books being reviewed. The metadata is stored in the file 'books_meta.txt'.
    meta = pd.read_csv('books_meta.txt', sep='\t')
    meta.head()
    #Select a small subset of books that you are interested in generating recommendations for.
    GOT = meta[meta.Title.str.contains('Thrones')]
    GOT
    #Generate Recommendations for a Few Books of Choice
    #employ collaborative filtering to generate recommendations!
    rec_dict = {}
    id_name_dict = dict(zip(meta.ASIN, meta.Title))
    for row in GOT.index:
        book_id = GOT.ASIN[row]
        book_name = id_name_dict[book_id]
        most_similar = df[(df.source==book_id)
                          | (df.target==book_id)
                         ].sort_values(by='weight', ascending=False).head(10)
        most_similar['source_name'] = most_similar['source'].map(id_name_dict)
        most_similar['target_name'] = most_similar['target'].map(id_name_dict)
        recommendations = []
        for row in most_similar.index:
            if most_similar.source[row] == book_id:
                recommendations.append((most_similar.target_name[row], most_similar.weight[row]))
            else:
                recommendations.append((most_similar.source_name[row], most_similar.weight[row]))
        rec_dict[book_name] = recommendations
        print("Recommendations for:", book_name)
        for r in recommendations:
            print(r)
        print('\n\n')

### Singular Value Decomposition (SVD) and Recommendations

With SVD, we turn the recommendation problem into an __Optimization__ problem that deals with how good we are in predicting the rating for items given a user. One common metric to achieve such optimization is __Root Mean Square Error (RMSE)__. A lower RMSE is indicative of improved performance performance and vice versa. RMSE is minimized on the known entries in the utility matrix. SVD has a great property that it has the minimal reconstruction Sum of Square Error (SSE); therefore, it is also commonly used in dimensionality reduction. Below is the formula to achieve this:

$$min_{UV\Sigma}\sum_{i,j∈A}(A_{ij} - [UV\Sigma^T]_{ij})^2$$


RMSE and SSE are monotonically related. This means that the lower the SSE, the lower the RMSE. With the convenient property of SVD that it minimizes SSE, we know that it also minimizes RMSE. Thus, SVD is a great tool for this optimization problem. To predict the unseen item for a user, we simply multiply U, V, and $\Sigma^{T}$.

### SVD in Python¶
Scipy has a straightforward implementation of SVD to help us avoid all the complex steps of SVD. We can use svds() function to decompose a matrix as shown below. We ill use csc_matrix() to create a sparse matrix object.

    from scipy.sparse import csc_matrix
    from scipy.sparse.linalg import svds

    # Create a sparse matrix 
    A = csc_matrix([[1, 0, 0], [5, 0, 2], [0, 1, 0], [0, 0, 3],[4,0,9]], dtype=float)

    # Apply SVD
    u, s, vt = svds(A, k=2) # k is the number of stretching factors

    print ('A:\n', A.toarray())
    print ('=')
    print ('\nU:\n', u)
    print ('\nΣ:\n', s)
    print ('\nV.T:\n', vt)

## Surprise
Surprise is a Python library that creates recommendation engines.**Surprise can make predictions on ratings, but does not recommend items to users. See below for recommendation code.**
https://surprise.readthedocs.io/en/stable/index.html
import surprise

#### Converting df to Surprise ready format

    from surprise import Reader, Dataset
    reader = Reader()
    data = Dataset.load_from_df(new_df,reader)

#### Train-Test

    #split into train and test set.
    trainset, testset = train_test_split(jokes,test_size=0.2)
    
Notice how there is no X_train or y_train in our values here. Our only features here are the ratings of other users and items, so we need to keep everything together. What is happening in the train test split here is that surprise is randomly selecting certain $r_{ij}$ for users $u_{i}$ and items $i_{j} $ at the rate of 80% of the ratings in the train set and 20% in the test set. Let's investigate `trainset` and `testset` further.

    print('Type trainset :',type(trainset),'\n') #
    print('Type testset :',type(testset)) #identified as a python list
    
### Memory-Based Methods (Neighborhood-Based)
    #import packages
    from surprise.prediction_algorithms import knns
    from surprise.similarities import cosine, msd, pearson
    from surprise import accuracy
    
    #retrieve number of users and items to see which would be more feasible to model (item/item or user/user)
    print('Number of users: ',trainset.n_users,'\n')
    print('Number of items: ',trainset.n_items,'\n')
    
##### cosine similarity model with KNN basic
    sim_cos = {'name':'cosine','user_based':False} #selects item based data only
    #instantiate knn and fit
    basic = knns.KNNBasic(sim_options=sim_cos)
    basic.fit(trainset)
    basic.sim  #generate an array that shows similarity data
    #get accuracy scores
    predictions = basic.test(testset) #generates an RMSE score rounded to four decimal places
    print(accuracy.rmse(predictions)) #generates RMSE without being rounded off
    
##### pearson similarity model with KNN basic
    sim_pearson = {'name':'pearson','user_based':False}
    basic_pearson = knns.KNNBasic(sim_options=sim_pearson)
    basic_pearson.fit(trainset)
    predictions = basic_pearson.test(testset)
    print(accuracy.rmse(predictions))
    
##### pearson similarity model with KNN mean
    sim_pearson = {'name':'pearson','user_based':False}
    knn_means = knns.KNNWithMeans(sim_options=sim_pearson)
    knn_means.fit(trainset)
    predictions = knn_means.test(testset)
    print(accuracy.rmse(predictions))
  
##### pearson similarity model with KNN baseline
    
    sim_pearson = {'name':'pearson','user_based':False}
    knn_baseline = knns.KNNBaseline(sim_options=sim_pearson)
    knn_baseline.fit(trainset)
    predictions = knn_baseline.test(testset)
    print(accuracy.rmse(predictions))
    
### Model Based methods (Matrix Factorization)

    #import packages
    from surprise.prediction_algorithms import SVD
    from surprise.model_selection import GridSearchCV
    #perform Gridsearch for optimal parameters
    param_grid = {'n_factors':[20,100],'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
                   'reg_all': [0.4, 0.6]}
    gs_model = GridSearchCV(SVD,param_grid=param_grid,n_jobs = -1,joblib_verbose=5)
    gs_model.fit(jokes)
    
    svd = SVD(n_factors=100,n_epochs=10,lr_all=0.005,reg_all=0.4)
    svd.fit(trainset)
    predictions = svd.test(testset)
    print(accuracy.rmse(predictions))
    
### Rating Prediction

    user_34_prediction = svd.predict('34','25') #prediction for user 35, item 24
    user_34_prediction #produces a tuple with estimation included 
    
### Obtaining Ratings from Specific Users

    def movie_rater(movie_df,num, genre=None):
        userID = 1000 #specific user id
        rating_list = []
        while num > 0:
            if genre:
                movie = movie_df[movie_df['genres'].str.contains(genre)].sample(1)
            else:
                movie = movie_df.sample(1)
            print(movie)
            rating = input('How do you rate this movie on a scale of 1-5, press n if you have not seen :\n')
            if rating == 'n':
                continue
            else:
                rating_one_movie = {'userId':userID,'movieId':movie['movieId'].values[0],'rating':rating}
                rating_list.append(rating_one_movie) 
                num -= 1
        return rating_list
        
    user_rating = movie_rater(df_movies,4,'Comedy')
    
### Making Predictions with New Ratings

    #add the new ratings to the original ratings DataFrame
    new_ratings_df = new_df.append(user_rating,ignore_index=True)
    new_data = Dataset.load_from_df(new_ratings_df,reader)
    
    # train a model using the new combined DataFrame
    svd_ = SVD(n_factors= 50, reg_all=0.05)
    svd_.fit(new_data.build_full_trainset())
    
    # make predictions for the user
    list_of_movies = []
    for m_id in new_df['movieId'].unique():
        list_of_movies.append( (m_id,svd_.predict(1000,m_id)[3]))
        
    # order the predictions from highest to lowest rated
    ranked_movies = sorted(list_of_movies,key=lambda x:x[1],reverse=True)
    
    # return the top n recommendations using the 
    def recommended_movies(user_ratings,movie_title_df,n):
            for idx, rec in enumerate(user_ratings):
                title = movie_title_df.loc[movie_title_df['movieId'] == int(rec[0])]['title']
                print('Recommendation # ',idx+1,': ',title,'\n')
                n-= 1
                if n == 0:
                    break

    recommended_movies(ranked_movies,df_movies,5)