# Recommender Systems

Useful links:

[Recommendation Systems — A walk through](https://medium.com/@chaitanyarb619/recommendation-systems-a-walk-trough-33587fecc195)

[MovieLens 20M Dataset](https://www.kaggle.com/grouplens/movielens-20m-dataset)

[Sample code](https://www.kaggle.com/fazilbtopal/popular-recommender-system-algorithms/data?select=rating.csv)

[Best](https://www.kaggle.com/gspmoreira/recommender-systems-in-python-101)

[YouTube Indian Guy](https://www.youtube.com/watch?v=h9gpufJFF-0&t=175s&ab_channel=ArtificialIntelligence-AllinOne)

##  Table of Contents

## 1.Popularity based

**Popularity** Algorithms are based on the trends.
- Recommends the same items to all users.


**IMDB weighted avg formula:**
\\[ \text{Weighted Rating(WR)} = \Big{(}\frac{vR}{v+m}\Big{)}+\Big{(}\frac{mC}{v+m}\Big{)} \\]
- v is the number of votes for the movie;
- m is the minimum votes required to be listed in the chart;
- R is the average rating of the movie
- C is the mean vote across the whole report.

**Graphlab** has built-in popularity based recommender.

## 2.Content-based

**Content-based Filtering:** Based on meta-data or characteristics of the items previously rated by the user.

### 2.1 Creating a TF_IDF Vectorizer 
**TF-IDF Algorithm:** Used to weight a keyword in any document and assign the importance to that keyword based on the number of times it appears in the document.
 - TF and IDF score: \\[ TF*IDF \\]
 
**TF(Term Frequency):**  Number of times a word appears in a document.
  - \\[ TF(t) = \frac{\text{Number of times term t appears in a document}}{\text{Total number of terms in the document}}\\]

**IDF(Inverse Document Frequency):** Measure of how significant that term is in the whole corpus.
   - \\[ IDF(t) = log_e\Big{(}\frac{\text{Total number of documents}}{\text{Number of documents with term t in it}}\Big{)} \\]

![TF-IDF](https://drive.google.com/uc?export=view&id=1b3N33-jUPB7Sud_EVvLxaJo6IHisCcv_)

**scikit-learn** has a pre-built TF-IDF vectorizer. 


**TF_IDF Matrix**: containing each word and its TF-IDF score with regard to each document, or item in this case.
  - representation of every item in terms of its description.

#### 2.1.1 Creating User-Item Matrix

The following branches are obtained based on the movies' contents!

![recom1](https://drive.google.com/uc?export=view&id=1n0b_CmzMe_Oexax6Ax8qpYDerbkR-hWC)

![recom2](https://drive.google.com/uc?export=view&id=1_3LHrRuXhvH14gzbrp2e3eCgigOFp6Lr)

![recom3](https://drive.google.com/uc?export=view&id=1xXP0tZXg1OFiqgujl1gAIn7QxaaTJnDJ)

![recom4](https://drive.google.com/uc?export=view&id=12IBYNmXnaTTJgnN2jUNTfTN4IPSRJiEq)

![recom5](https://drive.google.com/uc?export=view&id=1X98sRuVDfsHp2rmHuPyjHFNE545G27_s)

After round the numbers:

![recom6](https://drive.google.com/uc?export=view&id=1IcOjrf2yp0ZE85eFsfmxvq86SHkpwMqe)


### 2.2 Calculating Cosine Similarity

- Calculating cosine similarity of each item with every other item in the dataset.
- Arrange them according to their similarity with the item

![cos-similarity](https://drive.google.com/uc?export=view&id=1FLBsgfUaDJKsVUz7LiZ0NKfSNUDDFoUO)

- Then we would have a similarity score matrix which stores pairwise similarity scores of all the items.

### 2.3 Making a Recommendation

- Get an item-id input and number of recommendations needed and give the recommendation list.


### 2.4 Pros and Cons:
**Pros:**
- User independence
- No cold start
- Transparency

**Cons:**
- Limited content analysis
- Over-specialization: Not capable of capturing personal tastes.
  
## 3. Collaborative Filtering
 
**Collaborative Filtering:** Based on similar users.
- Based on just behavior of users and independent of the items' characteristics (not context).

**Steps:**
- Finding similar users (KNN Algorithms)
- Predicting the rating of a user would give an item based on similar users
- Measuring Accuracy of the rating
 




## 3.1 Memory-based
 
**User based:** Recommend products to a user that similar users have liked.
- Using pearson correlation or cosine similarity.
- Users’ preference can change over time.
- Scaling is difficult.
 
**Item based:** Instead of measuring the similarity between users, the item-based CF recommends items based on their similarity with the items that the target user rated.
- Using pearson Correlation or Cosine Similarity.
- Item-based CF is more static.
- Suitable for when users are more than items (Amazon)
- Scalability is better but still ann issue.
- Sparsity is better than user based but still can be issue.
 
 
## 3.2 Model-Based

This approach is a solution to handle scalability and sparsity issue crafted by CF.

Involves a step to reduce or compress the large but sparse user-item matrix amd leverage a latent factor model to capture the similarity between users and items.

**Matrix factorization:** breaking down a large matrix into a product of smaller ones.

Algorithms for Matrix Factorization:
- **SVD** (Used in Netflix prize competition): turn the recommendation problem into an optimization problem. common metric is Root Mean Square Error (RMSE). SVD decreases the dimension of the utility matrix by extracting its latent factors. Essentially, we map each user and each item into a latent space with dimension r.

![dimensionality-reduction](https://drive.google.com/uc?export=view&id=1kSo_CTYfS5Xds6RQ_Y3dXi_p00amAYoq)

- **PCA**
- **NMF**
- **Autoencoders** (In case  you want to use Neural Networks)

![recom7](https://drive.google.com/uc?export=view&id=1ZXH5D1MsD7tQPqW58A3TAhG5SWUK5iiR)

![recom8](https://drive.google.com/uc?export=view&id=1UmL4MDUlzvX9JFsU3D8uo_7oZvXuPBX-)

![recom9](https://drive.google.com/uc?export=view&id=1xjy5S9rvSOL3rNdMlRi_-hp_UU-52-ex)

## 4. Metrics

- [An Exhaustive List of Methods to Evaluate Recommender Systems](https://towardsdatascience.com/an-exhaustive-list-of-methods-to-evaluate-recommender-systems-a70c05e121de)

- [Evaluation Metrics for Recommender Systems](https://towardsdatascience.com/evaluation-metrics-for-recommender-systems-df56c6611093)

## References

**Papers:**




**Data Sets:[(See)](https://github.com/caserec/Datasets-for-Recommender-Systems)**

1- **Libraries**:
  - [Surprise Library](https://github.com/NicolasHug/Surprise)
  - [Turi](https://github.com/apple/turicreate/)
  - [GraphLab Installation](https://thatsclaire.medium.com/graphlab-installation-with-conda-python-2-7-8e4fed74b569)

2- **Websites:**
  - [Graphlab Library 1](https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/)
  - [Graphlab Library 2](https://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/)
  - [Graphlab Library 3](https://medium.com/@kiran6390/machine-learning-and-its-applications-dcee37f0fbfe)
  - [Popularity based Algorithm](https://medium.com/the-owl/recommender-systems-f62ad843f70c)
  - [TF-IDF Vectorizer](https://medium.com/@cmukesh8688/tf-idf-vectorizer-scikit-learn-dbc0244a911a)
  - [Content based Recommender](https://medium.com/@bindhubalu/content-based-recommender-system-4db1b3de03e7)
  - [Collaborative Filtering](https://realpython.com/build-recommendation-engine-collaborative-filtering/#steps-involved-in-collaborative-filtering)
  - [Using SVD model for Netflix Dataset](https://towardsdatascience.com/machine-learning-for-building-recommender-system-in-python-9e4922dd7e97)



  3- **Courses:**
  - [ML Foundations: Case Study Approach (Recommender & Image case study)] (https://www.coursera.org/learn/ml-foundations)