**Note:** The following are notes taken from Lillian Pierson's LinkedIn Learning Course of the same name.
# Building a Recommendation System with Python ML & AI

### Collaborative Filtering Systems
**Collaborative filtering systems** recommend items based on crowdsourced information about users' preferences for items. There are two approaches to collaborative filtering:
- User-based
- Item-based

* **Item-based systems:** are also known as **item-to-item systems** They generate recommendations based on similarity between items with respect to user ratings of those items. In this course, we'll use Pearson correlation as the driver for an item-based recommender

* **User-based systems:** recommend items based on similarity between users; "Customers who are similar to you liked X, Y, or Z product"

### Content-based Recommenders
- **Content-based recommenders** recommend items based on their features and how similar those features are to features of other items in a dataset. For example: Pandora uses content-based filtering to make its music recommendations

## Popularity-based recommendation systems
* The most simple type of recommender system out there: popularity-based recommendation systems
* These offer a very primitive form of collaborative filtering, where items are recommended to users based on how popular those items are among other users
* Based on **number of reviews** rather than quality or score of reviews
* Popularity-based recommenders:
    - rely on purchase history
    - are often used by online news sites like Bloomberg or NYT (rely on website user activity data set)
    - **cannot produce personalized results** (because they don't take user data into account)
    - offer a very simple form of collaborativec filtering
    
## Correlation-based Recommendations
- Use Pearson's *r* correlation to recommend an item that is most similar to the item a user has already chosen.
- These offer a very basic form of collaborative filtering, because items are recommended based on similarities in their user reviews. In this sense they do take **user preferences** into account
- **Item-based similarity:** How *correlated* are two items based on user ratings?
- In these systems, you use the Pearson ***R*** correlation to recommend an item that is most similar to the item a user already has chosen, in other words, an item that has a review score that correlates with another item that a user has already chosen. 

#### Pearson correlation coefficient (*r*)
- **r = 1** $\rightarrow$ strong positive *linear* relationship
- **r = 0** $\rightarrow$ not linearly correlated
- **r = -1** $\rightarrow$ strong negative *linear* relationship

### Item-based Similarity
- Recommend an item based on how well it correlates with other items with respect to user ratings
- "If users A, B, and D all gave good reviews to a printer, and users A and B also gave a high rating to a camera, user D would also likely give a high rating to the printer."
- Example: `similar_to_tortas = places_crosstab.corrwith(Tortas_ratings)`
- Note the above call will return a matrix


# 2. Machine Learning Recommendation Systems

### Classification-based collaboratice filtering
- These recommenders could be powered by 
- Naive Bayes Classification
- Logistic regression
- etc


- **Classification-based collaborative filtering recommenders are able to make personalized recommendations.
- Provides personalization ny accepting:
    - User and item attribute data
    - Purchase history data
    - Other contextual data
    
    
### Model-based Collaborative Filtering
- With model-based collaborative filtering systems, you build a recommender model from user ratings, and then make recommendations fbased on that model
- In this example: 
    - Truncated Singluar Value Decomposition (SVD) 
    - Utility matrices (also known as user-item matrix) 
    
## Singular Value Decomposition (SVD)
- **SVD is a linear algebra method that can decompose a utility matrix into three compressed matrices.**
- Model-based recommender - use these compressed matrices to make recommendations without having to refer back to the complete data set
- SVD is useful for building a model-based recommender because you can use these compressed matrices to make references without having to refer to the entire dataset
- With SVD, you uncover **latent variables**: inferred, nonobservable variables that are present within, and affect the behavior of a data set

<img src='data/rs1.png' width="800" height="400" align="center"/>

# Anatomy of Truncated SVD
- Sklearn's truncated SVD method returns a single compressed version of the matrix upon which it's called
- Compression happens along the dataset columns
- In the example in the exercises, since we want to recommend movies, we want to preserve the movie names as uncompressed rows 

<img src='data/rs2.png' width="800" height="400" align="center"/>

- We want to use the similarities between users to decide which movies to recommend so we can use Truncated SVD to compress all of the user ratings down to just 12 latent variables. These variables are going to capture most of the information that was stored in the 943 user columns previously. They represent a generalized view of users' tastes and preferences
- The first thing we must do is transpose our matrix so that movies are represented by rows

<img src='data/rs3.png' width="800" height="400" align="center"/>

- We use the Pearson *R* correlation coefficient to find out how similar each movie is to other movies on the basis of generalized user tastes
- We'll take our example (Star Wars 1977), calculate its matrix, and then determine how well this matrix correlates to other user ratings matrices of other movies in the dataset

## Content-Based Recommender
- Recommends an item based on its features and how similar they are to the features of other items in the data set
- These types of systems are not collaborative filtering systems because user preferences and attitudes do not weigh into the evaluation
- Instead, content-based recommenders recommend an item based on its features and how similar those are to features of other items in a dataset
- Nearest Neighbor Algorithm (Unsupervised method)

# Model Evaluation
- How relevant were the recommendations
- Precision = (The number of items that I liked that were also recommended to me) / (The number of items that were recommended)
- Recall = (The number of items that I like that were also recommended to me) / (The number of items that I liked)

<img src='data/rs4.png' width="800" height="400" align="center"/>