# Recommendation Engines

The goal of building recommendation engines includes:

* **Similar item recommendations**: surfacing similar items to users. This approach generates recommendations for items that are similar to an item you specify.
* **Personalized rankings**: a list of recommended items that are re-ranked for a specific user.
* **New item recommendations**: Offering the right recommendations when new items are added to your catalog. This is one of the most challenging problems in building relevant recommendations.

## How does a recommendation engine work?

Here is the high-level idea:

* recommend items to a user which are most popular among all the users
* divide the users into multiple segments based on their preferences (user features) and recommend items to them based on the segment they belong to

### Content based filtering

This algorithm recommends products which are similar to the ones that a user has liked in the past.

#### But what does **"similar"** mean in case of movies, musics, books, etc?

First we need to  save all the information related to each user in a vector form (**profile vector**). This vector contains the past behavior of the user, for example the movies liked/disliked by the user and the ratings given by them.

All the information related to items is stored in another vector called the **item vector**. For example, item vector contains the details of each movie, like genre, cast, director, etc.

Once we collect the data abour users and items in vectors, we can do vector operations including calculating their distance.

One common approach to measure similarity between vectors is **cosine similarity**. Cosine Similarity measures the cosine of the angle between two **non-zero** vectors of an inner product space. This similarity measurement is particularly concerned with orientation, rather than magnitude. 

![image](./img/cosine-similarity-1007790.jpeg) 


Based on the cosine value, which ranges between -1 to 1, the items are then arranged in descending order and you can use the result to recommend top-n items.



one challenge with this appraoch goes to "**non-zero** vectors" condition. In other words, this alrogrithm is limited to recommending items that are of the same type. It will never recommend products which the user has not bought or liked in the past. So if a user has watched or liked only action movies in the past, the system will recommend only action movies. It’s a very narrow way of building an engine.


To improve on this type of system, we need an algorithm that can recommend items not just based on the content, but the behavior of users as well.

------

### Collaborative filtering

The collaborative filtering algorithm uses “User Behavior” for recommending items. This is one of the most commonly used algorithms in the industry as it is not dependent on any additional information. There are different types of collaborating filtering techniques


* User-User collaborative filtering: This algorithm first finds the similarity score between users. Based on this similarity score, it then picks out the most similar users and recommends products which these similar users have liked or bought previously.

![image](https://miro.medium.com/max/720/0*o0zVW2O6Rv-LI5Mu.png) 
source: https://miro.medium.com/max/720/0*o0zVW2O6Rv-LI5Mu.png


* Item-Item collaborative filtering: In this algorithm, we compute the similarity between each pair of items. Based on that, we will recommend similar movies which are liked by the users in the past.


in-class example: https://docs.google.com/spreadsheets/d/1KutmAm87SIXvYkfsdK0FJFuLmSbjS1LY74S2mfInr8s/edit#gid=0

---
some practical considerations:

* This algorithm is quite time consuming as it involves calculating the similarity for each user/items and then calculating prediction for each similarity score. 
   * One way of handling this problem is to select only a few users/items instead of all to make predictions
    
 what will happen if a new user or a new item is added in the dataset? It is called a **Cold Start**. One possible solution could be to recommend the best selling products, i.e. the products which are high in demand. Another possible solution could be to recommend the products which would bring the maximum profit to the business.
 
 
 Data about users and items can be collected explicitly and implicitly. Explicit data is information that is provided intentionally, i.e. input from the users such as movie ratings. Implicit data is information that is not provided intentionally but gathered from available data streams like search history, clicks, order history, etc.