In [1]:
import pandas as pd
import numpy as np

# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Recommendation Engines

Week 11 | Day 2

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Explain what a recommendation engines is
- Explain the math behind recommendation engines
- Explain the types of recommendation engines and their pros and cons

### LESSON GUIDE

- What is a recommendation engine?
- Who uses a recommendation engine?
- The data for recommendation engines
- Collaborative filtering
- Content-based filtering

## What is a recommendation engine?

At its most basic: A system designed to match users to things that they will like.

- The "things" can be products, brands, media, or even other people. 
- Ideally, they should be things the user doesn't know about. 
- **The goal is to rank all the possible things that are available to the user and to only present the top items**

## Why bother?

- 1/4 to a 1/3 of consumer choices at Amazon are driven by personalized recommendations
- Netflix says there recommendation engine reduce churn saving the company in excess of $1 billion a year

## Who uses recommendation systems?

<img src="http://i.imgur.com/zOJt5mR.png">

![](http://res.cloudinary.com/goodsearch/image/upload/v1410895418/hi_resolution_merchant_logos/target_coupons.gif)

![](https://cdn1.vox-cdn.com/thumbor/lazP2aCcxVUI5RnbcxWpmjr7MU0=/cdn0.vox-cdn.com/uploads/chorus_asset/file/4109214/Discover_Weekly_Snapshot.0.png)

![](https://pmcvariety.files.wordpress.com/2015/09/pandora-logo.jpg?w=670&h=377&crop=1)

![](http://techlogitic.com/wp-content/uploads/2015/11/rs_560x415-140917143530-1024.Tinder-Logo.ms_.091714_copy.jpg)

![](https://pbs.twimg.com/profile_images/744949842720391168/wuzyVTTX.jpg)

<img src="https://www.facebook.com/images/fb_icon_325x325.png">

## The data for recommendations


To make a prediction on what someone will like, it goes without saying, we need to have data. 

## The two types of data


<img src="http://i.imgur.com/bf5YGWd.png">

## Explicit data vs Implicit data

Explicit
- Explicity given/pro-actively acquired
- Clear signals
- Cost associated with acquisition (time/cognitive)
- Limited and sparse data because of this


Implicit
- Provided/collected passively (digital exhaust)
- Signals can be difficult to interpret
- Enormous quantities

## Quiz: Implicit or Explicit?

<img src="http://i.imgur.com/y1XMIeO.png">

<img src="http://i.imgur.com/mrIpGsL.png">

<img src="http://i.imgur.com/ICf7mWp.png">

<img src="http://i.imgur.com/RDfXXRI.png">

<img src="http://i.imgur.com/maP4PBv.png">

### If you have the data, you can build it....

But how?

### We have essentially two options:
- Based upon similar people
- Based upon similar characteristics of the item

- The first is called **Collaborative Filtering**
- The second is called **Content-based Filtering**

## Collaborative Filtering

We'll first look at user-to-user filtering. The idea behind this method is finding your taste **doppelgänger**. This is the person who is most similar to you based upon the ratings both of you have given to a mix of products.

<img src="http://i.imgur.com/fOn04Tj.png">

<img src="http://i.imgur.com/R0dwkgp.png">

<img src="http://i.imgur.com/B1ASjVi.png">

## So, let's see how we construct it

We begin with what's called a utility matrix. This is a user by product matrix.
<img src="http://i.imgur.com/Ce838dV.png">

## Check: If we want to find the most similar users, what do we need?

If we want to find the users most similar to user A, we need a similarity metric.

One we can use is cosine similarity. Cosine similarity uses the cosine between two vectors to compute a scalar value that represents how closely related these vectors are. If the vectors have an angle of 0 (they are pointing in exactly the same direction), then the cosine of 0 is 1 and they are perfectly similar. If they point in completely different directions (the angle is 90 degrees), then the cosine similarity is 0 and they are unrelated. 

## Let's calculate it

With that, let's calculate the cosine similarity of A against all other users. We'll start with B. We have a sparse matrix so let's just fill in 0 for the missing values.

### A vs B
```python
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(np.array([4,0,5,3,5,0,0]).reshape(1,-1),\
np.array([0,4,0,4,0,5,0]).reshape(1,-1))
```
 This give us cosine similarity of **.1835**

This is a low rating and makes sense since they have no ratings in common.

Let's run it for user A and C now.

### A vs C
```python
cosine_similarity(np.array([4,0,5,3,5,0,0]).reshape(1,-1),\
np.array([2,0,2,0,1,0,0]).reshape(1,-1))
```

This gives us a cosine simularity of **.8852.**

#### This indicates these users are very similar. But are they?

## The problem with zero

By inputing 0 to fill the missing values, we have indicated strong negative sentiment for the missing ratings and thus agreement where there is none. We should instead represent that with a neutral value. We can do this by **mean centering** the values at zero. Let's see how that works.

We add up all the ratings for user A and then divide by the total ratings. In this case that is 17/4 or 4.25. We then subtract this 4.25 from every individual rating. We then do the same for all other users. <br><br>That gives us the following table:

<img src="http://i.imgur.com/QuM7xsa.png">


### A vs B
```python
cosine_similarity(np.array([-.25,0,.75,-1.25,.75,0,0])\
.reshape(1,-1),\
np.array([0,-.33,0,-.33,0,.66,0])\
.reshape(1,-1))
```

This new figure for this is:  **.3077**


### A vs C
```python
cosine_similarity(np.array([-.25,0,.75,-1.25,.75,0,0])\
.reshape(1,-1),\
np.array([.33,0,.33,0,-.66,0,0])\
.reshape(1,-1))
```
The new figure for this is: **-0.246**

## So what happened?

So the A and B got more similar and A and C got further apart which is what we'd hope to see. This centering process also has another benefit in that easy and hard raters are put on the same basis.

## Exercise: Find the similarity between X and Y and X and Z for the following.

|User |Snarky's Potato Chips	| SoSo Smoth Lotion	|Duffly Beer	|BetterTap Water	|XXLargeLivin' Football Jersey	|Snowy Cotton Ballas	|Disposos Diapers|
|:-:|---|---|---|---|---|---|---|---|
| X| |4| | 3| | 4| |
| Y| |3.5| | 2.5| | 4| 4|
| Z| | 4| | 3.5| | 4.5| 4.5|

In [9]:
from sklearn.metrics.pairwise import cosine_similarity

# x to y with all
cosine_similarity(np.array([0,  .33,    0, -.67,   0, .33,   0,   0]).reshape(1,-1),\
                  np.array([0,    0,    0,    1,   0, .50,   0, .66]).reshape(1,-1)).reshape(1,-1),\

(array([[-0.47637492]]),)

In [10]:
# x t z with all
cosine_similarity(np.array([0,  .33,    0, -.67,   0, .33,   0,   0]).reshape(1,-1),\
                  np.array([0, -.13,    0, -.63,   0, .38,   0, .38]).reshape(1,-1)).reshape(1,-1),\

(array([[ 0.73727235]]),)

In [11]:
# x to y abridged
cosine_similarity(np.array([  .33,     -.67,    .33,      0]).reshape(1,-1),\
                  np.array([    0,        1,    .50,    .66]).reshape(1,-1)).reshape(1,-1),\

(array([[-0.47637492]]),)

In [8]:
# x t z abridged
cosine_similarity(np.array([  .33,     -.67,    .33,      0]).reshape(1,-1),\
                  np.array([ -.13,     -.63,    .38,    .38]).reshape(1,-1)).reshape(1,-1),\

(array([[ 0.73727235]]),)

## But how do we predict the rating of an item for a user?

|User |Snarky's Potato Chips	| SoSo Smoth Lotion	|Duffly Beer	|BetterTap Water	|XXLargeLivin' Football Jersey	|Snowy Cotton Ballas	|Disposos Diapers|
|:-:|---|---|---|---|---|---|---|---|
| X| |4| | 3| | 4|? |
| Y| |3.5| | 2.5| | 4| 4|
| Z| | 4| | 3.5| | 4.5| 4.5|

Next we'll find the expected rating for User X for Disposo's Diapers using the weighted results of the two closest users (we only have two here, but normally k would be selected) Y and Z.

We do this by weighing each user's similarity to X and multiplying by their rating. We then divide by the sum of their similarities to arrive at our rating.

For k of 2:<br>
** (1st closest cosine sim x their product rating + 2nd closest cosine sim x their product rating) / (sum of 1st and 2nd's cosine sims) **

(.42447212 * (4) + .46571861 * (4.5)) / (.42447212 + .46571861) = 4.26

#### Check: What might be some problems with user-to-user filtering?

- A lot of people like WestWorld
- Some of the people that like WestWorld like Norweigan death metal
- I like WestWorld
- Perhaps I just haven't given Norwegian death metal the chance it deserves

In practice, there is a type of collaborative filtering that performs much better than user-to-user filtering: item-to-item filtering.

## Item-to-item filtering

Let's take a look at an example ratings table.

<img src="http://i.imgur.com/JoBHXcG.png">

Just as in user-to-user filtering, we need to center our values by row.

## Exercise: Center the values by row and find the cosine similarity for each row vs. row 5 (S5).

The nearest songs should have been S1 and S3. To calculate the rating for our target song, S5, for U3, using a k of 2, we have the following equation:

(.98 * (4) + .72 * (5)) / (.98 + .72) = 4.42

This is the similarity of our closest song S1 times User 3's rating + the similarity of song S3 times User 3's rating of it. This is then divided by the total similarity.

Therefore, based on this item-to-item collaborative filtering, we can see U3 is likely to rate S5 very highly at 4.42 from our calculations.

## Content-based Filtering

Finally, there is another method called content-based filtering. In content-based filtering, the items are broken down into "feature baskets". These are the characteristics that represent the item. The idea is that if you like the features of song X, then finding a song that has similar characteristics will tell us that you're likely to like it as well.


The quintessential example of this is Pandora with it's musical genome. Each song is rated on ~450 characteristics by a trained musicologist.

## Example 
Content-based filtering begins by mapping each item into
a feature space. Both users and items are represented by
vectors in this space.
Item vectors measure the degree to which the item is
described by each feature, and user vectors measure a
user’s preferences for each feature.
Ratings are generated by taking dot products of user &
item vectors. 

<img src="http://i.imgur.com/NzHksKK.png">

## Independent Exercise:

Write a function that takes in a utility matrix with users along the index and songs along the columns as seen above in the item-to-item filtering example. The function should accept a target user and song (as strings) that it will return a rating for. 

Use the following as your utility matrix;

In [63]:
df = pd.DataFrame({'U1':[2 , None, 1, None, 3], 'U2': [None, 3, None, 4,
None],'U3': [4, None, 5, 4, None], 'U4': [None, 3, None, 4, None], 'U5': [5, None, 4, None, 5]})
df.index = ['S1', 'S2', 'S3', 'S4', 'S5']
df.info()
print df.isnull().sum()
df

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, S1 to S5
Data columns (total 5 columns):
U1    3 non-null float64
U2    2 non-null float64
U3    3 non-null float64
U4    2 non-null float64
U5    3 non-null float64
dtypes: float64(5)
memory usage: 240.0+ bytes
U1    2
U2    3
U3    2
U4    3
U5    2
dtype: int64


Unnamed: 0,U1,U2,U3,U4,U5
S1,2.0,,4.0,,5.0
S2,,3.0,,3.0,
S3,1.0,,5.0,,4.0
S4,,4.0,4.0,4.0,
S5,3.0,,,,5.0


In [64]:
df.fillna(0, inplace=True)

In [67]:
df['usum'] = df.sum(axis=1)



In [None]:
lenc = []

for row in df:


In [68]:
df

Unnamed: 0,U1,U2,U3,U4,U5,usum
S1,2.0,0.0,4.0,0.0,5.0,11.0
S2,0.0,3.0,0.0,3.0,0.0,6.0
S3,1.0,0.0,5.0,0.0,4.0,10.0
S4,0.0,4.0,4.0,4.0,0.0,12.0
S5,3.0,0.0,0.0,0.0,5.0,8.0


## Conclusion

We have looked at the major types of recommender systems in this lesson. Let's quickly wrap up by looking at the pros and cons of each.

Collaborative Filtering:
Pros:
- No need to hand craft features
Cons:
- Needs a large existing set of ratings (cold-start problem)
- Sparsity occurs when the number of items far exceeds what a person could purchase

Content-based Filtering:
Pros:
- No need for a large number of users
Cons:
- Lacks serendipity
- May be difficult to generate the right features
- Hard to create cross-content recommendations (different feature spaces)

In fact, the best solution -- and the one most likely in use in any large-scale, production system is a combination of both of these. This is known as a **hybrid system**. By combining the two systems, you can get the best of both worlds.

## Additional Resources

- [Wharton Study of Recommender Systems](http://knowledge.wharton.upenn.edu/article/recommended-for-you-how-well-does-personalized-marketing-work/)
- [Netflix Recommendations](https://www.rtinsights.com/netflix-recommendations-machine-learning-algorithms/)
- [Netflix Paper](http://dl.acm.org/citation.cfm?id=2843948)