<h1>Recommender systems</h1>

<ol>
    <li>Similarity vectors</li>
    <li>Content-based filtering</li>
    <li>User-based filtering</li>
</ol>

<table>
    <tr>
        <td><img src="Media/netflix_recommender.png"/></td>
        <td><img src="Media/ebay_products.png"/></td>        
    </tr>
</table>


<h2>1. Similarity measures</h2>

How can we measure similarity between two records?

<p style="color:blue">Well, first a record, is a set of features, which can be translated into a vector.</p>

<h3>1.1. Cosine Similarity</h3><br/>
Two vectors are similar, the more they coincide, that is, the angle separating them is small. (i.e. they are about to coincide)<br/><br/>
$$
sim(\vec{u},\vec{v}) = cos(\theta) = \frac{\vec{u}.\vec{v}}{|\vec{u}||\vec{v}|}
$$

$$
-1\leq cos(\theta)\leq1
$$

<img src="Media/cos_sim_prev.png" width="500px"/>

In [65]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

profile_1 = {'age':12, 'height':1.2, 'weight':30}
profile_2 = {'age':13, 'height':1.3, 'weight':25}
profile_3 = {'age':20, 'height':1.7, 'weight':60}

p_1 = np.array(list(profile_1.values()))
p_2 = np.array(list(profile_2.values()))
p_3 = np.array(list(profile_3.values()))

p = np.array([p_1,p_2,p_3])
mean_p = p.mean(axis=0)
std_p = p.std(axis=0)

p_1 = (p_1-mean_p)/std_p
p_2 = (p_2-mean_p)/std_p
p_3 = (p_3-mean_p)/std_p

p_1 = p_1.reshape(1,-1)
p_2 = p_2.reshape(1,-1)
p_3 = p_3.reshape(1,-1)

In [66]:
sim_p1_p2 = cosine_similarity(p_1,p_2)
print('Similarity profile 1 and 2:',sim_p1_p2)

sim_p1_p3 = cosine_similarity(p_1,p_3)
print('Similarity profile 1 and 3:',sim_p1_p3)

Similarity profile 1 and 2: [[0.8885701]]
Similarity profile 1 and 3: [[-0.97688099]]


<h2>2. Recommender systems</h2><br/>

<b style="color:blue">Case-study: Movie Recommender system</b>

Which movie would you recommend to <b>Kevin Hart</b>?
<img src="Media/kevin_hart.jpeg"/>


Recommdender systems can be built using at least two philosophies:
<ul>
    <li><b>User-based Collaborative filtering</b></li>
    <li><b>Content-based filtering</b></li>
</ul>

<b style="color:blue">User-based Collaborative filtering: "Recommend top movies of people most like him"?</b>

<table>
 <tr>
     <td><img src="Media/mr_beans.jpeg"></td>
     <td><img src="Media/chris_rock.jpeg"></td>
     <td><img src="Media/indian_comedian.jpeg"></td>
     <td><img src="Media/the_mask.jpeg"></td>
     <td><img src="Media/mr_bones.jpeg"></td>
 </tr>
</table>

What do we need? 
<b>Requirements: Facts about users</b>

<b style="color:blue">Content-based Collaborative filtering: "Recommend movies most similar to the movies he has watched and loved"?</b>

<table>
 <tr>
     <td><img src="Media/home_alone.jpeg"></td>
     <td><img src="Media/junior_ter.jpeg"></td>
     <td><img src="Media/big_mama.jpeg"></td>
     <td><img src="Media/the_mask_movie.jpeg"></td>
 </tr>
</table>

What do we need? 
<b>Requirements: Facts about movies</b>

<h3>2.1. User-based filtering</h3><br/>

A utility matrix is needed in user-based filtering that contains <b>facts about users</b> and <b>ratings they gave on movies</b>

<img src="Media/utility_matrix.png"/>

<b>Problem statement:</b> <span style="color:blue"> (1) What top N movies do I recommend to user p?</span> <span style="color:blue"> (2) What rating will <b>user p</b> give to <b>movie q</b> if similar users <b>have rated movie q?</b></span>

<img src="Media/utility_matrix_2.png" width="80%"/>

Let's look at the solution steps.

<b>Step 1:</b> Compute the <b style="color:blue">similarity between users based on profile features</b> and select the most similar k users to User p

<img src="Media/similarity_matrix.png"/>

<b>Step 2a:</b> <b style="color:blue"> Rank the k most similar users to p. For each user collect their top-rated items.  Find the top N most popular items across k similar users </b> <b style="color:red">[This is a recommender system already]</b>

<b>Step 2b:</b> Compute the rating of movie q using <b style="color:blue">the weighted average of the ratings of most similar users</b>

$$
rating_p(q) = \frac{\sum_{i=1}^{k}sim_{i,p} rating_{i,q}}{\sum_{i=1}^{k}sim_{i,p}}
$$

In [None]:
user_p = xxx
movie_q =    (rating_q = ?)

user_1
movie_q:    rating_q = 0.3
sim_{1p} = 0.6
    
user_2 
movie_q     rating_q = 4.1
sim_{2p} = 0.8

rating_{p}(q) = (0.6*0.3 + 0.8*4.1)/(0.6+0.8) = 

<h3>2.2. Content-based filtering</h3><br/>
Content-based filtering requires facts about movies to provide recommendation.

<img src="Media/cb_db.png"/>

<b>Step 1:</b> Compute the <b style="color:blue">Similarity Matrix between movies</b>

<img src="Media/movie_similarity_matrix.png" width="80%"/>

<b>Step 2a:</b> <b style="color:blue"> Rank the movies most similar to the movie one has watched and loved</b> <b style="color:red">[This is a recommender system already]</b>

<img src="Media/rank_movies.png" wdith="80%"/>

<b>Step 2b:</b> <b style="color:blue">To rank an unrated movie q when one has rated other movies $j,k,l$, use the weighted average of ratings</b>

$$
rating_p(q) = \frac{sim(q,j)rating(j) + sim(q,k)rating(k) + sim(q,l)rating(l)}{sim(q,j)+sim(q,k)+sim(q,l)}
$$

<h2>Conclusion</h2>

<ul>
    <li>User-based collaborative filtering</li>
    <li>Content-based filtering</li>
</ul>