## Recommender systems

Nowadays, recommender systems are used to personalize your experience on the web, telling you what to buy, where to eat or even who you should be friends with. People’s tastes vary, but generally follow patterns. People tend to like things that are similar to other things they like, and they tend to have similar taste as other people they are close with. Recommender systems try to capture these patterns to help predict what else you might like.

### Types
- Content-Based (Similarity between items)
- Collaborative Filtering (Similarity between user's behaviers)
    - Model-Based Collaborative filtering (SVD)
    - Memory-Based Collaborative Filtering (cosine similarity)
        - user-item filtering
        - item-item filtering
 
### Data
- [MovieLens 100K Dataset](https://grouplens.org/datasets/movielens/100k/)
- 100k movie ratings
- 943 users
- 1682 movies

In [1]:
import numpy as np
import pandas as pd
import tools as t


In [2]:
#reading
header = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('u.data', sep='\t', names=header)

In [3]:
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print 'Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items)  

Number of users = 943 | Number of movies = 1682


In [4]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df, test_size=0.25)

In [5]:
train_data.describe()


Unnamed: 0,user_id,item_id,rating,timestamp
count,75000.0,75000.0,75000.0,75000.0
mean,462.185627,425.872053,3.530227,883537100.0
std,266.8593,331.233126,1.123428,5349875.0
min,1.0,1.0,1.0,874724700.0
25%,254.0,175.0,3.0,879448100.0
50%,446.0,321.0,4.0,882829500.0
75%,682.0,633.0,4.0,888267300.0
max,943.0,1682.0,5.0,893286600.0


In [6]:
train_data

Unnamed: 0,user_id,item_id,rating,timestamp
42060,106,739,3,881453290
71410,923,245,3,880387199
15848,409,680,1,881105677
30034,553,511,5,879948869
53751,523,384,3,883703495
63941,887,200,1,881380883
20020,13,547,1,882397011
36258,450,549,3,882377358
43472,629,50,5,880117395
83203,936,405,2,886833053


### Create a user-item rating matrix

<img src="user-item.png">

In [7]:
def user_item_rating(data):
    data = np.array(data)
    matrix = np.zeros((n_users, n_items))
    matrix[data[:,0]-1,data[:,1]-1] = data[:,2]
    return matrix

In [8]:
train_data_matrix = user_item_rating(train_data)
test_data_matrix = user_item_rating(test_data)

print train_data_matrix.shape, test_data_matrix.shape
print "Train Matrix ", train_data_matrix[:10]
print
print "Test Matrix ", test_data_matrix[:10]

(943, 1682) (943, 1682)
Train Matrix  [[5. 3. 4. ... 0. 0. 0.]
 [4. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

Test Matrix  [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [4. 0. 0. ... 0. 0. 0.]]


### Calculate Cosine Similarity
<img src="user_sim.gif">
<img src="item_sim.gif">

#### Hint: look for pairwise_distances

In [9]:
def cosine_similarity(data):
    # you code here
    similarity_matrix = np.zeros((np.shape(data)[0],np.shape(data)[0]))
    dot_product = data.dot(data.T)
    a = np.linalg.norm(data.T,axis=0)
    a= a.reshape((a.shape[0],1))
    b = a.T
    magnitude_product = b*a
    similarity_matrix = dot_product/magnitude_product
    return np.nan_to_num(similarity_matrix)

In [1]:
import numpy as np
a = np.array([[1,2,3],[4,5,6],[3,2,1]])
b = a.T
print "a",a
print "b",b
dot_product = a.dot(b)
print "dot_product",dot_product
a = np.linalg.norm(a.T,axis=0)
print "A",a
a= a.reshape((a.shape[0],1))
b = a.T
print "a",a
magnitude_product = a*b
print "Magnitude Product",magnitude_product

cosine = dot_product/magnitude_product


#(np.sum(a.T,axis=0)).dot(np.transpose(np.sum(a.T,axis=0)))

a [[1 2 3]
 [4 5 6]
 [3 2 1]]
b [[1 4 3]
 [2 5 2]
 [3 6 1]]
dot_product [[14 32 10]
 [32 77 28]
 [10 28 14]]
A [3.74165739 8.77496439 3.74165739]
a [[3.74165739]
 [8.77496439]
 [3.74165739]]
Magnitude Product [[14.         32.83291032 14.        ]
 [32.83291032 77.         32.83291032]
 [14.         32.83291032 14.        ]]


In [11]:
user_similarity = cosine_similarity(train_data_matrix)
item_similarity = cosine_similarity(train_data_matrix.T)

  if __name__ == '__main__':
  if __name__ == '__main__':


In [12]:
import sklearn as sk
user_similarity2=sk.metrics.pairwise_distances(train_data_matrix,metric='cosine')
item_similarity2=sk.metrics.pairwise_distances(train_data_matrix.T,metric='cosine')
print user_similarity.shape, item_similarity.shape
print 'Your Function: ',user_similarity[0][1]
print 'pairwise_distances: ',1-user_similarity2[0][1]
print 'Your Function: ',item_similarity[0][1]
print 'pairwise_distances: ',1-item_similarity2[0][1]

(943, 943) (1682, 1682)
Your Function:  0.14038979311672609
pairwise_distances:  0.14038979311672606
Your Function:  0.3151289102924038
pairwise_distances:  0.31512891029240375


### Predictions
- user-user filtering
- item-item filtering

<img src="user_predict.gif">
<img src="item_predict.gif">

In [13]:
print train_data_matrix.shape
print item_similarity.shape
print user_similarity.shape

(943, 1682)
(1682, 1682)
(943, 943)


In [14]:
def predict(ratings, similarity,k=5, type='item'):
    if type == 'user':
        ratings = (similarity.dot(ratings).T / np.sum(similarity, axis=0)).T
        pass
    elif type=='item':
        ratings = similarity.dot(ratings.T).T / (np.sum(similarity, axis=0).T)
        pass
    return ratings

In [15]:
item_prediction = predict(train_data_matrix, item_similarity, type='item')
user_prediction = predict(train_data_matrix, user_similarity, type='user')

  


In [16]:
print item_prediction[0]
print user_prediction[0]

[0.90122445 0.82096078 0.87104433 ... 0.12848525 0.68175589 1.1271946 ]
[1.74706018e+00 5.76859595e-01 3.44273267e-01 ... 6.76830335e-04
 5.62210752e-03 7.32916282e-03]
