## `Collabarative Filtering`

Collabarative filtering is a umbrella name for recommendation techniques which consider user data to provide personalized recommendations.

These techniques are based on the idea "apes together strong!" xD. Simply recommendations are based on the permise that people principally keep their tastes over time and that if they agreed with somebody in the past, they will likely to agree with them in the future as well.

One Collabarative filtering type is `Neighbourhood based filtering`. There are 2 techniques in this type.

1. User based filtering - find simialar users and recommend items liked by them.
2. Item based filtering - find item ratings and recommend similarly rated items.

To do the mentioned tasks we use a data structure(matrix) called `ratings matrix`. Basically this matrix include the each user and their ratings for respective item and our goal is to predict the missing values in this ratings matrix so that we can recommend those to users.


<center><image src="./images/Recommendation pipeline.jpg" width="500px" /></center>


In practice, one of the main concern in recommendation engines are similarity checking. Since this is a time consuming task given the number of items we need to consider is high, it is essential to precalculate the similarities before hand. And in fact Apparently Amazon used such offline calculation of similarity for their recommender systems.

<pre style="color:yellow;">
    For each item in product catalog, I1
        For each customer C who purchased I1
            For each item I2 purchased by customer C
                Record that a customer purchased I1 and I2
            For each item I2
                Compute the similarity between I1 and I2
</pre>

Following is a simple implementation to show the process of item - item recommendation

In [75]:
import pandas as pd
import numpy as np


rating_matrix = pd.DataFrame(data=np.array([ [5,4,5,3,3,2],[3,3,2,5,3,3],[None,4,5,3,3,2],\
                                    [2,None,2,None,2,3],[2,3,1,1,4,5],[2,3,1,1,5,5]]).T,\
                        columns=['item1','item2','item3','item4','item5','item6'],\
                        index=['user1','user2','user3','user4','user5','user6'])

print(rating_matrix)

      item1 item2 item3 item4 item5 item6
user1     5     3  None     2     2     2
user2     4     3     4  None     3     3
user3     5     2     5     2     1     1
user4     3     5     3  None     1     1
user5     3     3     3     2     4     5
user6     2     3     2     3     5     5


In [76]:
pd.set_option('precision', 2)

In [77]:
# %%timeit -n 100
def get_adjusted_ratings(rating_matrix):
    adjusted_matrix = rating_matrix.sub(rating_matrix.mean(axis=1), axis=0)
    return adjusted_matrix

get_adjusted_ratings(rating_matrix)

Unnamed: 0,item1,item2,item3,item4,item5,item6
user1,2.2,0.2,,-0.8,-0.8,-0.8
user2,0.6,-0.4,0.6,,-0.4,-0.4
user3,2.33,-0.67,2.33,-0.67,-1.67,-1.67
user4,0.4,2.4,0.4,,-1.6,-1.6
user5,-0.33,-0.33,-0.33,-1.33,0.67,1.67
user6,-1.33,-0.33,-1.33,-0.33,1.67,1.67


In [145]:
def adjusted_cosine_similarity(rating1, rating2):
    from math import sqrt

    totMulSum = 0
    r1sqr = 0
    r2sqr = 0
    
    for i in range(len(rating1)):

        if(pd.isnull(rating1[i]) or pd.isnull(rating2[i])):
            continue

        i1 = round(rating1[i], 2)
        i2 = round(rating2[i], 2)
        totMulSum = totMulSum + (i1*i2)
        r1sqr += i1**2
        r2sqr += i2**2
    # print(totMulSum)
    # print(sqrt(r1sqr), sqrt(r2sqr))
    cosine = totMulSum/( round(sqrt(r1sqr),2)*round(sqrt(r2sqr),2))

    return cosine
    

In [147]:
# %%timeit -n 100
def get_similarity_matrix(rating_matrix):

    adjusted_ratings = get_adjusted_ratings(rating_matrix)
    similarity_matrix = pd.DataFrame(np.ones_like(adjusted_ratings))

    for i in range(len(adjusted_ratings)):
        for j in range(len(adjusted_ratings)):
            similarity_matrix[i][j] = adjusted_cosine_similarity(adjusted_ratings.iloc[:,i],\
                                                                    adjusted_ratings.iloc[:,j])

        #     print(adjusted_cosine_similarity(adjusted_ratings.iloc[:,0],\
        #                                                             adjusted_ratings.iloc[:,1]))

        #     break
        # break

    similarity_matrix.columns = rating_matrix.columns
    similarity_matrix.index = rating_matrix.columns
    return similarity_matrix

get_similarity_matrix(rating_matrix)

Unnamed: 0,item1,item2,item3,item4,item5,item6
item1,1.0,0.02,1.0,-0.41,-0.82,-0.76
item2,0.02,1.0,-0.04,0.58,-0.44,-0.43
item3,1.0,-0.04,1.0,-0.17,-0.87,-0.81
item4,-0.41,0.58,-0.17,1.0,0.07,-0.2
item5,-0.82,-0.44,-0.87,0.07,1.0,0.96
item6,-0.76,-0.43,-0.81,-0.2,0.96,1.0


In [127]:
# (2.2*0.2)+(0.6*-0.4)+(2.33*-0.67)+(0.4*2.4)+(-0.33*-0.3)+(-1.33*-0.3)

0.09689999999999999