**CSC 466: Knowledge Discovery in Data**

**Individual Test**

**Thursday, December 2, 2021**

**Section 03: Task 1**

**Your Name : put your name here!**

**Cal Poly Email: put your email here**


**Your Assignment**:

For this assignment, you will implement a study of two different Collaborative Filtering methods.

You will implement User-based Adjusted Weighted Sum Recommendations with Cosine similarity and Pearson Correlation similarity as options. Your goal is to check which similarity score yields better predictions on our dataset.

You need to complete four tasks:

1. Implement the cosine similarity function

2. Implement the Pearson correlation similarity function

3. Implement the User-based Adjusted Weighted Sum Recommendation computation (that uses a similarity metric as a parameter)

4. Develop the full study for comparing MAE (Mean Absolute Error) values for each of the methods.

Look for cells with **Your Task: Step X** text for secific instructions for each step.


In [1]:
## Imports

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn
%matplotlib inline

**Data**

We use a toy dataset documenting the scores 10 different people gave to 10 different movies.  For the sake of simplicity, this dataset is **dense**, i.e., it does not have missing values. 

The scores are in the range [-5, 5], where -5 is negative score, 5 is positive, and 0 is neutral. All scores in the dataset are integers, so we are working with an 11-point scale.


In [2]:
## the data for this assignment

## DO NOT MODIFY THIS VARIABLE!!!!

data = np.array([[-4, -2, 3, 4, 0, -1, 0, -5, 4, 3],
                 [-3, -2, 4, 3, -1, 0, 0, -3, 2, 4],
                 [-5, -3, 3, 5,  1, 0, -1,-1, 1, 3],
                 [ 0,  1, 4, 5, 2, 1,  2, 0,  4, 4],
                 [0,   2, 2, 4,  2, 0, 3, -5, -2,0],
                 [1,   4, 4, 3,  0, 4, 4, 2,  2, -1],
                 [2,   3, 5, 5, -1, 3, 4, 2,  3,  0],
                 [5,   5, 5, 5, -5, 5, 5, 0,  0, -5],
                 [4,   3, 4, 4, -2, 3, 0, -1, -1, -4],
                 [3,  -3, 5, 5, -5, 5, 0, 0,  -5, 0]    
])

### to add some color and give you ability to produce meaningful output, here is the list of users 
### user in row 1 is "Alice", in row 2 is "Bob" and so on

users = ["Alice", "Bob", "Christie", "David", "Emily", "Frank", "Gina", "Ignacio", "Jenny", "Mark"]

### we use fruit as items here. Column 1 is "Apples", column 2 is "Bananas", etc...

items = ["Apples", "Bananas", "Cherries", "Dragonfruit", "Figs", "Grapes", "Kiwis", "Pears", "Raspberries", "Starfruit"]


## the toyData array may be useful for debugging purposes. Feel free to modify it as you see fit

toyData  = np.array([[5,5,0,-5],
                     [5,5,0,-5],
                     [0,0,5,0]       
])


**Your Task: Step 1: Implement Cosine similarity function**

Implement the function

    cosSimilarity(v,u,i)
  
where
    
        v,u are vectors of the same length
        i is a number in the range 0,..,len(v)-1

This function outputs the cosine similarity between vectors **v** and **u**, after coordinate **i** is excluded from both vectors (please note this last modification!)

If you need it, you can assume that both **v** and **u** are NumPy arrays.

Recall the formula for cosine similarity:

$$cos(x,y) = \frac{x\cdot y}{|x|_2 |y|_2} = \frac{\sum_{i=1}^d x_iy_i}{\sqrt{\sum_{i=1}^d x_i^2}\sqrt{\sum_{i=1}^d y_i^2}}$$


In [3]:
def cosSimilarity(v,u,i):
    v=np.delete(v,i)
    u=np.delete(u,i)
    return np.dot(u,v)/np.dot(np.linalg.norm(u), np.linalg.norm(v))

In [4]:
## debug and test cosSimilarity here
test = np.array([1,2,3,4])
# test2 = np.array([1,2,3,4])

test2 = np.array([-1, -2, -3, -4])
cosSimilarity(test, test2, 2)


-1.0

**Your Task: Step 2: Implement Pearson Correlation function**

Implement the function

    pearsonSimilarity(v,u,i)
  
where
    
        v,u are vectors of the same length
        i is a number in the range 0,..,len(v)-1

This function outputs the Pearson Correlation coefficient between vectors **v** and **u**, after coordinate **i** is excluded from both vectors (please note this last modification!)

If you need it, you can assume that both **v** and **u** are NumPy arrays.

Recall the formula for Pearson Correlation Coefficient

$$pearson(x,y) = \frac{(x-\bar{x})\cdot (y-\bar{y})}{|x-\bar{x}|_2 |y-\bar{y}|_2} = 
                 \frac{\sum_{i=1}^d (x-\bar{x})(y-\bar{y})}{\sqrt{\sum_{i=1}^d (x-\bar{x})^2}\sqrt{\sum_{i=1}^d (y-\bar{y})^2}}
$$

Here, 
 $$\bar{x} = \frac{1}{d}\sum_{i=1}^{d}x_i$$
 and
 $$\bar{y} = \frac{1}{d}\sum_{i=1}^{d}y_i$$
i.e., $\bar{x}$ and $\bar{y}$ are respective average values of a coordinate in vectors $x$ and $y$.


In [5]:
def pearsonSimilarity(v,u,i):
    v=np.delete(v,i)
    u=np.delete(u,i)
    vbar = np.mean(v)
    ubar = np.mean(u)
    num = np.sum((u-ubar)*(v-vbar))
    ## compute Pearson Correlation Coefficient
   
    return  num/np.dot(np.linalg.norm(u-ubar), np.linalg.norm(v-vbar))


In [6]:
## debug and test pearsonSimilarity() here
test1=test
pearsonSimilarity(test, test1, 2)

1.0000000000000002

**Your Task: Step 3: Implement Adjusted Weighted User Based Recommendations**
        
Implement the function

     awUserBasedRecommendation(data, user, item, simFunction)
    
where  

    data is the dataset
    (user, item) is the user, item pair for which the prediction needs to be generated
    simFunction is the name of the function to be used for similarity computation
    
We have two options for the similarity function: *cosSimilarity* and *pearsonSimilarity*

**You can look up the specific formulas in our Collaborative Filtering handout.**

The function shall return a single value: the predicted rating.
    
    

In [7]:
## returns adjusted weighted, user based recommendation prediction for (user, item) pair
def awUserBasedRecommendation(data, user, item, simFunction):
    restData = np.delete(data,user,axis=0)
    udata= data[user]
    sims = np.apply_along_axis(lambda x: simFunction(x, udata, item), 1, restData)
    k = 1/np.sum(sims)
    
    prediction = 0  ## use this variable to compute your prediction
    
    residuals = restData[:,item] - np.mean(restData,axis=1)
    
    ubar = np.mean(udata)
    return ubar + k*np.sum(sims*residuals)
        

In [8]:
## test and debug awUserBasedRecommendation
awUserBasedRecommendation(data, 0, 0, cosSimilarity)

-2.215989257987782

**Your Task: Step 4: Conduct the Comparative Study**

In this study, perform the following tasks:

1. For each pair (user, item) in the dataset, compute both the cosine-similarity based recommendation and the Pearson correlation based recommendation

2. Compute the errors for each recommendation

3. For each pair (user, item) output the following information:

    user, item, score, Cosine-predicted score, Error of cosine-predicted score, Pearson-predicted score, Error of Pearson-predicted score

4. Compute the MAE (mean absolute error) for each of the two methods. Report both MAEs (one per line, with some text labeling) after you print out all individual predictions.


In [9]:
## predict every single rating, output, compute errors, compute MAE for each method.
pearson = []
cosine = []
for i in range(len(data)):
    for j in range(len(data[0])):
        cospred=awUserBasedRecommendation(data, i, j, cosSimilarity)
        ppred=awUserBasedRecommendation(data, i, j, pearsonSimilarity)
        coserr = data[i,j] - cospred
        perr = data[i,j] - ppred
        print(f"user: {i}, item: {j}, cosine Pred: {cospred}, cosine err: {coserr}, pearson Pred: {ppred}, pearson err: {perr}")
        pearson.append(perr)
        cosine.append(coserr)
print(f"cosine MAE: {np.mean(np.abs(cosine))}")
print(f"pearson MAE: {np.mean(np.abs(pearson))}")

user: 0, item: 0, cosine Pred: -2.215989257987782, cosine err: -1.7840107420122182, pearson Pred: -2.5793018717678904, pearson err: -1.4206981282321096
user: 0, item: 1, cosine Pred: -1.2763039824754252, cosine err: -0.7236960175245748, pearson Pred: -1.4540535484107864, pearson err: -0.5459464515892136
user: 0, item: 2, cosine Pred: 2.656161235439824, cosine err: 0.3438387645601759, pearson Pred: 2.540596773588467, pearson err: 0.4594032264115331
user: 0, item: 3, cosine Pred: 3.777878739529769, cosine err: 0.22212126047023117, pearson Pred: 3.5176592347264903, pearson err: 0.48234076527350966
user: 0, item: 4, cosine Pred: 0.12355077636590898, cosine err: -0.12355077636590898, pearson Pred: 0.28840096225564504, pearson err: -0.28840096225564504
user: 0, item: 5, cosine Pred: -0.2642884108908648, cosine err: -0.7357115891091353, pearson Pred: -0.46002853661361404, pearson err: -0.5399714633863859
user: 0, item: 6, cosine Pred: 0.058799356661828756, cosine err: -0.058799356661828756, p

**Congratulations!** You are done.

Download the notebook (right-click on the file name and select "Download" in the pop-up menu) and submit it using the

        handin dekhtyar 466-test01 <file> 
        
 command.