_The main focus of this assignment is Building Recommendation Systems from theoretical as well as practical perspective_

## Problem 1: Implementing Recommendation Systems

The goal of this task is to predict the recommendation score for products given user reviews. The data consists of products as columns and users as rows. The data is given as follows where P refers to the product and U refers to the user. An entry in each cell refers to the users review score or recommendation for that product.

|   | P1 | P2 | P3 | P4 | P5 | P6 | P7 | P8 | P9 | P10 | 
|----|----|----|----|----|----|----|----|----|----|-----| 
| U1 | 3  | 7  | 4  | 9  | 9  | 7  | 6  | 7  | 8  | 8   | 
| U2 | 7  | 5  | 5  | 3  | 8  | 8  | 7  | 4  | 9  | 5   | 
| U3 | 7  | 5  | 5  | 0  | 8  | 4  | 8  | 6  | 7  | 9   | 
| U4 | 5  | 6  | 8  | 5  | 9  | 8  | 5  | 7  | 10 | 7   | 
| U5 | 5  | 8  | 8  | 8  | 10 | 9  | 7  | 4  | 9  | 8   | 
| U6 | 7  | 7  | 8  | 4  | 7  | 8  | 6  | 7  | 7  | 8   | 


Consider the following test set of users. The missing values are the products that the corresponding users have not bought. Given this dataset, determine which products U7, U8 and U9 should buy. Show the recommendation scores for the top 3 products.

|   | P1  | P2  | P3  | P4  | P5  | P6  | P7  | P8  | P9  | P10 | 
|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----| 
| U7 |  ?  | 6   | 9   |   ? |   ? | 6   |   ? | 9   |   ? |   ? | 
| U8 | 7   |   ? | 9   |   ? | 4   |  ?  | 9   |  ?  | 7   |   ? | 
| U9 |   ? | 6   |   ? | 9   |   ? | 7   |   ? | 8   |   ? | 4   | 


In [50]:
# Load the Relevant libraries
import sklearn as sk
import pandas as pd
import numpy as np

train = pd.read_csv("ratingsDataTrain.csv",index_col=[0])
test = pd.read_csv("ratingsDataTest.csv",index_col=[0])
test = test.rename(index=dict(zip(['U1','U2','U3'],['U7','U8','U9'])))
og_test = test.copy()

In [51]:
nearest_neighbor = {}
# for each user in our test set, find their nearest neighbor
for u in test.index:
    # append the user to the training set
    j = train.append(test.loc[u])
    # find the correlation matrix
    corr_mat = j.T.corr()
    # eliminate the diagonal perfect correlation
    corr_mat = corr_mat.replace(1.0,0.0)
    # find the nearest neighbor for everyone
    NNs = corr_mat.idxmax(axis=1)
    # extract out just the user from the test set
    nearest_neighbor.update({u:NNs[u]})

In [52]:
test_users = ['U7','U8','U9']

# for each of our users in the test set, fill their gaps with the review from their most comparable user
# replace their missing data in the test set to fill out the test set
# returns a df with no missing data
for u in test_users:
    best_comp = nearest_neighbor.get(u)
    test.loc[u] = test.loc[u].combine_first(train.loc[best_comp])

In [53]:
test

Unnamed: 0,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10
U7,7.0,6.0,9.0,0.0,8.0,6.0,8.0,9.0,7.0,9.0
U8,7.0,7.0,9.0,4.0,4.0,8.0,9.0,7.0,7.0,8.0
U9,3.0,6.0,4.0,9.0,9.0,7.0,6.0,8.0,8.0,4.0


In [54]:
og_test

Unnamed: 0,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10
U7,,6.0,9.0,,,6.0,,9.0,,
U8,7.0,,9.0,,4.0,,9.0,,7.0,
U9,,6.0,,9.0,,7.0,,8.0,,4.0


#### The highest rated products that our test users have not bought yet:

|  User | Suggested Purchase  | Anticipated Review Score
|----|:---:|:-:|
| U7 |  P10  | 9 |
| U8 | P6 or P10|8|
| U9 | P5 | 9 |

## Problem 2: Social Recommendation Systems

Using the FilmTrust dataset which has historical likes information and social network connection create recommendation systems, pick any three pairs of randomly selected users and 2 randomly selected movies and make recommendations with and without the social network information.


In [None]:
# Load the Relevant libraries
import sklearn as sk

# URL for the Social Network Data (UW Repository)
url = "FilmTrustSocialNetwork.csv"

# URL for the Ratings Data (UW Repository)
url = "FilmTrustRatings.csv"