### Pair Problem

You are given a (user_movie_likes.csv) with two columns: UserID and MovieID. Each row lists a user and a movie they liked.

1) Write a function that will take a MovieID and will return three movies that are similar to it (based on the similiarities in user likes).

2) Write a function that will take a UserID and recommend three movies based on what they have liked.

This an open-ended problem. Come up with a simple metric and just code it up. Come see me if you have any Qs!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv('user_movie_likes.csv', names=['UserID', 'MovieID'], dtype='int')

In [3]:
df.head()

Unnamed: 0,UserID,MovieID
0,412,512
1,458,770
2,185,37
3,137,701
4,190,870


In [4]:
# Create ratings matrix pivot table
ratings_matrix = df.pivot_table(index='UserID', columns='MovieID', aggfunc=len).fillna(0).astype(int)

In [5]:
ratings_matrix.head()

MovieID,2,5,6,9,10,12,13,15,16,17,...,987,988,989,990,992,993,994,995,996,997
UserID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,1,0,0,0,0,...,0,0,1,1,0,1,0,1,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
def recommend_movies(ratings_matrix, selected_movie, num_rec=3):
    
    # Remove selected movie from movies list
    movie_list = ratings_matrix.columns[ratings_matrix.columns != selected_movie]
    # Compare selected movie to all other movies
    difference = []
    for movie in movie_list:
        # Calculate simple ratings difference metric (pseudo levenshtein distance approach)
        difference.append(np.sum(np.abs(ratings_matrix.loc[:, selected_movie] - ratings_matrix.loc[:, movie])))
    # Reverse sort and selct desired number of similar movies
    idx = np.argsort(difference)[::-1][:num_rec]

    return list(movie_list[idx])

In [7]:
list(recommend_movies(ratings_matrix, 2))

[674, 769, 346]

In [8]:
for movie in df.MovieID[:10]:
    print('Movie: %3d,' % movie, 'Most Similar Movies:', recommend_movies(ratings_matrix, movie, 3))

Movie: 512, Most Similar Movies: [346, 674, 769]
Movie: 770, Most Similar Movies: [769, 674, 346]
Movie:  37, Most Similar Movies: [769, 674, 346]
Movie: 701, Most Similar Movies: [674, 346, 769]
Movie: 870, Most Similar Movies: [346, 769, 674]
Movie: 324, Most Similar Movies: [346, 674, 769]
Movie: 409, Most Similar Movies: [769, 674, 346]
Movie: 432, Most Similar Movies: [769, 674, 346]
Movie:  80, Most Similar Movies: [674, 346, 769]
Movie: 652, Most Similar Movies: [674, 346, 769]
