<a href="https://colab.research.google.com/github/cyprianoeduardo/MAB760_Computational_Linear_Algebra/blob/master/Recomendador_de_filmes_usando_SVD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Recomendador de filmes usando SVD

Importando bibliotecas:

In [0]:
import numpy as np
import pandas as pd

Importando as bases de dados `ratings.dat` e `movies.dat`, do [Dataset MovieLens 1M](https://grouplens.org/datasets/movielens/1m/):

In [0]:
data = pd.io.parsers.read_csv('/content/ratings.dat', 
    names=['user_id', 'movie_id', 'rating', 'time'],
    engine='python', delimiter='::')
movie_data = pd.io.parsers.read_csv('/content/movies.dat',
    names=['movie_id', 'title', 'genre'],
    engine='python', delimiter='::')

In [46]:
data.head()

Unnamed: 0,user_id,movie_id,rating,time
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [41]:
movie_data.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


Criando a matriz de classificações (f × u), sendo linhas como filmes e colunas como usuários:

In [0]:
ratings_mat = np.ndarray(
    shape=(np.max(data.movie_id.values), np.max(data.user_id.values)),
    dtype=np.uint8)
ratings_mat[data.movie_id.values-1, data.user_id.values-1] = data.rating.values #WHAT IS THIS?!?!

In [51]:
print(ratings_mat)
print(data.movie_id.values)
print(data.user_id.values)
print(data.rating.values)

[[  5   0 111 ... 127   0   3]
 [112 189 111 ... 127   0   0]
 [ 88  36  72 ... 127   0   0]
 ...
 [ 48  72  42 ... 127   0   0]
 [ 88  36  72 ... 127   0   0]
 [200 228  43 ... 127   0   0]]
[1193  661  914 ...  562 1096 1097]
[   1    1    1 ... 6040 6040 6040]
[5 3 3 ... 5 4 4]


Normalizando a matriz, subtraindo a média da mesma:

In [0]:
normalised_mat = ratings_mat - np.asarray([(np.mean(ratings_mat, 1))]).T

Calculando o SVD:

In [0]:
A = normalised_mat.T / np.sqrt(ratings_mat.shape[0] - 1)
U, S, V = np.linalg.svd(A)

Calculando a similaridade de cosseno, classificando os mais similares e retornando o maior:

In [0]:
def top_cosine_similarity(data, movie_id, top_n=10):
    index = movie_id - 1 # Movie id starts from 1
    movie_row = data[index, :]
    magnitude = np.sqrt(np.einsum('ij, ij -> i', data, data))
    similarity = np.dot(movie_row, data.T) / (magnitude[index] * magnitude)
    sort_indexes = np.argsort(-similarity)
    return sort_indexes[:top_n]

# Helper function to print top N similar movies
def print_similar_movies(movie_data, movie_id, top_indexes):
    print('Recomendações para {0}: \n'.format(
    movie_data[movie_data.movie_id == movie_id].title.values[0]))
    for id in top_indexes + 1:
        print(movie_data[movie_data.movie_id == id].title.values[0])

Definindo os `k` componentes principais para representar os filmes, um `movie_id` para encontrar recomendações e imprimir os `top_`n primeiros resultados:

In [61]:
k = 50
movie_id = 1 # Grab an id from movies.dat
top_n = 10

sliced = V.T[:, :k] # representative data
indexes = top_cosine_similarity(sliced, movie_id, top_n)
print_similar_movies(movie_data, movie_id, indexes)

Recomendações para Toy Story (1995): 

Toy Story (1995)
Aladdin (1992)
Willy Wonka and the Chocolate Factory (1971)
Toy Story 2 (1999)
Lion King, The (1994)
Babe (1995)
Princess Bride, The (1987)
Sword in the Stone, The (1963)
Bug's Life, A (1998)
Nightmare Before Christmas, The (1993)


# Referências

1. QUEK, Alyssa. Simple Movie Recommender Using SVD | Alyssa. Disponível em: <https://alyssaq.github.io/2015/20150426-simple-movie-recommender-using-svd/>. Acesso em: 8 maio 2019. 