# Sistema de Recomendação de Filmes

Para construir um sistema de recomendação que recomenda filmes aos usuários com base em suas avaliações passadas e na similaridade entre usuários ou itens, você pode seguir os passos abaixo. Vamos usar os datasets `u.data`, `u.item` e `u.user` do MovieLens 100k.

## Passo 1: Carregar os Dados
Primeiro, carregue os dados dos arquivos `u.data`, `u.item` e `u.user`:

In [21]:
import pandas as pd

# Carregar os dados
ratings = pd.read_csv('u.data', sep='\t', names=['user_id', 'item_id', 'rating', 'timestamp'])
movies = pd.read_csv('u.item', sep='|', names=['item_id', 'title'], usecols=[0, 1], encoding='latin-1')
users = pd.read_csv('u.user', sep='|', names=['user_id', 'age', 'gender', 'occupation', 'zip_code'])

# Exibir as primeiras linhas dos DataFrames
ratings.head(), movies.head(), users.head()

(   user_id  item_id  rating  timestamp
 0      196      242       3  881250949
 1      186      302       3  891717742
 2       22      377       1  878887116
 3      244       51       2  880606923
 4      166      346       1  886397596,
    item_id              title
 0        1   Toy Story (1995)
 1        2   GoldenEye (1995)
 2        3  Four Rooms (1995)
 3        4  Get Shorty (1995)
 4        5     Copycat (1995),
    user_id  age gender  occupation zip_code
 0        1   24      M  technician    85711
 1        2   53      F       other    94043
 2        3   23      M      writer    32067
 3        4   24      M  technician    43537
 4        5   33      F       other    15213)

## Passo 2: Dividir os Dados
Divida os dados em conjuntos de treinamento e teste:

In [22]:
from sklearn.model_selection import train_test_split

# Dividir os dados em 80% para treinamento e 20% para validação
train_data, test_data = train_test_split(ratings, test_size=0.2, random_state=42)

# Criar a matriz de usuário-item para o conjunto de treinamento
train_matrix = train_data.pivot(index='user_id', columns='item_id', values='rating').fillna(0)
test_matrix = test_data.pivot(index='user_id', columns='item_id', values='rating').fillna(0)

## Passo 3: Calcular a Similaridade
Calcule a similaridade entre os usuários:

In [23]:
from sklearn.metrics.pairwise import cosine_similarity

# Calcular a similaridade do cosseno entre os usuários
user_similarity = cosine_similarity(train_matrix)
user_similarity_df = pd.DataFrame(user_similarity, index=train_matrix.index, columns=train_matrix.index)
user_similarity_df.head()

user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.136196,0.030424,0.026203,0.284613,0.331412,0.319056,0.274139,0.083486,0.281396,...,0.277459,0.084849,0.205849,0.144161,0.133679,0.092367,0.216948,0.084181,0.104599,0.329288
2,0.136196,1.0,0.114644,0.16822,0.093128,0.162165,0.095848,0.09136,0.149476,0.125701,...,0.149359,0.268977,0.320095,0.323347,0.241012,0.152655,0.230951,0.117484,0.166632,0.096719
3,0.030424,0.114644,1.0,0.346894,0.0,0.085071,0.032829,0.053875,0.060177,0.052552,...,0.021713,0.017707,0.154299,0.049358,0.107604,0.019022,0.101207,0.021959,0.127179,0.013805
4,0.026203,0.16822,0.346894,1.0,0.011848,0.051287,0.075209,0.1421,0.060465,0.035202,...,0.034908,0.04448,0.087428,0.118082,0.100612,0.0,0.151086,0.110324,0.112342,0.032367
5,0.284613,0.093128,0.0,0.011848,1.0,0.168527,0.298438,0.18529,0.039737,0.166013,...,0.276012,0.103529,0.085547,0.072429,0.104445,0.049198,0.204472,0.148028,0.099978,0.247527


## Passo 4: Prever as Avaliações
Crie uma função para prever a nota de um usuário para um item:

In [24]:
def predict_rating(user_id, item_id, train_matrix, user_similarity_df):
    # Obter a similaridade do usuário com todos os outros usuários
    user_similarities = user_similarity_df.loc[user_id]
    
    # Obter as notas dos outros usuários para o item
    item_ratings = train_matrix[item_id]
    
    # Calcular a nota prevista como a média ponderada das notas dos outros usuários
    weighted_sum = (user_similarities * item_ratings).sum()
    similarity_sum = user_similarities[item_ratings > 0].sum()
    
    if similarity_sum == 0:
        return 0  # Evitar divisão por zero
    
    predicted_rating = weighted_sum / similarity_sum
    return predicted_rating

# Ensure user_id and item_id are present in train_matrix and user_similarity_df
test_data = test_data[test_data['user_id'].isin(train_matrix.index) & test_data['item_id'].isin(train_matrix.columns)]

# Prever as notas para o conjunto de validação
test_data['predicted_rating'] = test_data.apply(lambda row: predict_rating(row['user_id'], row['item_id'], train_matrix, user_similarity_df), axis=1)

## Passo 5: Avaliar o Modelo
Calcule o RMSE para avaliar o modelo:

In [25]:
import numpy as np
from sklearn.metrics import mean_squared_error

# Calcular o RMSE
rmse = np.sqrt(mean_squared_error(test_data['rating'], test_data['predicted_rating']))
print(f'RMSE: {rmse}')

RMSE: 1.0145531777915653


## Passo 6: Recomendação de Filmes
Crie uma função para recomendar filmes:

In [27]:
def recomendar(user_id, train_matrix, user_similarity_df, movies, n_recommendations=5):
    # Obter todos os itens
    all_items = train_matrix.columns
    
    # Prever a nota para todos os itens
    predictions = [predict_rating(user_id, item_id, train_matrix, user_similarity_df) for item_id in all_items]
    
    # Criar um DataFrame com as previsões
    predictions_df = pd.DataFrame({'item_id': all_items, 'predicted_rating': predictions})
    
    # Ordenar as previsões por nota
    predictions_df = predictions_df.sort_values(by='predicted_rating', ascending=False)
    
    # Obter os top N itens recomendados
    top_recommendations = predictions_df.head(n_recommendations)
    
    # Adicionar os títulos dos filmes
    top_recommendations = top_recommendations.merge(movies, on='item_id')
    
    return top_recommendations

# Exemplo de recomendação para um usuário específico
user_id = 1
recommendations = recomendar(user_id, train_matrix, user_similarity_df, movies)
print(f'Recomendações para o usuário {user_id}:')
print(recommendations)

Recomendações para o usuário 1:
   item_id  predicted_rating                                 title
0     1189               5.0                    Prefontaine (1997)
1     1500               5.0             Santa with Muscles (1996)
2     1612               5.0               Leading Man, The (1996)
3     1467               5.0  Saint of Fort Washington, The (1993)
4      850               5.0           Perfect Candidate, A (1996)


## Conclusão
Este é um exemplo básico de como construir um sistema de recomendação baseado na similaridade entre usuários usando o dataset MovieLens 100k. Você pode aprimorar o modelo ajustando os hiperparâmetros, experimentando diferentes técnicas de similaridade e incorporando mais informações dos datasets.