# <h1 align="center"><font color="red">Simple Example of Recommender System in Python</font></h1>

<font color="yellow">Data Scientist.: Dr. Eddy Giusepe Chirinos Isidro</font>

# <font color="gree">Contextualizando</font>

<font color="orange">Os conjuntos de dados [MovieLens](https://grouplens.org/datasets/movielens/100k/) foram coletados pelo `GroupLens Research Project` da Universidade de Minnesota.

Este conjunto de dados consiste em:

* 100.000 avaliações (1-5) de 943 usuários em 1.682 filmes.

* Cada usuário avaliou pelo menos 20 filmes.

* Informações demográficas simples para os usuários (idade, sexo, ocupação, CEP)

Os dados foram coletados através do site MovieLens (`movielens.umn.edu`) durante o período de sete meses, de 19 de setembro de 1997 a 22 de abril de 1998. Esses dados foram limpos - usuários que tiveram menos de 20 avaliações ou não têm informações demográficas completas foram removidas deste conjunto de dados.</font>

# <font color="gree">Carregamos nossas Bibliotecas e nosso Dataset</font>

In [1]:
# Importando as bibliotecas:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier


# Carregar Dados de avaliação do usuário:
ratings = pd.read_csv('./ml-100k/u.data', sep='\t', names = ['UserID', 'MovieID', 'Rating', 'Timestamp'])
ratings.head()

Unnamed: 0,UserID,MovieID,Rating,Timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [2]:
ratings.shape

(100000, 4)

In [3]:
# Carregar informações do filme:
movies_cols = ['MovieID', 'Title', 'ReleaseDate', 'VideoReleaseDate', 'IMDbURL', 'Unknown', 'Action', 'Adventure', 'Animation', 'Childrens',
               'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'FilmNoir', 'Horror', 'Musical', 'Mystery', 'Romance', 'SciFi', 'Thriller', 'War', 'Western']

movies = pd.read_csv('./ml-100k/u.item', sep='|', names = movies_cols, encoding='latin-1')
movies.head()

Unnamed: 0,MovieID,Title,ReleaseDate,VideoReleaseDate,IMDbURL,Unknown,Action,Adventure,Animation,Childrens,...,Fantasy,FilmNoir,Horror,Musical,Mystery,Romance,SciFi,Thriller,War,Western
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...,0,0,0,1,1,...,0,0,0,0,0,0,0,0,0,0
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...,0,1,1,0,0,...,0,0,0,0,0,0,0,1,0,0
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995),0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [4]:
movies.shape

(1682, 24)

In [5]:
# Carregar informações do usuário:
users = pd.read_csv('./ml-100k/u.user', sep='|', names = ['UserID', 'Age', 'Gender', 'Occupation', 'ZipCode'])
users.head()

Unnamed: 0,UserID,Age,Gender,Occupation,ZipCode
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [6]:
users.shape

(943, 5)

In [7]:
# Preparação de dados:
user_ratings = ratings.pivot(index='UserID', columns='MovieID', values='Rating').fillna(0)
user_ratings

MovieID,1,2,3,4,5,6,7,8,9,10,...,1673,1674,1675,1676,1677,1678,1679,1680,1681,1682
UserID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,3.0,4.0,3.0,3.0,5.0,4.0,1.0,5.0,3.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
939,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
940,0.0,0.0,0.0,2.0,0.0,0.0,4.0,5.0,3.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
941,5.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
942,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# <font color="gree">Instanciando o Algoritmo de aprendizado supervisionado</font>

In [8]:
# Construindo o modelo:
knn_model = KNeighborsClassifier(n_neighbors=5, metric='cosine', n_jobs=8)  # Usando similaridade de cosseno como medida de distância

# Treinamento do modelo:
knn_model.fit(user_ratings.values, user_ratings.index)

# <font color="gree">Recomendação para um Usuário através do `ID`</font>

In [9]:
# Avaliação do modelo (opcional).
# Recomendação de filmes para usuário específico:
target_user_id = 1  # ID do usuário de destino 

user_ratings_target = user_ratings.loc[target_user_id].values.reshape(1, -1)  # Classificações de usuários alvo
recommended_movies_indices = knn_model.kneighbors(user_ratings_target, n_neighbors=10, return_distance=False)


# <font color="gree">Printando as Recomendações para dito Usuário</font>

In [12]:
# Imprimir recomendações:
print("As 10 principais recomendações para o usuário", target_user_id, ":")
print("")
for movie_index in recommended_movies_indices[0]:
    movie_id = user_ratings.columns[movie_index]
    movie_title = movies[movies['MovieID'] == movie_id]['Title'].values[0]
    print(movie_title)

As 10 principais recomendações para o usuário 1 :

Toy Story (1995)
Lost in Space (1998)
My Fellow Americans (1996)
Chasing Amy (1997)
True Romance (1993)
Butch Cassidy and the Sundance Kid (1969)
Free Willy 3: The Rescue (1997)
Threesome (1994)
Day the Earth Stood Still, The (1951)
Ulee's Gold (1997)
