<picture>
  <!--Imagem para o tema dark-->
  <source media="(prefers-color-scheme: dark)" srcset="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Banners/Dark_Titulo5_Modelagem_KNN.png?raw=true">
  
  <!--Imagem para o tema light-->
  <source media="(prefers-color-scheme: light)" srcset="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Banners/Titulo5.5_Modelagem_KNN.png?raw=true">

  <!--Imagem padrão (quando os temas dark e light não forem identificados -->
  <img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Banners/Titulo5.5_Modelagem_KNN.png?raw=true">
</picture>

# **Introdução**
O KNN é um algoritmo de classificação bem simples. Ele memoriza um conjunto de dados de treinamento e prevê  “a classe de um novo ponto de dados com base na classe dos na classe dos vizinhos mais próximos deste ponto, assumindo que pontos próximos tem uma maior chance de pertencer à mesma categoria” (FONTANA, 2020).
<br><br>
Isso significa que ao contrário de muitos algoritmos de classificação, o KNN não faz nenhum aprendizado. Ele simplesmente armazena os dados de treinamento literalmente.
<br><br>
Quando um novo dado de teste é apresentado, o k-NN compara sua proximidade com os dados de treinamento. E então, ele seleciona os k vizinhos mais próximos e atribui um rótulo a este novo ponto. 
<br><br>
Para fazer esta previsão nos dados de teste, ele calcula a distância entre dois pontos. Existem várias distâncias que podem ser utilizadas pelo KNN, como a Euclidiana e a Manhatan. 
<br><br>
Este algoritmo não requer ajuste de parâmetro durante o treinamento, porém é muito importante escolher com sabedoria o valor do k. Um valor de k muito pequeno, permite com que dados ruidosos ou exceções influenciem na rotulagem. Porém escolher um k muito grande, faz com que o algoritmo ignore pequenos padrões importantes. Este problema é conhecido como **compensação de viés-variância** (LANTZ, 2019). 
<br><br>
Na figura abaixo, queremos descobrir a categoria do ponto cinza . Se escolhermos k =1, o algoritmo vai classifica-lo como verde. Porém, se usarmos um k=3 , teremos uma classificação mais adequada, pois o ponto será rotulado como laranja. 
<br>
[<img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_Introducao_v1.png?raw=true">](https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_Introducao_v1.png?raw=true)<br>

# **Exemplo**
No KNN um novo ponto é classificado de acordo com a moda dos “K” vizinhos mais próximos. <br>
No exemplo a seguir um usuário assistiu o filme “x” (novo ponto de dado). E, nós queremos descobrir se vamos recomendar o filme “A” ou “B”. As variáveis independentes são número de avalições recebidas por filme e avaliação do usuário, numa escala de 0 à 5.
<br>

[<img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_1.3.png?raw=true" width="67%">](https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_1.3.png?raw=true)<br>
Na **fig(1)**, estipulamos que “K” é igual a 3. Então vamos considerar os pontos mais próximos de “x”. Como resultado o filme “A” tem uma moda maior. Portanto, a recomendação para quem assistiu o filme “x” é assistir o filme “A”. 
<br>

Já na **fig(2)**, o “K” é igual a 5. Isto faz com que a o filme “B” tenha uma moda maior. E a recomendação será diferente. Neste caso, a recomendação para quem assistiu o filme “x” é assistir o filme “B”.

Para encontrar os “K” vizinhos mais próximos é necessário medir a distância entre os pontos. Existem várias maneiras de fazer, e uma dela é utilizando a distância euclidiana. Ela funciona da seguinte maneira: 
- Primeiro traçamos um triângulo retângulo e calculamos a variação entre os dois pontos das n-dimensões. 
- Em seguida, elevamos estas variações ao quadrado e somamos elas.
- Calculamos a raiz quadrada desta soma. 

No final teremos a distância euclidiana que nada mais é que a hipotenusa do triângulo retângulo traçado. <br>
No exemplo a seguir, calculamos a distância euclidiana dos pontos que correspondem ao filme “A” e o filme “x”. Temos **duas dimensões**, pois são duas variáveis independentes.

<br>

[<img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_2.1.png?raw=true" width="60%">](https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_2.1.png?raw=true)

É possível calcular a distância euclidiana quando há **mais de duas dimensões**, ou seja, mais de duas variáveis independentes. A fórmula é:


[<img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_3.png?raw=true" width="55%">](https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/KNN_3.png?raw=true)


<br><br>
**Um processo similiar ao descrito no exemplo será feito nesta etapa.**

# **Escolha do *K***
Embora não haja uma fórmula ou atalho específico para determinar o valor de K , existem certas abordagens que sugerem os melhores valores possíveis de K . Uma dessas abordagens é definir K como a raiz quadrada do número de itens no conjunto de dados de treinamento (LANTZ, 2019).
<br><br>
Neste caso, o número de itens é 24.326 (filmes) e a raiz quadrada é aproximadamente 155.

In [36]:
np.sqrt(24326)

np.float64(155.96794542469294)

# **Importar bibliotecas**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import confusion_matrix

# KNeighborsClassifier é um modelo de classificação
#  baseado no algoritmo K-Nearest Neighbors (KNN) da biblioteca scikit-learn.
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

# confusion_matrix, accuracy_score, precision_score, 
#   recall_score, f1_score: Funções do scikit-learn 
#   para calcular métricas de desempenho.
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingClassifier
import time
from tqdm import tqdm
from sklearn.base import clone


In [2]:
# Usado no Modelo 2
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import pairwise_distances
from sklearn.metrics.pairwise import pairwise_kernels

In [3]:
import joblib

# **Carregar arquivos**

<br>

#### Tabelas para modelagem 1: filmes

In [34]:
# Dados de treino
knn_filmes_treino = pd.read_pickle("C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/3.Datasets_Transformação/3.3_Datasets_Transformação_parte_3/knn_filmes_treino.pickle", compression='gzip')

In [35]:
# Dados de teste
knn_filmes_teste = pd.read_pickle("C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/3.Datasets_Transformação/3.3_Datasets_Transformação_parte_3/knn_filmes_teste.pickle", compression='gzip')

In [7]:
# Olhar tabela de treino para modelagem 1: Filmes
knn_filmes_treino

title,(2019),"""Great Performances"" Cats (1998)",#Alive (2020),#Female Pleasure (2018),#Iamhere (2020),#UNFIT: The Psychology of Donald Trump (2019),$ (Dollars) (1971),$5 a Day (2008),$9.99 (2008),$ellebrity (Sellebrity) (2012),...,Üvegtigris (2001),Τέλειοι Ξένοι (2016),Χούλιγκανς: Κάτω τα χέρια απ' τα νιάτα! (1983),Делай - раз! (1989),Каменная башка (2008),Карусель (1970),Он вам не Димон (2017),Пес Барбос и необычный кросс (1961),Я худею (2018),…And the Fifth Horseman Is Fear (1965)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
49,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
119,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
134,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
330651,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330661,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330811,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330949,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
# Olhar tabela de teste para modelagem 1: Filmes
knn_filmes_teste

title,#Alive (2020),$ (Dollars) (1971),'71 (2014),'83 (2021),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),"'burbs, The (1989)",'night Mother (1986),...,"tick, tick...BOOM! (2021)",xXx (2002),xXx: Return of Xander Cage (2017),xXx: State of the Union (2005),¡Three Amigos! (1986),¿Quién mató a Bambi? (2013),À nous la liberté (Freedom for Us) (1931),Ánimas (2018),Épouse-moi mon pote (2017),"Ужас, который всегда с тобой (2007)"
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
128,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
172,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
598,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
919,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
330236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330321,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# **Modelo 1**
Para rodar o modelo 1, usamos `KNeighborsClassifier()` e `KNeighborsRegressor()`. No  `KNeighborsClassifier()` , criou-se um ponto de corte, em que avaliações menores que 3,5 era trocadas por 0 (usuário não gostou do filme) e avaliações acima de 3,5 eram trocadas por 1 (usuário gostou do filme). <br>

Entretanto, o modelo demorou muito para ser rodado e o resultado não foi satisfatório. Por isso, focamos no  `KNeighborsRegressor( )` e geramos o ponto de corte após rodar o modelo. 

## Preparar os Dados

### Ponto de Corte

In [5]:
# Definir um ponto de corte
#corte = 3.5  # ponto de corte

# Transformar em rótulos binários
#knn_filmes_treino1 = knn_filmes_treino.applymap(lambda x: 1 if x > corte else 0)

  knn_filmes_treino1 = knn_filmes_treino.applymap(lambda x: 1 if x > corte else 0)


In [4]:
#knn_filmes_treino1

In [6]:
# Definir um ponto de corte
#corte = 3.5  # ponto de corte

# Transformar em rótulos binários
#knn_filmes_teste1 = knn_filmes_teste.applymap(lambda x: 1 if x > corte else 0)

  knn_filmes_teste1 = knn_filmes_teste.applymap(lambda x: 1 if x > corte else 0)


###  Dados de treino: Separar Variáveis de Entrada (X) e Saída (y)
Vamos utilizar `reshape(-1,1)`para mudar a forma de matriz. Vamos criar um array com 1 coluna e quantas linhas forem necessárias, resultando em um array 2D no formato (n,1). <br> Fonte: [Understanding the Differences Between Numpy Reshape(-1, 1) and Reshape(1, -1)](https://saturncloud.io/blog/understanding-the-differences-between-numpy-reshape1-1-and-reshape1-1/#2) 

In [None]:
# Usando KNeighborRegressor() ----------
# X_train são os índices dos usuários no DataFrame train convertidos para um array 2D.
X_train_filmes = knn_filmes_treino.index.values.reshape(-1, 1)
#y_train são os valores dos filmes que os usuários assistiram no DataFrame train
y_train_filmes = knn_filmes_treino.values

# Usando KNeighborsClassifier() ----------
# X_train são os índices dos usuários no DataFrame train convertidos para um array 2D.
#X_train_filmes1 = knn_filmes_treino1.index.values.reshape(-1, 1)
# y_train são os valores dos filmes que os usuários assistiram no DataFrame train
#y_train_filmes1 = knn_filmes_treino1.values

In [10]:
X_train_filmes

array([[     5],
       [    15],
       [    49],
       ...,
       [330811],
       [330949],
       [330963]])

In [11]:
y_train_filmes 

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

###  Dados de Teste: Separar Variáveis de Entrada (X) e Saída (y)

In [29]:
# Usando KNeighborRegressor() ----------
# Separar variáveis de entrada (X) e saída (y) para o teste
X_test_filmes = knn_filmes_teste.index.values.reshape(-1, 1)
y_test_filmes = knn_filmes_teste.values

# Usando KNeighborsClassifier() ----------
# Separar variáveis de entrada (X) e saída (y) para o teste
#X_test_filmes1 = knn_filmes_teste1.index.values.reshape(-1, 1)
#y_test_filmes1 = knn_filmes_teste1.values

In [13]:
X_test_filmes

array([[   128],
       [   172],
       [   465],
       ...,
       [330496],
       [330667],
       [330948]])

In [14]:
y_test_filmes

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

<br>

## Treinar o Modelo 1
Testamos diferentes valores de `n_neighbors` usando `KNeighborsRegressor` e `KNeighborsClassifier`. 

In [30]:
# Usando KNeighborRegressor() ----------
# Treinar o modelo KNN com os dados de treinamento (X_train e y_train)
# Usando KNeighborsRegressor()
modelo_filmes = KNeighborsRegressor(n_neighbors=155)
modelo_filmes.fit(X_train_filmes, y_train_filmes)


# Usando KNeighborsClassifier()-------
#modelo_filmes1 = KNeighborsClassifier(n_neighbors=47)
#modelo_filmes1.fit(X_train_filmes1, y_train_filmes1)


Como o resultado usando `KNeighborsClassifier()` demorou muito para terminar, vamos salvá-los.

In [10]:
# SALVAR O MODELO
# Usando KNeighborsClassifier()-------

# Salvar modelo treinado (k=9)
#joblib.dump(modelo_filmes1,'C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_9/modelo_filmes1_Classificacao_k_9.joblib')

# Salvar modelo treinado (k=47)
#joblib.dump(modelo_filmes1,'C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_47/modelo_filmes1_Classificacao_k_47.joblib')

['C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_47/modelo_filmes1_Classificacao_k_47.joblib']

##  Prever Recomendações para o Conjunto de Teste
Vamos fazer as previsões para os dados de teste (`X_test_filmes`) usando o modelo KNN treinado.

In [None]:
# Usando KNeighborRegressor() ----------
# Usa o modelo treinado para fazer previsões (y_pred) para os dados de teste (X_test).
y_pred_filmes = modelo_filmes.predict(X_test_filmes)

# Usando KNeighborsClassifier()-------
# Usa o modelo treinado para fazer previsões (y_pred) para os dados de teste (X_test).
#y_pred_filmes1 = modelo_filmes1.predict(X_test_filmes1)

In [12]:
# SALVAR A TABELA
# Usando KNeighborsClassifier()-------

# Salvar previsão do modelo (k=9)
#joblib.dump(y_pred_filmes1, 'C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_9/ y_pred_filmes1_Classificacao_k_9.joblib')

# Salvar previsão do modelo (k=47)
#joblib.dump(y_pred_filmes1, 'C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_47/ y_pred_filmes1_Classificacao_k_47.joblib')

['C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_47/ y_pred_filmes1_Classificacao_k_47.joblib']

In [17]:
y_pred_filmes 

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

## Converter Previsões em um DataFrame
Para facilitar a visualização das recomendações, vamos transformar previsões em um dataframe em que 0 siginifica não recomendado. 

In [18]:
# Verificar as dimensões (linhas, colunas)
knn_filmes_treino.shape

(7943, 24326)

In [19]:
# Verificar as dimensões (linhas, colunas)
y_pred_filmes.shape

(1986, 24326)

In [20]:
# Verificar as dimensões (linhas, colunas)
knn_filmes_teste.shape

(1986, 15496)

### Previsões em um DataFrame

In [21]:
# Converte as previsões (y_pred) em um DataFrame (pred_df)
#  com as mesmas colunas do DataFrame train e o índice do DataFrame test.
#
# Usando KNeighborRegressor() ----------
# Exibe as recomendações de filmes para os usuários no conjunto de teste
df_pred_filmes = pd.DataFrame(y_pred_filmes, columns=knn_filmes_treino.columns, index=knn_filmes_teste.index)
print("Recomendações de filmes para o conjunto de teste:")
df_pred_filmes 

# Usando KNeighborsClassifier()-------
#df_pred_filmes1 = pd.DataFrame(y_pred_filmes1, columns=knn_filmes_treino1.columns, index=knn_filmes_teste1.index)
#print("Recomendações de filmes para o conjunto de teste:")
#df_pred_filmes 



Recomendações de filmes para o conjunto de teste:


title,(2019),"""Great Performances"" Cats (1998)",#Alive (2020),#Female Pleasure (2018),#Iamhere (2020),#UNFIT: The Psychology of Donald Trump (2019),$ (Dollars) (1971),$5 a Day (2008),$9.99 (2008),$ellebrity (Sellebrity) (2012),...,Üvegtigris (2001),Τέλειοι Ξένοι (2016),Χούλιγκανς: Κάτω τα χέρια απ' τα νιάτα! (1983),Делай - раз! (1989),Каменная башка (2008),Карусель (1970),Он вам не Димон (2017),Пес Барбос и необычный кросс (1961),Я худею (2018),…And the Fifth Horseman Is Fear (1965)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
128,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
172,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
598,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
919,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
330236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330321,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
330667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [15]:
# SALVAR A TABELA
# Usando KNeighborsClassifier()-------

# Salvar as previsões convertidas em um DataFrame (k=9)
#df_pred_filmes1.to_pickle('C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_9/ df_pred_filmes1_Classificacao_k_9.pickle', compression='gzip')

# Salvar as previsões convertidas em um DataFrame (k=47)
#df_pred_filmes1.to_pickle('C:/0.Projetos/5.Sistema_de_Recomendacao_MovieLens_2/Datasets/5.Modelagem_KNN/KNN_Classificacao_k_47/ df_pred_filmes1_Classificacao_k_47.pickle', compression='gzip')

In [22]:
# Contando valores diferentes de zero em toda a tabela
total_valores_nao_zero = (df_pred_filmes  != 0).sum().sum()
print(f"Total de valores não zero na tabela: {total_valores_nao_zero}")

Total de valores não zero na tabela: 4972042


<br>

#### Quantas vezes cada filme foi recomendado
Para isso vamos olhar os resultados diferentes de zero.

In [23]:
# Contar valores diferentes de zero em cada coluna
count_sem_zero = (df_pred_filmes != 0).sum()

# Calcular a proporção desses valores em relação ao total de linhas
proporcao = count_sem_zero / len(df_pred_filmes)

# Criar um DataFrame de resumo
summary_df = pd.DataFrame({
    'Valores_Diferentes_de_zero': count_sem_zero,
    'Proporcao': proporcao
})

<br>

##### Os 10 filmes mais recomendados

In [26]:
print("Os 10 filmes mais recomendados:")
summary_df.sort_values(by='Valores_Diferentes_de_zero', ascending=False).head(10)

Os 10 filmes mais recomendados:


Unnamed: 0_level_0,Valores_Diferentes_de_zero,Proporcao
title,Unnamed: 1_level_1,Unnamed: 2_level_1
"Lord of the Rings: The Two Towers, The (2002)",1986,1.0
"Princess Bride, The (1987)",1986,1.0
Monty Python and the Holy Grail (1975),1986,1.0
"Monsters, Inc. (2001)",1986,1.0
"Dark Knight, The (2008)",1986,1.0
American History X (1998),1986,1.0
"Shining, The (1980)",1986,1.0
Up (2009),1986,1.0
Toy Story (1995),1986,1.0
Seven (a.k.a. Se7en) (1995),1986,1.0


<br>

#####  Os 10 filmes menos recomendados

In [27]:
print("Os 10 filmes menos recomendados:")
summary_df.sort_values(by='Valores_Diferentes_de_zero', ascending=True).head(10)

Os 10 filmes menos recomendados:


Unnamed: 0_level_0,Valores_Diferentes_de_zero,Proporcao
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Ong-Bak 3: The Final Battle (Ong Bak 3) (2010),3,0.001511
"Rumble in the Air-Conditioned Auditorium: O'Reilly vs. Stewart 2012, The (2012)",3,0.001511
The Hire: Chosen (2001),3,0.001511
Jack and Diane (2012),3,0.001511
The Hire: Hostage (2002),3,0.001511
Lovers of Hate (2010),3,0.001511
"Do-Deca-Pentathlon, The (2012)",3,0.001511
Into the Storm (2009),3,0.001511
Winning Time: Reggie Miller vs. The New York Knicks (2010),3,0.001511
Never Goin' Back (2018),3,0.001511


<br>

## Listagem de recomendações
Criar um dataframe com 2 colunas: UserId e uma lista com as recomendações.

In [28]:
n_recomendacoes =5

# Função para encontrar os top N filmes recomendados para um usuário
def top_recomendacoes(row, n) -> list:
    ''' Função que encontra as melhores recomendações para cada userId
    Args:
      - row = linha do DataFrame. 
              Cada linha representa as recomendações de filme para cada userId.
      - n = número de filmes a serem recomendados 

    Return:
      - Retorna uma lista com os "n" valores mais altos em cada. 
       Ou seja, retorna com as "n" recomendações de filmes.      
    '''
    # Selecionar e retornar os "n" valores mais altos de cada linha
    return row.nlargest(n).index.tolist()

    #OBS: .index(): Obtemos os nomes dos filmes, ao invés dos valores.
    #     .tolist(): Criamos uma lista com os nomes dos filmes 

In [29]:
# Usando KNeighborRegressor() ----------
# Aplicar a função a cada linha do DataFrame de previsões
recomendacoes = df_pred_filmes.apply(top_recomendacoes, n=n_recomendacoes, axis=1) 

# OBS: apply(... axis=1) -> Aplicar em cada linha
#      n = número de recomendações

# Usando KNeighborsClassifier()-------
# Aplicar a função a cada linha do DataFrame de previsões
#recomendacoes1 = df_pred_filmes1.apply(top_recomendacoes, n=n_recomendacoes, axis=1) 

In [30]:
recomendacoes

userId
128       [Shawshank Redemption, The (1994), Forrest Gum...
172       [Shawshank Redemption, The (1994), Forrest Gum...
465       [Shawshank Redemption, The (1994), Forrest Gum...
598       [Shawshank Redemption, The (1994), Forrest Gum...
919       [Shawshank Redemption, The (1994), Forrest Gum...
                                ...                        
330236    [Shawshank Redemption, The (1994), Lord of the...
330321    [Shawshank Redemption, The (1994), Lord of the...
330496    [Shawshank Redemption, The (1994), Lord of the...
330667    [Shawshank Redemption, The (1994), Lord of the...
330948    [Shawshank Redemption, The (1994), Lord of the...
Length: 1986, dtype: object

In [31]:
# Usando KNeighborRegressor() ----------
# Cria um DataFrame para armazenar as recomendações
df_recomendacoes = pd.DataFrame(recomendacoes.tolist(), index=recomendacoes.index, columns=[f"recomendação_{i+1}" for i in range(n_recomendacoes)])

# Transformar o userId em coluna
df_recomendacoes = df_recomendacoes.reset_index()
df_recomendacoes

# Usando KNeighborsClassifier()-------
# Cria um DataFrame para armazenar as recomendações
#df_recomendacoes1 = pd.DataFrame(recomendacoes1.tolist(), index=recomendacoes1.index, columns=[f"recomendação_{i+1}" for i in range(n_recomendacoes)])

# Transformar o userId em coluna
#df_recomendacoes1 = df_recomendacoes1.reset_index()

Unnamed: 0,userId,recomendação_1,recomendação_2,recomendação_3,recomendação_4,recomendação_5
0,128,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999)
1,172,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999)
2,465,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999)
3,598,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999)
4,919,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999)
...,...,...,...,...,...,...
1981,330236,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,..."
1982,330321,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,..."
1983,330496,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,..."
1984,330667,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,..."


In [32]:
df_recomendacoes.isna().sum()

userId            0
recomendação_1    0
recomendação_2    0
recomendação_3    0
recomendação_4    0
recomendação_5    0
dtype: int64

<br>

## Verificar se o userId assistiu a recomendação
Vamos criar uma função chamada `assistiu_ou_nao`. Esta função vai verificar se os usuários assistiram ou não a alguma das recomendações e em seguida vai retornar 1 (se assistiu),  0 (se não assitiu) e 2 (se o filmes não está no conjunto de teste).
<br>

> ##### ⚠ 📌 A opção de retornar existe, pois na hora do split não foi especificado que os dados de treino e teste deveriam ter os mesmos filmes.
> ##### **O split focou em pegar todas as avaliações de um usuário**. Por exemplo, se o "userId"=1 está no conjunto de treino, todas as avaliações que ele fez também estão no conjunto de treino. 
> ##### Por isso, o número de filmes nos dados de teste é menor que nos dados de treino.    

In [39]:
def assistiu_ou_nao(row, df_teste):
    # Obtém as recomendações para o usuário atual
    recomendacoes = row.drop('userId').values
    
    # Verifica se algum dos filmes recomendados está no df_teste e foi assistido (rating diferente de 0)
    for filme in recomendacoes:
        if filme in df_teste.columns and df_teste.loc[row['userId'], filme] != 0:
            return 1
        elif filme not in df_teste.columns:
            return 2 #Se um filme for recomendado mas não tiver nnos dados de teste
        else:
            return 0 #Se nenhum filme recomendado foi assistido, retorna 0
      

In [40]:
# Usando KNeighborRegressor() ----------
# Aplica a função a cada linha do DataFrame de recomendações para criar a coluna dummy
df_recomendacoes['assistiu_recomendacao'] = df_recomendacoes.apply(assistiu_ou_nao, df_teste=knn_filmes_teste, axis=1)

# Exibe o DataFrame final com a coluna dummy
df_recomendacoes

# Usando KNeighborsClassifier()-------
#df_recomendacoes1['assistiu_recomendacao'] = df_recomendacoes1.apply(assistiu_ou_nao, df_teste=knn_filmes_teste1, axis=1)
#df_recomendacoes1


Unnamed: 0,userId,recomendação_1,recomendação_2,recomendação_3,recomendação_4,recomendação_5,assistiu_recomendacao
0,128,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999),0
1,172,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999),0
2,465,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999),0
3,598,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999),0
4,919,"Shawshank Redemption, The (1994)",Forrest Gump (1994),Inception (2010),"Lord of the Rings: The Return of the King, The...",Fight Club (1999),0
...,...,...,...,...,...,...,...
1981,330236,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,...",0
1982,330321,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,...",1
1983,330496,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,...",0
1984,330667,"Shawshank Redemption, The (1994)","Lord of the Rings: The Two Towers, The (2002)","Lord of the Rings: The Return of the King, The...",Fight Club (1999),"Lord of the Rings: The Fellowship of the Ring,...",1


## **Resultados Modelo 1**

Na etapa anterior, geramos as recomendações, e criamos uma coluna (`assistiu_recomendacao`) para comparar as recomendações com os filmes assisitidos pelos `userId` dos dados de teste. Cada linha desta coluna é representada por 0, 1 ou 2.
- **0**: significa que nenhum filme da recomendação foi assistido pelo usuário ;
- **1**: significa que pelo menos um dos filmes recomendados foi assistido pelo usuário;
- **2**: significa que o filme recomendado, não está nos dados de teste. A explicação disso foi dada no tópico acima. 


Foi testado modelos com diferentes `K` usando tanto `KNeighborRegressor()` como ` KNeighborClassifier()`. E, no final, fizemos a contagem dos 0, 1 e 2 da coluna `assistiu_recomendacao` e comparamos o resultado.

In [40]:
# Usando KNeighborRegressor() ----------
# Resultado quando K=47
df_recomendacoes['assistiu_recomendacao'].value_counts()

assistiu_recomendacao
0    1420
1     566
Name: count, dtype: int64

A tabela a seguir tem os resultados dos `K`  usando `KNeighborRegressor()`. Como é possível observar, o valor `K=47` teve o melhor resultado.

[<img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/Resultado_KNN_Regressor_v1.0.png?raw=true" width="100%">](https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/Resultado_KNN_Regressor_v1.0.png?raw=true)

A tabela a seguir tem os resultados dos `K` usando `KNeighboClassifier()`, como os modelos demoraram muito para rodar, só foram testados dois valores de `K`. 

[<img src="https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/Resultado_KNN_Classifier_v1.1.png?raw=true" width="40%">](https://github.com/CatarinaAguiar3/Projeto_Sistema_de_Recomendacao_MovieLens/blob/main/Imagens/Modelagem/Resultado_KNN_Classifier_v1.1.png?raw=true)

# **Modelo 2: Filmes**
No modelo 2 vamos tentar uma abordagem diferente. Primeiro vamos selecionar as colunas (filmes) que estão tanto nos dados de treino como de teste. Em seguida, vamos rodar os vizinhos mais próximos **não supervisionados** (`NearestNeighbors`) nos dados de treino, e o resultado vamos utilizar para rodar os ***k*** vizinhos mais próximos **supervisionados** (`kneighbors`) nos dados de teste. 

## Preparar os dados 

### Selecionar as colunas 

In [8]:
# Garantir que ambos os DataFrames tenham as mesmas colunas (filmes)
common_columns = knn_filmes_treino.columns.intersection(knn_filmes_teste.columns)
knn_filmes_treino = knn_filmes_treino[common_columns]
knn_filmes_teste = knn_filmes_teste[common_columns]

### Converter as matrizes em matrizes esparsas
Isso é importante para evitar despérdício de memória.

In [37]:
# Função para converter as matrizes em matrizes esparsas
def create_sparse_matrix(matrix):
    return csr_matrix(matrix)

In [38]:
# Preparar os dados
features_matrix_treino = create_sparse_matrix(knn_filmes_treino.values)
features_matrix_teste = create_sparse_matrix(knn_filmes_teste.values)

In [39]:
features_matrix_treino 

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 820478 stored elements and shape (7943, 24326)>

### Treinar o `Nearest Neighbors`
Usar o `Nearest Neighbors()` nos dados de treino. Esta é uma abordagem **não surpevisionada**.

In [9]:
# Função para criar e treinar o modelo Nearest Neighbors
def train_model(features_matrix, n_neighbors):
    model = NearestNeighbors(n_neighbors=n_neighbors, metric='euclidean', algorithm='ball_tree')
    model.fit(features_matrix)
    return model


In [42]:
# Criar e treinar o modelo
model = train_model(features_matrix_treino, n_neighbors=155)
model



### Recomendações

#### Recomendações para um único userId

In [43]:
# Função para obter recomendações para um usuário específico
def get_recommendations(model, knn_filmes_treino, features_matrix_teste, query_index, n_neighbors):
    distances, indices = model.kneighbors(features_matrix_teste[query_index], n_neighbors=n_neighbors+1)  # +1 para incluir o próprio filme
    recommendations = [knn_filmes_treino.columns[indices.flatten()[i+1]] for i in range(n_neighbors)]  # Excluir o próprio filme (índice 0)
    return recommendations


#### Recomendações para todos os userId

In [10]:
# Função principal para recomendação de filmes para todos os usuários nos dados de teste
def recommend_movies(knn_filmes_treino, knn_filmes_teste, model, features_matrix_teste, n_neighbors=150):
    # Inicializar uma lista para armazenar as recomendações
    recommendations_list = []

    # Para cada usuário nos dados de teste, encontrar filmes similares usando kneighbors
    for query_index in range(features_matrix_teste.shape[0]):
        recommendations = get_recommendations(model, knn_filmes_treino, features_matrix_teste, query_index, n_neighbors)
        recommendations_list.append({
            'userId': knn_filmes_teste.index[query_index],
            'Recommendation_1': recommendations[0],
            'Recommendation_2': recommendations[1],
            'Recommendation_3': recommendations[2],
            'Recommendation_4': recommendations[3],
            'Recommendation_5': recommendations[4]
        })

    # Converter a lista de recomendações em DataFrame
    recommendations_df = pd.DataFrame(recommendations_list)
    return recommendations_df

In [13]:
# Garantir que os índices são 'userId' e obter recomendações
recommendations = recommend_movies(knn_filmes_treino, knn_filmes_teste, model, features_matrix_teste, n_neighbors=150)



In [12]:
recommendations

Unnamed: 0,userId,Recommendation_1,Recommendation_2,Recommendation_3,Recommendation_4,Recommendation_5
0,128,Martian Child (2007),Death Note 2: The Last Name (2006),Beyond the Lights (2014),Jacknife (1989),Arthur (2011)
1,172,Jacob's Ladder (1990),Krampus (2015),Asterix vs. Caesar (Astérix et la surprise de ...,"Burning, The (1981)",Killer Elite (2011)
2,465,G-Force (2009),Batman: The Killing Joke (2016),"Conjuring, The (2013)",Mulan (2009),God Grew Tired of Us (2006)
3,598,Henry: Portrait of a Serial Killer (1986),"Crow: Salvation, The (2000)",Kika (1993),Belladonna of Sadness (1973),Da 5 Bloods (2020)
4,919,Dhoom (2004),Jimmy Carr: Stand Up (2005),I Am Sam (2001),Monos (2019),Garfield: The Movie (2004)
...,...,...,...,...,...,...
1981,330236,Death Note 2: The Last Name (2006),Jacknife (1989),Arthur (2011),Beyond the Lights (2014),Ex Machina (2015)
1982,330321,Enough Said (2013),"Lost City, The (2005)",In Country (1989),Descendants (2015),Ice Age: The Great Egg-Scapade (2016)
1983,330496,Harmontown (2014),Lady in a Cage (1964),Love & Friendship (2016),End of Watch (2012),Kill Bill: Vol. 2 (2004)
1984,330667,End of Days (1999),Happiest Season (2020),Farinelli: il castrato (1994),"Emperor's New Clothes, The (2001)",Hotel Mumbai (2019)


### Criar coluna `assitiu_recomendacao`
Esta é uma coluna dummy em que:
- **1** significa que o usuário assitiu pelo menos 1 recomendação;
- **0** significa que o usuário não assitiu a nenhuma recomendação.

In [20]:
def assistiu_ou_nao(row, df_teste):
    # Obtém as recomendações para o usuário atual
    recomendacoes = row.drop('userId').values
    
    # Verifica se algum dos filmes recomendados está no df_teste e foi assistido (rating diferente de 0)
    for filme in recomendacoes:
        if filme in df_teste.columns and df_teste.loc[row['userId'], filme] != 0:
            return 1
    return 0 #Se nenhum filme recomendado foi assistido, retorna 0

In [21]:
# Aplica a função a cada linha do DataFrame de recomendações para criar a coluna dummy
recommendations['assistiu_recomendacao'] = recommendations.apply(assistiu_ou_nao, df_teste=knn_filmes_teste, axis=1)

# Exibe o DataFrame final com a coluna dummy
recommendations

Unnamed: 0,userId,Recommendation_1,Recommendation_2,Recommendation_3,Recommendation_4,Recommendation_5,assistiu_recomendacao
0,128,Martian Child (2007),Death Note 2: The Last Name (2006),Beyond the Lights (2014),Jacknife (1989),Arthur (2011),0
1,172,Jacob's Ladder (1990),Krampus (2015),Asterix vs. Caesar (Astérix et la surprise de ...,"Burning, The (1981)",Killer Elite (2011),0
2,465,G-Force (2009),Batman: The Killing Joke (2016),"Conjuring, The (2013)",Mulan (2009),God Grew Tired of Us (2006),0
3,598,Henry: Portrait of a Serial Killer (1986),"Crow: Salvation, The (2000)",Kika (1993),Belladonna of Sadness (1973),Da 5 Bloods (2020),0
4,919,Dhoom (2004),Jimmy Carr: Stand Up (2005),I Am Sam (2001),Monos (2019),Garfield: The Movie (2004),0
...,...,...,...,...,...,...,...
1981,330236,Death Note 2: The Last Name (2006),Jacknife (1989),Arthur (2011),Beyond the Lights (2014),Ex Machina (2015),0
1982,330321,Enough Said (2013),"Lost City, The (2005)",In Country (1989),Descendants (2015),Ice Age: The Great Egg-Scapade (2016),0
1983,330496,Harmontown (2014),Lady in a Cage (1964),Love & Friendship (2016),End of Watch (2012),Kill Bill: Vol. 2 (2004),0
1984,330667,End of Days (1999),Happiest Season (2020),Farinelli: il castrato (1994),"Emperor's New Clothes, The (2001)",Hotel Mumbai (2019),0


In [22]:
# n_neighbors=150
recommendations['assistiu_recomendacao'].value_counts()

assistiu_recomendacao
0    1903
1      83
Name: count, dtype: int64

# **Referências Bibliográficas**

**FONTANA, Éliton**. Introdução aos Algoritmos de Aprendizagem Supervisionada. 1ª Edição. Paraná, 2020. Disponível em: https://fontana.paginas.ufsc.br/files/2018/03/apostila_ML_pt2.pdf <br><br>
LANTZ, B. **Machine Learning with R: Expert techniques for predictive modeling**. Birmingham, England: Packt Publishing, 2019.