# Introdução

Sistemas de recomendação são ferramentas que filtram e processam informações, retornando um resultado que faça sentido para o contexto em que está sendo aplicado. Esses sistemas utilizam algoritmos específicos para oferecer sugestões personalizadas, facilitando a tomada de decisão do usuário. Neste estudo, buscamos compreender os principais algoritmos empregados em sistemas de recomendação e avaliar seu desempenho aplicando-os ao "Book-Crossing Dataset", um conjunto de dados que reúne informações sobre livros, avaliações e interações de usuários.

A partir dessa análise, pretendemos identificar os métodos mais eficazes para realizar recomendações precisas dentro deste contexto.

# Materiais e Métodos:

## Dependências

As bibliotecas necessárias para rodar os nossos testes de caso.

In [1]:
%pip install kagglehub
%pip install pandas
%pip install scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [13]:
import kagglehub
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import pairwise_distances
import os

## DataSet

Nós escolhemos o Book-Crossing Dataset, que relaciona usuários, livros e avaliações, permitindo analisar padrões de leitura e preferências. Com isso, aplicaremos algoritmos de recomendação para sugerir livros personalizados, baseados em interações anteriores.

### Importa DataSet

In [None]:
path = kagglehub.dataset_download("somnambwl/bookcrossing-dataset")

books = pd.read_csv(path+"/Books.csv", sep=";", header=0)
users = pd.read_csv(path+"/Users.csv", sep=";", low_memory=False, header=0)
ratings = pd.read_csv(path+"/Ratings.csv", sep=";", low_memory=False, header=0)

### Visualização dos dados importados

In [4]:
books.head()

Unnamed: 0,ISBN,Title,Author,Year,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company


In [5]:
users.head()

Unnamed: 0,User-ID,Age
0,1,
1,2,18.0
2,3,
3,4,17.0
4,5,


In [6]:
ratings.head()

Unnamed: 0,User-ID,ISBN,Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


### Junção dos dados em um unico dataset

In [9]:
min_user_ratings = 5  # Usuarios que avaliaram no minimo 5 livros
min_book_ratings = 10  # Livros com no minimo 10 avaliações

user_counts = ratings['User-ID'].value_counts()
book_counts = ratings['ISBN'].value_counts()

filtered_ratings = ratings[
    ratings['User-ID'].isin(user_counts[user_counts >= min_user_ratings].index) &
    ratings['ISBN'].isin(book_counts[book_counts >= min_book_ratings].index)
]

### Criação da matriz de interações

In [11]:
interaction_matrix = filtered_ratings.pivot(index='User-ID', columns='ISBN', values='Rating')
interaction_matrix = interaction_matrix.fillna(0)

In [18]:
# exibir os valores:
interaction_matrix.head()

ISBN,000000000,0002005018,0002251760,0002255081,0002257203,0002259001,0002259834,0002558122,0006172768,0006374921,...,9724119378,9726101794,9726106141,9726116902,9727591965,9727722458,9770390107900,9871138016,9871138148,B00009EF82
User-ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100004,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100010,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100029,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## KNN

### Calculando a distancia euclidiana

In [14]:
distance_matrix = pairwise_distances(interaction_matrix, metric='euclidean')
distance_df = pd.DataFrame(distance_matrix, index=interaction_matrix.index, columns=interaction_matrix.index)

### Encontrando os K-vizinhos mais proximos

In [None]:
def get_nearest_neighbors(user_id, k):
    distances = distance_df.loc[user_id]
    neighbors = distances.sort_values().iloc[1:k+1]
    return neighbors.index

### Gerando as recomendações

In [None]:
def recommend_books(user_id, k, n_recommendations):
    if user_id not in interaction_matrix.index:
        raise ValueError(f"Usuário {user_id} não encontrado na matriz de interação.")

    neighbors = get_nearest_neighbors(user_id, k)
    neighbor_ratings = interaction_matrix.loc[neighbors]
    
    mean_ratings = neighbor_ratings.mean(axis=0)
    
    unread_books = interaction_matrix.loc[user_id] == 0
    recommendations = mean_ratings[unread_books].sort_values(ascending=False).head(n_recommendations)
    
    recommendations = recommendations.reset_index().merge(
        books, left_on='ISBN', right_on='ISBN'
    )
    
    return recommendations[['Title', 'ISBN', 0]].rename(columns={0: 'Predicted-Rating'})

### Testando o algoritmo

In [27]:
recommended_books = recommend_books("100004", 5, 5)
print(recommended_books)

                                               Title        ISBN  \
0  Harry Potter and the Prisoner of Azkaban (Book 3)  0439136369   
1                                     Atlantis Found  0425177173   
2                            Whirlwind (The X-Files)  0061054151   
3                                    Julie and Romeo  0609606727   
4                                        Angel Falls  0609605925   

   Predicted-Rating  
0               2.8  
1               1.8  
2               1.6  
3               0.0  
4               0.0  
