Дані
--
* Books.csv - вся інформація про книги
* Users.csv - інформація про користувачів
* Ratings.csv - інформація про те, які рейтинги користувачі дали книгам

Імпортуємо бібліотеки
--

In [1]:
import pandas as pd
from plotly.graph_objects import *
import numpy as np

from surprise import Dataset
from surprise import Reader
from surprise import KNNBasic, KNNWithMeans
from surprise.model_selection import cross_validate, train_test_split, GridSearchCV

Зчитуємо дані
===


In [2]:
df = pd.read_csv('datasets/book-crossing/users-ratings.csv')

In [3]:
df.head()


Unnamed: 0,User-ID,Age,ISBN,Rating
0,243,,60915544,10
1,243,,60977493,7
2,243,,156006529,0
3,243,,316096199,0
4,243,,316601950,9


In [4]:
df.isnull().sum()/len(df)

User-ID    0.000000
Age        0.233315
ISBN       0.000000
Rating     0.000000
dtype: float64

Протестуємо K-Nearest Neighbours
===

Почнемо з найпростішого алгоритму KNNBasic

In [5]:
reader = Reader(rating_scale=(1,10))

In [6]:
df = df[df['Rating']!=0]

In [7]:
data = Dataset.load_from_df(df[['User-ID','ISBN', 'Rating']],
                           reader)

In [8]:
sim_options = {"name": "cosine",
               "user_based": False}
algo = KNNBasic(sim_options=sim_options)

In [9]:
trainingSet = data.build_full_trainset()

In [10]:
algo.fit(trainingSet)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x11ba61350>

In [11]:
prediction = algo.predict(243,'0060915544')

In [12]:
prediction.est

7.794547301085934

In [13]:
prediction

Prediction(uid=243, iid='0060915544', r_ui=None, est=7.794547301085934, details={'actual_k': 14, 'was_impossible': False})

In [14]:
train, test = train_test_split(data, test_size=.2)

In [15]:
algo.fit(train)

Computing the cosine similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x11ba61350>

In [16]:
predictions = algo.test(test)

In [17]:
like_threshold = 7


hits = [1 for prediction in predictions if prediction.est >= like_threshold and prediction.r_ui >= like_threshold]
hit_rate = sum(hits) / len(predictions) if predictions else 0

print(f"Hit Rate: {hit_rate:.3f}")

Hit Rate: 0.734


In [18]:
cross_validate(algo, data, measures=['RMSE'], cv=3, verbose=True)

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE of algorithm KNNBasic on 3 split(s).

                  Fold 1  Fold 2  Fold 3  Mean    Std     
RMSE (testset)    1.7134  1.7043  1.7104  1.7094  0.0038  
Fit time          0.09    0.09    0.10    0.09    0.01    
Test time         0.37    0.31    0.37    0.35    0.03    


{'test_rmse': array([1.71342056, 1.70428535, 1.71035789]),
 'fit_time': (0.08925986289978027, 0.08563518524169922, 0.10153079032897949),
 'test_time': (0.3707890510559082, 0.3090188503265381, 0.3729400634765625)}

In [19]:
sim_options = {
    "name": ["msd", "cosine"],
    "min_support": [3, 4, 5],
    "user_based": [False, True],
}

param_grid = {"sim_options": sim_options}

gs = GridSearchCV(KNNWithMeans, param_grid, measures=["rmse", "mae"], cv=3)
gs.fit(data)

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computi

In [20]:
print(gs.best_score["rmse"])
print(gs.best_params["rmse"])

1.7407050921795904
{'sim_options': {'name': 'msd', 'min_support': 5, 'user_based': False}}


In [24]:
gs.best_estimator['rmse'].predict

<bound method AlgoBase.predict of <surprise.prediction_algorithms.knns.KNNWithMeans object at 0x11dbd7110>>