# Predictions notebook

the goal fo this notebook is to use existing weights for the recommandation system

In [1]:
from pickle5 import pickle
import pandas as pd

r = pd.read_csv('data/ratings.csv')
r.tail()

Unnamed: 0,user_id,book_id,rating
5976474,49925,510,5
5976475,49925,528,4
5976476,49925,722,4
5976477,49925,949,5
5976478,49925,1023,4


first we need to import the test data and the model model itself.

In [2]:
pkl_filename = "SVDmodel.pkl"
# Load from file
with open(pkl_filename, 'rb') as file:
    pickle_model = pickle.load(file)

print("model was successfully loaded")

model was successfully loaded


In [3]:
from surprise.model_selection import train_test_split
from surprise import Reader, Dataset
from random import sample

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(r[['user_id', 'book_id', 'rating']], reader)
trainset, testset = train_test_split(data, test_size=0.15)

In [4]:
test_example = sample(testset, 1)
print(test_example)

[(36007, 953, 5.0)]


In [5]:
results = pickle_model.test(test_example)
print(results)

[Prediction(uid=36007, iid=953, r_ui=5.0, est=4.39463301491794, details={'was_impossible': False})]


## Predictions functions

these functions will help in determining the value of a book rating.\ These functions are designed to work with the module Surprise which is optimized for recommandation systems

while we have the dataframe we need to feed it to Surprise. We first extract the User Id, and then the Item Id, here it is the book's id. Surprise is very particular in how the data is fed to it.

In [6]:
def get_Iu(uid):
    try:
        return len(trainset.ur[trainset.to_inner_uid(uid)])
    except ValueError: # user was not part of the trainset
        return 0
    
def get_Ui(iid):
    try: 
        return len(trainset.ir[trainset.to_inner_iid(iid)])
    except ValueError:
        return 0
    
df = pd.DataFrame(results, columns=['uid', 'iid', 'rui', 'est', 'details'])
df['Iu'] = df.uid.apply(get_Iu)
df['Ui'] = df.iid.apply(get_Ui)
df['err'] = abs(df.est - df.rui)

best_predictions = df.sort_values(by='err')[:10]
worst_predictions = df.sort_values(by='err')[-10:]

after we created the dataframe we are going to use while processing the data, we can created the sorting algorithm.
here, it is based on the rating value the algorithm predicts for any given user.

this next function determines a list of books the user might like based on user id, and item id, it is then sorted to get the best results based on rating value.

In [7]:
from collections import defaultdict

def get_top_n(predictions, n=100):

    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [8]:
predictions = pickle_model.test(testset)

In [12]:
top_n = get_top_n(predictions, n=10)

After the functions are called, here we are feeding the testset extracted from the original data, the value can then be printed out, already sorted.

In [15]:
for uid, user_ratings in top_n.items():
    print(uid, [iid for (iid, _) in user_ratings])

326, 727, 3634, 2534, 111]
20441 [3758, 7312, 6260, 3927, 5108]
30622 [155, 21, 4921, 2998, 5331, 742, 4507, 219, 3673, 2086]
6371 [297, 13, 1179, 113, 1042, 63, 71, 4034, 611, 3195]
48971 [4804, 1990, 313, 12, 3312, 7044, 149, 811]
2332 [11, 5363, 195, 2598, 921, 185, 1185, 3561, 697, 1381]
20585 [25, 27, 5990, 1180, 76, 130]
13374 [24, 135, 168, 7, 188, 953, 1698, 2751, 4025, 97]
36318 [157, 219, 464, 84, 3929, 2730, 239, 2784]
38096 [25, 24, 9319, 581, 7562, 867, 2254, 363]
50287 [2299, 1924, 70, 1964, 3968, 361, 3664, 2861, 6507, 804]
5242 [8126, 4451, 1965, 74, 1347, 822, 4426, 6746, 5876, 9633]
42035 [2767, 1651, 1505, 2508, 535, 2983, 7448, 423, 1366, 1062]
9970 [24, 135, 1356, 3457, 1666, 258, 1120, 1041, 125, 476]
39766 [1, 993, 33, 1087, 754, 186, 3568, 9, 2307, 88]
11113 [746, 1422, 458, 859, 4029, 3771, 1366, 2494]
22086 [17, 912, 107, 4748, 676, 6075, 3025, 7488]
52075 [2460, 1321, 5520, 1, 2894, 3712, 6, 5291, 4088, 7531]
23978 [311, 497, 1377, 113, 3236, 361, 5656, 3391,

## todo

the list is comprised of a list of up to a 100 books the user might like already sorted, but this is not the end of the project. the list now needs to be filtered through another functions in order to remove already own or read books, living the user with unread books recommanded.

another function is in the works to be able to recommand a user a book based on Tags or more exactly a genre