***File to create the recommendation lists used in the project***

In [1]:
import numpy as np
import pandas as pd
import os
import gzip
import json
import re
import pathlib

BASE_DIR = pathlib.Path().resolve()

Read the needed csv files; you have to put the specified data files (goodreads_books.json.gz, goodreads_reviews_dedup.json.gz) available here: https://mengtingwan.github.io/data/goodreads.html#datasets in the base folder of the project to work properly. Alternatively the path can be altered.

In [2]:
book_df = pd.read_csv(os.path.join(BASE_DIR, 'df_books.csv'))
user_df = pd.read_csv(os.path.join(BASE_DIR, 'df_users.csv'))

In [3]:
count_zero_rows = len(user_df[user_df['user_rating'] == 0])

551885


Check if any value in the relevant columns for building the recommender models is missing

In [4]:
user_df[['book_id', 'user_id', 'user_rating']].isnull().values.any()

False

Change dtypes in the book dataframe

In [5]:
book_df = book_df.astype({'publication_year': 'Int64', 'num_pages': 'Int64'})

Creating a dataframe for the recommender systems consisting of columns for book_id, user_id, and user_ratings <br>
Renaming of the columns is necessary for the lenskit specification

In [6]:
user_counts = user_df['user_id'].value_counts()
users_with_enough_ratings = user_counts[user_counts >= 15].index
filtered_df = user_df[user_df['user_id'].isin(users_with_enough_ratings)]

In [7]:
model_df = filtered_df[['book_id', 'user_id', 'user_rating']]
model_df = model_df.rename(columns={'book_id': 'item', 'user_id': 'user', 'user_rating': 'rating'})

In [8]:
book_df = book_df[book_df['title_without_series'].notna()]

Import the lenskit modules

In [9]:
from lenskit.algorithms import Recommender, item_knn, user_knn, als
from lenskit import crossfold as xf
from lenskit import topn, util

Configurating the 3 different algorithms

In [10]:
nnbrs = 20
min_nbrs = 1
min_sim = 0.1
feedback = 'explicit'
center = True

algo_ii = item_knn.ItemItem(nnbrs=nnbrs, min_nbrs=min_nbrs, min_sim=min_sim, feedback=feedback, center=center)

In [11]:
nnbrs = 30
min_nbrs = 1
min_sim = 0.1
feedback = 'explicit'
center = True

algo_uu = user_knn.UserUser(nnbrs=nnbrs, min_nbrs=min_nbrs, min_sim=min_sim, feedback=feedback, center=center)

In [12]:
features = 50
iterations = 20
reg = 0.1
damping = 5

algo_als = als.BiasedMF(features=features, iterations=iterations, reg=reg, damping=damping)

Copying the algorithms and adapting them for later recommendation task

In [13]:
fit_algo_ii = Recommender.adapt(algo_ii)
fit_algo_uu = Recommender.adapt(algo_uu)
fit_algo_als = Recommender.adapt(algo_als)

Fitting the algorithms using 5 times crossvalidation

In [27]:
for i, tp in enumerate(xf.partition_users(model_df, 5, xf.SampleFrac(0.2))):
    train_split = tp.train.copy()
    
    #fit_algo_ii.fit(train_split)
    #print('Finished round {} of fitting the Item-Item model'.format(i+1))
    fit_algo_uu.fit(train_split)
    print("Finished round {} of fitting the User-User model".format(i+1))
    fit_algo_als.fit(train_split)
    print("Finished round {} of fitting the ALS model".format(i+1))

Finished round 1 of fitting the User-User model
Finished round 1 of fitting the ALS model
Finished round 2 of fitting the User-User model
Finished round 2 of fitting the ALS model
Finished round 3 of fitting the User-User model
Finished round 3 of fitting the ALS model
Finished round 4 of fitting the User-User model
Finished round 4 of fitting the ALS model
Finished round 5 of fitting the User-User model
Finished round 5 of fitting the ALS model


In [15]:
from pandas import Series

Creating a new user which was not a part of the fitting process. Creating top-10 recommendation for this user with each algorithm

In [29]:
 user_ratings = {
     18490: 4, # Frankenstein
     29579: 4, # Foundation
     333867: 4, # The Stars My Destination
     95558: 3, # Solaris
     234225: 5, # Dune
     16690: 5, # The Moon is a Harsh Mistress
     77566: 5, # Hyperion
     7677: 5, # Jurassic Park
     5470: 4, # 1984
     5129: 3, # Brave New World
     4981: 4, # Slaughterhouse-Five
     2767052: 5, # The Hunger Games
     830: 4, # Snow Crash
     7613: 3, # Animal Farm
     227463: 4 # A Clockwork Orange
} 

recs_ii = fit_algo_ii.recommend(user=-1, n=10, ratings=Series(user_ratings))
recs_uu = fit_algo_uu.recommend(user=-1, n=10, ratings=Series(user_ratings))
recs_als = fit_algo_als.recommend(user=-1, n=10, ratings=Series(user_ratings))

Saving the recommendations to csv files

In [31]:
recs_uu.to_csv('recs_uu.csv', encoding='utf-8')
recs_ii.to_csv('recs_ii.csv', encoding='utf-8')
recs_als.to_csv('recs_als.csv', encoding='utf-8')