# Book Recommender Prototype

Welcome to the **Book Recommender System** project! This repository contains the implementation of a collaborative filtering-based recommendation system for a fictional bookstore, **Books’R’Us**.

The system leverages the **Surprise** library to predict user preferences and recommend books based on past interactions. It is trained on user ratings data, using algorithms such as **KNNBasic**, to provide personalized book suggestions.

## Features
- Collaborative filtering using user-item interactions.
- Predictions for user-book ratings.
- Scalable and flexible for integration into e-commerce platforms.

## Goal
This project aims to demonstrate the fundamentals of building a recommender system that can enhance user engagement and improve sales by providing tailored recommendations.


## Libraries Import & dataset download

In [25]:
import pandas as pd
from surprise import Reader



## Dataset Import

In [26]:
book_ratings = pd.read_csv('20241208_rating_data.csv', sep=";")
print(book_ratings.head())

                            user_id   book_id  rating
0  d089c9b670c0b0b339353aebbace46a1   7686667       3
1  6dcb2c16e12a41ae0c6c38e9d46f3292  18073066       5
2  244e0ce681148a7586d7746676093ce9  13610986       5
3  73fcc25ff29f8b73b3a7578aec846394  27274343       1
4  f8880e158a163388a990b64fec7df300  11614718       4


# 1. Print Dataset Size and Examine Column Data Types


In [27]:
# Check dataset information
print(book_ratings.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3500 entries, 0 to 3499
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  3500 non-null   object
 1   book_id  3500 non-null   int64 
 2   rating   3500 non-null   int64 
dtypes: int64(2), object(1)
memory usage: 82.2+ KB
None


# 2. Distribution of Ratings


In [28]:
# Get the sorted distribution of ratings
rating_counts_sorted = book_ratings['rating'].value_counts().sort_index()
print(rating_counts_sorted)

rating
0     120
1     125
2     269
3     707
4    1278
5    1001
Name: count, dtype: int64


# 3. Filter Ratings That Are Out of Range


In [29]:
# Keep only ratings in the range of 1 to 5 inclusive
book_ratings_filtered = book_ratings[(book_ratings['rating'] >= 1) & (book_ratings['rating'] <= 5)]

# Get the sorted distribution of filtered ratings
rating_filtered_counts_sorted = book_ratings_filtered['rating'].value_counts().sort_index()
print(rating_filtered_counts_sorted)

rating
1     125
2     269
3     707
4    1278
5    1001
Name: count, dtype: int64


# 4. Prepare Data for Surprise: Build a Surprise Reader Object

In [30]:
from surprise import Reader

# Define the rating scale (1 to 5 inclusive)
reader = Reader(rating_scale=(1, 5))
print(reader)

<surprise.reader.Reader object at 0x13d173aa0>


# 5. Load `book_ratings` into a Surprise Dataset


In [31]:
from surprise import Dataset

# Load the filtered dataset into Surprise format
data = Dataset.load_from_df(book_ratings_filtered[['user_id', 'book_id', 'rating']], reader)

# Confirm the dataset is ready
print("Dataset loaded into Surprise format.")

Dataset loaded into Surprise format.


# 6. Create an 80:20 Train-Test Split and Set the Random State to 1


In [32]:
from surprise.model_selection import train_test_split

# Split the data into training (80%) and testing (20%) sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=1)


# 7. Train a Collaborative Filter Using KNNBasic


In [33]:
from surprise import KNNBasic

# Initialize and train the KNNBasic algorithm
algo = KNNBasic()
algo.fit(trainset)
print("Model trained successfully using KNNBasic.")


Computing the msd similarity matrix...
Done computing similarity matrix.
Model trained successfully using KNNBasic.


# 8. Evaluate the Recommender System


In [34]:
from surprise import accuracy

# Make predictions on the test set and calculate RMSE
predictions = algo.test(testset)
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse}")


RMSE: 1.0427
RMSE: 1.0427242524656692


# 9. Make a Prediction for a Specific User and Book


In [35]:
# Predict the rating for the given user and book
user_id = '8842281e1d1347389f2ab93d60773d4d'
book_id = 18007564

prediction = algo.predict(uid=user_id, iid=book_id)

# Display the predicted rating
print(f"Predicted rating for user {user_id} and book {book_id}: {prediction.est:.2f}")


Predicted rating for user 8842281e1d1347389f2ab93d60773d4d and book 18007564: 3.81
