# 📚 Book Recommendation System

**Project Owner**: Suyog Desai  
**Affiliation**: Arizona State University  
**Project Description**:  
This notebook demonstrates a collaborative filtering approach to build a simple book recommendation engine. It uses user-book rating data, constructs a sparse matrix, and generates recommendations using user similarity.

---


##  Step 1: Import Libraries

We begin by importing the necessary libraries for data handling, sparse matrix manipulation, and similarity computation.

In [2]:
import pandas as pd
import numpy as np
from scipy.sparse import coo_matrix
from sklearn.metrics.pairwise import cosine_similarity

##  Step 2: Load and Preprocess Ratings Data

We load the book ratings dataset, sort it, and factorize user and book IDs to create a matrix-friendly representation. This prepares our data for matrix-based operations.

In [4]:
# Load ratings data
ratings_path = '../Source/Ratings.csv'
df = pd.read_csv(ratings_path, delimiter=';')

# Sort for consistency
df = df.sort_values(by=['User-ID', 'ISBN'])

# Factorize user and book IDs for matrix representation
df['User_ID'] = pd.factorize(df['User-ID'])[0]
df['Book_ID'] = pd.factorize(df['ISBN'])[0]

# Create sparse matrix components
row = df['User_ID']
col = df['Book_ID']
data = df['Rating']

# Construct sparse matrix (COO format)
sparse_matrix = coo_matrix((data, (row, col)))

##  Step 3: Save Sparse Matrix in LibSVM Format

Convert the user-book rating matrix into a sparse format and export it using the LibSVM format for further analysis.

In [6]:
output_libsvm_path = '../Output/user_book_sparse_matrix.libsvm'

with open(output_libsvm_path, 'w') as f:
    for i in range(sparse_matrix.shape[0]):
        row_data = sparse_matrix.getrow(i).toarray().flatten()
        non_zero_indices = row_data.nonzero()[0]
        line = '0 '  # LibSVM format label (dummy label)
        for idx in non_zero_indices:
            line += f'{idx + 1}:{row_data[idx]} '
        f.write(line.strip() + '\n')

print(f"LibSVM file successfully created at: {output_libsvm_path}")

LibSVM file successfully created at: ../Output/user_book_sparse_matrix.libsvm


##  Step 4: Load Book Metadata

We extract relevant book metadata (ISBN and title) to link recommended book IDs back to their human-readable titles later.

In [8]:
metadata_path = '../Source/Books.csv'
meta = pd.read_csv(metadata_path, delimiter=';')
meta = meta[['ISBN', 'Title']].dropna().reset_index(drop=True)

# Mapping for ISBNs and titles
id_to_isbn = {i: isbn for i, isbn in enumerate(meta['ISBN'])}
isbn_to_title = dict(zip(meta['ISBN'], meta['Title']))

##  Step 5: Load Sparse Matrix from LibSVM

We implement a function to reconstruct the sparse matrix from the LibSVM file saved earlier.

In [10]:
def load_matrix(filepath):
    data, rows, cols = [], [], []
    with open(filepath, 'r') as f:
        for i, line in enumerate(f):
            for item in line.strip().split()[1:]:
                col_id, val = map(float, item.split(':'))
                rows.append(i)
                cols.append(int(col_id) - 1)
                data.append(val)
    return coo_matrix((data, (rows, cols)))

user_book_matrix = load_matrix(output_libsvm_path).tocsr()

##  Step 6: User Similarity and Recommendation Functions

These functions identify similar users using cosine similarity and recommend books based on their ratings.

- `top_k_users`: Finds the top K most similar users to a target user.
- `recommend`: Recommends books that similar users have liked, excluding books already rated by the target user.

In [12]:
def top_k_users(uid, mat, k=10):
    sim = cosine_similarity(mat[uid], mat)[0]
    top_users = np.argsort(-sim)[1:k+1]  # Exclude self
    return top_users, sim[top_users]

def recommend(uid, mat, k=10, n=5):
    if mat[uid].nnz == 0:
        top_users, sim_scores = top_k_users(uid, mat, k)
        books = {b for user in top_users for b in mat[user].nonzero()[1]}
    else:
        top_users, sim_scores = top_k_users(uid, mat, k)
        books = {b for user in top_users for b in mat[user].nonzero()[1]}
        books -= set(mat[uid].nonzero()[1])  # Exclude already rated

    scores = {
        b: sum(sim_scores[i] * mat[top_users[i], b] 
               for i in range(len(top_users)) if mat[top_users[i], b] > 0) / sum(sim_scores)
        for b in books if sum(sim_scores) > 0
    }
    return sorted(scores.items(), key=lambda x: -x[1])[:n]

##  Step 7: Generate Recommendations and Save to CSV

We apply the recommendation function for each user and save the resulting book recommendations (along with scores and titles) to a CSV file.

In [14]:
def make_recs(mat, meta, output_csv, k=10, n=5):
    recs = []
    for uid in range(mat.shape[0]):
        rec_books = recommend(uid, mat, k, n)
        for book, score in rec_books:
            isbn = id_to_isbn.get(book, "Unknown")
            title = isbn_to_title.get(isbn, "Unknown")
            recs.append({
                'User_ID': uid + 1,
                'Book_ID': book + 1,
                'Title': title,
                'Score': score
            })

    pd.DataFrame(recs).to_csv(output_csv, index=False, columns=['User_ID', 'Book_ID', 'Title', 'Score'])
    print(f"Recommendations saved to {output_csv}")

##  Step 8: Run Recommendation Pipeline

This final step triggers the recommendation generation process and stores the output in a file.

In [16]:
recommendation_output_path = '../Output/Book-recommendations.csv'
make_recs(user_book_matrix, meta, recommendation_output_path)

Recommendations saved to ../Output/Book-recommendations.csv
