#### Module 11: Association Rules Mining and Recommendation Systems

#### Case Study–1

Domain –Retail

focus – Optimize Book RENT

Business challenge/requirement

BookRent is the largest online and offline book rental chain in India. The Company charges a fixed fee per month plus rental per book. So, the company makes more money when users rent more books. You as an ML expert have to model a recommendation engine so that user gets
recommendation of books based on the behavior of similar users. This will ensure that users are renting books based on their taste.
The company is still unprofitable and is looking to improve both revenue and profit.


Key issues

As of now a lot of users return the books and do not take the new rental. The right recommendation will entice a users to rent more books

Considerations

NONE

Data volume

- Approx 1 M records – file BX-Book-Ratings.csv and 2 more. But only 10K records will be used

Fields in Data

• user_id: Unique Id of the User

• isbn: International Standard Book Number is a unique numeric commercial book identifier

• rating: the rating given by the user

Business benefits

Increase in both top line and bottom line as more rentals per user means more revenue and more profit

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Step 1: Load datasets
ratings = pd.read_csv("BX-Book-Ratings.csv", sep=";", encoding="latin-1", on_bad_lines="skip")
books = pd.read_csv("BX-Books.csv", sep=";", encoding="latin-1", on_bad_lines="skip")

# Step 2: Normalize column names (lowercase, replace spaces/hyphens with underscores)
ratings.columns = ratings.columns.str.strip().str.lower().str.replace(" ", "_").str.replace("-", "_")
books.columns = books.columns.str.strip().str.lower().str.replace(" ", "_").str.replace("-", "_")

# Step 3: Use only 10k ratings
ratings = ratings.sample(n=10000, random_state=42)

# Step 4: Merge ratings with book titles
# BX-Books usually has 'isbn' and 'book_title' after normalization
books_small = books[["isbn", "book_title"]].drop_duplicates()
ratings = ratings.merge(books_small, on="isbn", how="left")

# Step 5: Create user-item matrix
user_item = ratings.pivot_table(index="user_id", columns="isbn", values="book_rating")
user_item_filled = user_item.fillna(0)

# Step 6: Compute cosine similarity between users
similarity = cosine_similarity(user_item_filled)
similarity_df = pd.DataFrame(similarity, index=user_item_filled.index, columns=user_item_filled.index)

# Step 7: Recommend books for a user
def recommend_books(user_id, top_n=5):
    # Find most similar users
    similar_users = similarity_df[user_id].sort_values(ascending=False).index[1:6]
    
    # Average ratings from similar users
    sim_user_ratings = user_item.loc[similar_users].mean().sort_values(ascending=False)
    
    # Exclude books already rated by target user
    rated_books = user_item.loc[user_id].dropna().index
    recommendations = sim_user_ratings.drop(rated_books).head(top_n)
    
    # Map ISBNs to titles
    return books_small.set_index("isbn").loc[recommendations.index]["book_title"].tolist()

# Step 8: Demo for a sample user
sample_user = user_item.index[0]
print(f"Top 5 recommendations for user {sample_user}:")
print(recommend_books(sample_user))


KeyError: "None of [Index(['isbn', 'book_title'], dtype='object')] are in the [columns]"