# **Recommendation System**

## **Collaborative Filtering**

## **Objective**
The goal is to develop a **book recommendation system** that suggests books to users based on their past interactions.  
We will use **collaborative filtering**, which finds patterns in user-book interactions to provide personalized recommendations.

## **Criteria for Users and Books**
To ensure meaningful recommendations, we apply the following filtering criteria:
1. **Users Selection**  
   - Consider **only users who have rated at least 200 books**.  
   - This ensures that recommendations are based on users with enough reading history.
   
2. **Books Selection**  
   - Include **only books that have received at least 50 ratings**.  
   - This ensures that the books considered are popular enough to have reliable ratings.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from scipy import stats

warnings.filterwarnings('ignore')
%matplotlib inline

In [2]:
data = pd.read_csv("../artifacts/cleaned_data.csv",encoding='ISO-8859-1')

In [3]:
data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Age,City,State,Country,Age Group
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002.0,Ballantine Books,35.0,tyler,texas,usa,Young Adults
1,276726,0155061224,5,Rites of Passage,Judith Rae,2001.0,Heinle,35.0,seattle,washington,usa,Young Adults
2,276727,0446520802,0,The Notebook,Nicholas Sparks,1996.0,Warner Books,16.0,h,new south wales,australia,Teens
3,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999.0,Cambridge University Press,16.0,rijeka,,croatia,Teens
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...,Sue Leather,2001.0,Cambridge University Press,16.0,rijeka,,croatia,Teens
...,...,...,...,...,...,...,...,...,...,...,...,...
1031131,276704,0876044011,0,Edgar Cayce on the Akashic Records: The Book o...,Kevin J. Todeschi,1998.0,A.R.E. Press (Association of Research &amp; Enlig,35.0,cedar park,texas,usa,Young Adults
1031132,276704,1563526298,9,Get Clark Smart : The Ultimate Guide for the S...,Clark Howard,2000.0,Longstreet Press,36.0,cedar park,texas,usa,Middle-aged
1031133,276706,0679447156,0,Eight Weeks to Optimum Health: A Proven Progra...,Andrew Weil,1997.0,Alfred A. Knopf,18.0,quebec,quebec,canada,Teens
1031134,276709,0515107662,10,The Sherbrooke Bride (Bride Trilogy (Paperback)),Catherine Coulter,1996.0,Jove Books,38.0,mannington,west virginia,usa,Middle-aged


### **Grouping the user with book ratings count** 

In [13]:
data.groupby('User-ID')['Book-Rating'].count().reset_index()


Unnamed: 0,User-ID,Book-Rating
0,2,1
1,8,17
2,9,3
3,10,1
4,12,1
...,...,...
92101,278846,1
92102,278849,4
92103,278851,23
92104,278852,1


- Only 92k out of total user have rated the books
- Majority of the user haven't rated the books

### **Flitering the data where user have rated more than equal to 200 books**

In [21]:
users_with_200_ratings_data = data.loc[data.groupby('User-ID')['Book-Rating'].transform('count') >= 200]


In [22]:
users_with_200_ratings_data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Age,City,State,Country,Age Group
1150,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994.0,John Wiley &amp; Sons Inc,48.0,gilbert,arizona,usa,Middle-aged
1151,277427,0026217457,0,Vegetarian Times Complete Cookbook,Lucy Moll,1995.0,John Wiley &amp; Sons,48.0,gilbert,arizona,usa,Middle-aged
1152,277427,003008685X,8,Pioneers,James Fenimore Cooper,1974.0,Thomson Learning,48.0,gilbert,arizona,usa,Middle-aged
1153,277427,0030615321,0,"Ask for May, Settle for June (A Doonesbury book)",G. B. Trudeau,1982.0,Henry Holt &amp; Co,48.0,gilbert,arizona,usa,Middle-aged
1154,277427,0060002050,0,On a Wicked Dawn (Cynster Novels),Stephanie Laurens,2002.0,Avon Books,48.0,gilbert,arizona,usa,Middle-aged
...,...,...,...,...,...,...,...,...,...,...,...,...
1029357,275970,1931868123,0,There's a Porcupine in My Outhouse: Misadventu...,Mike Tougias,2002.0,Capital Books (VA),46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029358,275970,3411086211,10,Die Biene.,Sybil GrÃ?ÃÂ¤fin SchÃ?ÃÂ¶nfeldt,1993.0,"Bibliographisches Institut, Mannheim",46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029359,275970,3829021860,0,The Penis Book,Joseph Cohen,1999.0,Konemann,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029360,275970,4770019572,0,Musashi,Eiji Yoshikawa,1995.0,Kodansha International (JPN),46.0,pittsburgh,pennsylvania,usa,Middle-aged


- Users with **200+** rating have rated **50%** of the books

---

### **Fliter this filtered data on books with 50+ ratings**

In [23]:
users_with_200_ratings_data.groupby('Book-Title')['Book-Rating'].count().reset_index()

Unnamed: 0,Book-Title,Book-Rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
...,...,...
156132,Ã?Ã?ber das Fernsehen.,2
156133,Ã?Ã?ber die Pflicht zum Ungehorsam gegen den...,3
156134,Ã?Ã?lpiraten.,1
156135,Ã?Ã?stlich der Berge.,1


In [25]:
final_filtered_data = users_with_200_ratings_data.loc[users_with_200_ratings_data.groupby('Book-Title')['Book-Rating'].transform('count') >= 50]

In [26]:
final_filtered_data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Age,City,State,Country,Age Group
1150,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994.0,John Wiley &amp; Sons Inc,48.0,gilbert,arizona,usa,Middle-aged
1163,277427,0060930535,0,The Poisonwood Bible: A Novel,Barbara Kingsolver,1999.0,Perennial,48.0,gilbert,arizona,usa,Middle-aged
1165,277427,0060934417,0,Bel Canto: A Novel,Ann Patchett,2002.0,Perennial,48.0,gilbert,arizona,usa,Middle-aged
1168,277427,0061009059,9,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995.0,HarperTorch,48.0,gilbert,arizona,usa,Middle-aged
1174,277427,006440188X,0,The Secret Garden,Frances Hodgson Burnett,1998.0,HarperTrophy,48.0,gilbert,arizona,usa,Middle-aged
...,...,...,...,...,...,...,...,...,...,...,...,...
1029196,275970,1400031354,0,Tears of the Giraffe (No.1 Ladies Detective Ag...,Alexander McCall Smith,2002.0,Anchor,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029197,275970,1400031362,0,Morality for Beautiful Girls (No.1 Ladies Dete...,Alexander McCall Smith,2002.0,Anchor,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029270,275970,1573229725,0,Fingersmith,Sarah Waters,2002.0,Riverhead Books,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029309,275970,1586210661,9,Me Talk Pretty One Day,David Sedaris,2001.0,Time Warner Audio Major,46.0,pittsburgh,pennsylvania,usa,Middle-aged


- only **58k** books have ratings more than 50 and rated by top users(**>=200 ratings**)

---

# **Creating the User-Book Interaction Matrix for Recommendations**

## **Why Do We Need This Matrix?**
To implement **collaborative filtering**, we require a structured representation of user-book interactions.  
A **pivot table** helps us transform raw data into a **User-Book interaction matrix**, where:
- **Rows represent books (Book-Title).**
- **Columns represent users (User-ID).**
- **Values represent ratings given by users to books.**

This matrix allows us to analyze **user behavior patterns** and find similarities between users or books.

## **Why is This Approach Effective?**
- The matrix enables **pattern recognition** in user behavior.
- It allows us to compute **similarities between users or books**.
- It forms the foundation for **personalized book recommendations**.

🚀 **This matrix is the backbone of our collaborative filtering-based recommendation system!**

### **Pivot Table**

In [29]:
user_book_pt =  final_filtered_data.pivot_table(index= 'Book-Title', columns= 'User-ID', values= 'Book-Rating')

In [30]:
user_book_pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,10.0,,,,,,0.0,,,
1st to Die: A Novel,,,,,,,,,,9.0,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,0.0,...,,,,,,0.0,,,0.0,
4 Blondes,,,,,,,,0.0,,,...,,,,,,,,,,
A Bend in the Road,0.0,,7.0,,,,,,,,...,,0.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,,0.0,...,,9.0,,,,,0.0,,,
You Belong To Me,,,,,,,,,0.0,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,0.0,,,...,,,,,,,0.0,,,
Zoya,,,,,,,,,,,...,,0.0,,,,,,,,


- This is a sparse matrix

In [31]:
user_book_pt.fillna(value=0, inplace= True)

In [32]:
user_book_pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Bend in the Road,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### **Cosine Similarity**

In [33]:
from sklearn.metrics.pairwise import cosine_similarity

In [34]:
def calculate_cosine_similarity(matrix):
    """
    Computes the cosine similarity between rows of a given matrix.

    Parameters:
    matrix (pd.DataFrame): The user-book interaction matrix.

    Returns:
    np.ndarray: A square matrix containing cosine similarity scores.
    """
    return cosine_similarity(matrix)

similarity_scores = calculate_cosine_similarity(matrix= user_book_pt)

In [37]:
similarity_scores

array([[1.        , 0.0999137 , 0.01189468, ..., 0.11799012, 0.07158663,
        0.04205081],
       [0.0999137 , 1.        , 0.2364573 , ..., 0.07446129, 0.16773875,
        0.14263397],
       [0.01189468, 0.2364573 , 1.        , ..., 0.04558758, 0.04938579,
        0.10796119],
       ...,
       [0.11799012, 0.07446129, 0.04558758, ..., 1.        , 0.07085128,
        0.0196177 ],
       [0.07158663, 0.16773875, 0.04938579, ..., 0.07085128, 1.        ,
        0.10602962],
       [0.04205081, 0.14263397, 0.10796119, ..., 0.0196177 , 0.10602962,
        1.        ]], shape=(707, 707))

In [36]:
similarity_scores.shape

(707, 707)

In [40]:
def get_top_recommendations(book_title, similarity_matrix, book_titles, top_n=5):
    """
    Retrieves the top N book recommendations based on cosine similarity.

    Parameters:
    book_title (str): The title of the book for which recommendations are needed.
    similarity_matrix (np.ndarray): Precomputed cosine similarity matrix.
    book_titles (pd.Index): Index containing book titles corresponding to the matrix rows.
    top_n (int, optional): Number of recommendations to return. Default is 5.

    Returns:
    list: A list of top N recommended book titles.
    """
    if book_title not in book_titles:
        return ["Book not found in dataset"]

    book_idx = book_titles.get_loc(book_title)
    similarity_scores = list(enumerate(similarity_matrix[book_idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    top_recommendations = [book_titles[i] for i, _ in similarity_scores[1:top_n+1]]
    return top_recommendations

In [45]:
get_top_recommendations(book_title= 'Harry Potter and the Prisoner of Azkaban (Book 3)' , similarity_matrix= similarity_scores, book_titles= user_book_pt.index)

['Harry Potter and the Goblet of Fire (Book 4)',
 'Harry Potter and the Chamber of Secrets (Book 2)',
 'Harry Potter and the Order of the Phoenix (Book 5)',
 "Harry Potter and the Sorcerer's Stone (Book 1)",
 "Harry Potter and the Sorcerer's Stone (Harry Potter (Paperback))"]