# **Book Recommendation System**

## **Objective**
The goal is to develop a **book recommendation system** that suggests books to users based on their past interactions.  
We will implement multiple recommendation techniques, combining **collaborative filtering, content-based filtering, and hybrid models** to provide personalized recommendations.

## **Criteria for Users and Books**
To ensure meaningful recommendations, we apply the following filtering criteria:

1. **Users Selection**  
   - Consider **only users who have rated at least 200 books**.  
   - This ensures that recommendations are based on users with enough reading history.
   
2. **Books Selection**  
   - Include **only books that have received at least 50 ratings**.  
   - This ensures that the books considered are popular enough to have reliable ratings.

---

## **Recommendation Models Used**

### **1️⃣ Collaborative Filtering (CF)**
Collaborative filtering recommends books based on user-book interaction patterns. We will use two subtypes:

- **User-Based CF**: Finds similar users and recommends books that they have rated highly.
- **Item-Based CF**: Recommends books similar to those the user has rated highly.
- **Matrix Factorization (SVD, ALS)**:  
  - **SVD (Singular Value Decomposition)**: Reduces data sparsity by identifying hidden patterns in user-book interactions.  
  - **ALS (Alternating Least Squares)**: A matrix factorization technique optimized for sparse datasets.

---

### **2️⃣ Content-Based Filtering (CBF)**
This model recommends books based on their **metadata (title, author, genre, etc.)** rather than user interactions.  
- **TF-IDF (Term Frequency-Inverse Document Frequency)**: Extracts important words from book titles and descriptions to compute similarity.
- **Cosine Similarity**: Measures how similar two books are based on their descriptions.

---

### **3️⃣ Hybrid Model**
A combination of **collaborative filtering and content-based filtering** to leverage both user preferences and book attributes for better recommendations.

---

### **4️⃣ Cosine Similarity-Based Model**
Instead of relying on matrix factorization, we use **pure cosine similarity** to recommend books:
- **User-Item Cosine Similarity**: Finds similar users and recommends books they liked.
- **Item-Item Cosine Similarity**: Recommends books similar to the ones a user has rated highly.

---

## **Final Output**
The system will generate book recommendations for users using:
✅ **User-based CF, Item-based CF**  
✅ **SVD & ALS (Matrix Factorization)**  
✅ **Content-Based Filtering**  
✅ **Hybrid Filtering**  
✅ **Cosine Similarity-Based Recommendation**  

This ensures a **robust and diverse recommendation system** that provides personalized book suggestions. 📚✨


In [None]:
import pandas as pd
import numpy as np
from surprise import Dataset, Reader, SVD, KNNBasic
from surprise.model_selection import train_test_split
from implicit.als import AlternatingLeastSquares
import scipy.sparse as sparse
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load dataset
df = pd.read_csv("book_ratings.csv")  # Update with actual file path

# Selecting relevant columns
df = df[['User-ID', 'ISBN', 'Book-Rating', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher']]

# Fill missing values
df.fillna('', inplace=True)

# Convert ISBN to categorical codes
df['ISBN'] = df['ISBN'].astype('category').cat.codes

# Prepare data for collaborative filtering
reader = Reader(rating_scale=(df['Book-Rating'].min(), df['Book-Rating'].max()))
data = Dataset.load_from_df(df[['User-ID', 'ISBN', 'Book-Rating']], reader)
trainset, testset = train_test_split(data, test_size=0.2)

# -------------------------
# 1️⃣ User-Based & Item-Based Collaborative Filtering
# -------------------------
def train_knn_models():
    """Train User-Based and Item-Based k-NN Collaborative Filtering models."""
    user_knn = KNNBasic(sim_options={'user_based': True})
    item_knn = KNNBasic(sim_options={'user_based': False})
    user_knn.fit(trainset)
    item_knn.fit(trainset)
    return user_knn, item_knn

user_knn, item_knn = train_knn_models()

def user_based_recommend(user_id, n=5):
    """Recommend books based on User-Based Collaborative Filtering."""
    book_ids = df['ISBN'].unique()
    predictions = [user_knn.predict(user_id, book_id).est for book_id in book_ids]
    return df.iloc[np.argsort(predictions)[-n:]][['Book-Title']]

def item_based_recommend(book_id, n=5):
    """Recommend books based on Item-Based Collaborative Filtering."""
    book_index = df[df['ISBN'] == book_id].index[0]
    similarities = item_knn.sim[book_index]
    similar_indices = np.argsort(similarities)[-n-1:-1]
    return df.iloc[similar_indices][['Book-Title']]

# -------------------------
# 2️⃣ Matrix Factorization (SVD)
# -------------------------
svd = SVD()
svd.fit(trainset)

def svd_recommend(user_id, n=5):
    """Recommend books based on SVD Matrix Factorization."""
    book_ids = df['ISBN'].unique()
    predictions = [svd.predict(user_id, book_id).est for book_id in book_ids]
    return df.iloc[np.argsort(predictions)[-n:]][['Book-Title']]

# -------------------------
# 3️⃣ Alternating Least Squares (ALS)
# -------------------------
ratings_matrix = sparse.csr_matrix((df['Book-Rating'], (df['User-ID'], df['ISBN'])))

als_model = AlternatingLeastSquares(factors=10, regularization=0.1)
als_model.fit(ratings_matrix)

def als_recommend(user_id, n=5):
    """Recommend books based on ALS Matrix Factorization."""
    recommendations = als_model.recommend(user_id, ratings_matrix, N=n)
    book_ids = [rec[0] for rec in recommendations]
    return df[df['ISBN'].isin(book_ids)][['Book-Title']]

# -------------------------
# 4️⃣ Content-Based Filtering (TF-IDF + Cosine Similarity)
# -------------------------
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['Book-Title'])

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

def content_recommend(book_title, n=5):
    """Recommend books based on Content-Based Filtering (TF-IDF + Cosine Similarity)."""
    book_index = df[df['Book-Title'] == book_title].index[0]
    similar_indices = np.argsort(cosine_sim[book_index])[::-1][1:n+1]
    return df.iloc[similar_indices][['Book-Title']]

# -------------------------
# 5️⃣ Hybrid Recommendation System
# -------------------------
def hybrid_recommend(user_id, book_title, n=5):
    """Generate recommendations using a hybrid approach combining CF and Content-Based Filtering."""
    user_recs = user_based_recommend(user_id, n)
    item_recs = item_based_recommend(df[df['Book-Title'] == book_title]['ISBN'].values[0], n)
    svd_recs = svd_recommend(user_id, n)
    als_recs = als_recommend(user_id, n)
    content_recs = content_recommend(book_title, n)

    hybrid_recs = pd.concat([user_recs, item_recs, svd_recs, als_recs, content_recs]).drop_duplicates().reset_index(drop=True)
    return hybrid_recs

# -------------------------
# Example Usage
# -------------------------
if __name__ == "__main__":
    user_id = 123  # Example user ID
    book_title = "The Da Vinci Code"  # Example book title

    print("\n📚 User-Based Collaborative Filtering Recommendations:")
    print(user_based_recommend(user_id))

    print("\n📖 Item-Based Collaborative Filtering Recommendations:")
    print(item_based_recommend(1))  # Replace with a valid ISBN code

    print("\n🔢 SVD-Based Recommendations:")
    print(svd_recommend(user_id))

    print("\n📊 ALS-Based Recommendations:")
    print(als_recommend(user_id))

    print("\n📜 Content-Based Recommendations:")
    print(content_recommend(book_title))

    print("\n🔄 Hybrid Recommendations:")
    print(hybrid_recommend(user_id, book_title))


In [2]:
import pandas as pd
import pickle
import warnings

warnings.filterwarnings('ignore')
%matplotlib inline

In [3]:
data = pd.read_csv("../artifacts/cleaned_data.csv",encoding='ISO-8859-1')

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1031136 entries, 0 to 1031135
Data columns (total 12 columns):
 #   Column               Non-Null Count    Dtype  
---  ------               --------------    -----  
 0   User-ID              1031136 non-null  int64  
 1   ISBN                 1031136 non-null  object 
 2   Book-Rating          1031136 non-null  int64  
 3   Book-Title           1031136 non-null  object 
 4   Book-Author          1031134 non-null  object 
 5   Year-Of-Publication  1031136 non-null  float64
 6   Publisher            1031134 non-null  object 
 7   Image-URL-M          1031136 non-null  object 
 8   Age                  1031136 non-null  float64
 9   City                 1013763 non-null  object 
 10  State                987803 non-null   object 
 11  Country              992503 non-null   object 
dtypes: float64(2), int64(2), object(8)
memory usage: 94.4+ MB


In [3]:
data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,Age,City,State,Country,Age Group
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002.0,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,35.0,tyler,texas,usa,Young Adults
1,276726,0155061224,5,Rites of Passage,Judith Rae,2001.0,Heinle,http://images.amazon.com/images/P/0155061224.0...,35.0,seattle,washington,usa,Young Adults
2,276727,0446520802,0,The Notebook,Nicholas Sparks,1996.0,Warner Books,http://images.amazon.com/images/P/0446520802.0...,16.0,h,new south wales,australia,Teens
3,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999.0,Cambridge University Press,http://images.amazon.com/images/P/052165615X.0...,16.0,rijeka,,croatia,Teens
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...,Sue Leather,2001.0,Cambridge University Press,http://images.amazon.com/images/P/0521795028.0...,16.0,rijeka,,croatia,Teens
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1031131,276704,0876044011,0,Edgar Cayce on the Akashic Records: The Book o...,Kevin J. Todeschi,1998.0,A.R.E. Press (Association of Research &amp; Enlig,http://images.amazon.com/images/P/0876044011.0...,35.0,cedar park,texas,usa,Young Adults
1031132,276704,1563526298,9,Get Clark Smart : The Ultimate Guide for the S...,Clark Howard,2000.0,Longstreet Press,http://images.amazon.com/images/P/1563526298.0...,36.0,cedar park,texas,usa,Middle-aged
1031133,276706,0679447156,0,Eight Weeks to Optimum Health: A Proven Progra...,Andrew Weil,1997.0,Alfred A. Knopf,http://images.amazon.com/images/P/0679447156.0...,18.0,quebec,quebec,canada,Teens
1031134,276709,0515107662,10,The Sherbrooke Bride (Bride Trilogy (Paperback)),Catherine Coulter,1996.0,Jove Books,http://images.amazon.com/images/P/0515107662.0...,38.0,mannington,west virginia,usa,Middle-aged


### **Grouping the user with book ratings count** 

In [4]:
data.groupby('User-ID')['Book-Rating'].count().reset_index()


Unnamed: 0,User-ID,Book-Rating
0,2,1
1,8,17
2,9,3
3,10,1
4,12,1
...,...,...
92101,278846,1
92102,278849,4
92103,278851,23
92104,278852,1


- Only 92k out of total user have rated the books
- Majority of the user haven't rated the books

### **Flitering the data where user have rated more than equal to 200 books**

In [None]:
users_with_200_ratings_data = data.loc[data.groupby('User-ID')['Book-Rating'].transform('count') >= 200]

In [6]:
users_with_200_ratings_data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,Age,City,State,Country,Age Group
1150,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994.0,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,48.0,gilbert,arizona,usa,Middle-aged
1151,277427,0026217457,0,Vegetarian Times Complete Cookbook,Lucy Moll,1995.0,John Wiley &amp; Sons,http://images.amazon.com/images/P/0026217457.0...,48.0,gilbert,arizona,usa,Middle-aged
1152,277427,003008685X,8,Pioneers,James Fenimore Cooper,1974.0,Thomson Learning,http://images.amazon.com/images/P/003008685X.0...,48.0,gilbert,arizona,usa,Middle-aged
1153,277427,0030615321,0,"Ask for May, Settle for June (A Doonesbury book)",G. B. Trudeau,1982.0,Henry Holt &amp; Co,http://images.amazon.com/images/P/0030615321.0...,48.0,gilbert,arizona,usa,Middle-aged
1154,277427,0060002050,0,On a Wicked Dawn (Cynster Novels),Stephanie Laurens,2002.0,Avon Books,http://images.amazon.com/images/P/0060002050.0...,48.0,gilbert,arizona,usa,Middle-aged
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1029357,275970,1931868123,0,There's a Porcupine in My Outhouse: Misadventu...,Mike Tougias,2002.0,Capital Books (VA),http://images.amazon.com/images/P/1931868123.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029358,275970,3411086211,10,Die Biene.,Sybil GrÃ?ÃÂ¤fin SchÃ?ÃÂ¶nfeldt,1993.0,"Bibliographisches Institut, Mannheim",http://images.amazon.com/images/P/3411086211.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029359,275970,3829021860,0,The Penis Book,Joseph Cohen,1999.0,Konemann,http://images.amazon.com/images/P/3829021860.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029360,275970,4770019572,0,Musashi,Eiji Yoshikawa,1995.0,Kodansha International (JPN),http://images.amazon.com/images/P/4770019572.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged


- Users with **200+** rating have rated **50%** of the books

---

### **Fliter this filtered data on books with 50+ ratings**

In [7]:
users_with_200_ratings_data.groupby('Book-Title')['Book-Rating'].count().reset_index()

Unnamed: 0,Book-Title,Book-Rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
...,...,...
156132,Ã?Ã?ber das Fernsehen.,2
156133,Ã?Ã?ber die Pflicht zum Ungehorsam gegen den...,3
156134,Ã?Ã?lpiraten.,1
156135,Ã?Ã?stlich der Berge.,1


In [8]:
final_filtered_data = users_with_200_ratings_data.loc[users_with_200_ratings_data.groupby('Book-Title')['Book-Rating'].transform('count') >= 50]

In [9]:
final_filtered_data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,Age,City,State,Country,Age Group
1150,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994.0,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,48.0,gilbert,arizona,usa,Middle-aged
1163,277427,0060930535,0,The Poisonwood Bible: A Novel,Barbara Kingsolver,1999.0,Perennial,http://images.amazon.com/images/P/0060930535.0...,48.0,gilbert,arizona,usa,Middle-aged
1165,277427,0060934417,0,Bel Canto: A Novel,Ann Patchett,2002.0,Perennial,http://images.amazon.com/images/P/0060934417.0...,48.0,gilbert,arizona,usa,Middle-aged
1168,277427,0061009059,9,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995.0,HarperTorch,http://images.amazon.com/images/P/0061009059.0...,48.0,gilbert,arizona,usa,Middle-aged
1174,277427,006440188X,0,The Secret Garden,Frances Hodgson Burnett,1998.0,HarperTrophy,http://images.amazon.com/images/P/006440188X.0...,48.0,gilbert,arizona,usa,Middle-aged
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1029196,275970,1400031354,0,Tears of the Giraffe (No.1 Ladies Detective Ag...,Alexander McCall Smith,2002.0,Anchor,http://images.amazon.com/images/P/1400031354.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029197,275970,1400031362,0,Morality for Beautiful Girls (No.1 Ladies Dete...,Alexander McCall Smith,2002.0,Anchor,http://images.amazon.com/images/P/1400031362.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029270,275970,1573229725,0,Fingersmith,Sarah Waters,2002.0,Riverhead Books,http://images.amazon.com/images/P/1573229725.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029309,275970,1586210661,9,Me Talk Pretty One Day,David Sedaris,2001.0,Time Warner Audio Major,http://images.amazon.com/images/P/1586210661.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged


- only **58k** books have ratings more than 50 and rated by top users(**>=200 ratings**)

---

### **Saving this as pickel file**

In [10]:
def save_dataframe_to_pickle(df, file_name):
    """
    Saves a DataFrame to a pickle file using pickle.dump().
    
    Parameters:
    df (pd.DataFrame): The DataFrame to be saved.
    file_name (str): The name of the pickle file to store the DataFrame.
    """
    with open(file_name, 'wb') as file:
        pickle.dump(df, file)
    print("DataFrame saved as top_60_books.pkl")
    
save_dataframe_to_pickle(df=final_filtered_data,file_name='../artifacts/final_filtered_data.pkl')

DataFrame saved as top_60_books.pkl


# **Creating the User-Book Interaction Matrix for Recommendations**

## **Why Do We Need This Matrix?**
To implement **collaborative filtering**, we require a structured representation of user-book interactions.  
A **pivot table** helps us transform raw data into a **User-Book interaction matrix**, where:
- **Rows represent books (Book-Title).**
- **Columns represent users (User-ID).**
- **Values represent ratings given by users to books.**

This matrix allows us to analyze **user behavior patterns** and find similarities between users or books.

## **Why is This Approach Effective?**
- The matrix enables **pattern recognition** in user behavior.
- It allows us to compute **similarities between users or books**.
- It forms the foundation for **personalized book recommendations**.

🚀 **This matrix is the backbone of our collaborative filtering-based recommendation system!**

### **Pivot Table**

In [11]:
final_filtered_data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-M,Age,City,State,Country,Age Group
1150,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994.0,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,48.0,gilbert,arizona,usa,Middle-aged
1163,277427,0060930535,0,The Poisonwood Bible: A Novel,Barbara Kingsolver,1999.0,Perennial,http://images.amazon.com/images/P/0060930535.0...,48.0,gilbert,arizona,usa,Middle-aged
1165,277427,0060934417,0,Bel Canto: A Novel,Ann Patchett,2002.0,Perennial,http://images.amazon.com/images/P/0060934417.0...,48.0,gilbert,arizona,usa,Middle-aged
1168,277427,0061009059,9,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995.0,HarperTorch,http://images.amazon.com/images/P/0061009059.0...,48.0,gilbert,arizona,usa,Middle-aged
1174,277427,006440188X,0,The Secret Garden,Frances Hodgson Burnett,1998.0,HarperTrophy,http://images.amazon.com/images/P/006440188X.0...,48.0,gilbert,arizona,usa,Middle-aged
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1029196,275970,1400031354,0,Tears of the Giraffe (No.1 Ladies Detective Ag...,Alexander McCall Smith,2002.0,Anchor,http://images.amazon.com/images/P/1400031354.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029197,275970,1400031362,0,Morality for Beautiful Girls (No.1 Ladies Dete...,Alexander McCall Smith,2002.0,Anchor,http://images.amazon.com/images/P/1400031362.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029270,275970,1573229725,0,Fingersmith,Sarah Waters,2002.0,Riverhead Books,http://images.amazon.com/images/P/1573229725.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged
1029309,275970,1586210661,9,Me Talk Pretty One Day,David Sedaris,2001.0,Time Warner Audio Major,http://images.amazon.com/images/P/1586210661.0...,46.0,pittsburgh,pennsylvania,usa,Middle-aged


In [12]:
user_book_pt =  final_filtered_data.pivot_table(index= 'Book-Title', columns= 'User-ID', values= 'Book-Rating')

In [13]:
user_book_pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,10.0,,,,,,0.0,,,
1st to Die: A Novel,,,,,,,,,,9.0,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,0.0,...,,,,,,0.0,,,0.0,
4 Blondes,,,,,,,,0.0,,,...,,,,,,,,,,
A Bend in the Road,0.0,,7.0,,,,,,,,...,,0.0,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,,0.0,...,,9.0,,,,,0.0,,,
You Belong To Me,,,,,,,,,0.0,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,0.0,,,...,,,,,,,0.0,,,
Zoya,,,,,,,,,,,...,,0.0,,,,,,,,


- This is a sparse matrix

In [14]:
user_book_pt.fillna(value=0, inplace= True)

In [15]:
user_book_pt

User-ID,254,2276,2766,2977,3363,4017,4385,6251,6323,6543,...,271705,273979,274004,274061,274301,274308,275970,277427,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Bend in the Road,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### **Saving this as pickel file**

In [16]:
def save_dataframe_to_pickle(df, file_name):
    """
    Saves a DataFrame to a pickle file using pickle.dump().
    
    Parameters:
    df (pd.DataFrame): The DataFrame to be saved.
    file_name (str): The name of the pickle file to store the DataFrame.
    """
    with open(file_name, 'wb') as file:
        pickle.dump(df, file)
    print("DataFrame saved as top_60_books.pkl")
    
save_dataframe_to_pickle(df=user_book_pt,file_name='../artifacts/user_book_pt.pkl')

DataFrame saved as top_60_books.pkl


### **Cosine Similarity**

In [17]:
from sklearn.metrics.pairwise import cosine_similarity

In [18]:
def calculate_cosine_similarity(matrix):
    """
    Computes the cosine similarity between rows of a given matrix.

    Parameters:
    matrix (pd.DataFrame): The user-book interaction matrix.

    Returns:
    np.ndarray: A square matrix containing cosine similarity scores.
    """
    return cosine_similarity(matrix)

similarity_scores = calculate_cosine_similarity(matrix= user_book_pt)

In [19]:
similarity_scores

array([[1.        , 0.0999137 , 0.01189468, ..., 0.11799012, 0.07158663,
        0.04205081],
       [0.0999137 , 1.        , 0.2364573 , ..., 0.07446129, 0.16773875,
        0.14263397],
       [0.01189468, 0.2364573 , 1.        , ..., 0.04558758, 0.04938579,
        0.10796119],
       ...,
       [0.11799012, 0.07446129, 0.04558758, ..., 1.        , 0.07085128,
        0.0196177 ],
       [0.07158663, 0.16773875, 0.04938579, ..., 0.07085128, 1.        ,
        0.10602962],
       [0.04205081, 0.14263397, 0.10796119, ..., 0.0196177 , 0.10602962,
        1.        ]], shape=(707, 707))

In [20]:
similarity_scores.shape

(707, 707)

### **Saving this as pickel file**

In [21]:
def save_dataframe_to_pickle(df, file_name):
    """
    Saves a DataFrame to a pickle file using pickle.dump().
    
    Parameters:
    df (pd.DataFrame): The DataFrame to be saved.
    file_name (str): The name of the pickle file to store the DataFrame.
    """
    with open(file_name, 'wb') as file:
        pickle.dump(df, file)
    print(f"DataFrame saved as {file_name}")
    
save_dataframe_to_pickle(df=similarity_scores,file_name='../artifacts/similarity_scores.pkl')

DataFrame saved as ../artifacts/similarity_scores.pkl


In [22]:
def get_top_recommendations(book_title, similarity_matrix, book_titles, book_dataset, top_n=5):
    """
    Retrieves the top N book recommendations based on cosine similarity, along with Title, Author, and Image URL.

    Parameters:
    book_title (str): The title of the book for which recommendations are needed.
    similarity_matrix (np.ndarray): Precomputed cosine similarity matrix.
    book_titles (pd.Index): Index containing book titles corresponding to the matrix rows.
    book_dataset (pd.DataFrame): Dataset containing book information (Title, Author, Image URL).
    top_n (int, optional): Number of recommendations to return. Default is 5.

    Returns:
    list: A list of dictionaries containing Title, Author, and Image URL of the top N recommended books.
    """
    if book_title not in book_titles:
        return [{"message": "Book not found in dataset"}]

    book_idx = book_titles.get_loc(book_title)
    similarity_scores = list(enumerate(similarity_matrix[book_idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    top_recommendations = []
    
    # Collect top N books' Title, Author, and Image URL
    for i, _ in similarity_scores[1:top_n+1]:
        book_info = book_dataset[book_dataset['Book-Title'] == book_titles[i]].iloc[0]
        top_recommendations.append({
            "Title": book_info["Book-Title"],
            "Author": book_info["Book-Author"],
            "Image URL": book_info["Image-URL-M"]
        })
    
    return top_recommendations

In [23]:
get_top_recommendations(book_title= 'Harry Potter and the Prisoner of Azkaban (Book 3)' , similarity_matrix= similarity_scores, book_titles= user_book_pt.index,book_dataset=final_filtered_data)

[{'Title': 'Harry Potter and the Goblet of Fire (Book 4)',
  'Author': 'J. K. Rowling',
  'Image URL': 'http://images.amazon.com/images/P/0439139597.01.MZZZZZZZ.jpg'},
 {'Title': 'Harry Potter and the Chamber of Secrets (Book 2)',
  'Author': 'J. K. Rowling',
  'Image URL': 'http://images.amazon.com/images/P/0439064872.01.MZZZZZZZ.jpg'},
 {'Title': 'Harry Potter and the Order of the Phoenix (Book 5)',
  'Author': 'J. K. Rowling',
  'Image URL': 'http://images.amazon.com/images/P/043935806X.01.MZZZZZZZ.jpg'},
 {'Title': "Harry Potter and the Sorcerer's Stone (Book 1)",
  'Author': 'J. K. Rowling',
  'Image URL': 'http://images.amazon.com/images/P/043936213X.01.MZZZZZZZ.jpg'},
 {'Title': "Harry Potter and the Sorcerer's Stone (Harry Potter (Paperback))",
  'Author': 'J. K. Rowling',
  'Image URL': 'http://images.amazon.com/images/P/059035342X.01.MZZZZZZZ.jpg'}]