<a href="https://colab.research.google.com/github/Punukollu-Meghana/Book-Recommendation-System/blob/main/BookRecommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**numpy (as np)** for numerical computations

**pandas (as pd)** for data manipulation and analysis

**sklearn** for machine learning algorithms:

**KMeans** for clustering

**neighbors** for nearest neighbors algorithm

**train_test_split** for splitting data into training and testing sets

**MinMaxScaler** for scaling data

In [1]:
#importing libraries
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sklearn import neighbors
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

 loads the data from a CSV file named 'books.csv' into a Pandas DataFrame df. The "on_bad_lines='skip'" parameter tells Pandas to skip any lines with errors.

In [22]:
#loading data
df = pd.read_csv('/content/books.csv', on_bad_lines='skip')
df.head(30)

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling/Mary GrandPré,4.57,0439785960,9780439785969,eng,652,2095690,27591,9/16/2006,Scholastic Inc.
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling/Mary GrandPré,4.49,0439358078,9780439358071,eng,870,2153167,29221,9/1/2004,Scholastic Inc.
2,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.42,0439554896,9780439554893,eng,352,6333,244,11/1/2003,Scholastic
3,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling/Mary GrandPré,4.56,043965548X,9780439655484,eng,435,2339585,36325,5/1/2004,Scholastic Inc.
4,8,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling/Mary GrandPré,4.78,0439682584,9780439682589,eng,2690,41428,164,9/13/2004,Scholastic
5,9,"Unauthorized Harry Potter Book Seven News: ""Ha...",W. Frederick Zimmerman,3.74,0976540606,9780976540601,en-US,152,19,1,4/26/2005,Nimble Books
6,10,Harry Potter Collection (Harry Potter #1-6),J.K. Rowling,4.73,0439827604,9780439827607,eng,3342,28242,808,9/12/2005,Scholastic
7,12,The Ultimate Hitchhiker's Guide: Five Complete...,Douglas Adams,4.38,0517226952,9780517226957,eng,815,3628,254,11/1/2005,Gramercy Books
8,13,The Ultimate Hitchhiker's Guide to the Galaxy ...,Douglas Adams,4.38,0345453743,9780345453747,eng,815,249558,4080,4/30/2002,Del Rey Books
9,14,The Hitchhiker's Guide to the Galaxy (Hitchhik...,Douglas Adams,4.22,1400052920,9781400052929,eng,215,4930,460,8/3/2004,Crown


the **describe()** method provides summary statistics for the data.

The **df.head()** and **df.describe()** functions are used to display the first few rows of the data and provide a summary of the data, respectively.

In [5]:
df.describe()

Unnamed: 0,bookID,average_rating,isbn13,num_pages,ratings_count,text_reviews_count
count,11123.0,11123.0,11123.0,11123.0,11123.0,11123.0
mean,21310.856963,3.934075,9759880000000.0,336.405556,17942.85,542.048099
std,13094.727252,0.350485,442975800000.0,241.152626,112499.2,2576.619589
min,1.0,0.0,8987060000.0,0.0,0.0,0.0
25%,10277.5,3.77,9780345000000.0,192.0,104.0,9.0
50%,20287.0,3.96,9780582000000.0,299.0,745.0,47.0
75%,32104.5,4.14,9780872000000.0,416.0,5000.5,238.0
max,45641.0,5.0,9790008000000.0,6576.0,4597666.0,94265.0


In [6]:
#data pre-processing
df2 = df.copy()

creates a new column **rating_between** in df2 based on the **average_rating** column. The rating_between column is used to categorize the ratings into five bins.

In [7]:
df2.loc[ (df2['average_rating'] >= 0) & (df2['average_rating'] <= 1), 'rating_between'] = "between 0 and 1"
df2.loc[ (df2['average_rating'] > 1) & (df2['average_rating'] <= 2), 'rating_between'] = "between 1 and 2"
df2.loc[ (df2['average_rating'] > 2) & (df2['average_rating'] <= 3), 'rating_between'] = "between 2 and 3"
df2.loc[ (df2['average_rating'] > 3) & (df2['average_rating'] <= 4), 'rating_between'] = "between 3 and 4"
df2.loc[ (df2['average_rating'] > 4) & (df2['average_rating'] <= 5), 'rating_between'] = "between 4 and 5"

This section creates two new DataFrames **rating** and **lang** using the **get_dummies** method from Pandas. The get_dummies method *converts categorical variables into numerical variables*.

The **get_dummies** function is used to create binary columns for each category.

In [8]:
#feature engineering
rating=pd.get_dummies(df2['rating_between'])
lang=pd.get_dummies(df2['language_code'])

The **features** DataFrame is created by concatenating the **rating**, **lang**, **average_rating**, and **ratings_count **columns along the columns **axis (axis=1)**.

In [9]:
features = pd.concat([rating,
                      lang,
                      df2['average_rating'],
                      df2['ratings_count']

], axis=1)

In [10]:
# scaling the data
scaler=MinMaxScaler()

This section creates a **MinMaxScaler** object and uses it to scale the features DataFrame. The **fit_transform** method scales the data to a common range, usually between *0 and 1.*

In [11]:
features=scaler.fit_transform(features)

This section creates a **NearestNeighbors** object and uses it to find the nearest neighbors for each data point in the features DataFrame. The **n_neighbors** parameter specifies the number of nearest neighbors to find, and the algorithm parameter specifies the algorithm to use.

The **fit** method trains the model on the **features DataFrame**, and the **kneighbors** method finds the nearest neighbors for each data point

In [12]:
#nearest neighbours algorithm
model=neighbors.NearestNeighbors(n_neighbors=6, algorithm='ball_tree')
model.fit(features)
dist, idlist = model.kneighbors(features)

he **df2['title'] == book_title** part creates a boolean mask that selects only the rows where the title column matches the **book_title**. The **.index** attribute returns the index of the selected rows.

If the book was found, this line of code extracts the first (and only) index from the **id_b index**. This is because **id_b** is a pandas Index object, and we need to get the actual index value.

This loop iterates over the indices of the nearest neighbors for the book with index **id_b**. For each neighbor, it appends the title of the book to the **book_list_name** list.

Finally, the function returns the list of recommended books.



In [13]:
#book recommendation function
def get_book_recommendations(book_title):
    id_b = df2[df2['title'] == book_title].index
    if id_b.empty:
        print(f"Book '{book_title}' not found in the dataset.")
        return []
    else:
        id_b = id_b[0]
        book_list_name = []
        for newid in idlist[id_b]:
            book_list_name.append(df2.loc[newid].title)
        return book_list_name

This line of code prompts the user to enter a book title and stores the input in the **book_title** variable.

This line of code calls the **get_book_recommendations** function with the **book_title** as an argument and stores the result in the recommendations variable.

If there are recommendations, this block of code prints a header message and then iterates over the recommendations list using the enumerate function, which returns both the index and the value of each item in the list

In [14]:
def main():
    book_title = input("Enter the book title for recommendation: ")
    recommendations = get_book_recommendations(book_title)
    if recommendations:
        print("Book recommendations:")
        for i, book in enumerate(recommendations):
            print(f"{i+1}. {book}")
    else:
        print("No recommendations found.")

In summary, this function prompts the user to enter a book title, gets the book recommendations using the **get_book_recommendations** function, and prints the recommendations if there are any. If there are no recommendations, it prints a message indicating that no recommendations were found.

In [25]:
if __name__ == "__main__":
    main()


Enter the book title for recommendation: Dreamland
Book recommendations:
1. Dreamland
2. The Maltese Falcon
3. Tara Road
4. The Moonstone
5. Notes from a Small Island
6. Pygmalion


Let's say the user wants to get book recommendations for the following books:

1."Harry Potter and the Half-Blood Prince (Harry Potter #6)"  
2."To Kill a Mockingbird"  
3."The Lord of the Rings"  
4."Pride and Prejudice"  
5."The Hunger Games"