##**BOOK RECOMMENDER**
dataset=['https://www.kaggle.com/datasets/dylanjcastillo/7k-books-with-metadata/data?select=books.csv']

---



In [1]:
import pandas as pd

In [2]:
df=pd.read_csv('/content/books.csv')
df

Unnamed: 0,isbn13,isbn10,title,subtitle,authors,categories,thumbnail,description,published_year,average_rating,num_pages,ratings_count
0,9780002005883,0002005883,Gilead,,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,361.0
1,9780002261982,0002261987,Spider's Web,A Novel,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,5164.0
2,9780006163831,0006163831,The One Tree,,Stephen R. Donaldson,American fiction,http://books.google.com/books/content?id=OmQaw...,Volume Two of Stephen Donaldson's acclaimed se...,1982.0,3.97,479.0,172.0
3,9780006178736,0006178731,Rage of angels,,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,29532.0
4,9780006280897,0006280897,The Four Loves,,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,33684.0
...,...,...,...,...,...,...,...,...,...,...,...,...
6805,9788185300535,8185300534,I Am that,Talks with Sri Nisargadatta Maharaj,Sri Nisargadatta Maharaj;Sudhakar S. Dikshit,Philosophy,http://books.google.com/books/content?id=Fv_JP...,This collection of the timeless teachings of o...,1999.0,4.51,531.0,104.0
6806,9788185944609,8185944601,Secrets Of The Heart,,Khalil Gibran,Mysticism,http://books.google.com/books/content?id=XcrVp...,,1993.0,4.08,74.0,324.0
6807,9788445074879,8445074873,Fahrenheit 451,,Ray Bradbury,Book burning,,,2004.0,3.98,186.0,5733.0
6808,9789027712059,9027712050,The Berlin Phenomenology,,Georg Wilhelm Friedrich Hegel,History,http://books.google.com/books/content?id=Vy7Sk...,Since the three volume edition ofHegel's Philo...,1981.0,0.00,210.0,0.0


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6810 entries, 0 to 6809
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   isbn13          6810 non-null   int64  
 1   isbn10          6810 non-null   object 
 2   title           6810 non-null   object 
 3   subtitle        2381 non-null   object 
 4   authors         6738 non-null   object 
 5   categories      6711 non-null   object 
 6   thumbnail       6481 non-null   object 
 7   description     6548 non-null   object 
 8   published_year  6804 non-null   float64
 9   average_rating  6767 non-null   float64
 10  num_pages       6767 non-null   float64
 11  ratings_count   6767 non-null   float64
dtypes: float64(4), int64(1), object(7)
memory usage: 638.6+ KB


In [6]:
df.describe()

Unnamed: 0,isbn13,published_year,average_rating,num_pages,ratings_count
count,6810.0,6804.0,6767.0,6767.0,6767.0
mean,9780677000000.0,1998.630364,3.933284,348.181026,21069.1
std,606891100.0,10.484257,0.331352,242.376783,137620.7
min,9780002000000.0,1853.0,0.0,0.0,0.0
25%,9780330000000.0,1996.0,3.77,208.0,159.0
50%,9780553000000.0,2002.0,3.96,304.0,1018.0
75%,9780810000000.0,2005.0,4.13,420.0,5992.5
max,9789042000000.0,2019.0,5.0,3342.0,5629932.0


In [7]:
df.isnull().sum()

isbn13               0
isbn10               0
title                0
subtitle          4429
authors             72
categories          99
thumbnail          329
description        262
published_year       6
average_rating      43
num_pages           43
ratings_count       43
dtype: int64

- mainly books recommending based on description of the books ,there's 262 missing values in description column.so its gonna filled with empty strings.

In [8]:
df['description']=df['description'].fillna('')

- using TfidfVectorizer to convert text descriptions into a TF-IDF matrix.

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words="english")

tfidf_matrix = tfidf.fit_transform(df['description'])


-  retrieve the feature names (terms) from the TF-IDF matrix.

In [10]:
feature_names = tfidf.get_feature_names_out()

- calculates the cosine similarity between pairs of documents using the TF-IDF matrix obtained from the book descriptions.

In [12]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)


- A Series is created with book titles as index and their corresponding indices as values. duplicate book titles are removed, keeping only the last occurrence.







In [13]:
indices = pd.Series(df.index, index=df['title'])

In [14]:
indices = indices[~indices.index.duplicated(keep='last')]

**Function to Recommend Books Based on Similarity Scores**

In [18]:
def recommend_books(book_title, num_recommendations=10):
    book_index = indices[book_title]


    similarity_scores = pd.DataFrame(cosine_sim[book_index], columns=["score"])

    recommended_indices = similarity_scores.sort_values("score", ascending=False)[1:num_recommendations+1].index


    return df['title'].iloc[recommended_indices]

- exapmle usage by crime and punishment book

In [19]:
crime_and_punishment_recommendations = recommend_books("Crime and Punishment")
print(crime_and_punishment_recommendations)


845                  The Brothers Karamazov
4501                             Hard Eight
6129                   Notebooks, 1935-1951
6331       The Man who Watched Trains Go by
1295                            Euripides I
3603                            Taxi Driver
2766            The Cricket in Times Square
3053    Paradise Lost and Paradise Regained
1625                             Hard Eight
948                    A Tale of Two Cities
Name: title, dtype: object
