# BOOK RECOMMENDATION SYSTEM

## What is a Recommendation System and Why is it important ?

![Image](books.jpg)



A recommendation system is a software algorithm or system that suggests items, products, or content to users based on their preferences, behavior, or similarities to other users. These systems are designed to provide personalized and relevant recommendations, aiming to enhance user experience, engagement, and satisfaction.

Recommendation systems are crucial in various domains, including e-commerce, entertainment, social media, and content streaming platforms

We are going to build a Book recommendation system which can recommend us a book to read based on what are other read and also what books were rated the highest

We are going to use two types of methods for our Recommendation System namely Collaborative and Content Based Filtering , on this particular we will be using Content based filtering

## Content Based Filtering

This filtering method uses item features to recommend other items similar to what the user likes and also based on their previous actions or explicit feedback. If we consider the example for our book recommender system, the additional information can be the **Title**, the **Ratings**, the **Genre** e.t.c 

In [22]:
import pandas as pd
import numpy as np
import pickle
import re
import matplotlib as plt
import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


In [2]:
import gzip
with gzip.open(r"C:\Users\PC\OneDrive\Documents\Movie Recommendation System\goodreads_books.json.gz") as f:
    line = f.readline()

In [3]:
data = json.loads(line)


In [4]:
data

{'isbn': '0312853122',
 'text_reviews_count': '1',
 'series': [],
 'country_code': 'US',
 'language_code': '',
 'popular_shelves': [{'count': '3', 'name': 'to-read'},
  {'count': '1', 'name': 'p'},
  {'count': '1', 'name': 'collection'},
  {'count': '1', 'name': 'w-c-fields'},
  {'count': '1', 'name': 'biography'}],
 'asin': '',
 'is_ebook': 'false',
 'average_rating': '4.00',
 'kindle_asin': '',
 'similar_books': [],
 'description': '',
 'format': 'Paperback',
 'link': 'https://www.goodreads.com/book/show/5333265-w-c-fields',
 'authors': [{'author_id': '604031', 'role': ''}],
 'publisher': "St. Martin's Press",
 'num_pages': '256',
 'publication_day': '1',
 'isbn13': '9780312853129',
 'publication_month': '9',
 'edition_information': '',
 'publication_year': '1984',
 'url': 'https://www.goodreads.com/book/show/5333265-w-c-fields',
 'image_url': 'https://images.gr-assets.com/books/1310220028m/5333265.jpg',
 'book_id': '5333265',
 'ratings_count': '3',
 'work_id': '5400751',
 'title': '

In [5]:
def parse_fields(line):
    data = json.loads(line)
    return {
        "book_id" : data["book_id"],
        "title" : data["title_without_series"],
        "ratings" : data["ratings_count"],
        "url" : data["url"],
        "cover_image" : data["image_url"]
        }

In [6]:
book_titles = []
with gzip.open(r"C:\Users\PC\OneDrive\Documents\Movie Recommendation System\goodreads_books.json.gz" ,"r") as f:
    while True:
        line = f.readline()
        if not line:
            break
        fields = parse_fields(line)
        try :
            ratings = int(fields["ratings"])
        except ValueError:
            continue
        if ratings > 15:
            book_titles.append(fields)


Convert gzip file and preprocess to Pandas DataFrame

In [7]:
titles = pd.DataFrame.from_dict(book_titles)

Data Cleaning

In [8]:
# Turn ratings column dtype to numeric 
titles["ratings"] = pd.to_numeric(titles['ratings'])

In [9]:
# Clean Title column
titles["title_clean"] = titles["title"].replace(r"[^\w\s]", '', regex=True)

In [10]:
# lower characters 
titles["title_clean"] = titles["title_clean"].str.lower()

In [11]:
#remove spaces in a row with single space
titles["title_clean"] = titles["title_clean"].str.replace("\s+"," ",regex=True)

In [12]:
#remove any columns with no title 
titles = titles[titles["title_clean"].str.len()>0]

In [13]:
titles 

Unnamed: 0,book_id,title,ratings,url,cover_image,title_clean
0,7327624,"The Unschooled Wizard (Sun Wolf and Starhawk, ...",140,https://www.goodreads.com/book/show/7327624-th...,https://images.gr-assets.com/books/1304100136m...,the unschooled wizard sun wolf and starhawk 12
1,6066819,Best Friends Forever,51184,https://www.goodreads.com/book/show/6066819-be...,https://s.gr-assets.com/assets/nophoto/book/11...,best friends forever
2,287141,The Aeneid for Boys and Girls,46,https://www.goodreads.com/book/show/287141.The...,https://s.gr-assets.com/assets/nophoto/book/11...,the aeneid for boys and girls
3,6066812,All's Fairy in Love and War (Avalon: Web of Ma...,98,https://www.goodreads.com/book/show/6066812-al...,https://images.gr-assets.com/books/1316637798m...,alls fairy in love and war avalon web of magic 8
4,287149,The Devil's Notebook,986,https://www.goodreads.com/book/show/287149.The...,https://images.gr-assets.com/books/1328768789m...,the devils notebook
...,...,...,...,...,...,...
1308952,17805813,"Ondine (Ondine Quartet, #0.5)",327,https://www.goodreads.com/book/show/17805813-o...,https://images.gr-assets.com/books/1379766592m...,ondine ondine quartet 05
1308953,331839,Jacqueline Kennedy Onassis: Friend of the Arts,18,https://www.goodreads.com/book/show/331839.Jac...,https://s.gr-assets.com/assets/nophoto/book/11...,jacqueline kennedy onassis friend of the arts
1308954,2685097,The Spaniard's Blackmailed Bride,112,https://www.goodreads.com/book/show/2685097-th...,https://s.gr-assets.com/assets/nophoto/book/11...,the spaniards blackmailed bride
1308955,2342551,The Children's Classic Poetry Collection,36,https://www.goodreads.com/book/show/2342551.Th...,https://s.gr-assets.com/assets/nophoto/book/11...,the childrens classic poetry collection


In [14]:
titles.to_json("book_titles.json")

In [15]:
titles

Unnamed: 0,book_id,title,ratings,url,cover_image,title_clean
0,7327624,"The Unschooled Wizard (Sun Wolf and Starhawk, ...",140,https://www.goodreads.com/book/show/7327624-th...,https://images.gr-assets.com/books/1304100136m...,the unschooled wizard sun wolf and starhawk 12
1,6066819,Best Friends Forever,51184,https://www.goodreads.com/book/show/6066819-be...,https://s.gr-assets.com/assets/nophoto/book/11...,best friends forever
2,287141,The Aeneid for Boys and Girls,46,https://www.goodreads.com/book/show/287141.The...,https://s.gr-assets.com/assets/nophoto/book/11...,the aeneid for boys and girls
3,6066812,All's Fairy in Love and War (Avalon: Web of Ma...,98,https://www.goodreads.com/book/show/6066812-al...,https://images.gr-assets.com/books/1316637798m...,alls fairy in love and war avalon web of magic 8
4,287149,The Devil's Notebook,986,https://www.goodreads.com/book/show/287149.The...,https://images.gr-assets.com/books/1328768789m...,the devils notebook
...,...,...,...,...,...,...
1308952,17805813,"Ondine (Ondine Quartet, #0.5)",327,https://www.goodreads.com/book/show/17805813-o...,https://images.gr-assets.com/books/1379766592m...,ondine ondine quartet 05
1308953,331839,Jacqueline Kennedy Onassis: Friend of the Arts,18,https://www.goodreads.com/book/show/331839.Jac...,https://s.gr-assets.com/assets/nophoto/book/11...,jacqueline kennedy onassis friend of the arts
1308954,2685097,The Spaniard's Blackmailed Bride,112,https://www.goodreads.com/book/show/2685097-th...,https://s.gr-assets.com/assets/nophoto/book/11...,the spaniards blackmailed bride
1308955,2342551,The Children's Classic Poetry Collection,36,https://www.goodreads.com/book/show/2342551.Th...,https://s.gr-assets.com/assets/nophoto/book/11...,the childrens classic poetry collection


Creating Search Engine

In [16]:
vectorizer = TfidfVectorizer()

In [17]:
tfidf = vectorizer.fit_transform(titles["title_clean"])

In [18]:
def show_book_image(val):
    return '<img src="{}" width = 100></img'.format(val)

In [19]:
def search(query,vectorizer):
    processed = re.sub(r"[^\w\s]", '',query.lower())
    query_vect = vectorizer.transform([processed])
    similarity = cosine_similarity(query_vect,tfidf).flatten()
    indices = np.argpartition(similarity,-10)[-10:]
    results = titles.iloc[indices]
    results = results.sort_values("ratings",ascending=False)
    return results.head(5).style.format({'cover_image': show_book_image})

In [20]:
search("The Devils Notebook" ,vectorizer)

Unnamed: 0,book_id,title,ratings,url,cover_image,title_clean
374879,15931,"The Notebook (The Notebook, #1)",1064723,https://www.goodreads.com/book/show/15931.The_Notebook,,the notebook the notebook 1
544207,336903,The Notebook,2243,https://www.goodreads.com/book/show/336903.The_Notebook,,the notebook
749489,1250158,The Notebook,1273,https://www.goodreads.com/book/show/1250158.The_Notebook,,the notebook
1258536,3472,The Notebook,1270,https://www.goodreads.com/book/show/3472.The_Notebook,,the notebook
4,287149,The Devil's Notebook,986,https://www.goodreads.com/book/show/287149.The_Devil_s_Notebook,,the devils notebook
