Importing Libraries:

pandas as pd: Imports the pandas library and assigns it the alias pd. Pandas is a powerful library used for data manipulation and analysis, particularly with data structures like DataFrames.

numpy as np: Imports the numpy library and assigns it the alias np. Numpy is used for numerical operations and handling arrays.

from transformers import BertTokenizer, BertModel: Imports BertTokenizer and BertModel from the transformers library. transformers is a library by Hugging Face that provides state-of-the-art machine learning models, including BERT (Bidirectional Encoder Representations from Transformers) for natural language processing tasks.

import torch: Imports the PyTorch library, which is used for tensor computations and deep learning.

from sklearn.metrics.pairwise import cosine_similarity: Imports the cosine_similarity function from scikit-learn's metrics.pairwise module. This function is used to compute the cosine similarity between two vectors, which measures the cosine of the angle between them and is often used in text similarity tasks.

import requests: Imports the requests library, which is used for making HTTP requests in Python.

import json: Imports the json library, which is used for working with JSON (JavaScript Object Notation) data.

In [25]:
import pandas as pd
import numpy as np
from transformers import BertTokenizer, BertModel
import torch
from sklearn.metrics.pairwise import cosine_similarity
import requests
import json


The fetch_book_data function queries the Google Books API to retrieve book information based on a search query. It allows for fetching a specified number of results and supports pagination to retrieve additional books if needed.

In [27]:
def extract_book_info(books):
    book_data = []
    for book in books:
        title = book.get('title', 'N/A')
        summary = book.get('summary', 'N/A')
        authors = ', '.join(book.get('authors', [])) if 'authors' in book else 'N/A'
        published_date = book.get('publishedDate', 'N/A')  # `publishedDate` may not be in all entries
        categories = ', '.join(book.get('categories', [])) if 'categories' in book else 'N/A'
        book_data.append([title, summary, authors, published_date, categories])
    return pd.DataFrame(book_data, columns=['Title', 'Summary', 'Authors', 'Published Date', 'Categories'])


In [28]:
import requests

def fetch_book_data(query, max_results=10, start_index=0, api_key=None):
    url = "https://www.googleapis.com/books/v1/volumes"
    params = {
        'q': query,
        'maxResults': max_results,
        'startIndex': start_index,
        'key': api_key
    }
    try:
        response = requests.get(url, params=params)
        response.raise_for_status()  # Raise an exception for HTTP errors
        data = response.json()
        books = []

        # Extract relevant book data
        for item in data.get('items', []):
            volume_info = item.get('volumeInfo', {})
            book = {
                'title': volume_info.get('title', 'No Title'),
                'authors': volume_info.get('authors', 'No Authors'),
                'summary': volume_info.get('description', 'No Description'),
                'categories': volume_info.get('categories', 'No Categories'),
            }
            books.append(book)
        return books
    except requests.exceptions.RequestException as e:
        print(f"An error occurred: {e}")
        return []

# Fetch and display book data
# Fetch book data
api_key = 'AIzaSyBTYViDZUzP-3iQMrOKxSsJEFClFAGPcl8'  # Replace with your API key
books = fetch_book_data("Harry Potter", api_key=api_key)
print(books)  # Print books to verify the data

# Convert list of dictionaries to DataFrame
df_books = extract_book_info(books)
print(df_books.head())  # Print DataFrame to verify the content






                                      Title  \
0         Harry Potter and the Cursed Child   
1  Harry Potter and the Prisoner of Azkaban   
2   Harry Potter and the Chamber of Secrets   
3      Harry Potter and the Deathly Hallows   
4    Harry Potter and the Half-Blood Prince   

                                             Summary  \
0  As an overworked employee of the Ministry of M...   
1  'Welcome to the Knight Bus, emergency transpor...   
2  'There is a plot, Harry Potter. A plot to make...   
3  "The final adventure in J.K. Rowling's phenome...   
4  There it was, hanging in the sky above the sch...   

                                    Authors Published Date        Categories  
0  J. K. Rowling, Jack Thorne, John Tiffany            N/A           Fiction  
1                              J.K. Rowling            N/A           Fiction  
2                              J.K. Rowling            N/A           Fiction  
3                             J. K. Rowling            N/A  Juve

The extract_book_info function processes a list of book data retrieved from the Google Books API and extracts relevant information. It converts the extracted information into a Pandas DataFrame for easy manipulation and analysis.

In [30]:
# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')


In [31]:
# Define function to encode summaries using BERT
def encode_summaries(summaries):
    encoded_summaries = []
    for summary in summaries:
        inputs = tokenizer(summary, return_tensors='pt', truncation=True, padding=True, max_length=512)
        outputs = model(**inputs)
        encoded_summaries.append(outputs.last_hidden_state.mean(dim=1).detach().numpy())
    return np.vstack(encoded_summaries)


In [32]:
# Encode book summaries
summaries = df_books['Summary'].tolist()
encoded_summaries = encode_summaries(summaries)


In [33]:
# Calculate cosine similarity
similarity_matrix = cosine_similarity(encoded_summaries)


In [34]:
# Define function to get book recommendations
def get_recommendations(title, similarity_matrix, df_books, top_n=5):
    book_idx = df_books[df_books['Title'] == title].index[0]
    similarity_scores = list(enumerate(similarity_matrix[book_idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    top_books_indices = [i[0] for i in similarity_scores[1:top_n+1]]
    return df_books.iloc[top_books_indices]


In [35]:
# Display recommendations
selected_title = df_books['Title'][0]  # For example, using the first book in the list
recommended_books = get_recommendations(selected_title, similarity_matrix, df_books)
print(f"Books similar to '{selected_title}':\n", recommended_books)


Books similar to 'Harry Potter and the Cursed Child':
                                       Title  \
4    Harry Potter and the Half-Blood Prince   
2   Harry Potter and the Chamber of Secrets   
1  Harry Potter and the Prisoner of Azkaban   
8                Harry Potter and the Other   
7               Harry Potter: A Pop-Up Book   

                                             Summary  \
4  There it was, hanging in the sky above the sch...   
2  'There is a plot, Harry Potter. A plot to make...   
1  'Welcome to the Knight Bus, emergency transpor...   
8  Contributions by Christina M. Chica, Kathryn C...   
7  The first ever illustrated Harry Potter pop-up...   

                                     Authors Published Date  \
4                               J.K. Rowling            N/A   
2                               J.K. Rowling            N/A   
1                               J.K. Rowling            N/A   
8  Sarah Park Dahlen, Ebony Elizabeth Thomas            N/A   
7         

In [36]:
# Add user ratings
user_ratings = {title: rating for title, rating in zip(df_books['Title'], np.random.randint(1, 6, len(df_books)))}
df_books['User Ratings'] = df_books['Title'].apply(lambda x: user_ratings.get(x, 0))


In [37]:
# Incorporate user ratings into the recommendation
def get_recommendations_with_ratings(title, similarity_matrix, df_books, top_n=5):
    book_idx = df_books[df_books['Title'] == title].index[0]
    similarity_scores = list(enumerate(similarity_matrix[book_idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    top_books_indices = [i[0] for i in similarity_scores[1:top_n+1]]
    top_books = df_books.iloc[top_books_indices]
    top_books['Weighted Score'] = top_books['User Ratings'] * 0.5 + top_books['Similarity'] * 0.5
    top_books = top_books.sort_values(by='Weighted Score', ascending=False)
    return top_books


In [38]:
# Display recommendations with user ratings
selected_title = df_books['Title'][0]  # For example, using the first book in the list
df_books['Similarity'] = similarity_matrix[0]
recommended_books_with_ratings = get_recommendations_with_ratings(selected_title, similarity_matrix, df_books)
print(f"Books similar to '{selected_title}' with user ratings:\n", recommended_books_with_ratings)


Books similar to 'Harry Potter and the Cursed Child' with user ratings:
                                       Title  \
2   Harry Potter and the Chamber of Secrets   
1  Harry Potter and the Prisoner of Azkaban   
8                Harry Potter and the Other   
4    Harry Potter and the Half-Blood Prince   
7               Harry Potter: A Pop-Up Book   

                                             Summary  \
2  'There is a plot, Harry Potter. A plot to make...   
1  'Welcome to the Knight Bus, emergency transpor...   
8  Contributions by Christina M. Chica, Kathryn C...   
4  There it was, hanging in the sky above the sch...   
7  The first ever illustrated Harry Potter pop-up...   

                                     Authors Published Date  \
2                               J.K. Rowling            N/A   
1                               J.K. Rowling            N/A   
8  Sarah Park Dahlen, Ebony Elizabeth Thomas            N/A   
4                               J.K. Rowling           

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  top_books['Weighted Score'] = top_books['User Ratings'] * 0.5 + top_books['Similarity'] * 0.5
