# 4.0 — Sentiment Analysis for the Book Recommender Project  
<img src="./df8ab45d-e611-447d-b7f0-c962fd19343c (1).png" width="350" height="300">  

In this notebook, we will explore **sentiment analysis** as part of our **semantic book recommender system**.  
The goal is to **analyze the emotional tone of book descriptions or reviews**, which can help improve recommendations by understanding **user preferences and book sentiments**.

###  What is Sentiment Analysis?

- **Sentiment analysis** is a **natural language processing (NLP) task** that determines the emotional tone of a piece of text.  
- Typically, it classifies text as:
  - **Positive**
  - **Negative**
  - **Neutral**  
- Advanced models can also detect more nuanced emotions, such as **joy, anger, sadness, or excitement**.

###  Why Use Sentiment Analysis in a Book Recommender?

- Helps **enhance user experience** by recommending books that match the user’s preferred emotional tone.  
- Allows filtering books by sentiment, e.g., recommending **uplifting books** or avoiding **negative-tone books**.  
- Supports **semantic search** by adding an **emotional dimension** to purely semantic similarity.  
- Can be used to **analyze user reviews** and adjust recommendations based on feedback.

###  Dataset

- We will use the **cleaned books dataset** and optionally include **user reviews or descriptions**.  
- Each text chunk will be analyzed for sentiment, which can later be integrated into the **recommendation scoring**.

###  What We Will Do in This Notebook

- Use **Hugging Face pipelines** to perform sentiment analysis on book descriptions or reviews.  
- Assign a **sentiment score or label** to each book.  
- Explore how sentiment can enhance the **semantic search and recommendation system**.  


In [1]:
import pandas as pd

books = pd.read_csv("books_with_categories.csv")

In [None]:
# Import the pipeline function from the Hugging Face Transformers library

# Initialize a text classification pipeline
# - Task: "text-classification" (we want to classify emotions in text)
# - Model: "j-hartmann/emotion-english-distilroberta-base" (pre-trained for emotion detection)
# - top_k=None: return all emotion labels instead of just the top ones
# - device=-1: run on CPU (-1), set to 0 to use GPU if available

# Use the classifier to predict emotions in the input text
# Example input: "I love this!"

# Print the classification result


In [5]:
from transformers import pipeline
classifier = pipeline("text-classification",
                      model="j-hartmann/emotion-english-distilroberta-base",
                      top_k = None,
                      device = -1)
classifier("I love this!")

Device set to use cpu


[[{'label': 'joy', 'score': 0.9771687984466553},
  {'label': 'surprise', 'score': 0.008528684265911579},
  {'label': 'neutral', 'score': 0.005764586851000786},
  {'label': 'anger', 'score': 0.004419783595949411},
  {'label': 'sadness', 'score': 0.002092392183840275},
  {'label': 'disgust', 'score': 0.0016119900392368436},
  {'label': 'fear', 'score': 0.0004138521908316761}]]

In [None]:
# Access the first element of the "description" column in the books DataFrame
# - "books": the DataFrame containing information about books
# - "description": the column with text descriptions of each book
# - [0]: selects the first row (index 0) from that column
# Example output: the text description of the first book in the dataset


In [6]:
books["description"][0]

'A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives. John Ames is a preacher, the son of a preacher and the grandson (both maternal and paternal) of preachers. It’s 1956 in Gilead, Iowa, towards the end of the Reverend Ames’s life, and he is absorbed in recording his family’s story, a legacy for the young son he will never see grow up. Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist. He is troubled, too, by his prodigal namesake, Jack (John Ames) Boughton, his best friend’s lost son who returns to Gilead searching for forgiveness and redemption. Told in John Ames’s joyous, rambling voice that finds beauty, humour and truth in the smallest of life’s details, Gilead is a song of celebration and acceptance of the best and the worst the world ha

In [None]:
# Use the previously initialized text-classification pipeline to predict emotions
# - Input: the description of the first book from the books DataFrame
#   (books["description"][0] retrieves the text)
# - The classifier analyzes the text and returns the predicted emotions with their scores
# - Example output: a list of emotions with associated probabilities for the first book description


In [7]:
classifier(books["description"][0])

[[{'label': 'fear', 'score': 0.6548399925231934},
  {'label': 'neutral', 'score': 0.1698525995016098},
  {'label': 'sadness', 'score': 0.11640939861536026},
  {'label': 'surprise', 'score': 0.02070068009197712},
  {'label': 'disgust', 'score': 0.019100721925497055},
  {'label': 'joy', 'score': 0.015161462128162384},
  {'label': 'anger', 'score': 0.003935154061764479}]]

In [None]:
# Use the text-classification pipeline to predict emotions for multiple sentences
# - books["description"][0]: retrieves the first book's description
# - .split("."): splits the description into a list of sentences using the period as a separator
# - classifier(...): analyzes each sentence individually and returns predicted emotions with their scores
# - Example output: a list of emotion predictions for each sentence in the first book description


In [8]:
classifier(books["description"][0].split("."))

[[{'label': 'surprise', 'score': 0.729602038860321},
  {'label': 'neutral', 'score': 0.14038607478141785},
  {'label': 'fear', 'score': 0.06816229224205017},
  {'label': 'joy', 'score': 0.04794258251786232},
  {'label': 'anger', 'score': 0.009156374260783195},
  {'label': 'disgust', 'score': 0.002628477755934},
  {'label': 'sadness', 'score': 0.0021221644710749388}],
 [{'label': 'neutral', 'score': 0.44937077164649963},
  {'label': 'disgust', 'score': 0.27359139919281006},
  {'label': 'joy', 'score': 0.10908306390047073},
  {'label': 'sadness', 'score': 0.09362738579511642},
  {'label': 'anger', 'score': 0.040478240698575974},
  {'label': 'surprise', 'score': 0.02697017788887024},
  {'label': 'fear', 'score': 0.0068790484219789505}],
 [{'label': 'neutral', 'score': 0.6462157964706421},
  {'label': 'sadness', 'score': 0.24273352324962616},
  {'label': 'disgust', 'score': 0.04342266544699669},
  {'label': 'surprise', 'score': 0.028300534933805466},
  {'label': 'joy', 'score': 0.014211485

In [None]:
# Split the first book's description into individual sentences
# - Retrieve the text of the first book's description
# - Split the text at each period to create a list of sentences

# Predict emotions for each sentence
# - Use the text-classification pipeline to analyze each sentence individually
# - Obtain a list of predictions, where each element contains the emotions and their associated scores for one sentence


In [9]:
sentences = books["description"][0].split(".")
predictions = classifier(sentences)

In [10]:
sentences[0]

'A NOVEL THAT READERS and critics have been eagerly anticipating for over a decade, Gilead is an astonishingly imagined story of remarkable lives'

In [11]:
predictions[0]

[{'label': 'surprise', 'score': 0.729602038860321},
 {'label': 'neutral', 'score': 0.14038607478141785},
 {'label': 'fear', 'score': 0.06816229224205017},
 {'label': 'joy', 'score': 0.04794258251786232},
 {'label': 'anger', 'score': 0.009156374260783195},
 {'label': 'disgust', 'score': 0.002628477755934},
 {'label': 'sadness', 'score': 0.0021221644710749388}]

In [12]:
sentences[3]

' Haunted by his grandfather’s presence, John tells of the rift between his grandfather and his father: the elder, an angry visionary who fought for the abolitionist cause, and his son, an ardent pacifist'

In [13]:
predictions[3]

[{'label': 'fear', 'score': 0.9281681180000305},
 {'label': 'anger', 'score': 0.03219093754887581},
 {'label': 'neutral', 'score': 0.01280868798494339},
 {'label': 'sadness', 'score': 0.008756876923143864},
 {'label': 'surprise', 'score': 0.008597906678915024},
 {'label': 'disgust', 'score': 0.008431830443441868},
 {'label': 'joy', 'score': 0.0010455821175128222}]

In [14]:
predictions

[[{'label': 'surprise', 'score': 0.729602038860321},
  {'label': 'neutral', 'score': 0.14038607478141785},
  {'label': 'fear', 'score': 0.06816229224205017},
  {'label': 'joy', 'score': 0.04794258251786232},
  {'label': 'anger', 'score': 0.009156374260783195},
  {'label': 'disgust', 'score': 0.002628477755934},
  {'label': 'sadness', 'score': 0.0021221644710749388}],
 [{'label': 'neutral', 'score': 0.44937077164649963},
  {'label': 'disgust', 'score': 0.27359139919281006},
  {'label': 'joy', 'score': 0.10908306390047073},
  {'label': 'sadness', 'score': 0.09362738579511642},
  {'label': 'anger', 'score': 0.040478240698575974},
  {'label': 'surprise', 'score': 0.02697017788887024},
  {'label': 'fear', 'score': 0.0068790484219789505}],
 [{'label': 'neutral', 'score': 0.6462157964706421},
  {'label': 'sadness', 'score': 0.24273352324962616},
  {'label': 'disgust', 'score': 0.04342266544699669},
  {'label': 'surprise', 'score': 0.028300534933805466},
  {'label': 'joy', 'score': 0.014211485

In [15]:
sorted(predictions[0], key=lambda x: x["label"])

[{'label': 'anger', 'score': 0.009156374260783195},
 {'label': 'disgust', 'score': 0.002628477755934},
 {'label': 'fear', 'score': 0.06816229224205017},
 {'label': 'joy', 'score': 0.04794258251786232},
 {'label': 'neutral', 'score': 0.14038607478141785},
 {'label': 'sadness', 'score': 0.0021221644710749388},
 {'label': 'surprise', 'score': 0.729602038860321}]

In [None]:
# Import NumPy for numerical operations

# Define the list of emotion labels used by the classifier
# - anger, disgust, fear, joy, sadness, surprise, neutral

# Initialize an empty list to store book identifiers (e.g., ISBNs)

# Initialize a dictionary to store emotion scores for each label
# - Each key is an emotion label
# - Each value is an empty list to store scores for multiple books

# Define a function to calculate the maximum score for each emotion across all sentences of a book

# Inside the function:
# - Initialize a temporary dictionary to store scores for each emotion
# - Iterate over the predictions for each sentence
#   - Sort the prediction list by label to match the order in emotion_labels
#   - For each emotion, append the corresponding score from the sorted predictions
# - Return a dictionary containing the maximum score for each emotion


In [16]:
import numpy as np

emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
isbn = []
emotion_scores = {label: [] for label in emotion_labels}

def calculate_max_emotion_scores(predictions):
    per_emotion_scores = {label: [] for label in emotion_labels}
    for prediction in predictions:
        sorted_predictions = sorted(prediction, key=lambda x: x["label"])
        for index, label in enumerate(emotion_labels):
            per_emotion_scores[label].append(sorted_predictions[index]["score"])
    return {label: np.max(scores) for label, scores in per_emotion_scores.items()}

In [None]:
# Loop over the first 10 books in the dataset

# For each book:
# - Append the book's ISBN-13 to the isbn list
# - Split the book's description into individual sentences using periods as separators
# - Use the text-classification pipeline to predict emotions for each sentence
# - Calculate the maximum emotion score for each label across all sentences

# For each emotion label:
# - Append the maximum score to the corresponding list in the emotion_scores dictionary


In [17]:
for i in range(10):
    isbn.append(books["isbn13"][i])
    sentences = books["description"][i].split(".")
    predictions = classifier(sentences)
    max_scores = calculate_max_emotion_scores(predictions)
    for label in emotion_labels:
        emotion_scores[label].append(max_scores[label])

In [18]:
emotion_scores

{'anger': [np.float64(0.06413355469703674),
  np.float64(0.6126194000244141),
  np.float64(0.06413355469703674),
  np.float64(0.35148441791534424),
  np.float64(0.08141238987445831),
  np.float64(0.23222465813159943),
  np.float64(0.5381842255592346),
  np.float64(0.06413355469703674),
  np.float64(0.3006700873374939),
  np.float64(0.06413355469703674)],
 'disgust': [np.float64(0.27359139919281006),
  np.float64(0.3482847809791565),
  np.float64(0.10400658845901489),
  np.float64(0.1507224589586258),
  np.float64(0.18449527025222778),
  np.float64(0.7271749377250671),
  np.float64(0.155854731798172),
  np.float64(0.10400658845901489),
  np.float64(0.279481440782547),
  np.float64(0.17792704701423645)],
 'fear': [np.float64(0.9281681180000305),
  np.float64(0.9425276517868042),
  np.float64(0.9723208546638489),
  np.float64(0.36070606112480164),
  np.float64(0.09504339843988419),
  np.float64(0.05136274918913841),
  np.float64(0.7474274635314941),
  np.float64(0.40449756383895874),
  np

In [None]:
# Import tqdm to display a progress bar during loops

# Define the list of emotion labels
# - anger, disgust, fear, joy, sadness, surprise, neutral

# Initialize an empty list to store book identifiers (ISBN-13)

# Initialize a dictionary to store emotion scores for each label
# - Each key is an emotion label
# - Each value is an empty list to store scores for multiple books

# Loop over all books in the dataset with a progress bar using tqdm

# For each book:
# - Append the book's ISBN-13 to the isbn list
# - Split the book's description into individual sentences
# - Use the text-classification pipeline to predict emotions for each sentence
# - Calculate the maximum emotion score for each label across all sentences

# For each emotion label:
# - Append the maximum score to the corresponding list in the emotion_scores dictionary


In [19]:
from tqdm import tqdm

emotion_labels = ["anger", "disgust", "fear", "joy", "sadness", "surprise", "neutral"]
isbn = []
emotion_scores = {label: [] for label in emotion_labels}

for i in tqdm(range(len(books))):
    isbn.append(books["isbn13"][i])
    sentences = books["description"][i].split(".")
    predictions = classifier(sentences)
    max_scores = calculate_max_emotion_scores(predictions)
    for label in emotion_labels:
        emotion_scores[label].append(max_scores[label])

100%|██████████| 5197/5197 [36:56<00:00,  2.34it/s]  


In [20]:
emotions_df = pd.DataFrame(emotion_scores)
emotions_df["isbn13"] = isbn

In [21]:
emotions_df

Unnamed: 0,anger,disgust,fear,joy,sadness,surprise,neutral,isbn13
0,0.064134,0.273591,0.928168,0.932798,0.646216,0.967158,0.729602,9780002005883
1,0.612619,0.348285,0.942528,0.704422,0.887940,0.111690,0.252546,9780002261982
2,0.064134,0.104007,0.972321,0.767238,0.549477,0.111690,0.078765,9780006178736
3,0.351484,0.150722,0.360706,0.251881,0.732685,0.111690,0.078765,9780006280897
4,0.081412,0.184495,0.095043,0.040564,0.884390,0.475880,0.078765,9780006280934
...,...,...,...,...,...,...,...,...
5192,0.148208,0.030643,0.919165,0.255172,0.853721,0.980877,0.030656,9788172235222
5193,0.064134,0.114383,0.051363,0.400263,0.883198,0.111690,0.227765,9788173031014
5194,0.009997,0.009929,0.339218,0.947779,0.375754,0.066685,0.057625,9788179921623
5195,0.064134,0.104007,0.459268,0.759456,0.951104,0.368111,0.078765,9788185300535


In [22]:
books = pd.merge(books, emotions_df, on = "isbn13")

In [23]:
books

Unnamed: 0,isbn13,isbn10,title,authors,categories,thumbnail,description,published_year,average_rating,num_pages,...,title_and_subtitle,tagged_description,simple_categories,anger,disgust,fear,joy,sadness,surprise,neutral
0,9780002005883,0002005883,Gilead,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,...,Gilead,9780002005883 A NOVEL THAT READERS and critics...,Fiction,0.064134,0.273591,0.928168,0.932798,0.646216,0.967158,0.729602
1,9780002261982,0002261987,Spider's Web,Charles Osborne;Agatha Christie,Detective and mystery stories,http://books.google.com/books/content?id=gA5GP...,A new 'Christie for Christmas' -- a full-lengt...,2000.0,3.83,241.0,...,Spider's Web: A Novel,9780002261982 A new 'Christie for Christmas' -...,Fiction,0.612619,0.348285,0.942528,0.704422,0.887940,0.111690,0.252546
2,9780006178736,0006178731,Rage of angels,Sidney Sheldon,Fiction,http://books.google.com/books/content?id=FKo2T...,"A memorable, mesmerizing heroine Jennifer -- b...",1993.0,3.93,512.0,...,Rage of angels,"9780006178736 A memorable, mesmerizing heroine...",Fiction,0.064134,0.104007,0.972321,0.767238,0.549477,0.111690,0.078765
3,9780006280897,0006280897,The Four Loves,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=XhQ5X...,Lewis' work on the nature of love divides love...,2002.0,4.15,170.0,...,The Four Loves,9780006280897 Lewis' work on the nature of lov...,Nonfiction,0.351484,0.150722,0.360706,0.251881,0.732685,0.111690,0.078765
4,9780006280934,0006280935,The Problem of Pain,Clive Staples Lewis,Christian life,http://books.google.com/books/content?id=Kk-uV...,"""In The Problem of Pain, C.S. Lewis, one of th...",2002.0,4.09,176.0,...,The Problem of Pain,"9780006280934 ""In The Problem of Pain, C.S. Le...",Nonfiction,0.081412,0.184495,0.095043,0.040564,0.884390,0.475880,0.078765
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5192,9788172235222,8172235224,Mistaken Identity,Nayantara Sahgal,Indic fiction (English),http://books.google.com/books/content?id=q-tKP...,On A Train Journey Home To North India After L...,2003.0,2.93,324.0,...,Mistaken Identity,9788172235222 On A Train Journey Home To North...,Fiction,0.148208,0.030643,0.919165,0.255172,0.853721,0.980877,0.030656
5193,9788173031014,8173031010,Journey to the East,Hermann Hesse,Adventure stories,http://books.google.com/books/content?id=rq6JP...,This book tells the tale of a man who goes on ...,2002.0,3.70,175.0,...,Journey to the East,9788173031014 This book tells the tale of a ma...,Nonfiction,0.064134,0.114383,0.051363,0.400263,0.883198,0.111690,0.227765
5194,9788179921623,817992162X,The Monk Who Sold His Ferrari: A Fable About F...,Robin Sharma,Health & Fitness,http://books.google.com/books/content?id=c_7mf...,"Wisdom to Create a Life of Passion, Purpose, a...",2003.0,3.82,198.0,...,The Monk Who Sold His Ferrari: A Fable About F...,9788179921623 Wisdom to Create a Life of Passi...,Fiction,0.009997,0.009929,0.339218,0.947779,0.375754,0.066685,0.057625
5195,9788185300535,8185300534,I Am that,Sri Nisargadatta Maharaj;Sudhakar S. Dikshit,Philosophy,http://books.google.com/books/content?id=Fv_JP...,This collection of the timeless teachings of o...,1999.0,4.51,531.0,...,I Am that: Talks with Sri Nisargadatta Maharaj,9788185300535 This collection of the timeless ...,Nonfiction,0.064134,0.104007,0.459268,0.759456,0.951104,0.368111,0.078765


In [24]:
books.to_csv("books_with_emotions.csv", index = False)