# <h1 align="center"><font color="red">spaCy Sentiment Analysis</font></h1>

<font color="pink">Senior Data Scientist.: Dr. Eddy Giusepe Chirinos Isidro</font>

Links de estudo:

* [spaCy](https://github.com/AI-Republic-PH/AIR_AI_Engineering_Course_2024/blob/main/Day1/Activity1_SpacySentimentAnalysis.ipynb)

# <font color="gree">Setup</font>

In [None]:
%pip install spacy spacytextblob # https://spacy.io/universe/project/spacy-textblob
!python -m spacy download en_core_web_sm

In [1]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
import pandas as pd
import string
from tqdm import tqdm

# Apply tqdm to pandas for progress tracking
tqdm.pandas()

In [2]:
# Load the SpaCy language model and add the textblob component for sentiment analysis
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")

<spacytextblob.spacytextblob.SpacyTextBlob at 0x74977dff4a00>

In [3]:
# Optimized function for preprocessing the text using SpaCy
def preprocess_text(text):
    # Process the text using the SpaCy pipeline (no need to manually lowercase, SpaCy handles this)
    doc = nlp(text)

    # Use list comprehension para filtrar e lematizar tokens:
    tokens = [token.lemma_ for token in doc if not token.is_stop and token.is_alpha]

    # Junte os tokens novamente em uma única string:
    return " ".join(tokens)

In [4]:
# Function to classify the sentiment score as Positive, Negative, or Neutral
def classify_sentiment(score):
    if score > 0:
        return "positive"
    elif score < 0:
        return "negative"
    else:
        return "neutral"

In [5]:
# Load the IMDB dataset from GitHub
url = "https://github.com/angelaaaateng/AIR_AI_Engineering_Course_2024/raw/refs/heads/main/Datasets/IMDB_Dataset.csv"
df = pd.read_csv(url)

In [6]:
# Randomly sample 1000 entries from the dataset
df_sampled = df.sample(n=1000, random_state=42)
# takes a long time to process

In [7]:
df_sampled.head(5).style

Unnamed: 0,review,sentiment
33553,"I really liked this Summerslam due to the look of the arena, the curtains and just the look overall was interesting to me for some reason. Anyways, this could have been one of the best Summerslam's ever if the WWF didn't have Lex Luger in the main event against Yokozuna, now for it's time it was ok to have a huge fat man vs a strong man but I'm glad times have changed. It was a terrible main event just like every match Luger is in is terrible. Other matches on the card were Razor Ramon vs Ted Dibiase, Steiner Brothers vs Heavenly Bodies, Shawn Michaels vs Curt Hening, this was the event where Shawn named his big monster of a body guard Diesel, IRS vs 1-2-3 Kid, Bret Hart first takes on Doink then takes on Jerry Lawler and stuff with the Harts and Lawler was always very interesting, then Ludvig Borga destroyed Marty Jannetty, Undertaker took on Giant Gonzalez in another terrible match, The Smoking Gunns and Tatanka took on Bam Bam Bigelow and the Headshrinkers, and Yokozuna defended the world title against Lex Luger this match was boring and it has a terrible ending. However it deserves 8/10",positive
9427,"Not many television shows appeal to quite as many different kinds of fans like Farscape does...I know youngsters and 30/40+ years old;fans both Male and Female in as many different countries as you can think of that just adore this T.V miniseries. It has elements that can be found in almost every other show on T.V, character driven drama that could be from an Australian soap opera; yet in the same episode it has science fact & fiction that would give even the hardiest ""Trekkie"" a run for his money in the brainbender stakes! Wormhole theory, Time Travel in true equational form...Magnificent. It embraces cultures from all over the map as the possibilities are endless having multiple stars and therefore thousands of planets to choose from. With such a broad scope; it would be expected that nothing would be able to keep up the illusion for long, but here is where ""Farscape"" really comes into it's own element...It succeeds where all others have failed, especially the likes of Star Trek (a universe with practically zero Kaos element!) They ran out of ideas pretty quickly + kept rehashing them! Over the course of 4 seasons they manage to keep the audience's attention using good continuity and constant character evolution with multiple threads to every episode with unique personal touches to camera that are specific to certain character groups within the whole. This structure allows for an extremely large area of subject matter as loyalties are forged and broken in many ways on many many issues. I happened to see the pilot (Premiere) in passing and just had to keep tuning in after that to see if Crichton would ever ""Get the girl"", after seeing them all on television I was delighted to see them available on DVD & I have to admit that it was the only thing that kept me sane whilst I had to do a 12 hour night shift and developed chronic insomnia...Farscape was the only thing to get me through those extremely long nights... Do yourself a favour; Watch the pilot and see what I mean... Farscape Comet",positive
199,"The film quickly gets to a major chase scene with ever increasing destruction. The first really bad thing is the guy hijacking Steven Seagal would have been beaten to pulp by Seagal's driving, but that probably would have ended the whole premise for the movie. It seems like they decided to make all kinds of changes in the movie plot, so just plan to enjoy the action, and do not expect a coherent plot. Turn any sense of logic you may have, it will reduce your chance of getting a headache. I does give me some hope that Steven Seagal is trying to move back towards the type of characters he portrayed in his more popular movies.",negative
12447,"Jane Austen would definitely approve of this one! Gwyneth Paltrow does an awesome job capturing the attitude of Emma. She is funny without being excessively silly, yet elegant. She puts on a very convincing British accent (not being British myself, maybe I'm not the best judge, but she fooled me...she was also excellent in ""Sliding Doors""...I sometimes forget she's American ~!). Also brilliant are Jeremy Northam and Sophie Thompson and Phyllida Law (Emma Thompson's sister and mother) as the Bates women. They nearly steal the show...and Ms. Law doesn't even have any lines! Highly recommended.",positive
39489,"Expectations were somewhat high for me when I went to see this movie, after all I thought Steve Carell could do no wrong coming off of great movies like Anchorman, The 40 Year-Old Virgin, and Little Miss Sunshine. Boy, was I wrong. I'll start with what is right with this movie: at certain points Steve Carell is allowed to be Steve Carell. There are a handful of moments in the film that made me laugh, and it's due almost entirely to him being given the wiggle-room to do his thing. He's an undoubtedly talented individual, and it's a shame that he signed on to what turned out to be, in my opinion, a total train-wreck. With that out of the way, I'll discuss what went horrifyingly wrong. The film begins with Dan Burns, a widower with three girls who is being considered for a nationally syndicated advice column. He prepares his girls for a family reunion, where his extended relatives gather for some time with each other. The family is high atop the list of things that make this an awful movie. No family behaves like this. It's almost as if they've been transported from Pleasantville or Leave it to Beaver. They are a caricature of what we think a family is when we're 7. It reaches the point where they become obnoxious and simply frustrating. Touch football, crossword puzzle competitions, family bowling, and talent shows ARE NOT HOW ACTUAL PEOPLE BEHAVE. It's almost sickening. Another big flaw is the woman Carell is supposed to be falling for. Observing her in her first scene with Steve Carell is like watching a stroke victim trying to be rehabilitated. What I imagine is supposed to be unique and original in this woman comes off as mildly retarded. It makes me think that this movie is taking place on another planet. I left the theater wondering what I just saw. After thinking further, I don't think it was much.",negative


In [8]:
print(df.size)
print(df_sampled.size)

100000
2000


In [9]:
print(df.shape)
print(df_sampled.shape)



(50000, 2)
(1000, 2)


Uma pontuação de sentimento maior que 0 será rotulada como `"Positiva"`.

Uma pontuação de sentimento menor que 0 será rotulada como `"Negativa"`.

Uma pontuação de sentimento de exatamente 0 será rotulada como `"Neutra"`.


In [10]:
# Preprocess the reviews
# df['cleaned_review'] = df['review'].apply(preprocess_text)
# Preprocess the reviews with a progress bar
df_sampled['cleaned_review'] = df_sampled['review'].progress_apply(preprocess_text) # O progress_apply é basicamente uma versão do método apply do pandas que mostra uma barra de progresso durante a execução.
df_sampled.head()


100%|██████████| 1000/1000 [00:20<00:00, 48.99it/s]


Unnamed: 0,review,sentiment,cleaned_review
33553,I really liked this Summerslam due to the look...,positive,like Summerslam look arena curtain look overal...
9427,Not many television shows appeal to quite as m...,positive,television show appeal different kind fan like...
199,The film quickly gets to a major chase scene w...,negative,film quickly get major chase scene increase de...
12447,Jane Austen would definitely approve of this o...,positive,Jane Austen definitely approve Paltrow awesome...
39489,Expectations were somewhat high for me when I ...,negative,expectation somewhat high go movie think Steve...


In [11]:
df_sampled.head(2).style

Unnamed: 0,review,sentiment,cleaned_review
33553,"I really liked this Summerslam due to the look of the arena, the curtains and just the look overall was interesting to me for some reason. Anyways, this could have been one of the best Summerslam's ever if the WWF didn't have Lex Luger in the main event against Yokozuna, now for it's time it was ok to have a huge fat man vs a strong man but I'm glad times have changed. It was a terrible main event just like every match Luger is in is terrible. Other matches on the card were Razor Ramon vs Ted Dibiase, Steiner Brothers vs Heavenly Bodies, Shawn Michaels vs Curt Hening, this was the event where Shawn named his big monster of a body guard Diesel, IRS vs 1-2-3 Kid, Bret Hart first takes on Doink then takes on Jerry Lawler and stuff with the Harts and Lawler was always very interesting, then Ludvig Borga destroyed Marty Jannetty, Undertaker took on Giant Gonzalez in another terrible match, The Smoking Gunns and Tatanka took on Bam Bam Bigelow and the Headshrinkers, and Yokozuna defended the world title against Lex Luger this match was boring and it has a terrible ending. However it deserves 8/10",positive,like Summerslam look arena curtain look overall interesting reason anyways good Summerslam WWF Lex Luger main event Yokozuna time ok huge fat man vs strong man glad time change terrible main event like match Luger terrible match card Razor Ramon vs Ted Dibiase Steiner Brothers vs Heavenly Bodies Shawn Michaels vs Curt Hening event Shawn name big monster body guard Diesel IRS vs kid Bret Hart take Doink take Jerry Lawler stuff Harts Lawler interesting Ludvig Borga destroy Marty Jannetty Undertaker take Giant Gonzalez terrible match Smoking Gunns Tatanka take Bam Bam Bigelow Headshrinkers Yokozuna defend world title Lex Luger match boring terrible ending deserve
9427,"Not many television shows appeal to quite as many different kinds of fans like Farscape does...I know youngsters and 30/40+ years old;fans both Male and Female in as many different countries as you can think of that just adore this T.V miniseries. It has elements that can be found in almost every other show on T.V, character driven drama that could be from an Australian soap opera; yet in the same episode it has science fact & fiction that would give even the hardiest ""Trekkie"" a run for his money in the brainbender stakes! Wormhole theory, Time Travel in true equational form...Magnificent. It embraces cultures from all over the map as the possibilities are endless having multiple stars and therefore thousands of planets to choose from. With such a broad scope; it would be expected that nothing would be able to keep up the illusion for long, but here is where ""Farscape"" really comes into it's own element...It succeeds where all others have failed, especially the likes of Star Trek (a universe with practically zero Kaos element!) They ran out of ideas pretty quickly + kept rehashing them! Over the course of 4 seasons they manage to keep the audience's attention using good continuity and constant character evolution with multiple threads to every episode with unique personal touches to camera that are specific to certain character groups within the whole. This structure allows for an extremely large area of subject matter as loyalties are forged and broken in many ways on many many issues. I happened to see the pilot (Premiere) in passing and just had to keep tuning in after that to see if Crichton would ever ""Get the girl"", after seeing them all on television I was delighted to see them available on DVD & I have to admit that it was the only thing that kept me sane whilst I had to do a 12 hour night shift and developed chronic insomnia...Farscape was the only thing to get me through those extremely long nights... Do yourself a favour; Watch the pilot and see what I mean... Farscape Comet",positive,television show appeal different kind fan like Farscape know youngster year Male Female different country think adore miniserie element find character drive drama australian soap opera episode science fact fiction hardy Trekkie run money brainbender stake wormhole theory Time Travel true equational form magnificent embrace culture map possibility endless have multiple star thousand planet choose broad scope expect able illusion long Farscape come element succeed fail especially like Star Trek universe practically zero Kaos element run idea pretty quickly keep rehash course season manage audience attention good continuity constant character evolution multiple thread episode unique personal touch camera specific certain character group structure allow extremely large area subject matter loyalty forge break way issue happen pilot Premiere pass tune Crichton girl see television delighted available DVD admit thing keep sane whilst hour night shift develop chronic insomnia farscape thing extremely long night favour watch pilot mean Comet


In [15]:
# Perform sentiment analysis on the preprocessed reviews
df_sampled['sentiment_score'] = df_sampled['cleaned_review'].progress_apply(lambda review: nlp(review)._.blob.polarity)
df_sampled.head()

100%|██████████| 1000/1000 [00:09<00:00, 108.64it/s]


Unnamed: 0,review,sentiment,cleaned_review,sentiment_score
33553,I really liked this Summerslam due to the look...,positive,like Summerslam look arena curtain look overal...,-0.074074
9427,Not many television shows appeal to quite as m...,positive,television show appeal different kind fan like...,0.143395
199,The film quickly gets to a major chase scene w...,negative,film quickly get major chase scene increase de...,0.236979
12447,Jane Austen would definitely approve of this o...,positive,Jane Austen definitely approve Paltrow awesome...,0.342308
39489,Expectations were somewhat high for me when I ...,negative,expectation somewhat high go movie think Steve...,0.03221


In [16]:
# Classify the sentiment based on the score
df_sampled['sentiment_label'] = df_sampled['sentiment_score'].progress_apply(classify_sentiment)
df_sampled.head()

100%|██████████| 1000/1000 [00:00<00:00, 1358259.07it/s]


Unnamed: 0,review,sentiment,cleaned_review,sentiment_score,sentiment_label
33553,I really liked this Summerslam due to the look...,positive,like Summerslam look arena curtain look overal...,-0.074074,negative
9427,Not many television shows appeal to quite as m...,positive,television show appeal different kind fan like...,0.143395,positive
199,The film quickly gets to a major chase scene w...,negative,film quickly get major chase scene increase de...,0.236979,positive
12447,Jane Austen would definitely approve of this o...,positive,Jane Austen definitely approve Paltrow awesome...,0.342308,positive
39489,Expectations were somewhat high for me when I ...,negative,expectation somewhat high go movie think Steve...,0.03221,positive


In [17]:
# Display the results
df_sampled[['review', 'cleaned_review', 'sentiment_score', 'sentiment_label']].head()

Unnamed: 0,review,cleaned_review,sentiment_score,sentiment_label
33553,I really liked this Summerslam due to the look...,like Summerslam look arena curtain look overal...,-0.074074,negative
9427,Not many television shows appeal to quite as m...,television show appeal different kind fan like...,0.143395,positive
199,The film quickly gets to a major chase scene w...,film quickly get major chase scene increase de...,0.236979,positive
12447,Jane Austen would definitely approve of this o...,Jane Austen definitely approve Paltrow awesome...,0.342308,positive
39489,Expectations were somewhat high for me when I ...,expectation somewhat high go movie think Steve...,0.03221,positive
