<a href="https://colab.research.google.com/github/EbtisamElgerghani/coding_tasks/blob/main/sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#!python -m textblob.download_corpora

In [None]:
# Import necessary libraries
import pandas as pd
import spacy
import en_core_web_sm
import en_core_web_md
from spacytextblob.spacytextblob import SpacyTextBlob

In [None]:
# Load the spaCy English model for sentiment and similarity analysis
sm_nlp = spacy.load('en_core_web_sm')
md_nlp = spacy.load('en_core_web_md')

Load the spacy english model in **sm_nlp** and  **md_nlp**
- **en_core_web_sm** a small English pipeline trained on written web text
- **en_core_web_md** a medium-sized English model trained on written web text

In [None]:
# Version...
spacy.__version__

'3.7.4'

In [None]:
# To get the components names!
#sm_nlp.components
#sm_nlp.pipeline
#sm_nlp.component_names
sm_nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

The components names of spacy pipeline are ['tok2vec', 'tagger', 'parser', 'senter', 'attribute_ruler', 'lemmatizer', 'ner']

#### Using spacy textblob: Sentiment Analysis.

In [None]:
# Add SpacyTextBlob as a pipeline component for sentiment analysis features.
sm_nlp.add_pipe('spacytextblob', last=True)

<spacytextblob.spacytextblob.SpacyTextBlob at 0x21903227f50>

After adding SpacyTextBlob as a pipeline component for sentiment analysis features apply the code for more visualization:

**sm_nlp.pipe_names**
- The components names of spacy pipeline are ['tok2vec', 'tagger', 'parser', 'senter', 'attribute_ruler', 'lemmatizer', 'ner', 'spacytextblob']

In [None]:
# Load the dataset into a pandas DataFrame
dataframe = pd.read_csv('amazon_product_reviews.csv', low_memory=False)

In [None]:
# Display information about the dataset and the 'reviews.text' column
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34660 entries, 0 to 34659
Data columns (total 21 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    34660 non-null  object 
 1   name                  27900 non-null  object 
 2   asins                 34658 non-null  object 
 3   brand                 34660 non-null  object 
 4   categories            34660 non-null  object 
 5   keys                  34660 non-null  object 
 6   manufacturer          34660 non-null  object 
 7   reviews.date          34621 non-null  object 
 8   reviews.dateAdded     24039 non-null  object 
 9   reviews.dateSeen      34660 non-null  object 
 10  reviews.didPurchase   1 non-null      object 
 11  reviews.doRecommend   34066 non-null  object 
 12  reviews.id            1 non-null      float64
 13  reviews.numHelpful    34131 non-null  float64
 14  reviews.rating        34627 non-null  float64
 15  reviews.sourceURLs 

We will work on our target variable, the 17th column '16  reviews.text  34659 non-null  object'.

In [None]:
reviews_data = dataframe['reviews.text']
print(reviews_data.head())
print(reviews_data.shape)

0    This product so far has not disappointed. My c...
1    great for beginner or experienced person. Boug...
2    Inexpensive tablet for him to use and learn on...
3    I've had my Fire HD 8 two weeks now and I love...
4    I bought this for my grand daughter when she c...
Name: reviews.text, dtype: object
(34660,)


In [None]:
# Explore the data.
reviews_data.isnull().sum()

1

In [None]:
# Drop rows where reviews are missing and print the new total
# we can see here onley one review is missing.
clean_data = dataframe.dropna(subset=['reviews.text'])
reviews_data = clean_data['reviews.text']
print(reviews_data.shape)

(34659,)


#### To implement more visualization I will add 4 extra cells for testing.
To make a sample test before I start my code, I will use one review text from the dataset, review.text[0].

In [None]:
mytext = reviews_data[0]
docs = sm_nlp(mytext)
# check sentiment polarity.
docs._.polarity

0.325

In [None]:
docs._.subjectivity

0.7833333333333333

In [None]:
docs._.assessments

[(['far'], 0.1, 1.0, None),
 (['not', 'disappointed'], 0.375, 0.75, None),
 (['love'], 0.5, 0.6, None)]

In [None]:
for token in docs:
    print(token.text,token.pos_,token.tag_)

This DET DT
product NOUN NN
so ADV RB
far ADV RB
has AUX VBZ
not PART RB
disappointed VERB VBN
. PUNCT .
My PRON PRP$
children NOUN NNS
love VERB VBP
to PART TO
use VERB VB
it PRON PRP
and CCONJ CC
I PRON PRP
like VERB VBP
the DET DT
ability NOUN NN
to PART TO
monitor VERB VB
control NOUN NN
what PRON WP
content NOUN NN
they PRON PRP
see VERB VBP
with ADP IN
ease NOUN NN
. PUNCT .


### Resoning:
- The above code was implemented to display the POS and TAG of each word in one review for better understand of the polarity and sentiment results.
- As we can see from the assessment function's results (['not', 'disappointed'], 0.375, 0.75, None), the words 'not' and 'disappointed' togather results in low positive feeling and positive subjectivity, which show **semantic ambiguity**.
- After applying the pipeline, this change the total polarity results from negative -0.05 to low positive 0.375.
- The results from the folowing sentence in the same review was (['love'], 0.5, 0.6, None) which shows positive polarity and positive subjectivity.

Now! we are going to create functions to reduce the code which will be repeated twice each time we compare 2 product reviews.

Preprosessing: We still need to remove stop words and punctuations.

In [None]:
# This function inputs the review as a text to tokanize it and remove stop words, punctuations and change the uppercase to lowercase.
def prep_text(text):
    token_text = sm_nlp(text)
    cleaned_stopwords = [str(token.text).lower().strip() for token in token_text if not token.is_stop]
    return ' '.join(cleaned_stopwords)
    # Join back the selected text and Returns it ready.

Join back the selected text after spliting it to clean then Returns it clean.

In [None]:
# This function to Perform polarity and sentiment analysis.
# Inputs the claned product review
def Sentiment_A(product_review):
    doc = sm_nlp(product_review) #
    polarity = doc._.blob.polarity
    sentiment = doc._.blob.sentiment
    return polarity, sentiment
# Returns polarity and subjectivity.

Returns polarity and subjectivity

In [None]:
# This is function to perform similarity comparison.
# Inputs 2 reviews to be compared by implemnting the md model 'medium-sized English model trained'
def compare(review1, review2):
    doc1 = md_nlp(review1)
    doc2 = md_nlp(review2)
    similarity_S = doc1.similarity(doc2)
    return similarity_S
    # returns the similarity score.

Returns the similarity score

In [None]:
# Comparing the similarity of 2 reviews.
review1 = dataframe['reviews.text'][0]
review2 = dataframe['reviews.text'][1]
similarity_value = compare(review1, review2)
print(f"Review 1: {review1}")
print(f"Review 2: {review2}")
print(f"\nSimilarity Score of Two Reviews: {similarity_value}\n")

Review 1: This product so far has not disappointed. My children love to use it and I like the ability to monitor control what content they see with ease.
Review 2: great for beginner or experienced person. Bought as a gift and she loves it

Similarity Score of Two Reviews: 0.8094779286807917



We can say the 2 reviews has strong positive similarity score becase 0.8 closer to 1.

In [None]:
# Testing sentiment analysis on a sample of product reviews
sample_reviews = [dataframe['reviews.text'][34656], dataframe['reviews.text'][34658]]
for review in sample_reviews: # for 2 reviews apply:
    polar, senti = Sentiment_A(review)
    review_cleaned = prep_text(review) # Calling the function prep_text to clean.
    sentiment_result = Sentiment_A(review_cleaned) # Calling the function Sentiment_A again with clean text.
    if polar == 0:
        print('We have neutral feelings')
        print(f"Review: {review}")
        print(f"Sample Review Sentiment: {sentiment_result}")
        print(f"Polarity: {polar}, Sentiment: {senti}")
        print("\n")
    elif polar > 0.0:
        print('We have positive feelings')
        print(f"Review: {review}")
        print(f"Sample Review Sentiment: {sentiment_result}")
        print(f"Polarity: {polar}, Sentiment: {senti}")
        print("\n")
    else:
        print('We have negative feelings')
        print(f"Review: {review}")
        print(f"Sample Review Sentiment: {sentiment_result}")
        print(f"Polarity: {polar}, Sentiment: {senti}")
        print("\n")



We have neutral feelings
Review: Amazon should include this charger with the Kindle. The fact that they're charging us extra for something that should be included is a sign of cheapness. Plus, you can use any micro-USB phone charger instead of this to charge your Kindle. Save your money.
Sample Review Sentiment: (0.0, Sentiment(polarity=0.0, subjectivity=0.1))
Polarity: 0.0, Sentiment: Sentiment(polarity=0.0, subjectivity=0.1)


We have negative feelings
Review: I was surprised to find it did not come with any type of charging cords so I had to purchase one and then found my Sprint HTC 3D charger is faster. I would not purchase again- 1st item I've ever not liked I've purchased from Amazon
Sample Review Sentiment: (0.35, Sentiment(polarity=0.35, subjectivity=0.8500000000000001))
Polarity: -0.09999999999999999, Sentiment: Sentiment(polarity=-0.09999999999999999, subjectivity=0.8500000000000001)




#### Here is the results of polarity and subjectivity before and after cleaning.
The model interpert sucsessfully many reviews in the dataset, such as review.text[34656] and review.text[34658] has a neutral an negative feelings.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Apply text preprocessing and sentiment analysis to a subset of clean_data
clean_data.loc[:, 'reviews_cleaned'] = clean_data['reviews.text'].tail(4).apply(prep_text) # call function to clean all
clean_data.loc[:, 'sentiment'] = clean_data['reviews_cleaned'].tail(4).apply(Sentiment_A)


print(clean_data[['reviews_cleaned', 'sentiment']].tail(4))


                                         reviews_cleaned  \
34656  amazon include charger kindle . fact charging ...   
34657  love kindle fire disappointed kindle power fas...   
34658  surprised find come type charging cords purcha...   
34659  spite fact good things amazon anthing gotten ....   

                                               sentiment  
34656                                  (0.0, (0.0, 0.1))  
34657  (0.10952380952380951, (0.10952380952380951, 0....  
34658                 (0.35, (0.35, 0.8500000000000001))  
34659  (0.47714285714285715, (0.47714285714285715, 0....  
