#**SENTIMENT ANALYSIS / CLASSWORK-4**

**SENTIMENT ANALYSIS**

Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotion expressed in a piece of text. The goal is to identify whether the expressed sentiment is positive, negative, or neutral.

In [8]:
import pandas as pd


In [9]:
#This dataset contains information about arts and crafts and reviewrs from different websites
df = pd.read_csv( 'Arts_Crafts_and_sewing_5.csv')
df.sample(10)

Unnamed: 0,overall,verified,reviewerID,asin,reviewText,summary
373889,5,True,A2JHEJA03Q94NB,B00WHEG588,These last when sewing about as long as the $5...,If I get 5 or 10 uses I am happy.
300531,4,True,A1Z7DE0VL2HYJ3,B00DNUGVZW,Love that it keeps the thinlet dies in place w...,Four Stars
222952,5,True,A34CVHUIHATZGQ,B005025KQC,These are gorgeous. Perfect for gift,Great gift idea
101376,5,True,AJ91AOCHHUOPO,B000YQMPK8,Liked this large size tub for large jobs.,Five Stars
386038,3,True,A3VRL3BLPI9J7H,B0144ARY5K,The thickness of each piece is inconsistent an...,Might be good for small craftings and
146410,5,False,ARLLD7FYWR6T8,B001EL68F0,Excellant,Five Stars
331892,5,False,ALE0L3ZNTK0YQ,B00JAPPSO4,love them,Five Stars
366810,3,True,A2JFYKXHSSIT4C,B00TNMOAZW,The 3 star rating is due to the color of the r...,Gold is more of a yellow...
71420,5,True,A1G5P9I2G0028X,B0013NVA7K,"excellent, thank you!!!",Five Stars
483536,5,True,AIYD5NRBZSDP4,B00UY0YNQ8,Made excellent scrubbers,Five Stars


**IMPORT LIBRARIES**

In [10]:
from sklearn import preprocessing
import nltk

**DOWNLOADING OPINION LEXICON**

In [11]:
nltk.download('opinion_lexicon')

[nltk_data] Downloading package opinion_lexicon to /root/nltk_data...
[nltk_data]   Unzipping corpora/opinion_lexicon.zip.


True

**IMPORTING COMPONENTS FROM NLTK**

In [12]:
from nltk.corpus import opinion_lexicon
from nltk.tokenize import word_tokenize

**PRINTING INFORMATION**

In [13]:
# prints total number of words in the opinion lexicon
print('Total number of words in opinion lexicon', len(opinion_lexicon.words()))
print('Examples of positive words in opinion lexicon',
      opinion_lexicon.positive()[:10])     #display examples of positive and negative words from the opinion lexicon
print('Examples of negative words in opinion lexicon',
      opinion_lexicon.negative()[:10])

Total number of words in opinion lexicon 6789
Examples of positive words in opinion lexicon ['a+', 'abound', 'abounds', 'abundance', 'abundant', 'accessable', 'accessible', 'acclaim', 'acclaimed', 'acclamation']
Examples of negative words in opinion lexicon ['2-faced', '2-faces', 'abnormal', 'abolish', 'abominable', 'abominably', 'abominate', 'abomination', 'abort', 'aborted']


**DOWNLOADING 'PUNTK' TOKENIZER'**

In [14]:
#'punkt' tokenizer is a pre-trained unsupervised machine learning model for tokenizing text into words
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

**CREATING SENTIMENT SCORING DICTIONARY FOR REVIEW TEXT**

In [15]:
# Let's create a dictionary which we can use for scoring our review text
df.rename(columns={"reviewText": "text"}, inplace=True)

**INITIALIZING SCORES**

In [16]:
#set the positive and negative scores to be assigned to words in the word_dict dictionary.
pos_score = 1
neg_score = -1

**CREATING WORD DICTIONARY**

In [17]:
#Adding the positive words to the dictionary
word_dict = {}
for word in opinion_lexicon.positive():
        word_dict[word] = pos_score       #iterates through the positive words in the Opinion Lexicon and assigns each word a positive score in the dictionary

# Adding the negative words to the dictionary
for word in opinion_lexicon.negative():
        word_dict[word] = neg_score       #iterates throug -ve words and assigns each a -ve

**BING_LIU_SCORE**

The bing_liu_score function is a sentiment scoring function that assigns a sentiment score to a piece of text based on a predefined dictionary (word_dict).

In [18]:
# Sentiment analysis function using a simple bag-of-words approach
def bing_liu_score(text):
    sentiment_score = 0
    bag_of_words = word_tokenize(text.lower())  #word_tokenize splits the text into individual words.
    for word in bag_of_words:
        if word in word_dict:
            sentiment_score += word_dict[word]
    return sentiment_score

**BING_LU_SCORE() FOR TEXT DF**

In [19]:
# Replace missing values in the 'text' column with the string 'no review'
df['text'].fillna('no review', inplace=True)
df['Bing_Liu_Score'] = df['text'].apply(bing_liu_score)

**TOP10 ROWS**

In [20]:
#Previewing the sentiment analysis results for the top 10 rows
df[['overall',"text", 'Bing_Liu_Score']].head(10)

Unnamed: 0,overall,text,Bing_Liu_Score
0,4,Contains some interesting stitches.,1
1,5,I'm a fairly experienced knitter of the one-co...,22
2,4,Great book but the index is terrible. Had to w...,0
3,5,I purchased the Kindle edition which is incred...,4
4,5,Very well laid out and very easy to read.\n\nT...,5
5,5,"Beginning her career as a freelance knitter, M...",15
6,5,This is a terrific stitch handbook (and I have...,9
7,4,The book needs to be coil bound. The content i...,1
8,5,I really am enjoying this book! I like the siz...,12
9,5,Just received this book and looked over it cov...,6


**AVERAGE BING LU SENTIMENT SCORES GROUPED BY OVERALL RATINGS**

In [21]:
#Calculating the mean Bing Liu sentiment score
df.groupby('overall').agg({'Bing_Liu_Score':'mean'})

Unnamed: 0_level_0,Bing_Liu_Score
overall,Unnamed: 1_level_1
1,-0.255049
2,0.566098
3,1.158796
4,2.027999
5,2.129986
