# Sentiment Analysis for Customer Reviews Challenge

## Challenge:
Develop a robust Sentiment Analysis classifier for XYZ customer reviews, automating the categorization into positive, negative, or neutral sentiments. Utilize Natural Language Processing (NLP) techniques, exploring different sentiment analysis methods.

## Problem Statement:
XYZ organization, a global online retail giant, accumulates a vast number of customer reviews daily. Extracting sentiments from these reviews offers insights into customer satisfaction, product quality, and market trends. The challenge is to create an effective sentiment analysis model that accurately classifies XYZ customer reviews.

### Important Instructions:

1. Make sure this ipynb file that you have cloned is in the __Project__ folder on the Desktop. The Dataset is also available in the same folder.
2. Ensure that all the cells in the notebook can be executed without any errors.
3. Once the Challenge has been completed, save the SentimentAnalysis.ipynb notebook in the __*Project*__ Folder on the desktop. If the file is not present in that folder, autoevalution will fail.
4. Print the evaluation metrics of the model. 
5. Before you submit the challenge for evaluation, please make sure you have assigned the Accuracy score of the model that was created for evaluation.
6. Assign the Accuracy score obtained for the model created in this challenge to the specified variable in the predefined function *submit_accuracy_score*. The solution is to be written between the comments `# code starts here` and `# code ends here`
7. Please do not make any changes to the variable names and the function name *submit_accuracy_score* as this will be used for automated evaluation of the challenge. Any modification in these names will result in unexpected behaviour.

### --------------------------------------- CHALLENGE CODE STARTS HERE --------------------------------------------

In [191]:
!pip install openpyxl
import nltk
import pandas as pd
import numpy as np
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score as asc


Defaulting to user installation because normal site-packages is not writeable


In [164]:
df = pd.read_csv('/home/labuser/Desktop/Project/Reviews.csv')
df.head(5)

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,5,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,1,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,4,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...
3,4,B000UA0QIQ,A395BORC6FGVXV,Karl,3,3,2,1307923200,Cough Medicine,If you are looking for the secret ingredient i...
4,5,B006K2ZZ7K,A1UQRSCLF8GW1T,"Michael D. Bigham ""M. Wassir""",0,0,5,1350777600,Great taffy,Great taffy at a great price. There was a wid...


In [165]:
df1 = df[['Text']].copy()
df2=df[['Summary']].copy()
print(df1.count)
print(df2.count)

<bound method DataFrame.count of                                                      Text
0       I have bought several of the Vitality canned d...
1       Product arrived labeled as Jumbo Salted Peanut...
2       This is a confection that has been around a fe...
3       If you are looking for the secret ingredient i...
4       Great taffy at a great price.  There was a wid...
...                                                   ...
568449  Great for sesame chicken..this is a good if no...
568450  I'm disappointed with the flavor. The chocolat...
568451  These stars are small, so you can give 10-15 o...
568452  These are the BEST treats for training and rew...
568453  I am very satisfied ,product is as advertised,...

[568454 rows x 1 columns]>
<bound method DataFrame.count of                                    Summary
0                    Good Quality Dog Food
1                        Not as Advertised
2                    "Delight" says it all
3                           Cough Medi

In [166]:
def normalize_score(score,min_val,max_val):
    return(score-min_val)/(max_val-min_val)

In [167]:
df1 = df1.rename(columns={'Text':'Sentences'})

df2 = df2.rename(columns={'Summary':'Sentences'})
df2

Unnamed: 0,Sentences
0,Good Quality Dog Food
1,Not as Advertised
2,"""Delight"" says it all"
3,Cough Medicine
4,Great taffy
...,...
568449,Will not do without
568450,disappointed
568451,Perfect for our maltipoo
568452,Favorite Training and reward treat


In [168]:

sid = SentimentIntensityAnalyzer()
df1['sent_neg']=0
df1['sent_neu']=0
df1['sent_pos']=0
df1['sent_compound']=0
for i in range(0,len(df1),1):
  sentence = df1['Sentences'][i]
  ss = sid.polarity_scores(sentence)
  df1.iloc[i,1]=normalize_score(float(ss['neg']),-1.0,1.0)
  df1.iloc[i,2]=normalize_score(ss['neu'],-1.0,1.0)
  df1.iloc[i,3]=normalize_score(ss['pos'],-1.0,1.0)
  df1.iloc[i,4]=normalize_score(ss['compound'],-1.0,1.0)

    
   

 

  df1.iloc[i,1]=normalize_score(float(ss['neg']),-1.0,1.0)
  df1.iloc[i,2]=normalize_score(ss['neu'],-1.0,1.0)
  df1.iloc[i,3]=normalize_score(ss['pos'],-1.0,1.0)
  df1.iloc[i,4]=normalize_score(ss['compound'],-1.0,1.0)


In [171]:
sid = SentimentIntensityAnalyzer()
df2['sent_neg']=0
df2['sent_neu']=0
df2['sent_pos']=0
df2['sent_compound']=0
for i in range(0,len(df1),1):
  sentence = df1['Sentences'][i]
  ss = sid.polarity_scores(sentence)
  df2.iloc[i,1]=normalize_score(float(ss['neg']),-1.0,1.0)
  df2.iloc[i,2]=normalize_score(ss['neu'],-1.0,1.0)
  df2.iloc[i,3]=normalize_score(ss['pos'],-1.0,1.0)
  df2.iloc[i,4]=normalize_score(ss['compound'],-1.0,1.0)


  df2.iloc[i,1]=normalize_score(float(ss['neg']),-1.0,1.0)
  df2.iloc[i,2]=normalize_score(ss['neu'],-1.0,1.0)
  df2.iloc[i,3]=normalize_score(ss['pos'],-1.0,1.0)
  df2.iloc[i,4]=normalize_score(ss['compound'],-1.0,1.0)


In [201]:
df1['Score']=df['Score']
df1['Score']=df1['Score']
df1




Unnamed: 0,Sentences,sent_neg,sent_neu,sent_pos,sent_compound,Score
0,I have bought several of the Vitality canned d...,0.5000,0.8475,0.6525,0.97205,5
1,Product arrived labeled as Jumbo Salted Peanut...,0.5690,0.9310,0.5000,0.21680,1
2,This is a confection that has been around a fe...,0.5455,0.8770,0.5775,0.91325,4
3,If you are looking for the secret ingredient i...,0.5000,1.0000,0.5000,0.50000,2
4,Great taffy at a great price. There was a wid...,0.5000,0.7760,0.7240,0.97340,5
...,...,...,...,...,...,...
568449,Great for sesame chicken..this is a good if no...,0.5360,0.8000,0.6635,0.92945,5
568450,I'm disappointed with the flavor. The chocolat...,0.5950,0.8485,0.5570,0.25760,2
568451,"These stars are small, so you can give 10-15 o...",0.5185,0.9420,0.5390,0.71760,5
568452,These are the BEST treats for training and rew...,0.5205,0.7530,0.7260,0.98585,5


In [188]:
df1.isnull().sum()

Sentences        0
sent_neg         0
sent_neu         0
sent_pos         0
sent_compound    0
Score            0
dtype: int64

### --------------------------------------- CHALLENGE CODE ENDS HERE --------------------------------------------

### NOTE:
1. Assign the Accuracy score obtained for the model created in this challenge to the specified variable in the predefined function *submit_accuracy_score* below. The solution is to be written between the comments `# code starts here` and `# code ends here`
2. Please do not make any changes to the variable names and the function name *submit_accuracy_score* as this will be used for automated evaluation of the challenge. Any modification in these names will result in unexpected behaviour.

In [218]:


def submit_accuracy_score(df1)-> float: 
    accuracy = 0.0
    predicted_sentiments = df1['sent_compound']

    present_score = df1['Score']

    predicted_sentiments = ['positive' if x>0.5 else ('negative' if x<0.5 else 'neutral')for x in predicted_sentiments]
    present_score = ['positive' if x>0.5 else ('negative' if x<0.5 else 'neutral')for x in present_score]
    accuracy = accuracy_score(predicted_sentiments,present_score)
    
    
    return accuracy
a = submit_accuracy_score(df1)

print("Accuracy for sentiment analysis with Txt: ", a)




   
    
   

Accuracy for sentiment analysis with Txt:  0.8779514261488177
