# VaderSentiment Analysis Pipeline

#### Valence Aware Dictionary and sEntiment Reasoner (VADER) is a lexicon rule-based sentiment analysis tool designed for social media and short text, developed by C.J. Hutto and Eric Gilbert (2014).  The lexicon is a dictionary that maps words to their sentiment scores.

#### This script is part of a sentiment analysis evaluation pipeline. The goal is to parse this information, normalize it and apply the lexicon as a sentiment classifier. 

#### It uses a CSV file with different sentences to be classified by VADER. The result is a new dataframe and CVS that contains compound sentiment scores and class labels generated by the VADER lexicon-based tool. 



#### Import libraries

In [None]:
import pandas as pd
import io
import nltk, re, pprint
import string

from nltk.corpus import wordnet
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text





     

In [None]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/ericacarneiro/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
#### Load Dataset

In [None]:
df = pd.read_csv("desafio_DS/dataset_valid.csv", sep=None,  #automatically detecting sep
    engine="python",  #accepting different seps
    quoting=3,  # ignoring quotes
    on_bad_lines="warn"  # warning without breaking
)

In [None]:
df.head

<bound method NDFrame.head of      Unnamed: 0                                              input
0         19784                        The pizza was really good .
1         19788  Knowledge of the chef and the waitress are bel...
2         19792                               The service was ok .
3         19796  I 'm happy to have Nosh in the neighborhood an...
4         19800                    Indoor was very cozy and cute .
..          ...                                                ...
194       20560  We started with lox and mussels ( the best ive...
195       20564  The food here does a great service to the name...
196       20568  Although the tables may be closely situated , ...
197       20572         The staff is also attentive and friendly .
198       20576  And they have these home made potato chips at ...

[199 rows x 2 columns]>

### Data Pre-Processing

#### Pre-Processing and cleaning dataset by removing numbers and punctuations

In [None]:
df_valid = df.rename(columns={"input": "text"})


In [None]:
corpus = df_valid["text"]

In [None]:
clean_corpus = corpus.apply(lambda corpus: corpus.lower())

In [None]:
clean_corpus = clean_corpus.apply(lambda corpus: re.sub(r'[0-9]|,|\.|/|\$|\||-|\+|:|•', ' ', corpus))



In [None]:
print(clean_corpus)

0                            the pizza was really good  
1      knowledge of the chef and the waitress are bel...
2                                   the service was ok  
3      i 'm happy to have nosh in the neighborhood an...
4                        indoor was very cozy and cute  
                             ...                        
194    we started with lox and mussels ( the best ive...
195    the food here does a great service to the name...
196    although the tables may be closely situated   ...
197           the staff is also attentive and friendly  
198    and they have these home made potato chips at ...
Name: text, Length: 199, dtype: object


In [None]:
type(clean_corpus)

pandas.core.series.Series

## Vader Instalation

In [None]:
pip install vaderSentiment


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

#### The function below takes a single string sentence as input (in this case users restaurant reviews)

In [None]:
def sentiment_vader(sentence):

    # Create a SentimentIntensityAnalyzer object. 
    # This object contains a pre-built sentiment lexicon 
    #and rules to analyze the text.
    
    sid_obj = SentimentIntensityAnalyzer()
    
    # Getting sentiment scores by analyzing polarity of each sentence, 
    # returning a dictionary with the keys 'neg', 'neu', 'pos', and 'compound'.
    # 'neg' indicates negative sentiment;
    # 'neu' indicates neutral sentiment,
    # 'pos' indicates positive sentiment;
    # compound is a normalized aggregate score between -1 and +1.

    sentiment_dict = sid_obj.polarity_scores(sentence)
    negative = sentiment_dict['neg']
    neutral = sentiment_dict['neu']
    positive = sentiment_dict['pos']
    compound = sentiment_dict['compound']

    #Classifying overall sentiment

    if sentiment_dict['compound'] >= 0.05 :
        overall_sentiment = "Positive"

    elif sentiment_dict['compound'] <= - 0.05 :
        overall_sentiment = "Negative"

    else :
        overall_sentiment = "Neutral"

    # Returning values
  
    return negative, neutral, positive, compound, overall_sentiment

In [None]:
sent= sentiment_vader(clean_corpus)

In [None]:
print(sent)

(0.045, 0.731, 0.223, 1.0, 'Positive')


In [None]:
clean_corpus.apply(sentiment_vader)

0      (0.0, 0.556, 0.444, 0.4927, Positive)
1              (0.0, 1.0, 0.0, 0.0, Neutral)
2       (0.0, 0.577, 0.423, 0.296, Positive)
3        (0.0, 0.66, 0.34, 0.7713, Positive)
4      (0.0, 0.605, 0.395, 0.5046, Positive)
                       ...                  
194    (0.0, 0.833, 0.167, 0.6369, Positive)
195    (0.0, 0.773, 0.227, 0.6249, Positive)
196            (0.0, 1.0, 0.0, 0.0, Neutral)
197    (0.0, 0.652, 0.348, 0.4939, Positive)
198    (0.0, 0.824, 0.176, 0.6468, Positive)
Name: text, Length: 199, dtype: object

In [None]:
sent =  clean_corpus.apply(sentiment_vader)

In [None]:
print(sent)

0      (0.0, 0.556, 0.444, 0.4927, Positive)
1              (0.0, 1.0, 0.0, 0.0, Neutral)
2       (0.0, 0.577, 0.423, 0.296, Positive)
3        (0.0, 0.66, 0.34, 0.7713, Positive)
4      (0.0, 0.605, 0.395, 0.5046, Positive)
                       ...                  
194    (0.0, 0.833, 0.167, 0.6369, Positive)
195    (0.0, 0.773, 0.227, 0.6249, Positive)
196            (0.0, 1.0, 0.0, 0.0, Neutral)
197    (0.0, 0.652, 0.348, 0.4939, Positive)
198    (0.0, 0.824, 0.176, 0.6468, Positive)
Name: text, Length: 199, dtype: object


In [None]:
type(sent)

pandas.core.series.Series

In [None]:
#Concatenating the results with the original dataframe
sent_vader = pd.concat([sent, clean_corpus], axis=1)

sent_vader

Unnamed: 0,text,text.1
0,"(0.0, 0.556, 0.444, 0.4927, Positive)",the pizza was really good
1,"(0.0, 1.0, 0.0, 0.0, Neutral)",knowledge of the chef and the waitress are bel...
2,"(0.0, 0.577, 0.423, 0.296, Positive)",the service was ok
3,"(0.0, 0.66, 0.34, 0.7713, Positive)",i 'm happy to have nosh in the neighborhood an...
4,"(0.0, 0.605, 0.395, 0.5046, Positive)",indoor was very cozy and cute
...,...,...
194,"(0.0, 0.833, 0.167, 0.6369, Positive)",we started with lox and mussels ( the best ive...
195,"(0.0, 0.773, 0.227, 0.6249, Positive)",the food here does a great service to the name...
196,"(0.0, 1.0, 0.0, 0.0, Neutral)",although the tables may be closely situated ...
197,"(0.0, 0.652, 0.348, 0.4939, Positive)",the staff is also attentive and friendly


In [None]:
#Renaming columns
vader_emo = sent_vader.to_csv("vader_emo.csv")

In [None]:
# Load the CSV file
df_vader = pd.read_csv("vader_emo.csv")

# Extracting only the sentiment class (last value of the tuple) 
df_vader["sentiment"] = df_vader["text"].str.extract(r"'([^']+)'")

# Lowercase padronization in order to compare with other sentiment classification methods
df_vader["sentiment"] = df_vader["sentiment"].str.lower().str.strip()

# Saving as CSV
df_vader.to_csv("vader_emo_com_sentiment.csv", index=False)

# Visualizing the distribution of the results
print(df_vader["sentiment"].value_counts())

sentiment
positive    133
neutral      42
negative     24
Name: count, dtype: int64


In [None]:
df_vader.head

<bound method NDFrame.head of      Unnamed: 0                                     text  \
0             0  (0.0, 0.556, 0.444, 0.4927, 'Positive')   
1             1          (0.0, 1.0, 0.0, 0.0, 'Neutral')   
2             2   (0.0, 0.577, 0.423, 0.296, 'Positive')   
3             3    (0.0, 0.66, 0.34, 0.7713, 'Positive')   
4             4  (0.0, 0.605, 0.395, 0.5046, 'Positive')   
..          ...                                      ...   
194         194  (0.0, 0.833, 0.167, 0.6369, 'Positive')   
195         195  (0.0, 0.773, 0.227, 0.6249, 'Positive')   
196         196          (0.0, 1.0, 0.0, 0.0, 'Neutral')   
197         197  (0.0, 0.652, 0.348, 0.4939, 'Positive')   
198         198  (0.0, 0.824, 0.176, 0.6468, 'Positive')   

                                                text.1 sentiment  
0                          the pizza was really good    positive  
1    knowledge of the chef and the waitress are bel...   neutral  
2                                 the service wa