<a href="https://colab.research.google.com/github/adamazanos/using_nltkvader_to_make-labeling/blob/main/Using_NLTK_vader_to_make_labeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The objective of this project is to label data using instructions from

 https://thecleverprogrammer.com/2021/11/24/add-labels-to-a-dataset-for-sentiment-analysis/. 

In this project, we will use the NLTK Vader library to calculate the compound score.

#01. Download the library and import the data

In [None]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download("vader_lexicon")
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/reviews%20data.csv")

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


##01.a Examine the data

In [None]:
data

Unnamed: 0,Review
0,nice hotel expensive parking got good deal sta...
1,ok nothing special charge diamond member hilto...
2,nice rooms not 4* experience hotel monaco seat...
3,"unique, great stay, wonderful time hotel monac..."
4,"great stay great stay, went seahawk game aweso..."
...,...
20486,"best kept secret 3rd time staying charm, not 5..."
20487,great location price view hotel great quick pl...
20488,"ok just looks nice modern outside, desk staff ..."
20489,hotel theft ruined vacation hotel opened sept ...


In [None]:
data = data.dropna() #fast way to drop all null data
print(data.head())

                                              Review
0  nice hotel expensive parking got good deal sta...
1  ok nothing special charge diamond member hilto...
2  nice rooms not 4* experience hotel monaco seat...
3  unique, great stay, wonderful time hotel monac...
4  great stay great stay, went seahawk game aweso...


#02. Use SentimentIntensityAnalyzer 

In [None]:
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Review"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Review"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Review"]]
data['Compound'] = [sentiments.polarity_scores(i)["compound"] for i in data["Review"]]


##02.a Check the data output

In [None]:
data

Unnamed: 0,Review,Positive,Negative,Neutral,Compound
0,nice hotel expensive parking got good deal sta...,0.285,0.072,0.643,0.9747
1,ok nothing special charge diamond member hilto...,0.189,0.110,0.701,0.9787
2,nice rooms not 4* experience hotel monaco seat...,0.219,0.081,0.700,0.9889
3,"unique, great stay, wonderful time hotel monac...",0.385,0.060,0.555,0.9912
4,"great stay great stay, went seahawk game aweso...",0.221,0.135,0.643,0.9797
...,...,...,...,...,...
20486,"best kept secret 3rd time staying charm, not 5...",0.272,0.063,0.665,0.9834
20487,great location price view hotel great quick pl...,0.430,0.000,0.570,0.9753
20488,"ok just looks nice modern outside, desk staff ...",0.145,0.131,0.724,0.2629
20489,hotel theft ruined vacation hotel opened sept ...,0.179,0.150,0.671,0.9867


#03. Make the Sentiment value based on compound score

In [None]:
score = data["Compound"].values
sentiment = []
for i in score:
    if i >= 0.05 :
        sentiment.append('Positive')
    elif i <= -0.05 :
        sentiment.append('Negative')
    else:
        sentiment.append('Neutral')
data["Sentiment"] = sentiment
data

Unnamed: 0,Review,Positive,Negative,Neutral,Compound,Sentiment
0,nice hotel expensive parking got good deal sta...,0.285,0.072,0.643,0.9747,Positive
1,ok nothing special charge diamond member hilto...,0.189,0.110,0.701,0.9787,Positive
2,nice rooms not 4* experience hotel monaco seat...,0.219,0.081,0.700,0.9889,Positive
3,"unique, great stay, wonderful time hotel monac...",0.385,0.060,0.555,0.9912,Positive
4,"great stay great stay, went seahawk game aweso...",0.221,0.135,0.643,0.9797,Positive
...,...,...,...,...,...,...
20486,"best kept secret 3rd time staying charm, not 5...",0.272,0.063,0.665,0.9834,Positive
20487,great location price view hotel great quick pl...,0.430,0.000,0.570,0.9753,Positive
20488,"ok just looks nice modern outside, desk staff ...",0.145,0.131,0.724,0.2629,Positive
20489,hotel theft ruined vacation hotel opened sept ...,0.179,0.150,0.671,0.9867,Positive


##02.a Count how many sentiment is ready

In [None]:
print(data["Sentiment"].value_counts())

Positive    18831
Negative     1569
Neutral        91
Name: Sentiment, dtype: int64


#Conclusion 

We can use the compound score to make predictions and label the original dataset. This step is crucial in creating training data.