## Adding Label to Unlabeled Datasets

- The data we receive has many errors and occasionally is not labelled, a data scientist must invest a great deal of time in creating a dataset for any given data science assignment. 
- Before using a dataset to address an issue, it is crucial to add labels to it. 
- **Sentiment analysis is one of those challenges where labelling a dataset is crucial.** 
- In this scenario, the data is user reviews or comments, and labelling it is necessary to get it ready for sentiment analysis. Thus, this code is for to learn how to label unlabeled data. I'll walk you how to add labels to a dataset in this file.

**Sentiment analysis** is the branch of natural language processing that allows us to identify whether a text has positive or negative sentiment. 
- Businesses utilise sentiment analysis to examine consumer perceptions about their goods and services. 
- Based on the positive and negative feelings, they may use the information to promote their goods and services more effectively and raise the calibre of their goods and services. 

The **VADER sentiment model** and the **Naïve Bayes Algorithm** are the two best approaches that is to be prefered when working on the Sentiment Analysis task. 
- If you are working on a sentiment analysis task where your **dataset does not have sentiment labels, you can use the VADER sentiment model**.
- if your **dataset is labelled and your task is to train a classification model to classify the sentiment of a text in real-time, you may prefer the Naïve Bayes algorithm.**

In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download("vader_lexicon")
import pandas as pd
data = pd.read_csv("C:/Users/asus/OneDrive/Desktop/ML_Datasets/project/More_Projects/Review.csv")
data = data.dropna()
data.head()

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\asus\AppData\Roaming\nltk_data...


Unnamed: 0,Review
0,nice hotel expensive parking got good deal sta...
1,ok nothing special charge diamond member hilto...
2,nice rooms not 4* experience hotel monaco seat...
3,"unique, great stay, wonderful time hotel monac..."
4,"great stay great stay, went seahawk game aweso..."


This dataset contains only one column, I will now move to the task of adding labels to the dataset. I will start by adding four new columns to this dataset as Positive, Negative, Neutral, and Compound by calculating the sentiment scores of the column containing textual data:

In [2]:
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["Review"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["Review"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["Review"]]
data['Compound'] = [sentiments.polarity_scores(i)["compound"] for i in data["Review"]]
data.head()

Unnamed: 0,Review,Positive,Negative,Neutral,Compound
0,nice hotel expensive parking got good deal sta...,0.285,0.072,0.643,0.9747
1,ok nothing special charge diamond member hilto...,0.189,0.11,0.701,0.9787
2,nice rooms not 4* experience hotel monaco seat...,0.219,0.081,0.7,0.9889
3,"unique, great stay, wonderful time hotel monac...",0.385,0.06,0.555,0.9912
4,"great stay great stay, went seahawk game aweso...",0.221,0.135,0.643,0.9797


 Now the next task is to add labels by categorizing these scores. 
 - According to the industry standards, if the compound score of sentiment is more than 0.05, then it is categorized as Positive, and if the compound score is less than -0.05, then it is categorized as Negative, otherwise, it’s neutral. 
 - So with this information, I will add a new column in this dataset which will include all the sentiment labels:

In [3]:
score = data["Compound"].values
sentiment = []
for i in score:
    if i >= 0.05 :
        sentiment.append('Positive')
    elif i <= -0.05 :
        sentiment.append('Negative')
    else:
        sentiment.append('Neutral')
data["Sentiment"] = sentiment
data.head()

Unnamed: 0,Review,Positive,Negative,Neutral,Compound,Sentiment
0,nice hotel expensive parking got good deal sta...,0.285,0.072,0.643,0.9747,Positive
1,ok nothing special charge diamond member hilto...,0.189,0.11,0.701,0.9787,Positive
2,nice rooms not 4* experience hotel monaco seat...,0.219,0.081,0.7,0.9889,Positive
3,"unique, great stay, wonderful time hotel monac...",0.385,0.06,0.555,0.9912,Positive
4,"great stay great stay, went seahawk game aweso...",0.221,0.135,0.643,0.9797,Positive


In [4]:
print(data["Sentiment"].value_counts())

Sentiment
Positive    18831
Negative     1569
Neutral        91
Name: count, dtype: int64


- The Review column was the only initial column in the dataset, we added four columns containing the sentiment scores, and at last, we added a new column containing labels according to the sentiment scores. If you only want the text and label columns, you can remove all other columns and save your dataset. 

**this is how you can add labels to an unlabeled dataset for sentiment analysis using the Python programming language.**

## Real-Time Sentiment Analysis

**Sentiment analysis's** primary goal is to examine user reviews of a specific good or service in order to better inform potential buyers about the product's calibre. 
- Every time Apple launches a new iPhone, for instance, we see a lot of people expressing their thoughts about it; some like it, some don't, but ultimately, all of these viewpoints help us determine whether or not we should purchase the new iPhone.

**What if you want to analyze people’s feelings in real-time ?** i.e. ask a user about your product and understand your product in real-time. 

- To analyze feelings in real-time, we need to request input from the user and then analyze user feelings given by him/her as input. So for this real-time sentiment analysis task using Python, I will be using the **NLTK library in Python** which is a very useful tool for all the tasks of natural language processing.

In [5]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.download('vader_lexicon')
user_input = input("Please Rate Our Services >>: ")
sid = SentimentIntensityAnalyzer()
score = sid.polarity_scores(user_input)
score

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\asus\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Please Rate Our Services >>: its great


{'neg': 0.0, 'neu': 0.196, 'pos': 0.804, 'compound': 0.6249}

So the sentiments score looks like a dictionary with keys as ‘neg’, ‘neu’, ‘pos’, ‘compound’. The above output says that the sentiment of the user is 80.4% positive and 19.6% neutral .

So here is the complete Python code for real-time sentiment analysis:

In [6]:
user_input = input("Please Rate Our Services >>: ")
sid = SentimentIntensityAnalyzer()
score = sid.polarity_scores(user_input)
if score["neg"] != 0:
      print("Negative")
else:
      print("Positive")

Please Rate Our Services >>: its really good
Positive


now we can see positive or negative as an output instead of the sentiment scores. 