## Sentiment Analysis Using VADER

[PlayerUnknown's Battlegrounds (PUBG)](https://en.wikipedia.org/wiki/PUBG:_Battlegrounds) is a popular multiplayer battle royale game developed by PUBG Corporation in [Krafton](https://www.krafton.com/). It was first released in 2017 and is available on various platforms including PC, consoles (Xbox One) and mobile devices.
<br>
<br>
In this notebook, I am going to use PUBG-mobile review dataset from India (Dataset from [Kaggle](https://www.kaggle.com/datasets/manishgupta72/battlegrounds-mobile-india-krafton-inc/data)) and conduct sentiment analysis using [VADER (Valence Aware Dictionary and sEntiment Reasoner)](https://vadersentiment.readthedocs.io/en/latest/).

The dataset consists of $4$ columns:
- $\textbf{Name:}$ Name of the user name who posted the review 
- $\textbf{Upload date:}$ Time of the review being posted online by the reviewer
- $\textbf{Reviews:}$ The review context made by the reviewer
- $\textbf{Num. of people found this review helpful:}$ Number of people who found that the corresponding review is helpful

$\underline{\textbf{objective:}}$

The object of this study is to find if there is a correlation between the sentiment score and the number of helpful vote a review gets

In [37]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np 
import pandas as pd

df = pd.read_csv("/kaggle/input/battlegrounds-mobile-india-krafton-inc/Battlegrounds Mobile India DataSet.csv")

df.head()

Unnamed: 0,Name,Upload date,Reviews,Num. of people found this review helpful
0,Saurabh shukla,09-08-2023,The new dragon ball update is really an amazin...,471 people found this review helpful
1,VIl EMERALD SHOURYA MISHRA 0270,09-08-2023,"This game is super , I like this game very muc...",251 people found this review helpful
2,Rana Chhatrapalsinh,10-08-2023,I want to give an advice that no matter how go...,412 people found this review helpful
3,Naresh Pal,10-08-2023,A must-play for all mobile gamers! BattleGroun...,198 people found this review helpful
4,Dilip Rathod,11-08-2023,♥️💞I have no issues with the game graphics and...,"1,07,093 people found this review helpful"


In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1039 entries, 0 to 1038
Data columns (total 4 columns):
 #   Column                                    Non-Null Count  Dtype 
---  ------                                    --------------  ----- 
 0   Name                                      1039 non-null   object
 1   Upload date                               1039 non-null   object
 2   Reviews                                   1039 non-null   object
 3   Num. of people found this review helpful  1032 non-null   object
dtypes: object(4)
memory usage: 32.6+ KB


In [39]:
df.isnull().any()

Name                                        False
Upload date                                 False
Reviews                                     False
Num. of people found this review helpful     True
dtype: bool

In [40]:
df.shape

(1039, 4)

##### Note: 
when we call df.info() and df.shape, we can see that the dimension of the dataset is ($1039 \times 4$) and there are $7$ Null values in the $4^{th}$ column. 

Hence, in the $4^{th}$ column, I am going to fill the Null value with $0$ instead and change the string into integer type (ex. "$471$ people found this review helpful" to $471$)

Also, I am going to drop the Name column and convert Upload date from string to datetime type before I go further into analysis.


In [41]:
import re
from datetime import datetime

# INPUT: pd.DataFrame
# OUTPUT: pd.DataFrame
# Change df['Num. of people found this review helpful'] from str type to int
#        replace Null with 0
def change_str_to_int(df: pd.DataFrame) -> pd.DataFrame:
    # Replace NaN with '0' first,
    # Remove non-numeric characters using Regular Expression
    # Convert to integer
    df['Num. of people found this review helpful'] = (
        df['Num. of people found this review helpful']
        .fillna('0')
        .apply(lambda x: re.sub(r'[^\d]', '', str(x)))
        .astype(int)
    )
    return df

def drop_and_change(df: pd.DataFrame) -> pd.DataFrame:

    # Change "DD-MM-YYYY" to datetime format
    df['Upload date '] = (
        df['Upload date '].apply(lambda x: datetime.strptime(x, "%d-%m-%Y"))
    )
    # Drop the "Name" (First) column
    df = df.drop(columns = ['Name'])

    
    return df

df = change_str_to_int(df)
df = drop_and_change(df)

df.info

<bound method DataFrame.info of      Upload date                                             Reviews  \
0      2023-08-09  The new dragon ball update is really an amazin...   
1      2023-08-09  This game is super , I like this game very muc...   
2      2023-08-10  I want to give an advice that no matter how go...   
3      2023-08-10  A must-play for all mobile gamers! BattleGroun...   
4      2023-08-11  ♥️💞I have no issues with the game graphics and...   
...           ...                                                ...   
1034   2024-01-19  You are gonna love it for sure. Everything is ...   
1035   2024-01-19  There is lots of bugs I'm not able to see enem...   
1036   2024-01-20  BGMI Is One Of My Favourite Game. I Am Playing...   
1037   2024-01-20  Yeah good game . But it lags on some time . Kr...   
1038   2024-01-20  I have no words to describe this game such an ...   

      Num. of people found this review helpful  
0                                          471  
1    

In [42]:
df.isnull().any()

Upload date                                 False
Reviews                                     False
Num. of people found this review helpful    False
dtype: bool

In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1039 entries, 0 to 1038
Data columns (total 3 columns):
 #   Column                                    Non-Null Count  Dtype         
---  ------                                    --------------  -----         
 0   Upload date                               1039 non-null   datetime64[ns]
 1   Reviews                                   1039 non-null   object        
 2   Num. of people found this review helpful  1039 non-null   int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 24.5+ KB


And now, we create 3 columns ("Negative", "Neutral", "Positive") indicating VADER sentiment scores.

In [44]:
# Initialize new columns "Negative", "Neutral", "Positive"
df["Negative"] = 0
df["Neutral"] = 0
df["Positive"] = 0
df["Overall Score"] = 0
df["Overall"] = ""

df.head()

Unnamed: 0,Upload date,Reviews,Num. of people found this review helpful,Negative,Neutral,Positive,Overall Score,Overall
0,2023-08-09,The new dragon ball update is really an amazin...,471,0,0,0,0,
1,2023-08-09,"This game is super , I like this game very muc...",251,0,0,0,0,
2,2023-08-10,I want to give an advice that no matter how go...,412,0,0,0,0,
3,2023-08-10,A must-play for all mobile gamers! BattleGroun...,198,0,0,0,0,
4,2023-08-11,♥️💞I have no issues with the game graphics and...,107093,0,0,0,0,


In [49]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

def assign_sentiment_scores(df):
    sid_obj = SentimentIntensityAnalyzer()

    df[["Negative", "Neutral", "Positive", "Overall Score"]] = df["Reviews"].apply(
        lambda review: pd.Series({
            "Negative": sid_obj.polarity_scores(review)['neg'],
            "Neutral": sid_obj.polarity_scores(review)['neu'],
            "Positive": sid_obj.polarity_scores(review)['pos'],
            "Overall Score": sid_obj.polarity_scores(review)['compound'],
        })
    )
    return df

    
assign_sentiment_scores(df)


Unnamed: 0,Upload date,Reviews,Num. of people found this review helpful,Negative,Neutral,Positive,Overall Score,Overall
0,2023-08-09,The new dragon ball update is really an amazin...,471,0.100,0.837,0.063,-0.3206,
1,2023-08-09,"This game is super , I like this game very muc...",251,0.050,0.747,0.203,0.8704,
2,2023-08-10,I want to give an advice that no matter how go...,412,0.011,0.831,0.158,0.9316,
3,2023-08-10,A must-play for all mobile gamers! BattleGroun...,198,0.090,0.600,0.310,0.9508,
4,2023-08-11,♥️💞I have no issues with the game graphics and...,107093,0.102,0.719,0.178,0.8907,
...,...,...,...,...,...,...,...,...
1034,2024-01-19,You are gonna love it for sure. Everything is ...,8,0.052,0.602,0.345,0.9393,
1035,2024-01-19,There is lots of bugs I'm not able to see enem...,4,0.127,0.873,0.000,-0.5423,
1036,2024-01-20,BGMI Is One Of My Favourite Game. I Am Playing...,7,0.085,0.695,0.220,0.8296,
1037,2024-01-20,Yeah good game . But it lags on some time . Kr...,0,0.246,0.590,0.164,-0.7650,


In [None]:


def sentiment_scores(sentence):
    sid_obj = SentimentIntensityAnalyzer()
    # polarity_scores method of SentimentIntensityAnalyzer
    # which contains pos, neg, neu, and compound scores.
    sentiment_dict = sid_obj.polarity_scores(sentence)

    print("Overall sentiment dictionary is : ", sentiment_dict)
    print("sentence was rated as ", sentiment_dict['neg']*100, "% Negative")
    print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral")
    print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive")
 
    print("Sentence Overall Rated As", end = " ")
 
    # decide sentiment as positive, negative and neutral
    if sentiment_dict['compound'] >= 0.05 :
        print("Positive")
 
    elif sentiment_dict['compound'] <= - 0.05 :
        print("Negative")
 
    else :
        print("Neutral")