# Reddit Crypto Comments - Sentiment Classification

## Vader Sentiment Analyzer from NLTK
Reference - https://www.nltk.org/_modules/nltk/sentiment/vader.html

### Backgroud

- VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion.
- VADER has the advantage of assessing the sentiment of any given text without the need for previous training as we might have to for Machine Learning models.
- The limitation of VADER is that as a rule-based sentiment classifier, it can only classify the scenarios the algorithm has explicitly defined.

## Exploring Baseline Approaches

- Let's see how does vader yields the sentiment results.
- It generally outputs 4 score, 
    - Negative, Nuetral, Positive, Compound

## General Imports

In [6]:
import pandas as pd

In [3]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [4]:
import nltk
nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\darsh\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [5]:
analyzer = SentimentIntensityAnalyzer()

## Loading PreProcessed Data

In [7]:
data_df = pd.read_csv("../data/processed_dataset.csv")
data_df

Unnamed: 0,worker_id,sentiment,processed_text
0,XYNN2Y4VCF3G,Positive,bought 2200 ico 050 coin hold sold 3 months ag...
1,DR6XNZMT9KRH,Positive,harmony algorand cardano solana vechain gonna ...
2,9FCQGMYD4A42,Negative,honestly reading post responses conclude crypt...
3,QEZAEMV2WF9D,Positive,bear market money continue dca assets believe
4,Z7J7W3XCP4XC,Negative,funny people think bitcoins risk comparable st...
...,...,...,...
547,FM9FYY27VTVR,Positive,lightning network capability millions btc tran...
548,K94W3ATCZX2V,Positive,means daily volume means daily active users me...
549,W7APN9XYDRV6,Negative,exchanges irresponsible list tokens luna shoul...
550,MTCJYTYKKHZJ,Positive,massive ecosystem pretty applications shitload...


## Baseline - Calculate Sentiment Score Using Vador

In [8]:
sentiment = []
for sentence in data_df["processed_text"].values:
    polarity_score = analyzer.polarity_scores(sentence)
    sentiment.append(polarity_score)

In [9]:
sentiment[:5]

[{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0},
 {'neg': 0.0, 'neu': 0.803, 'pos': 0.197, 'compound': 0.4019},
 {'neg': 0.218, 'neu': 0.567, 'pos': 0.215, 'compound': -0.0222},
 {'neg': 0.0, 'neu': 0.779, 'pos': 0.221, 'compound': 0.1779},
 {'neg': 0.18, 'neu': 0.67, 'pos': 0.149, 'compound': 0.1027}]

In [10]:
sentiment_score_df = pd.DataFrame(sentiment)
sentiment_score_df

Unnamed: 0,neg,neu,pos,compound
0,0.000,1.000,0.000,0.0000
1,0.000,0.803,0.197,0.4019
2,0.218,0.567,0.215,-0.0222
3,0.000,0.779,0.221,0.1779
4,0.180,0.670,0.149,0.1027
...,...,...,...,...
547,0.000,1.000,0.000,0.0000
548,0.000,0.636,0.364,0.7430
549,0.531,0.469,0.000,-0.7783
550,0.043,0.726,0.231,0.9548


In [11]:
data_df["vader_neg"] = sentiment_score_df["neg"]
data_df["vader_neu"] = sentiment_score_df["neu"]
data_df["vader_pos"] = sentiment_score_df["pos"]
data_df["vader_compound"] = sentiment_score_df["compound"]

In [12]:
data_df

Unnamed: 0,worker_id,sentiment,processed_text,vader_neg,vader_neu,vader_pos,vader_compound
0,XYNN2Y4VCF3G,Positive,bought 2200 ico 050 coin hold sold 3 months ag...,0.000,1.000,0.000,0.0000
1,DR6XNZMT9KRH,Positive,harmony algorand cardano solana vechain gonna ...,0.000,0.803,0.197,0.4019
2,9FCQGMYD4A42,Negative,honestly reading post responses conclude crypt...,0.218,0.567,0.215,-0.0222
3,QEZAEMV2WF9D,Positive,bear market money continue dca assets believe,0.000,0.779,0.221,0.1779
4,Z7J7W3XCP4XC,Negative,funny people think bitcoins risk comparable st...,0.180,0.670,0.149,0.1027
...,...,...,...,...,...,...,...
547,FM9FYY27VTVR,Positive,lightning network capability millions btc tran...,0.000,1.000,0.000,0.0000
548,K94W3ATCZX2V,Positive,means daily volume means daily active users me...,0.000,0.636,0.364,0.7430
549,W7APN9XYDRV6,Negative,exchanges irresponsible list tokens luna shoul...,0.531,0.469,0.000,-0.7783
550,MTCJYTYKKHZJ,Positive,massive ecosystem pretty applications shitload...,0.043,0.726,0.231,0.9548


In [14]:
data_df['processed_text_len'] = data_df["processed_text"].apply(lambda x: len(x))

In [15]:
col_order = ["worker_id", "processed_text", "processed_text_len", "vader_neg", "vader_neu", "vader_pos", "vader_compound", "sentiment"]

In [16]:
data_df = data_df[col_order]
data_df

Unnamed: 0,worker_id,processed_text,processed_text_len,vader_neg,vader_neu,vader_pos,vader_compound,sentiment
0,XYNN2Y4VCF3G,bought 2200 ico 050 coin hold sold 3 months ag...,71,0.000,1.000,0.000,0.0000,Positive
1,DR6XNZMT9KRH,harmony algorand cardano solana vechain gonna ...,80,0.000,0.803,0.197,0.4019,Positive
2,9FCQGMYD4A42,honestly reading post responses conclude crypt...,171,0.218,0.567,0.215,-0.0222,Negative
3,QEZAEMV2WF9D,bear market money continue dca assets believe,45,0.000,0.779,0.221,0.1779,Positive
4,Z7J7W3XCP4XC,funny people think bitcoins risk comparable st...,105,0.180,0.670,0.149,0.1027,Negative
...,...,...,...,...,...,...,...,...
547,FM9FYY27VTVR,lightning network capability millions btc tran...,72,0.000,1.000,0.000,0.0000,Positive
548,K94W3ATCZX2V,means daily volume means daily active users me...,80,0.000,0.636,0.364,0.7430,Positive
549,W7APN9XYDRV6,exchanges irresponsible list tokens luna shoul...,62,0.531,0.469,0.000,-0.7783,Negative
550,MTCJYTYKKHZJ,massive ecosystem pretty applications shitload...,693,0.043,0.726,0.231,0.9548,Positive


## Storing results on the disk

In [17]:
data_df.to_csv("../data/preprocessed_with_vader_score.csv", index=False)