# Researching Libraries
I used [this medium article](https://medium.com/@nile.bits/best-python-sentiment-analysis-libraries-unleashing-the-power-of-text-analysis-ad13c272e5d4) for picking my two libraries: _Vader_ (from NLTK) (due to standard usage) and _Transformers_ from HuggingFace (due to sophistication).

# VADER (NLTK)

I used [this article](https://medium.com/@skillcate/sentiment-analysis-using-nltk-vader-98f67f2e6130) for this.

VADER scores the following way based on the compound rating:
1. -1.00 to -0.05 (Negative)
2. -0.05 to 0.05 (Neutral)
3. 0.05 to 1.00 (Positive)

In [None]:
# Library Imports
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [None]:
# Get sentiment labels
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [None]:
# Demonstration
text = "Layers' cheesecake is so yummy"
SentimentIntensityAnalyzer().polarity_scores(text)['compound']

0.656

# Transformers (Huggingface)
The [website](https://huggingface.co/blog/sentiment-analysis-python) had enough documentation to get started.

In [None]:
# Library Imports
from transformers import pipeline

In [None]:
# Pipiline creation. This handles any new data for us such as user input.
# Default model is DistilBERT base uncased finetuned SST-2. {Positive, Negative} only.
# I am using distilbert-base-uncased-emotion since it shows more emotions.
transformer_pipe = pipeline("text-classification", model="bhadresh-savani/distilbert-base-uncased-emotion")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/768 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/291 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


In [None]:
data = ["I love you", "I hate you", "I think Ahsan needs to buy me food!"]
transformer_pipe(data)

[{'label': 'love', 'score': 0.9608993530273438},
 {'label': 'anger', 'score': 0.826848566532135},
 {'label': 'anger', 'score': 0.5047048926353455}]

In [None]:
# Single text
single_data = "@apple your phones suck"

transformer_pipe(single_data)

[{'label': 'anger', 'score': 0.9753281474113464}]

In [None]:
# It works on noisy data as well.
data = ["#T34His is a b@ad review :("]
transformer_pipe(data)

[{'label': 'anger', 'score': 0.4474793076515198}]