# Sentiment Analysis Models

In this notebook, our goal is to analyze Twitter's tweets about bitcoin.
We want to see if the general sentiment about bitcoin (by searching tweets with the ticker) reflects the trend of the corrency price (positive sentiments are followed by ascending price and negative sentiments followed by descending price).




In order to check the above, we compare 3 models for text sentiment analysis and discuss the results later in this notebook:
1.   **VADER** Sentiment Analysis (https://github.com/cjhutto/vaderSentiment)
> VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.<br><br>
About the Scoring:<br>
The sentiment property returns a an object of the following form:
```
{'pos': 0.303, 'compound': 0.3832, 'neu': 0.697, 'neg': 0.0}
```
The **compound** score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.<br>
NOTE: The compound score is the one most commonly used for sentiment analysis by most researchers, including the authors.

2.   **TextBlob** (https://textblob.readthedocs.io/en/dev)
> TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, **sentiment analysis**, classification, translation, and more.<br><br>
The sentiment property returns a namedtuple of the form Sentiment(polarity, subjectivity). The **polarity** score is a float within the range [-1.0, 1.0], and this is the value we extract from each tweet. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

3.   The **BERT** language model (learnt in class and implemented on exercise 4)
> BERT, stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

---

Run on colab:

In [1]:
!pip install Cython
!pip install whatthelang
!pip install tensorflow
!pip3 install nest_asyncio
!pip install tweet-preprocessor
!pip install twint
!pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
!pip install --upgrade ipykernel # important
!pip install chart-studio # visualization
!pip install wordcloud
!pip install yfinance
!pip install vaderSentiment
!pip install transformers 
!pip install plotly==5.8
!pip install textblob seaborn nltk
!pip install pyyaml==5.4.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting whatthelang
  Downloading whatthelang-1.0.1.tar.gz (786 kB)
[K     |████████████████████████████████| 786 kB 7.3 MB/s 
Collecting cysignals
  Downloading cysignals-1.11.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (871 kB)
[K     |████████████████████████████████| 871 kB 62.4 MB/s 
[?25hCollecting pyfasttext
  Downloading pyfasttext-0.4.6.tar.gz (244 kB)
[K     |████████████████████████████████| 244 kB 87.9 MB/s 
Building wheels for collected packages: whatthelang, pyfasttext
  Building wheel for whatthelang (setup.py) ... [?25l[?25hdone
  Created wheel for whatthelang: filename=whatthelang-1.0.1-py3-none-any.whl size=789818 sha256=a48d6ac5b64c926db89cbaecc1e1032611130d5d9268bd9935cd0ab1ba0d279f
  Stored in directory: /root/.cache/pip/wheels/91/5b/fe/43b4b1eb6

---

Use GPU for faster runtime

In [1]:
%tensorflow_version 2.x
import tensorflow as tf

device_name = tf.test.gpu_device_name()

if device_name != '/device:GPU:0':
    raise SystemError('GPU device not found')

print('Found GPU at: {}'.format(device_name))

UsageError: Line magic function `%tensorflow_version` not found.


In [2]:

import pandas as pd
import numpy as np
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import seaborn as sns


import warnings
warnings.filterwarnings("ignore")
sns.set(font_scale=1.5)
sns.set_style("whitegrid")
import plotly.graph_objects as go
import plotly.express as px

---

In [3]:
EPSILON = 0.1
TICKER = "BTC"
SEARCH_WORD = f"#{TICKER}"

DATE_SINCE = "2019-05-25"
DATE_UNTIL = "2019-06-24" 
file_name = "BTC_2019-05-25_2019-06-25__tweets_data_from_file_after_preprocessing.csv"

---

In [4]:
tweet_data = pd.read_csv("./datasets/" + file_name)

In [14]:
tweet_data

Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,index,Unnamed: 0,date,tweet,username,link,retweets,text
0,246275,333135,354812,7242079,2019-05-29 14:55:00+00,"$btg / $btc: +9% value, +267% volume at #Binan...",cryptocoinradar,,1.0,"$btg / $btc: +9% value, +267% volume at #Binan..."
1,986744,1292458,1476476,8374478,2019-06-14 18:24:23+00,Cryptocurrency | Cryptocurrency Jewelry | Mens...,ArtHarmony_shop,,0.0,Cryptocurrency | Cryptocurrency Jewelry | Mens...
2,1135815,1494023,1820000,8915720,2019-06-17 23:49:06+00,@GainsPainsCapit @DavidBCollum @mark_dow The s...,paranoidbull,,0.0,@GainsPainsCapit @DavidBCollum @mark_dow The s...
3,714433,938030,1004903,7900185,2019-06-08 03:01:57+00,"Bitcoin price analysis: 8 june, bitcoin is bul...",ClubInvestlife,,0.0,"Bitcoin price analysis: 8 june, bitcoin is bul..."
4,575337,757193,809732,7703075,2019-06-04 22:54:29+00,@DRomATX Always better to get a bargain price ...,ivanba12,,0.0,@DRomATX Always better to get a bargain price ...
...,...,...,...,...,...,...,...,...,...,...
39995,445242,589370,630044,7520827,2019-06-02 08:37:00+00,⏰ LIQUIDATION on BTC-PERPETUAL ☠️️\n\n Bought ...,DeribitRekt,,0.0,LIQUIDATION on BTC-PERPETUAL Bought $130 of #B...
39996,940300,1232112,1411308,8308969,2019-06-13 17:36:13+00,@Lihams22 @BHoarder1 @CalvinAyre @oudekaas3 bt...,missourapete,,0.0,@Lihams22 @BHoarder1 @CalvinAyre @oudekaas3 bt...
39997,1125309,1480630,1686351,8782003,2019-06-17 19:25:06+00,"Sr Platform Engineer - Veear ( San Jose, Unite...",WorkInRobotics,,0.0,"Sr Platform Engineer - Veear ( San Jose, Unite..."
39998,319225,426415,455123,7343696,2019-05-30 18:26:29+00,Be part of the future App Store https://t.co/D...,Kupi83921604,,0.0,Be part of the future App Store @dapp_com Join...


## Vender

In [15]:
analyzer = SentimentIntensityAnalyzer()

In [16]:
#Adding key words from Reddit\Twitter - slang.
new_words = {
    'rocket': 1.0,
    'banana': 1.0,
    'full moon': 1.0,
    'waxing gibbous moon': 1.0,
    'crescent moon': 1.0,
    'to the moon': 1.0,
    'stonk': 1.0,
    'gorila': 1.0,
    'gang': 1.0,
    'bitcoin': 1.0,
    'gme': 1.0,
    'hedge fund': -1.0,
    'crypto': 1.0,
    'Squeeze': 1.0,
    'ApeStrongTogether': 1.0,
    'Apes': 1.0,
    'Ape': 1.0,
    'repos': 1.0,
    'DarkPoolAbuse ': -1.0,
    'dark pool ': -1.0,
    'dark MOASS ': 1.0,
}
analyzer.lexicon.update(new_words)

In [17]:
%%time
tweet_data['vader'] = tweet_data['tweet'].apply(lambda x: analyzer.polarity_scores(x)['compound']) # https://github.com/cjhutto/vaderSentiment


CPU times: user 8.5 s, sys: 79.9 ms, total: 8.58 s
Wall time: 9.63 s


## Textblob

In [18]:
%%time
tweet_data['textblob'] = tweet_data['text'].apply(lambda x: TextBlob(x).polarity)

CPU times: user 15.2 s, sys: 105 ms, total: 15.3 s
Wall time: 16.3 s


In [19]:
tweet_data

Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,index,Unnamed: 0,date,tweet,username,link,retweets,text,vader,textblob
0,246275,333135,354812,7242079,2019-05-29 14:55:00+00,"$btg / $btc: +9% value, +267% volume at #Binan...",cryptocoinradar,,1.0,"$btg / $btc: +9% value, +267% volume at #Binan...",0.3400,0.000000
1,986744,1292458,1476476,8374478,2019-06-14 18:24:23+00,Cryptocurrency | Cryptocurrency Jewelry | Mens...,ArtHarmony_shop,,0.0,Cryptocurrency | Cryptocurrency Jewelry | Mens...,0.0000,0.000000
2,1135815,1494023,1820000,8915720,2019-06-17 23:49:06+00,@GainsPainsCapit @DavidBCollum @mark_dow The s...,paranoidbull,,0.0,@GainsPainsCapit @DavidBCollum @mark_dow The s...,-0.1779,0.025000
3,714433,938030,1004903,7900185,2019-06-08 03:01:57+00,"Bitcoin price analysis: 8 june, bitcoin is bul...",ClubInvestlife,,0.0,"Bitcoin price analysis: 8 june, bitcoin is bul...",0.0000,0.000000
4,575337,757193,809732,7703075,2019-06-04 22:54:29+00,@DRomATX Always better to get a bargain price ...,ivanba12,,0.0,@DRomATX Always better to get a bargain price ...,0.6693,0.175000
...,...,...,...,...,...,...,...,...,...,...,...,...
39995,445242,589370,630044,7520827,2019-06-02 08:37:00+00,⏰ LIQUIDATION on BTC-PERPETUAL ☠️️\n\n Bought ...,DeribitRekt,,0.0,LIQUIDATION on BTC-PERPETUAL Bought $130 of #B...,-0.3400,0.000000
39996,940300,1232112,1411308,8308969,2019-06-13 17:36:13+00,@Lihams22 @BHoarder1 @CalvinAyre @oudekaas3 bt...,missourapete,,0.0,@Lihams22 @BHoarder1 @CalvinAyre @oudekaas3 bt...,0.0000,0.000000
39997,1125309,1480630,1686351,8782003,2019-06-17 19:25:06+00,"Sr Platform Engineer - Veear ( San Jose, Unite...",WorkInRobotics,,0.0,"Sr Platform Engineer - Veear ( San Jose, Unite...",0.5859,0.000000
39998,319225,426415,455123,7343696,2019-05-30 18:26:29+00,Be part of the future App Store https://t.co/D...,Kupi83921604,,0.0,Be part of the future App Store @dapp_com Join...,0.8074,0.133333


## BERT

In [None]:
from transformers import BertTokenizer, TFBertForSequenceClassification
from transformers import InputExample, InputFeatures

# Create a new model instance
bert_model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")


tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [20]:
# Restore the weights
# bert_model.load_weights('./models/model_checkpoints/bert_model')
bert_model.load_weights('./models/model_checkpoints/bert_model')

2022-07-29 10:06:03.762107: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [20]:
%%time
import tensorflow as tf
def Bert_v(x):
    tf_batch = tokenizer(x, max_length=128, padding=True, truncation=True, return_tensors='tf')
    tf_outputs = bert_model(tf_batch)
    tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
    labels = [-1,1]
    label = tf.argmax(tf_predictions, axis=1)
    label = label.numpy()
    return labels[label[0]]
tweet_data['bert'] = np.vectorize(Bert_v)(tweet_data['text'])

2022-07-29 10:06:03.762107: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [26]:
tweet_data.head()

Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,index,Unnamed: 0,date,tweet,username,link,retweets,text,vader,textblob,bert
0,246275,333135,354812,7242079,2019-05-29 14:55:00+00,"$btg / $btc: +9% value, +267% volume at #Binan...",cryptocoinradar,,1.0,"$btg / $btc: +9% value, +267% volume at #Binan...",0.34,0.0,-1
1,986744,1292458,1476476,8374478,2019-06-14 18:24:23+00,Cryptocurrency | Cryptocurrency Jewelry | Mens...,ArtHarmony_shop,,0.0,Cryptocurrency | Cryptocurrency Jewelry | Mens...,0.0,0.0,-1
2,1135815,1494023,1820000,8915720,2019-06-17 23:49:06+00,@GainsPainsCapit @DavidBCollum @mark_dow The s...,paranoidbull,,0.0,@GainsPainsCapit @DavidBCollum @mark_dow The s...,-0.1779,0.025,-1
3,714433,938030,1004903,7900185,2019-06-08 03:01:57+00,"Bitcoin price analysis: 8 june, bitcoin is bul...",ClubInvestlife,,0.0,"Bitcoin price analysis: 8 june, bitcoin is bul...",0.0,0.0,-1
4,575337,757193,809732,7703075,2019-06-04 22:54:29+00,@DRomATX Always better to get a bargain price ...,ivanba12,,0.0,@DRomATX Always better to get a bargain price ...,0.6693,0.175,1


In [27]:
tweets_sentiment_df = tweet_data[["date", "text", "textblob", "vader", "bert"]]
tweets_sentiment_df

Unnamed: 0,date,text,textblob,vader,bert
0,2019-05-29 14:55:00+00,"$btg / $btc: +9% value, +267% volume at #Binan...",0.000000,0.3400,-1
1,2019-06-14 18:24:23+00,Cryptocurrency | Cryptocurrency Jewelry | Mens...,0.000000,0.0000,-1
2,2019-06-17 23:49:06+00,@GainsPainsCapit @DavidBCollum @mark_dow The s...,0.025000,-0.1779,-1
3,2019-06-08 03:01:57+00,"Bitcoin price analysis: 8 june, bitcoin is bul...",0.000000,0.0000,-1
4,2019-06-04 22:54:29+00,@DRomATX Always better to get a bargain price ...,0.175000,0.6693,1
...,...,...,...,...,...
39995,2019-06-02 08:37:00+00,LIQUIDATION on BTC-PERPETUAL Bought $130 of #B...,0.000000,-0.3400,1
39996,2019-06-13 17:36:13+00,@Lihams22 @BHoarder1 @CalvinAyre @oudekaas3 bt...,0.000000,0.0000,1
39997,2019-06-17 19:25:06+00,"Sr Platform Engineer - Veear ( San Jose, Unite...",0.000000,0.5859,-1
39998,2019-05-30 18:26:29+00,Be part of the future App Store @dapp_com Join...,0.133333,0.8074,-1


In [29]:
tweets_sentiment_df.to_csv("./datasets/senitiments_scores.csv")