# **Team 1**
# **Balsam Hindi**
# **Lynn Nyazika**
# **Course:** AI 574 - Natural Language Processing (FALL I, 2022)


Cryptocurrency markets are notoriously difficult to predict. Even the most experienced investors can have trouble anticipating market behavior. The volatile and complex nature of cryptocurrencies makes them notoriously difficult to model. However, we believe that deep learning models such as convolutional neural networks (CNNs) can potentially provide more accurate predictions. In our project, we will be customizing a model built on a pre-trained CNN model and Transformers to build a models that predict the sentiment of text and relates that to the  behavior of cryptocurrency markets. 

The objective of our project is to design a model that utilizes AI to predict the behavior of cryptocurrency markets, in this context specifically Bitcoin; based on the sentiment of crowds on social media, in this case specifically Twitter. Our suite of tools will be trained on a dataset of past social media data and market data. Once trained, it will be able to automatically detect market trends and make predictions accordingly. Ideally our model will have the potential to surpass previous results and provide accurate predictions of Bitcoin market behavior.

Keywords: Bitcoin, market, twitter, social media, 


# **Data Collection**


# **Bitcoin Tweets**
# **https://www.kaggle.com/datasets/kaushiksuresh147/bitcoin-tweets**
# **Generate sentiment analysis model for Bitcoin-specific Twitter colloquialism.**
# **Twitter Sentiment Dataset**
# **https://www.kaggle.com/datasets/saurabhshahane/twitter-sentiment-dataset**
# **Generate sentiment analysis model for more general Twitter colloquialism.**
# **Sarcasm on Reddit**
# **https://www.kaggle.com/datasets/danofer/sarcasm**
# **Generate sarcasm detection model for social media context.**
# **Bitcoin Historical Dataset**
# **https://www.kaggle.com/datasets/prasoonkottarathil/btcinusd**
# **Utilized for bitcoin performance by day comparison against sentiment.**

# **Required packages**

# **!pip install sklearn**
# **!pip install pandas**
# **!pip install tensorflow**
# **!pip install numpy**
# **!pip install bert**
# **!pip install bert-tensorflow**
# **!pip install ipywidget**
# **!pip install IPython**

# **Data Preprocessing**

In [None]:
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
from datetime import datetime

In [3]:
import numpy as np
import bert
from bert import run_classifier
from bert import optimization
from bert import tokenization
from sklearn.model_selection import train_test_split

In [4]:
bitcoin_df = pd.read_csv("BTC_tweets_daily_example.csv")

In [5]:
bitcoin_df.drop(bitcoin_df.loc[bitcoin_df['sent_score']==0].index, inplace=True)

In [6]:
bitcoin_df["sent_score"] = np.where(bitcoin_df["sent_score"] == -1, 0, 1)

In [7]:
bitcoin_df.head()

Unnamed: 0.1,Unnamed: 0,Date,Tweet,Screen_name,Source,Link,Sentiment,sent_score,New_Sentiment_Score,New_Sentiment_State
2,2,Fri Mar 23 00:40:35 +0000 2018,RT @tippereconomy: Another use case for #block...,hojachotopur,"[u'blockchain', u'Tipper', u'TipperEconomy']","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive'],1,0.136364,1.0
3,3,Fri Mar 23 00:40:36 +0000 2018,free coins https://t.co/DiuoePJdap,denies_distro,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive'],1,0.4,1.0
4,4,Fri Mar 23 00:40:36 +0000 2018,RT @payvxofficial: WE are happy to announce th...,aditzgraha,[],"<a href=""http://twitter.com/download/android"" ...",['positive'],1,0.468182,1.0
5,5,Fri Mar 23 00:40:36 +0000 2018,Copy successful traders automatically with Bit...,VictorS61164810,[],"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",['positive'],1,0.75,1.0
6,6,Fri Mar 23 00:40:37 +0000 2018,RT @bethereumteam: We're revealing our surpris...,ClarkKalel4,"[u'surprise', u'presents', u'crypto', u'bitcoin']","<a href=""http://twitter.com/download/android"" ...",['positive'],1,0.2,1.0


In [8]:
sentimentclean_df = bitcoin_df.dropna(subset=['Tweet'])

In [9]:
training_sentiment, testing_sentiment = train_test_split(sentimentclean_df, test_size=0.2, random_state=25)

In [10]:
posts_df = pd.read_csv("train-balanced-sarcasm.csv")

In [11]:
posts_df.head()

Unnamed: 0,label,comment,author,subreddit,score,ups,downs,date,created_utc,parent_comment
0,0,NC and NH.,Trumpbart,politics,2,-1,-1,2016-10,2016-10-16 23:55:23,"Yeah, I get that argument. At this point, I'd ..."
1,0,You do know west teams play against west teams...,Shbshb906,nba,-4,-1,-1,2016-11,2016-11-01 00:24:10,The blazers and Mavericks (The wests 5 and 6 s...
2,0,"They were underdogs earlier today, but since G...",Creepeth,nfl,3,3,0,2016-09,2016-09-22 21:45:37,They're favored to win.
3,0,"This meme isn't funny none of the ""new york ni...",icebrotha,BlackPeopleTwitter,-8,-1,-1,2016-10,2016-10-18 21:03:47,deadass don't kill my buzz
4,0,I could use one of those tools.,cush2push,MaddenUltimateTeam,6,-1,-1,2016-12,2016-12-30 17:00:13,Yep can confirm I saw the tool they use for th...


In [242]:
posts_df.describe()

Unnamed: 0,label,score,ups,downs
count,1010826.0,1010826.0,1010826.0,1010826.0
mean,0.5,6.885676,5.498885,-0.1458629
std,0.5,48.34288,41.27297,0.3529689
min,0.0,-507.0,-507.0,-1.0
25%,0.0,1.0,0.0,0.0
50%,0.5,2.0,1.0,0.0
75%,1.0,4.0,3.0,0.0
max,1.0,9070.0,5163.0,0.0


In [12]:
na_df = posts_df.dropna(subset=['comment'])

In [13]:
split_df = np.array_split(na_df, 3)

In [14]:
split_portion = split_df[0]

In [None]:
split_portion.head()

Unnamed: 0,label,score,ups,downs
count,336925.0,336925.0,336925.0,336925.0
mean,0.452448,7.740215,3.58405,-0.436907
std,0.497734,57.813158,37.911448,0.496004
min,0.0,-507.0,-507.0,-1.0
25%,0.0,1.0,-1.0,-1.0
50%,0.0,2.0,1.0,0.0
75%,1.0,5.0,2.0,0.0
max,1.0,9070.0,4609.0,0.0


In [16]:
import json
from collections import defaultdict

def format_jsonl(dataframe, targetfile):
    with open(targetfile, 'w', encoding='UTF8') as f:
        # create the csv writer
        json_list = []
        for index, row in dataframe.iterrows():
            jsonl_object = {}
            jsonl_object["text"] = row["comment"]
            other_dict = {1: 0, 0: 1}
            jsonl_object["cats"] = {"sarcasm": row["label"], "other": other_dict[row["label"]]}
            json_list.append(jsonl_object)
        json_output = {"data": json_list}
        json.dump(json_output, f, ensure_ascii=False, indent=4)

In [17]:
def format_json_sentiment(dataframe, targetfile):
    with open(targetfile, 'w', encoding='UTF8') as f:
        # create the csv writer
        json_list = []
        for index, row in dataframe.iterrows():
            jsonl_object = {}
            jsonl_object["text"] = row["Tweet"]
            other_dict = {1: 0, 0: 1}
            jsonl_object["cats"] = {"sentiment_positive": row["sent_score"], "sentiment_negative": other_dict[row["sent_score"]]}
            json_list.append(jsonl_object)
        json_output = {"data": json_list}
        json.dump(json_output, f, ensure_ascii=False, indent=4)

In [18]:
def format_json_tweets(dataframe, targetfile):
    with open(targetfile, 'w', encoding='UTF8') as f:
        # create the csv writer
        json_list = []
        for index, row in dataframe.iterrows():
            jsonl_object = {}
            jsonl_object["text"] = row["text"]
            other_dict = {1: 0, 0: 1}
            jsonl_object["cats"] = {"sentiment_positive": row["sentiment"], "sentiment_negative": other_dict[row["sentiment"]]}
            json_list.append(jsonl_object)
        json_output = {"data": json_list}
        json.dump(json_output, f, ensure_ascii=False, indent=4)

In [19]:
training_data, testing_data = train_test_split(split_portion, test_size=0.2, random_state=25)

In [20]:
format_jsonl(training_data, "assets/data.train.jsonl")

In [21]:
format_jsonl(testing_data, "assets/data.valid.jsonl")

In [22]:
format_json_sentiment(training_sentiment,  "assets/data_sentiment.train.jsonl")

In [23]:
format_json_sentiment(testing_sentiment,  "assets/data_sentiment.test.jsonl")

In [24]:
import en_textcat_sarcasm

nlp_sarcasm = en_textcat_sarcasm.load()
text_sarcasm = ["gee I sure am glad I invested in bitcoin", "elon musk will take us to the moon", "going to invest about 30 dollars into bitcoin today", "bitcoin was a mistake", "I am so happy right now", "Wish I didn't invest as much as I did", "Yeah I don't think bitcoin is doing so well right now", "Bitcoin is doing great now", "I hate everything about cryptocurrency", "I hate lasagna"]
docs = list(nlp_sarcasm.pipe(text_sarcasm))
result = []
for doc in docs:
    print(doc.text)
    print(doc.cats)

gee I sure am glad I invested in bitcoin
{'sarcasm': 0.8144199848175049, 'other': 0.1855800449848175}
elon musk will take us to the moon
{'sarcasm': 0.6246351003646851, 'other': 0.37536492943763733}
going to invest about 30 dollars into bitcoin today
{'sarcasm': 0.3419283330440521, 'other': 0.6580716371536255}
bitcoin was a mistake
{'sarcasm': 0.5626269578933716, 'other': 0.43737301230430603}
I am so happy right now
{'sarcasm': 0.61134934425354, 'other': 0.3886506259441376}
Wish I didn't invest as much as I did
{'sarcasm': 0.2973064184188843, 'other': 0.7026935815811157}
Yeah I don't think bitcoin is doing so well right now
{'sarcasm': 0.7461886405944824, 'other': 0.2538113296031952}
Bitcoin is doing great now
{'sarcasm': 0.526121973991394, 'other': 0.47387802600860596}
I hate everything about cryptocurrency
{'sarcasm': 0.4415939152240753, 'other': 0.5584060549736023}
I hate lasagna
{'sarcasm': 0.49569565057754517, 'other': 0.5043043494224548}


In [25]:
tweets_df = pd.read_csv("Tweets.csv")

In [26]:
tweets_df = tweets_df.dropna(subset=['text'])

In [27]:
tweets_df.drop(tweets_df.loc[tweets_df['sentiment']=="netural"].index, inplace=True)

In [28]:
tweets_df["sentiment"] = np.where(tweets_df["sentiment"] == "negative", 0, 1)

In [267]:
tweets_df.loc[9887]

textID                                      8a24c189a8
text             Is leaving Utah today  Super Sad Face
selected_text                           Super Sad Face
sentiment                                            0
Name: 9887, dtype: object

In [30]:
training_tweets, testing_tweets = train_test_split(tweets_df, test_size=0.2, random_state=25)

In [31]:
format_json_tweets(training_tweets, "assets/data.train.jsonl")

In [32]:
format_json_tweets(testing_tweets, "assets/data.valid.jsonl")

# **Model Fitting and Evaluation**

In [33]:
import en_textcat_sentiment
nlp_sentiment = en_textcat_sentiment.load()
text = ["gee I sure am glad I invested in bitcoin", "elon musk will take us to the moon", "going to invest about 30 dollars into bitcoin today", "bitcoin was a mistake", "I am so happy right now", "Wish I didn't invest as much as I did", "Yeah I don't think bitcoin is doing so well right now", "Bitcoin is doing great now", "I hate everything about cryptocurrency", "I hate lasagna"]
docs = list(nlp_sentiment.pipe(text))
result = []
for doc in docs:
    print(doc.text)
    print(doc.cats)

gee I sure am glad I invested in bitcoin
{'sentiment_positive': 0.4833342134952545, 'sentiment_negative': 0.5166658163070679}
elon musk will take us to the moon
{'sentiment_positive': 0.9579459428787231, 'sentiment_negative': 0.04205407202243805}
going to invest about 30 dollars into bitcoin today
{'sentiment_positive': 0.515830934047699, 'sentiment_negative': 0.484169065952301}
bitcoin was a mistake
{'sentiment_positive': 0.10607410967350006, 'sentiment_negative': 0.8939259052276611}
I am so happy right now
{'sentiment_positive': 0.7525826096534729, 'sentiment_negative': 0.2474173605442047}
Wish I didn't invest as much as I did
{'sentiment_positive': 0.6062438488006592, 'sentiment_negative': 0.39375612139701843}
Yeah I don't think bitcoin is doing so well right now
{'sentiment_positive': 0.3251758813858032, 'sentiment_negative': 0.674824059009552}
Bitcoin is doing great now
{'sentiment_positive': 0.9576895833015442, 'sentiment_negative': 0.04231038689613342}
I hate everything about cr

In [34]:
import en_textcat_tweet_sentiment
nlp_tweet_sentiment = en_textcat_tweet_sentiment.load()
text = ["gee I sure am glad I invested in bitcoin", "elon musk will take us to the moon", "going to invest about 30 dollars into bitcoin today", "bitcoin was a mistake", "I am so happy right now", "Wish I didn't invest as much as I did", "Yeah I don't think bitcoin is doing so well right now", "Bitcoin is doing great now", "I hate everything about cryptocurrency", "I hate lasagna"]
docs = list(nlp_tweet_sentiment.pipe(text))
result = []
for doc in docs:
    print(doc.text)
    print(doc.cats)

gee I sure am glad I invested in bitcoin
{'sentiment_positive': 0.9251460433006287, 'sentiment_negative': 0.07485399395227432}
elon musk will take us to the moon
{'sentiment_positive': 0.8429345488548279, 'sentiment_negative': 0.1570655107498169}
going to invest about 30 dollars into bitcoin today
{'sentiment_positive': 0.959004282951355, 'sentiment_negative': 0.040995724499225616}
bitcoin was a mistake
{'sentiment_positive': 0.4258057177066803, 'sentiment_negative': 0.5741942524909973}
I am so happy right now
{'sentiment_positive': 0.9386050701141357, 'sentiment_negative': 0.06139498949050903}
Wish I didn't invest as much as I did
{'sentiment_positive': 0.717323899269104, 'sentiment_negative': 0.2826760709285736}
Yeah I don't think bitcoin is doing so well right now
{'sentiment_positive': 0.5311000347137451, 'sentiment_negative': 0.4688999652862549}
Bitcoin is doing great now
{'sentiment_positive': 0.9507790803909302, 'sentiment_negative': 0.04922090843319893}
I hate everything about 

In [195]:
def sarc_weight(sarc_score):
    sarc_wt = 0
    if 0.5 < sarc_score < 0.65:
        sarc_wt = .05
    elif 0.65 < sarc_score < 0.8:
        sarc_wt = .10
    elif 0.8 < sarc_score < 1.0:
        sarc_wt = .15
    else:
        sarc_wt = 0
    return sarc_wt

#{'sentiment_positive': 0.4258057177066803, 'sentiment_negative': 0.5741942524909973}
def sarc_sent_adjuster(sentiment_object, sarcasm_weight):
    new_sent_positive = sentiment_object["sentiment_positive"]
    new_sent_negative = sentiment_object["sentiment_negative"]
    sarc_magnitude = sarcasm_weight
    if sentiment_object["sentiment_positive"] > 0.6:
        new_sent_positive -= sarc_magnitude
        new_sent_negative += sarc_magnitude
    elif sentiment_object["sentiment_positive"] < 0.4 and sentiment_object["sentiment_negative"] > 0.25 :
        new_sent_positive += sarc_magnitude
        new_sent_negative -= sarc_magnitude
    return {"sentiment_positive": new_sent_positive, "sentiment_negative": new_sent_negative}
    

In [218]:
def sentiment_sarcasm_weight(text):
    docs_sarc = list(nlp_sarcasm.pipe(text))
    docs_sentiment = list(nlp_sentiment.pipe(text))
    result = []
    text_dict = {}
    text_object_list = []
    for (sarcasm), (sentiment) in zip(docs_sarc, docs_sentiment):
        # print((sarcasm.cats["sarcasm"]))
        sarc_strength = sarc_weight(sarcasm.cats["sarcasm"])
        adjusted_sent_object = sarc_sent_adjuster(sentiment.cats, sarc_strength)
        text_dict[sarcasm.text] = adjusted_sent_object
        text_object_list.append(adjusted_sent_object)
    return text_object_list
sarc_sent_tweet = ["gee I sure am glad I invested in bitcoin", "elon musk will take us to the moon", "going to invest about 30 dollars into bitcoin today", "bitcoin was a mistake", "I am so happy right now", "Wish I didn't invest as much as I did", "Yeah I don't think bitcoin is doing so well right now", "Bitcoin is doing great now", "I hate everything about cryptocurrency", "I hate lasagna"]
sentiment_sarcasm_weight(sarc_sent_tweet)

[{'sentiment_positive': 0.4833342134952545,
  'sentiment_negative': 0.5166658163070679},
 {'sentiment_positive': 0.9079459428787231,
  'sentiment_negative': 0.09205407202243805},
 {'sentiment_positive': 0.515830934047699,
  'sentiment_negative': 0.484169065952301},
 {'sentiment_positive': 0.15607410967350005,
  'sentiment_negative': 0.8439259052276611},
 {'sentiment_positive': 0.7025826096534729,
  'sentiment_negative': 0.2974173605442047},
 {'sentiment_positive': 0.6062438488006592,
  'sentiment_negative': 0.39375612139701843},
 {'sentiment_positive': 0.4251758813858032,
  'sentiment_negative': 0.574824059009552},
 {'sentiment_positive': 0.9076895833015441,
  'sentiment_negative': 0.09231038689613343},
 {'sentiment_positive': 0.1056583970785141,
  'sentiment_negative': 0.8943416476249695},
 {'sentiment_positive': 0.00011662672477541491,
  'sentiment_negative': 0.9998834133148193}]

In [219]:
bitcoin_performance_df = pd.read_csv("BTC-Daily.csv")

In [220]:
bitcoin_performance_df.head()

Unnamed: 0,unix,date,symbol,open,high,low,close,Volume BTC,Volume USD
0,1646092800,2022-03-01 00:00:00,BTC/USD,43221.71,43626.49,43185.48,43185.48,49.006289,2116360.0
1,1646006400,2022-02-28 00:00:00,BTC/USD,37717.1,44256.08,37468.99,43178.98,3160.61807,136472300.0
2,1645920000,2022-02-27 00:00:00,BTC/USD,39146.66,39886.92,37015.74,37712.68,1701.817043,64180080.0
3,1645833600,2022-02-26 00:00:00,BTC/USD,39242.64,40330.99,38600.0,39146.66,912.724087,35730100.0
4,1645747200,2022-02-25 00:00:00,BTC/USD,38360.93,39727.97,38027.61,39231.64,2202.851827,86421490.0


In [221]:
def overall_performance(open_val, closed_val):
    print(open_val)
    if (closed_val > open_val):
        growth = "positive"
    elif (closed_val < open_val):
        growth = "negative"
    else:
        growth = "netural"
    return growth

In [222]:
bitcoin_performance_df['growth'] = np.where((bitcoin_performance_df["open"] > bitcoin_performance_df["close"]), "negative", "positive")

In [223]:
bitcoin_performance_df.head()

Unnamed: 0,unix,date,symbol,open,high,low,close,Volume BTC,Volume USD,growth
0,1646092800,2022-03-01 00:00:00,BTC/USD,43221.71,43626.49,43185.48,43185.48,49.006289,2116360.0,negative
1,1646006400,2022-02-28 00:00:00,BTC/USD,37717.1,44256.08,37468.99,43178.98,3160.61807,136472300.0,positive
2,1645920000,2022-02-27 00:00:00,BTC/USD,39146.66,39886.92,37015.74,37712.68,1701.817043,64180080.0,negative
3,1645833600,2022-02-26 00:00:00,BTC/USD,39242.64,40330.99,38600.0,39146.66,912.724087,35730100.0,negative
4,1645747200,2022-02-25 00:00:00,BTC/USD,38360.93,39727.97,38027.61,39231.64,2202.851827,86421490.0,positive


In [224]:
bitcoin_general_tweets_1 = pd.read_csv("file_01.csv")                            

In [225]:
bitcoin_general_tweets_1.head()

Unnamed: 0,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet
0,PI makes #Crypto mining EASY and FREE!,,,,,,,,,,,,
1,"WATCH the Vi… https://t.co/fkbs1chQwy""","['CryptoCurrency', 'Crypto']",Twitter Web App,False,,,,,,,,,
2,Mikcoin,,"Technical Analyst | Trader\n\nNo certainty, on...",2020-11-26 23:45:46,168.0,42.0,270.0,False,2021-02-14 04:47:24,#BTC #Bitcoin #Ethereum #ETH #Crypto #cryptotr...,"['BTC', 'Bitcoin', 'Ethereum', 'ETH', 'Crypto'...",Twitter Web App,False
3,fedayy,,убиениц,2020-11-04 16:16:03,1.0,9.0,5.0,False,2021-02-14 04:47:22,🤝 Follow me on @betfury_io. Let's hunt for Bit...,,Twitter Web App,False
4,Unocoin,India,Unocoin is India's first & safest #bitcoin pla...,2011-09-24 18:32:27,33492.0,554.0,4901.0,True,2021-02-14 04:47:19,"Wishing you all the lovely couples and ,Why no...",,Buffer,False


In [226]:
bitcoin_general_tweets_1['date'] = pd.to_datetime(bitcoin_general_tweets_1['date']).dt.normalize()

In [227]:
bitcoin_general_tweets_1.head()

Unnamed: 0,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet
0,PI makes #Crypto mining EASY and FREE!,,,,,,,,NaT,,,,
1,"WATCH the Vi… https://t.co/fkbs1chQwy""","['CryptoCurrency', 'Crypto']",Twitter Web App,False,,,,,NaT,,,,
2,Mikcoin,,"Technical Analyst | Trader\n\nNo certainty, on...",2020-11-26 23:45:46,168.0,42.0,270.0,False,2021-02-14,#BTC #Bitcoin #Ethereum #ETH #Crypto #cryptotr...,"['BTC', 'Bitcoin', 'Ethereum', 'ETH', 'Crypto'...",Twitter Web App,False
3,fedayy,,убиениц,2020-11-04 16:16:03,1.0,9.0,5.0,False,2021-02-14,🤝 Follow me on @betfury_io. Let's hunt for Bit...,,Twitter Web App,False
4,Unocoin,India,Unocoin is India's first & safest #bitcoin pla...,2011-09-24 18:32:27,33492.0,554.0,4901.0,True,2021-02-14,"Wishing you all the lovely couples and ,Why no...",,Buffer,False


In [228]:
bitcoin_performance_df['date'] = pd.to_datetime(bitcoin_performance_df['date']).dt.normalize()

In [229]:
bitcoin_performance_df.head()

Unnamed: 0,unix,date,symbol,open,high,low,close,Volume BTC,Volume USD,growth
0,1646092800,2022-03-01,BTC/USD,43221.71,43626.49,43185.48,43185.48,49.006289,2116360.0,negative
1,1646006400,2022-02-28,BTC/USD,37717.1,44256.08,37468.99,43178.98,3160.61807,136472300.0,positive
2,1645920000,2022-02-27,BTC/USD,39146.66,39886.92,37015.74,37712.68,1701.817043,64180080.0,negative
3,1645833600,2022-02-26,BTC/USD,39242.64,40330.99,38600.0,39146.66,912.724087,35730100.0,negative
4,1645747200,2022-02-25,BTC/USD,38360.93,39727.97,38027.61,39231.64,2202.851827,86421490.0,positive


In [230]:
bitcoin_general_tweets_1 = bitcoin_general_tweets_1.sort_values(by='date',ascending=False)

In [231]:
bitcoin_general_tweets_1.head()

Unnamed: 0,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet
20329,ElastOS Insights,web 3.0,A genuine believer in ELASTOS the team and the...,2017-08-23 16:55:06,1240.0,1033.0,32028.0,False,2021-04-12,@nikichain @ElastosInfo #ELA \nResearch this o...,"['ELA', 'btc']",Twitter for Android,False
20081,U.N.C.L.E B.O.B,"City of London, London",WEB: https://t.co/5yGnGf0iEr YOUTUBE: https://...,2013-03-22 10:15:02,21106.0,1949.0,6640.0,False,2021-04-12,💥 The #Flippening of 2021 will be $BNB overtak...,['Flippening'],Twitter for iPhone,False
20170,EmilyNews,,Fair and Fast updates about HYIPs from https:/...,2017-01-16 10:58:53,436.0,1.0,6.0,False,2021-04-12,TODAY PAYING HYIPS - 12/04/2021 ON EN MONITOR!...,"['EmilyNews', 'invest', 'HYIPs', 'bitcoin', 'c...",IFTTT,False
20169,Expeditors,South Africa,We Expedite Fast..\n\n#PoliceClearanceCertific...,2020-03-03 07:37:02,1002.0,1449.0,7448.0,False,2021-04-12,Earn Bitcoin browsing with CryptoTab.\n\n*Earn...,,"Cheap Bots, Done Quick!",False
20168,Crypto Kiwi,Milky Way,"Crypto, DeFi, FinTech & Blockchain Enthusiast....",2020-11-10 12:07:47,81.0,532.0,1669.0,False,2021-04-12,"$BTC bears tryna fork around with you, don´t b...",,Twitter Web App,False


In [232]:
bitcoin_filtered = bitcoin_general_tweets_1.groupby(['date']).apply(lambda x: x.nlargest(20,['user_followers'])).reset_index(drop=True)

In [233]:
bitcoin_filtered.head()

Unnamed: 0,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet
0,CNBC-TV18,,Follow business news with India's Leading Busi...,2012-07-10 08:32:06,873938.0,207.0,1108.0,True,2021-02-13,“A currency is never supposed to be more volat...,['Bitcoin'],SocialPilot.co,False
1,Bitcoin News,World Wide,Official Twitter account for https://t.co/Y3Sm...,2015-07-09 08:58:22,732944.0,935.0,10116.0,True,2021-02-13,"""US cardholders of the Bitpay Prepaid Masterca...",,Zapier.com,False
2,Bitcoin News,World Wide,Official Twitter account for https://t.co/Y3Sm...,2015-07-09 08:58:22,732940.0,935.0,10116.0,True,2021-02-13,“As mayor of NYC ... I could invest in making ...,"['BTC', 'cryptocurrencies']",Zapier.com,False
3,Cointelegraph,"New York, New York","The leader in Bitcoin, Ethereum & blockchain n...",2013-11-21 12:57:08,637292.0,564.0,4923.0,False,2021-02-13,#Bitcoin price turned bearish after 2017 launc...,"['Bitcoin', 'CME', 'BTC', 'Ether']",dlvr.it,False
4,Cointelegraph,"New York, New York","The leader in Bitcoin, Ethereum & blockchain n...",2013-11-21 12:57:08,637289.0,564.0,4923.0,False,2021-02-13,The amount of #Bitcoin held by companies surpa...,"['Bitcoin', 'Tesla']",dlvr.it,False


In [234]:
date_array = bitcoin_filtered["date"].unique()

In [235]:
date_df = bitcoin_filtered[(bitcoin_filtered['date'] == "2021-02-13")]

In [236]:
from statistics import mean
def create_average_sentiment(sent_object_array):
    sentiment_positive_list = [li['sentiment_positive'] for li in sent_object_array]
    sentiment_negative_list = [li['sentiment_negative'] for li in sent_object_array]
    pos_mean = mean(sentiment_positive_list)
    neg_mean = mean(sentiment_negative_list)
    return {"average_sentiment_positive": pos_mean, "average_sentiment_negative": neg_mean}

In [237]:
date_text_dict = {}
date_array = bitcoin_filtered["date"].unique()
for date in date_array:
    text_list = []
    date_df = bitcoin_filtered[(bitcoin_filtered['date'] == date)]
    for index, row in date_df.iterrows():
        text_list.append(row["text"])
    sentiment_list = sentiment_sarcasm_weight(text_list)
    sentiment_average = create_average_sentiment(sentiment_list)
    date_text_dict[date] = sentiment_average

In [238]:
print(date_text_dict)

{numpy.datetime64('2021-02-13T00:00:00.000000000'): {'average_sentiment_positive': 0.6900296741724015, 'average_sentiment_negative': 0.3099703282781411}, numpy.datetime64('2021-02-14T00:00:00.000000000'): {'average_sentiment_positive': 0.6375640403912984, 'average_sentiment_negative': 0.36243595905136317}, numpy.datetime64('2021-02-18T00:00:00.000000000'): {'average_sentiment_positive': 0.8128245558589697, 'average_sentiment_negative': 0.1871754493273329}, numpy.datetime64('2021-02-19T00:00:00.000000000'): {'average_sentiment_positive': 0.7429431289434433, 'average_sentiment_negative': 0.2570568687049672}, numpy.datetime64('2021-02-22T00:00:00.000000000'): {'average_sentiment_positive': 0.6030407606996596, 'average_sentiment_negative': 0.3969592321757227}, numpy.datetime64('2021-02-28T00:00:00.000000000'): {'average_sentiment_positive': 0.7276850666478276, 'average_sentiment_negative': 0.27231492795050144}, numpy.datetime64('2021-03-11T00:00:00.000000000'): {'average_sentiment_positive

In [239]:
bitcoin_performance_df["growth"].loc[bitcoin_performance_df['date'] == "2021-04-10"]

325    positive
Name: growth, dtype: object

In [240]:
from IPython.display import display
from ipywidgets import Dropdown

def dropdown_eventhandler(change):
    bitcoin_performance = bitcoin_performance_df["growth"].loc[bitcoin_performance_df['date'] == change.new].iloc[0]
    print(f"Bitcoin Performance on {change.new} was {bitcoin_performance}")
    print(f"The general sentiment distribution for tweets posted on this date was {date_text_dict[change.new]}")

option_list = date_array
dropdown = Dropdown(description="Date:", options=option_list)
dropdown.observe(dropdown_eventhandler, names='value')
display(dropdown)


Dropdown(description='Date:', options=(numpy.datetime64('2021-02-13T00:00:00.000000000'), numpy.datetime64('20…

In [None]:
# Model Evaluation

In [274]:
print("Sarcasm Model:")
with open('training/sarcasm_detection/metrics.json', 'r') as f:
    print(f.read())

Sarcasm Model:
{
  "token_acc":1.0,
  "token_p":1.0,
  "token_r":1.0,
  "token_f":1.0,
  "cats_score":0.7055487223,
  "cats_score_desc":"macro F",
  "cats_micro_p":0.7106863042,
  "cats_micro_r":0.7106863042,
  "cats_micro_f":0.7106863042,
  "cats_macro_p":0.7088121331,
  "cats_macro_r":0.7044947799,
  "cats_macro_f":0.7055487223,
  "cats_macro_auc":0.7743222427,
  "cats_f_per_type":{
    "sarcasm":{
      "p":0.6978444958,
      "r":0.6381331175,
      "f":0.6666544097
    },
    "other":{
      "p":0.7197797705,
      "r":0.7708564423,
      "f":0.7444430349
    }
  },
  "cats_auc_per_type":{
    "sarcasm":0.7743222442,
    "other":0.7743222411
  },
  "speed":45369.5310427387
}


In [276]:
print("General Sentimeent Tweet:")
with open('training/data_tweet_sentiment/metrics.json', 'r') as f:
    print(f.read())

General Sentimeent Tweet:
{
  "token_acc":1.0,
  "token_p":1.0,
  "token_r":1.0,
  "token_f":1.0,
  "cats_score":0.7762814183,
  "cats_score_desc":"macro F",
  "cats_micro_p":0.827510917,
  "cats_micro_r":0.827510917,
  "cats_micro_f":0.827510917,
  "cats_macro_p":0.7862927092,
  "cats_macro_r":0.7682413337,
  "cats_macro_f":0.7762814183,
  "cats_macro_auc":0.8687412126,
  "cats_f_per_type":{
    "sentiment_positive":{
      "p":0.8679564692,
      "r":0.8992733651,
      "f":0.8833374354
    },
    "sentiment_negative":{
      "p":0.7046289493,
      "r":0.6372093023,
      "f":0.6692254013
    }
  },
  "cats_auc_per_type":{
    "sentiment_positive":0.8687412126,
    "sentiment_negative":0.8687412126
  },
  "speed":70055.4734588945
}


In [278]:
print("Bitcoin Sentimeent Tweet:")
with open('training/data_sentiment_bitcoin/metrics_sentiment.json', 'r') as f:
    print(f.read())

Bitcoin Sentimeent Tweet:
{
  "token_acc":1.0,
  "token_p":1.0,
  "token_r":1.0,
  "token_f":1.0,
  "cats_score":0.8861686054,
  "cats_score_desc":"macro F",
  "cats_micro_p":0.9270558749,
  "cats_micro_r":0.9270558749,
  "cats_micro_f":0.9270558749,
  "cats_macro_p":0.9007206997,
  "cats_macro_r":0.873543785,
  "cats_macro_f":0.8861686054,
  "cats_macro_auc":0.9568656397,
  "cats_f_per_type":{
    "sentiment_positive":{
      "p":0.9433287042,
      "r":0.9657152231,
      "f":0.9543907052
    },
    "sentiment_negative":{
      "p":0.8581126952,
      "r":0.781372347,
      "f":0.8179465056
    }
  },
  "cats_auc_per_type":{
    "sentiment_positive":0.9568656482,
    "sentiment_negative":0.9568656313
  },
  "speed":75116.048971079
}


# **Improvements**
# **We can improve on this by doing a more hour-by-hour or day-by-day analaysis.** 
# **We also can focus on creating a more objective analysis for our model performance**
# **as well as refine the sarcasm modifier.**

# **Credits**
# **This code is based on https://towardsdatascience.com/sarcasm-text-classification-using-spacy-in-python-7cd39074f32e**