# Plotting the price of assets against Elon Musk and Tesla Twitter activity 

## Why?

Over the last few months my news feed has been a constant wash of articles about Blockchain, stock markets (GME) and Elon Musk. The first and the last of these topics have been newsworthy regularly over the last few years, meanwhile the GME shortsqueeze appeared overnight.

There's no doubt that these topics are interconnected. Homebound US citizens with stimulus checks and too much time seem to be the main participants in these three hot topics.

Whilst this has been going on I have found myself increasingly interested in data science, more specifically data visualisation, I was introduced to the field by a colleague and sites like ChartR (https://www.chartr.co/) have gone a long way in furthering that interest. Stories told through a simple vis are compelling. A picture (or chart) speak a thousand words.

To attempt to get into the field I have been taking an extended course on data science provided by IBM. I found this course good, but lacking applied projects..so here goes an attempt at applying some of what I have known.


## Why this post?

This isn't really written to garner an audience - more for me to catalogue my progress over what I hope will be a long while.

## The project

The plan was to look at the Bitcoin, Dogecoin and GME prices along side Twitter activity to see if I can see any trends. Whilst creating this notebook however I thought it could be much more useful if you could choose your asset and the individual and look at the two together. Therefore the intention is to produce graphs of the asset prices with some form of overlay related to the tweets, with user specified inputs for:

- the asset ticker
- twitter handle
- number of tweets
- date range for asset prices.

Again whilst creating this I thought about the sentiment of the tweet, i.e. was it positive or negative? Once I knew this I could somehow represent that on the graph. My original plan had been to place the actual tweet text on the graph but this became very messy very quickly. Scroll right to the end for the graphs and what I ended up with..

It should be noted this is not a rigerous quantitative approach - more a high level 'lets see what I can achieve in a few days and check what I've learned' approach. If you're after the former then have a look at this: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3778844

In [59]:
#To start import pandas to enable easier data manipulation, Plotly for eventual visualisation, yfinance to obtain our data

import pandas as pd

import plotly.express as px

import yfinance as yf

import datetime as dt

! pip install textblob

from textblob import TextBlob





In [60]:
# Download the required data from yahoo finance using the appropriate tickers 
# Using start Dec 2020 to current day
# Take daily prices on the assumption tweets are around that frequent



#Start by creating variables for the start date (user input) and the end date being today's date - to do that the datetime package is useful

today_date = dt.date.today()

today_date = today_date.strftime("%Y-%m-%d")

start_date = "2020-11-01"


#BTC
BTC_Data = yf.download(tickers='BTC-USD', start = start_date, end = today_date, interval = '1d')

#It's worth noting not reseting the index caused a major headache as the 'date' column header was misaligned without it.
BTC_Data.reset_index(inplace=True)


BTC_Data_df = pd.DataFrame(BTC_Data)

BTC_Data_df['Date'] = pd.to_datetime(BTC_Data_df['Date']).dt.date


BTC_Data_df.tail(5)


[*********************100%***********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
139,2021-03-20,58332.261719,60031.285156,58213.296875,58313.644531,58313.644531,50361731222
140,2021-03-21,58309.914062,58767.898438,56005.617188,57523.421875,57523.421875,51943414539
141,2021-03-22,57517.890625,58471.480469,54288.15625,54529.144531,54529.144531,56521454974
142,2021-03-23,54511.660156,55985.441406,53470.695312,54738.945312,54738.945312,56435023914
143,2021-03-24,54663.683594,57255.886719,52598.59375,52994.324219,52994.324219,62918037504


### Quick note on obtaining tweets

How would we obtain the tweet information? My initial thought process was:

- Scrape the tweets using the Twitter API 
- Create a table of the tweets with datestamp and text
- Filter by keyword e.g. BTC, Bitcoin etc.
- Plot the tweets using date stamp over the price graph 

this would likely involve appending prices to the table so that we can plot an effective 'x and y' coordinate of date and price and then use the text as a data label - that's the theory at least...

In scraping the tweets I came across Tweepy, which is a convinient pre-made package that enables you to scrape tweets without too much prior knowledge. I ended up using it and an article from Martin Beck: https://towardsdatascience.com/how-to-scrape-tweets-from-twitter-59287e20f0f1 to achieve what I wanted. I won't claim I spent too much time examining what was going on in the next cell but I cared more that it worked, and really that's the beauty of the ds community - open knowledge sharing. It's what makes it approachable to a complete novice like myself. The cell below is far better explained by the article above so I won't go into much detail.



In [61]:
# Import what is required
import tweepy
import time

# Credentials to enable access to the Twitter API

consumer_key = "qSlXdQBHiP3bd45cCOs9Sq0FO"
consumer_secret = "I3W8CBFDkhgwCFp6VExQj3FhTHEkcuMXMAlgrj88MS7WaEQxP3"
access_token = "604056369-pGIpZHnepCzzs4yXdF9clWV10UxLy1mX7kEhW4ir"
access_token_secret = "0pHtZQgE2QHc4tP9JyU5VCKuTjjHdh38Qxzpz8MUlWZex"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

#Here the code defines a function that we can call later on to get the data we need

tweets = []

def username_tweets_to_csv(username,count):
    try:      
        # Creation of query method using parameters
        tweets = tweepy.Cursor(api.user_timeline,id=username).items(count)

        # Pulling information from tweets iterable object
        tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets]

        # Creation of dataframe from tweets list
        # Add or remove columns as you remove tweet information
        tweets_df = pd.DataFrame(tweets_list,columns=['Datetime', 'Tweet Id', 'Text'])

        # Converting dataframe to CSV 
        tweets_df.to_csv('{}-tweets.csv'.format(username), sep=',', index = False)

    except BaseException as e:
          print('failed on_status,',str(e))
          time.sleep(3)
        
# Input username to scrape tweets and name csv file
# Max recent tweets pulls x amount of most recent tweets from that user
username = 'Jack'
count = 250

# Calling function to turn username's past X amount of tweets into a CSV file
username_tweets_to_csv(username, count)

In [62]:
#import the created CSV files and create a pandas data frame

csv_path = '/Users/jamesdawson/Desktop/Data analysis /Python/Jack-tweets.csv'

df = pd.read_csv(csv_path)

df.head(5)

Unnamed: 0,Datetime,Tweet Id,Text
0,2021-03-22 18:53:23,1374071729467707394,"Sent to @GiveDirectly Africa fund 🌍Thank you, ..."
1,2021-03-21 03:57:26,1373483866095243267,RT @moneyball: The original goal of 3-4 engine...
2,2021-03-20 23:03:44,1373409955823415300,RT @sqcrypto: We turned 2 today. Jack’s our dad.
3,2021-03-20 14:07:46,1373275072392278018,50/50
4,2021-03-18 21:04:22,1372655138981224450,@EnesKanter @Twitter 🙏🏼


In [63]:
#Filter out anything that we don't feel is relevant from the tweets
#We need to create three data frames here so we can plot each respective one against a different plot of price

df['Datetime'] = pd.to_datetime(df['Datetime']).dt.date


In [64]:
df['sentiment'] = df['Text'].apply(lambda tweet: TextBlob(tweet).sentiment)


df

Unnamed: 0,Datetime,Tweet Id,Text,sentiment
0,2021-03-22,1374071729467707394,"Sent to @GiveDirectly Africa fund 🌍Thank you, ...","(0.0, 0.0)"
1,2021-03-21,1373483866095243267,RT @moneyball: The original goal of 3-4 engine...,"(0.375, 0.75)"
2,2021-03-20,1373409955823415300,RT @sqcrypto: We turned 2 today. Jack’s our dad.,"(0.0, 0.0)"
3,2021-03-20,1373275072392278018,50/50,"(0.0, 0.0)"
4,2021-03-18,1372655138981224450,@EnesKanter @Twitter 🙏🏼,"(0.0, 0.0)"
...,...,...,...,...
245,2020-11-17,1328721286474788865,Thank you members of the Judiciary Committee f...,"(0.0, 0.0)"
246,2020-11-16,1328426021041754112,"Welcome, Mudge! https://t.co/hl9HiRjGtg","(1.0, 0.9)"
247,2020-11-15,1327770733401894913,No. https://t.co/X6EWJ73sRx,"(0.0, 0.0)"
248,2020-11-14,1327723833080438784,RT @jerrybrito: 1/ Patent trolls are a massive...,"(-0.06666666666666667, 0.6333333333333333)"


In [65]:
#append the price to the line based on datetime - How easy this would be if I was using index match in excel..


df['BTC_price'] = df.Datetime.map(BTC_Data_df.set_index('Date')['Open'].to_dict())

df

Unnamed: 0,Datetime,Tweet Id,Text,sentiment,BTC_price
0,2021-03-22,1374071729467707394,"Sent to @GiveDirectly Africa fund 🌍Thank you, ...","(0.0, 0.0)",57517.890625
1,2021-03-21,1373483866095243267,RT @moneyball: The original goal of 3-4 engine...,"(0.375, 0.75)",58309.914062
2,2021-03-20,1373409955823415300,RT @sqcrypto: We turned 2 today. Jack’s our dad.,"(0.0, 0.0)",58332.261719
3,2021-03-20,1373275072392278018,50/50,"(0.0, 0.0)",58332.261719
4,2021-03-18,1372655138981224450,@EnesKanter @Twitter 🙏🏼,"(0.0, 0.0)",58893.078125
...,...,...,...,...,...
245,2020-11-17,1328721286474788865,Thank you members of the Judiciary Committee f...,"(0.0, 0.0)",16685.691406
246,2020-11-16,1328426021041754112,"Welcome, Mudge! https://t.co/hl9HiRjGtg","(1.0, 0.9)",15955.577148
247,2020-11-15,1327770733401894913,No. https://t.co/X6EWJ73sRx,"(0.0, 0.0)",16068.139648
248,2020-11-14,1327723833080438784,RT @jerrybrito: 1/ Patent trolls are a massive...,"(-0.06666666666666667, 0.6333333333333333)",16317.808594


In [66]:
sentiment_series = df['sentiment'].tolist()

df1 = pd.DataFrame(sentiment_series, columns=['polarity','subjectivity'], index=df.index)

result = pd.concat([df, df1], axis=1)

decimals = 2    
result['polarity'] = result['polarity'].apply(lambda x: round(x, decimals))

result


Unnamed: 0,Datetime,Tweet Id,Text,sentiment,BTC_price,polarity,subjectivity
0,2021-03-22,1374071729467707394,"Sent to @GiveDirectly Africa fund 🌍Thank you, ...","(0.0, 0.0)",57517.890625,0.00,0.000000
1,2021-03-21,1373483866095243267,RT @moneyball: The original goal of 3-4 engine...,"(0.375, 0.75)",58309.914062,0.38,0.750000
2,2021-03-20,1373409955823415300,RT @sqcrypto: We turned 2 today. Jack’s our dad.,"(0.0, 0.0)",58332.261719,0.00,0.000000
3,2021-03-20,1373275072392278018,50/50,"(0.0, 0.0)",58332.261719,0.00,0.000000
4,2021-03-18,1372655138981224450,@EnesKanter @Twitter 🙏🏼,"(0.0, 0.0)",58893.078125,0.00,0.000000
...,...,...,...,...,...,...,...
245,2020-11-17,1328721286474788865,Thank you members of the Judiciary Committee f...,"(0.0, 0.0)",16685.691406,0.00,0.000000
246,2020-11-16,1328426021041754112,"Welcome, Mudge! https://t.co/hl9HiRjGtg","(1.0, 0.9)",15955.577148,1.00,0.900000
247,2020-11-15,1327770733401894913,No. https://t.co/X6EWJ73sRx,"(0.0, 0.0)",16068.139648,0.00,0.000000
248,2020-11-14,1327723833080438784,RT @jerrybrito: 1/ Patent trolls are a massive...,"(-0.06666666666666667, 0.6333333333333333)",16317.808594,-0.07,0.633333


In [67]:
df_BTC = result[result['Text'].str.contains("bitcoin | crypto")]

df_BTC


Unnamed: 0,Datetime,Tweet Id,Text,sentiment,BTC_price,polarity,subjectivity
9,2021-03-18,1372590033803284480,beautiful work #bitcoin https://t.co/BErHZ2wUts,"(0.85, 1.0)",58893.078125,0.85,1.0
11,2021-03-17,1372226044946755586,#bitcoin first https://t.co/vd5fBRSkkn,"(0.25, 0.3333333333333333)",56825.828125,0.25,0.333333
18,2021-03-15,1371549932000313351,"RT @signalapp: As a nonprofit organization, we...","(0.0, 0.0)",59267.429688,0.0,0.0
25,2021-03-12,1370452409957384198,RT @hello_bitcoin: Many critics think bitcoin ...,"(0.1875, 0.4)",57821.21875,0.19,0.4
58,2021-02-20,1362946674553856004,RT @jerrybrito: It’s an honor to help in this ...,"(0.4, 1.0)",55887.335938,0.4,1.0
65,2021-02-10,1359546136785129473,RT @moneyball: Great overview of what the bitc...,"(1.0, 0.75)",46469.761719,1.0,0.75
73,2021-02-05,1357588751757627393,Running #bitcoin https://t.co/W51ga3yrKb,"(0.0, 0.0)",36931.546875,0.0,0.0
87,2021-01-27,1354491856650817537,RT @FrancisSuarez: The City of Miami believes ...,"(0.0, 0.0)",32564.029297,0.0,0.0
133,2021-01-04,1346188219231047686,"RT @Square: 2 weeks ago, FinCEN (a bureau of t...","(-0.4, 0.9)",32810.949219,-0.4,0.9
134,2021-01-04,1346188092210724864,Our comments on FinCen’s rule proposal on #bit...,"(0.0, 0.0)",32810.949219,0.0,0.0


### Comment on data collection

Though it would be nice to press run and we get everything we need pulled down without too much manual entry, at the end of the day you need consider whether it's worth it to automate some things. There's certain bits of news that I have in the back of my mind that aren't tweets but I do feel would compliment the analysis - to do this you can create manual annotations and place them onto the graph by eye.

In [68]:

fig_BTC = px.line(BTC_Data_df, x="Date", y="Open")

fig2_BTC = px.scatter(df_BTC, x="Datetime", y="BTC_price", text = "polarity", hover_data=['Text'])

fig_BTC.add_trace(fig2_BTC.data[0])

#Custome annotation code if required 
fig_BTC.add_annotation(x= "2021-01-29 00:00:00.00000", y=34111,
            text="Elon Musk changes his bio to #bitcoin",
            showarrow=True,
            arrowhead=1)


fig_BTC.show()






