# Analysis of Financial Markets based on President Trump's Tweets

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Data Preprocessing

### Importing Data from Kaggle

In [1]:
!pip install -q kaggle

In [2]:
# Upload kaggle.json file containing your API key
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"alexbakr","key":"c0801cc78cd2543642c17d399c2abd3f"}'}

In [3]:
!mkdir ~/.kaggle 

In [4]:
!cp kaggle.json ~/.kaggle/

In [6]:
! chmod 600 ~/.kaggle/kaggle.json

In [5]:
# Check to see if the API is working correctly 
# ! kaggle datasets list

In [7]:
! kaggle datasets download -d austinreese/trump-tweets

Downloading trump-tweets.zip to /content
 73% 5.00M/6.88M [00:00<00:00, 17.4MB/s]
100% 6.88M/6.88M [00:00<00:00, 22.9MB/s]


### Data Cleaning

In [8]:
import pandas as pd
import numpy as np
from zipfile import ZipFile

In [9]:
data = ZipFile("/content/trump-tweets.zip")
trump_tweets = pd.read_csv(data.open('trumptweets.csv'))
trump_tweets.head()

Unnamed: 0,id,link,content,date,retweets,favorites,mentions,hashtags,geo
0,1698308935,https://twitter.com/realDonaldTrump/status/169...,Be sure to tune in and watch Donald Trump on L...,2009-05-04 20:54:25,500,868,,,
1,1701461182,https://twitter.com/realDonaldTrump/status/170...,Donald Trump will be appearing on The View tom...,2009-05-05 03:00:10,33,273,,,
2,1737479987,https://twitter.com/realDonaldTrump/status/173...,Donald Trump reads Top Ten Financial Tips on L...,2009-05-08 15:38:08,12,18,,,
3,1741160716,https://twitter.com/realDonaldTrump/status/174...,New Blog Post: Celebrity Apprentice Finale and...,2009-05-08 22:40:15,11,24,,,
4,1773561338,https://twitter.com/realDonaldTrump/status/177...,"""My persona will never be that of a wallflower...",2009-05-12 16:07:28,1399,1965,,,


In [10]:
trump_tweets.shape

(41122, 9)

The dataframe shows that there are 41,122 rows and 9 columns. Three of these columns mention, hashtags, and geo are currenly showing NaN for the first five rows. 

In [11]:
trump_tweets.describe()

Unnamed: 0,id,retweets,favorites,geo
count,41122.0,41122.0,41122.0,0.0
mean,6.088909e+17,5455.590657,22356.899105,
std,3.027946e+17,10130.076661,41501.859711,
min,1698309000.0,0.0,0.0,
25%,3.549428e+17,25.0,28.0,
50%,5.609149e+17,291.0,247.0,
75%,7.941218e+17,8778.0,32970.75,
max,1.219077e+18,309892.0,857678.0,


In [12]:
#Remove NaN columns
trump_tweets = trump_tweets.drop(labels=['mentions', 'hashtags', 'geo'], axis='columns')
trump_tweets

Unnamed: 0,id,link,content,date,retweets,favorites
0,1698308935,https://twitter.com/realDonaldTrump/status/169...,Be sure to tune in and watch Donald Trump on L...,2009-05-04 20:54:25,500,868
1,1701461182,https://twitter.com/realDonaldTrump/status/170...,Donald Trump will be appearing on The View tom...,2009-05-05 03:00:10,33,273
2,1737479987,https://twitter.com/realDonaldTrump/status/173...,Donald Trump reads Top Ten Financial Tips on L...,2009-05-08 15:38:08,12,18
3,1741160716,https://twitter.com/realDonaldTrump/status/174...,New Blog Post: Celebrity Apprentice Finale and...,2009-05-08 22:40:15,11,24
4,1773561338,https://twitter.com/realDonaldTrump/status/177...,"""My persona will never be that of a wallflower...",2009-05-12 16:07:28,1399,1965
...,...,...,...,...,...,...
41117,1218962544372670467,https://twitter.com/realDonaldTrump/status/121...,I have never seen the Republican Party as Stro...,2020-01-19 19:24:52,32620,213817
41118,1219004689716412416,https://twitter.com/realDonaldTrump/status/121...,Now Mini Mike Bloomberg is critical of Jack Wi...,2020-01-19 22:12:20,36239,149571
41119,1219053709428248576,https://twitter.com/realDonaldTrump/status/121...,I was thrilled to be back in the Great State o...,2020-01-20 01:27:07,16588,66944
41120,1219066007731310593,https://twitter.com/realDonaldTrump/status/121...,"“In the House, the President got less due proc...",2020-01-20 02:16:00,20599,81921


In [13]:
trump_tweets.dtypes

id            int64
link         object
content      object
date         object
retweets      int64
favorites     int64
dtype: object

In [14]:
# The date column is an object when it should be datetime
trump_tweets['date'] = pd.to_datetime(trump_tweets['date'], format='%Y%m%d %H:%M:%S')

In [15]:
# Split the date column into 2 seperate columns
trump_tweets['Time'],trump_tweets['Date']= trump_tweets['date'].apply(lambda x:x.time()), trump_tweets['date'].apply(lambda x:x.date())
trump_tweets = trump_tweets.drop(labels='date',axis='columns')

In [17]:
trump_tweets.head()

Unnamed: 0,id,link,content,retweets,favorites,Time,Date
0,1698308935,https://twitter.com/realDonaldTrump/status/169...,Be sure to tune in and watch Donald Trump on L...,500,868,20:54:25,2009-05-04
1,1701461182,https://twitter.com/realDonaldTrump/status/170...,Donald Trump will be appearing on The View tom...,33,273,03:00:10,2009-05-05
2,1737479987,https://twitter.com/realDonaldTrump/status/173...,Donald Trump reads Top Ten Financial Tips on L...,12,18,15:38:08,2009-05-08
3,1741160716,https://twitter.com/realDonaldTrump/status/174...,New Blog Post: Celebrity Apprentice Finale and...,11,24,22:40:15,2009-05-08
4,1773561338,https://twitter.com/realDonaldTrump/status/177...,"""My persona will never be that of a wallflower...",1399,1965,16:07:28,2009-05-12


In [18]:
trump_tweets['content']

0        Be sure to tune in and watch Donald Trump on L...
1        Donald Trump will be appearing on The View tom...
2        Donald Trump reads Top Ten Financial Tips on L...
3        New Blog Post: Celebrity Apprentice Finale and...
4        "My persona will never be that of a wallflower...
                               ...                        
41117    I have never seen the Republican Party as Stro...
41118    Now Mini Mike Bloomberg is critical of Jack Wi...
41119    I was thrilled to be back in the Great State o...
41120    “In the House, the President got less due proc...
41121    A great show! Check it out tonight at 9pm. @ F...
Name: content, Length: 41122, dtype: object

In [20]:
# Remove punctuation from content column
trump_tweets['content'] = trump_tweets['content'].str.replace('[^\w\s]','')
trump_tweets['content'] 

0        Be sure to tune in and watch Donald Trump on L...
1        Donald Trump will be appearing on The View tom...
2        Donald Trump reads Top Ten Financial Tips on L...
3        New Blog Post Celebrity Apprentice Finale and ...
4        My persona will never be that of a wallflower ...
                               ...                        
41117    I have never seen the Republican Party as Stro...
41118    Now Mini Mike Bloomberg is critical of Jack Wi...
41119    I was thrilled to be back in the Great State o...
41120    In the House the President got less due proces...
41121    A great show Check it out tonight at 9pm  FoxN...
Name: content, Length: 41122, dtype: object

In [21]:
trump_tweets.head()

Unnamed: 0,id,link,content,retweets,favorites,Time,Date
0,1698308935,https://twitter.com/realDonaldTrump/status/169...,Be sure to tune in and watch Donald Trump on L...,500,868,20:54:25,2009-05-04
1,1701461182,https://twitter.com/realDonaldTrump/status/170...,Donald Trump will be appearing on The View tom...,33,273,03:00:10,2009-05-05
2,1737479987,https://twitter.com/realDonaldTrump/status/173...,Donald Trump reads Top Ten Financial Tips on L...,12,18,15:38:08,2009-05-08
3,1741160716,https://twitter.com/realDonaldTrump/status/174...,New Blog Post Celebrity Apprentice Finale and ...,11,24,22:40:15,2009-05-08
4,1773561338,https://twitter.com/realDonaldTrump/status/177...,My persona will never be that of a wallflower ...,1399,1965,16:07:28,2009-05-12


## Exploratory Analysis

Now that the data is cleaned up, lets preform some exploritory anlaysis

In [26]:
trump_tweets['Date'].isnull().sum()

0