# Bitcoin Price Change Prediction Utilizing Twitter Sentiment Data and Volume

Notebook Navigation:
1. Bitcoin Price
2. Filter English Tweets without Duplicate
3. Tweets Pre-processing
4. Creating Vader and Textblob Features
5. Creating Twitter Volume Feature
6. Final Twitter data
7. Combine Twitter and Crypto Price Data
8. Creating Lag Feature
9. Modeling
    - Linear Regression
    - XGBoost
    - Random Forest

### Background
Cryptocurrency is one of the most volatile instrument and just like any stock prediction, it is very hard to predict future price, especially with the lack of regulatory system in place compared to stock. There are few ways that can be done on predicting crypto price: technical analysis, fundamental analysis and sentiment analysis.

The latter one remains abstract as it is dealing with traders emotion that can trigger panic selling/ purchasing spree based on expectations and perceptions. Twitter has been a platform to discuss this and from previous studies done, social media activity of significant individuals indeed can sway public opinion. Tweets from significant individuals can shift the price which happens a few times like when Elon tweeted about not accepting Bitcoin payment in May'22 that caused the price to plump by 10%  Therefore, **we would like to bring a tool to help retail investor shorten the time spent on doing sentiment analysis**, rather than keeping up on all the tweets for hours a day, we can spend the remaining time building an edge on the technical or fundamental side.

### Problem Statement
creating a model that helps retail investor to **make better informed decision based** on twitter sentiments and volume along with basic technical analysis


In [2]:
#importing libraries
import pandas as pd
from datetime import datetime
#from datetime import timedelta
import datetime as dt
import yfinance as yfin
import pandas_datareader.data as pdr

#from yahoo finance
yfin.pdr_override()
crypto = 'BTC'
against_currency = 'USD'

start = dt.datetime(2016,1,1)
end = dt.datetime.now()

df = pdr.get_data_yahoo("BTC-USD", start, end)
df.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-01-01,430.721008,436.246002,427.515015,434.334015,434.334015,36278900
2016-01-02,434.622009,436.062012,431.869995,433.437988,433.437988,30096600
2016-01-03,433.578003,433.743011,424.705994,430.010986,430.010986,39633800
2016-01-04,430.061005,434.516998,429.084015,433.091003,433.091003,38477500
2016-01-05,433.069,434.182007,429.675995,431.959991,431.959991,34522600


### Datasets Overview
Bitcoin historical data is obtained from [Kaggle](https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data?resource=download) or can be found from yahoo finance pandas readrr
- Timestamp: Start time of time window (60s) in Unix Time
    - Unix Time: the number of seconds that have elapsed since the Unix epoch (00:00:00 UTC (Coordinated Universal Time) on 1 January 1970.)
- Open: Open price at start of time window
- High: High price within time window
- Low: Low price within time window
- Close: Close price at the end of time window
- Volume_(BTC): Volume of BTC transacted in this window
- Volume_(Currency): Volume of corresponding currency transacted in this window
- Weighted_Price: Volume Weighted Average Price (VWAP)

In [5]:
btc = pd.read_csv('../data/bitcoin_data.csv')
#cleaning up the Unix timestamp to datetime
btc['timestamp'] = btc['Timestamp'].apply(lambda x: datetime.fromtimestamp(x)) 
#taking those dates between 1st Jan 2016 to 31st Dec 2017
btc = btc[(btc['timestamp'] >= '2016-01-01') & (btc['timestamp'] <= '2017-12-31')]
btc.drop(columns='Timestamp',inplace=True)
btc.head() #data is still by minute

Unnamed: 0,Open,High,Low,Close,Volume_(BTC),Volume_(Currency),Weighted_Price,timestamp
2097856,423.52,423.52,423.51,423.51,1.159208,490.946953,423.51933,2016-01-01 00:00:00
2097857,423.25,423.25,423.24,423.24,0.12028,50.907951,423.245349,2016-01-01 00:01:00
2097858,423.27,423.27,423.27,423.27,0.02356,9.972241,423.27,2016-01-01 00:02:00
2097859,,,,,,,,2016-01-01 00:03:00
2097860,,,,,,,,2016-01-01 00:04:00


In [7]:
btc.to_csv('../data/1-bitcoin_data.csv')