The project is done in python
using the following libraries: numpy
, pandas
, seaborn
, matplotlib
, sklearn
, statsmodels
, itertools
Please refer to the research-proposal-and-discussion
for the research rationale and background literature, presentation
for the quick summary of the project, and the rest of the files for the actual code and analysis.
- Unsupervised
- K-means (tuned based on TSS)
- K-Means with Differencing: detrending the series to avoid the autocorrelation
- Supervised
- ARIMA (tuned based on AIC)
- Single Exponential Smoothing (SES) with Seasonal and Trend decomposition using Loess (STL) Decomposition
- Holt-Winters method
- ARIMA with CV (tuned based on AIC)
Scraping the complete Bitcoin Data from 2009 as well as getting the following exogenous variables:
- Gold
- Crude Oil
- S&P 500
- Vanguard Financials Index Fund ETF
- Vanguard Information Technology Index Fund ETF
- NVIDIA
The exogenous variables are chosen from the background literature search and current market analysis.
Libraries used: numpy
, pandas
, statsmodels
, seaborn
, matplotlib
EDA and preprocessing include changing column names, data types, missing values, uniques values.
Visualizations are done without any transformation and then with log
transformation because of drastically different scale of bitoin prices over time.
The graphs include:
- line plots
- box plots
- violin plots
- bar plots
- lag plots
- autocorrelation plots
There are also time series decomposition plots done.
Observed and seasonally adjusted trends are compared.
Libraries used: sklearn
- Clustering (with tuning)
- K-means
- K-Means with Differencing: detrending the series to avoid the autocorrelation.
Total within sum of squares (TSS) was used to choose the optimal number of clusters.
Libraries used: sklearn
, statsmodels
, itertools
- ARIMA
- Single Exponential Smoothing (SES) with Seasonal and Trend decomposition using Loess (STL) Decomposition
- Holt-Winters method
- ARIMA with CV
Akaike Information Criterion (AIC) was used to choose the best tuning parameters for ARIMA models.
Root Mean Square Error (RMSE) was used to choose the best performing model.