Skip to content

anastasia-sosnovskikh/predict-bitcoin-prices

Repository files navigation

bitcoin

Predict the Closing Price of Bitcoin in the Next 6 Months

The project is done in python using the following libraries: numpy, pandas, seaborn, matplotlib, sklearn, statsmodels, itertools

Time Series Forecasting

Please refer to the research-proposal-and-discussion for the research rationale and background literature, presentation for the quick summary of the project, and the rest of the files for the actual code and analysis.

Quick Glance at the Methods:
  • Unsupervised
    • K-means (tuned based on TSS)
    • K-Means with Differencing: detrending the series to avoid the autocorrelation
  • Supervised
    • ARIMA (tuned based on AIC)
    • Single Exponential Smoothing (SES) with Seasonal and Trend decomposition using Loess (STL) Decomposition
    • Holt-Winters method
    • ARIMA with CV (tuned based on AIC)

0. Get the Data

Scraping the complete Bitcoin Data from 2009 as well as getting the following exogenous variables:

  • Gold
  • Crude Oil
  • S&P 500
  • Vanguard Financials Index Fund ETF
  • Vanguard Information Technology Index Fund ETF
  • NVIDIA

The exogenous variables are chosen from the background literature search and current market analysis.

1. Exploratory Data Analysis and Preprocessing

Libraries used: numpy, pandas, statsmodels, seaborn, matplotlib

EDA and preprocessing include changing column names, data types, missing values, uniques values.
Visualizations are done without any transformation and then with log transformation because of drastically different scale of bitoin prices over time.
The graphs include:

  • line plots
  • box plots
  • violin plots
  • bar plots
  • lag plots
  • autocorrelation plots

There are also time series decomposition plots done.
Observed and seasonally adjusted trends are compared.

2. Unsupervised Learning

Libraries used: sklearn

Models
  • Clustering (with tuning)
    • K-means
    • K-Means with Differencing: detrending the series to avoid the autocorrelation.

Total within sum of squares (TSS) was used to choose the optimal number of clusters.

2. Supervised Learning

Libraries used: sklearn, statsmodels, itertools

Methods:
  • ARIMA
  • Single Exponential Smoothing (SES) with Seasonal and Trend decomposition using Loess (STL) Decomposition
  • Holt-Winters method
  • ARIMA with CV

Akaike Information Criterion (AIC) was used to choose the best tuning parameters for ARIMA models.

Root Mean Square Error (RMSE) was used to choose the best performing model.