# Predicting BitCoin Volume

## Goals:

- Investigate drivers of volume in BitCoin Market Data from January 2018 to December 2022.

- Construct a ML time-series model that accurately predicts volume of Bitcoin.

# Import

In [1]:
import warnings
warnings.filterwarnings("ignore")


import requests

import pandas as pd
import numpy as np
from scipy import stats
from math import sqrt
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
from statsmodels.tsa.api import Holt

from datetime import datetime

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import TimeSeriesSplit 


import wrangle as w
import explore as e
import model as m

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

# Acquire

- Data acquired from messari.io


- It contained 1768 rows and 5 columns before cleaning


- Each row represents one day from 01/01/2018 to 12/12/2022


- Each column represents a feature that describes BitCoin

In [None]:
# Acquiring and cleaning data
btc = w.get_crypto_price('btc', '2018-01-01', '2022-12-12')

btc = w.clean_data(btc)

In [None]:
btc.head(3)

In [None]:
btc.info()

# Prepare

- Check for nulls (nulls were found and filled with nearest date)


- Checked for missing dates, filled 40 missing dates with data from the nearest data.


- Extended the length of data from 12-12-2022 to 12-31-2022, then filled data with nearest dates for modeling purposes.


- Renamed columns for readability


- Split data into train, validate, and test sets


- All observations were significant, none were removed


- We have 1825 rows after cleaning and filling.

| Feature | Definition |
| :- | :- |
| btc_open | Decimal value, opening price of BitCoin for the day. |
| btc_close | Decimal value, closing price of BitCoin for the day. |
| btc_high | Decimal value, highest price of BitCoin for the day. |
| btc_low | Decimal value, lowest price of BitCoin for the day. |
| btc_volume | Decimal value, number of shares traded in BitCoin for the day. |

In [None]:
# split into train, validate, test by year
train = btc[:'2020']
validate = btc['2021']
test = btc['2022']

# Explore


- Only exploring the train data.

## Questions:

- Does volume have seasonality over time?


- Do opening price and volume have a relationship?


- Do closing price and volume have a relationship?


- Do highest price and volume have a relationship?


- Do lowest price and volume have a relationship?

## Looking into seasonality..

In [None]:
# get plot of Average Volume of BitCoin per month
e.get_avg_vol_monthly()

- Monthly averages seem to share very weak seasonality that only appears in around September-November.

In [None]:
# get plot of Volume for each day
e.get_vol_by_date()

- Daily values appear to share little to no seasonality over the span of 3 years.

## Do open, close, highest or lowest price share a relationship with volume?

In [None]:
e.plot_price_vol()

- There appears to be no significant relationship between open, close, highest, or lowest price and volume.

## Exploration Summary

- There were no significant relationships between volume and the rest of the features in our train data.


- We move on to modeling our target variable.

# Modeling

- RMSE is the metric to compare models against each other.


- Models are fit on train data, then predict on validate data.


- Model with the lowest RMSE will be considered the strongest and move on to test data.

In [None]:
# function for updating an evaluation dataframe for comparison purposes
def update_eval_df(model_type, eval_df, col, validate, yhat_df):
    rmse = m.evaluate(col, validate, yhat_df)
    d = {'model_type': [model_type], 'target_var': [col], 'rmse': [rmse]}
    d = pd.DataFrame(d)
    eval_df=eval_df.append(d, ignore_index = True)
    
    return eval_df

In [None]:
# Create the empty dataframe for evaluation
eval_df = pd.DataFrame(columns=['model_type', 'target_var', 'rmse'])

# Initialize volume, yhat_df, and period for modeling
volume = 0

yhat_df = pd.DataFrame({'btc_volume': [volume]}, 
                       index = validate.index)

period = 0

# setting btc_volume to variable to use in function arguments
col = 'btc_volume'

## Last Observed (our Baseline)

In [None]:
# get last_observed model and update eval_df
volume, yhat_df = m.get_btc_last_observed(train, validate, volume, yhat_df)

eval_df = update_eval_df('last_observed', eval_df, col, validate, yhat_df)

#eval_df

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

## Simple Average

In [None]:
volume, yhat_df = m.get_btc_simple_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('simple_average', eval_df, col, validate, yhat_df)

#eval_df

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

# Moving Average

### 30 Day Period

In [None]:
volume, yhat_df = m.get_btc_30d_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('30d_moving_average', eval_df, col, validate, yhat_df)

#eva

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

### 7 Day Period

In [None]:
volume, yhat_df = m.get_btc_7d_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('7d_moving_average', eval_df, col, validate, yhat_df)

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

### 14 Day Period

In [None]:
volume, yhat_df = m.get_btc_14d_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('14d_moving_average', eval_df, col, validate, yhat_df)

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

### 21 Day Period

In [None]:
volume, yhat_df = m.get_btc_21d_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('21d_moving_average', eval_df, col, validate, yhat_df)

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

### 28 Day Period

In [None]:
volume, yhat_df = m.get_btc_28d_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('28d_moving_average', eval_df, col, validate, yhat_df)

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

### 120 Day Period

In [None]:
volume, yhat_df = m.get_btc_120d_average(train, validate, volume, yhat_df)

eval_df = update_eval_df('120d_moving_average', eval_df, col, validate, yhat_df)

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

# Previous Cycle

In [None]:
yhat_df = m.get_btc_previous_cycle(train, validate)

eval_df = update_eval_df('previous_cycle', eval_df, col, validate, yhat_df)

In [None]:
m.plot_and_eval(col, train, validate, yhat_df)

## Comparing Models

In [None]:
# show evaluation dataframe
eval_df

- Comparing the RMSE

# Model on Test

In [None]:
volume, yhat_df = m.get_test_btc_21d_average(train, test, volume, yhat_df)

eval_df = update_eval_df('test_21d_moving_average', eval_df, col, test, yhat_df)

#eval_df

In [None]:
m.plot_and_eval_test(col, train, validate, test, yhat_df)