# Bitcoin Price Analysis and Forecasting: Volatility Insights, Time Series Modeling, and Visualization

# Overview

Bitcoin, the pioneering cryptocurrency, has ignited global interest due to its intriguing price fluctuations and potential impact on the financial landscape. This project is designed to provide a comprehensive exploration of Bitcoin's price behavior, encompassing historical trends, volatility patterns, and future price predictions.

The foundation of this analysis rests on historical Bitcoin price data, meticulously collected from the CryptoCompare API. This dataset comprises of hourly Bitcoin price data starting from July 23, 2023, 23:00:00 and going back two years. The dataset encompasses a rich variety of information, including daily opening, closing, high, and low prices, along with corresponding trading volumes in Bitcoin and US Dollars. This comprehensive dataset serves as the bedrock for our in-depth analysis and forecasting endeavors.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item">
    <ul class="toc-item"><li><span><a href="#Exploratory-Data-Analysis-(EDA)" data-toc-modified-id="Exploratory-Data-Analysis-(EDA)-2"><span class="toc-item-num">1&nbsp;&nbsp;</span>Exploratory Data Analysis (EDA)</a></span><ul class="toc-item"><li><span><a href="#Finding-maximum-and-minimum-values-of-each-column-with-the-date-and-time-that-they-happened." data-toc-modified-id="Finding-maximum-and-minimum-values-of-each-column-with-the-date-and-time-that-they-happened.-2.1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Finding maximum and minimum values of each column with the date and time that they happened.</a></span></li><li><span><a href="#Highest-and-lowest-trading-volumes-from-Bitcoin" data-toc-modified-id="Highest-and-lowest-trading-volumes-from-Bitcoin-1.1"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Highest and lowest trading volumes from Bitcoin</a></span></li><li><span><a href="#The-correlation-between-the-trading-volume-and-the-Bitcoin-closing-price" data-toc-modified-id="The-correlation-between-the-trading-volume-and-the-Bitcoin-closing-price-2.1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>The correlation between the trading volume and the Bitcoin closing price</a></span></li><li><span><a href="#Calculating-and-Visualizing-Returns" data-toc-modified-id="Calculating-and-Visualizing-Returns-2.1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Calculating and Visualizing Returns</a></span></li><li><span><a href="#Rolling-Statistics" data-toc-modified-id="Rolling-Statistics-2.1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Rolling Statistics</a></span></li><li><span><a href="#Seasonal-Decomposition" data-toc-modified-id="Seasonal-Decomposition-2.1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Seasonal Decomposition</a></span></li></ul></li><li><span><a href="#Volatility-Analysis" data-toc-modified-id="Volatility-Analysis-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Volatility Analysis</a></span><ul class="toc-item"><li><span><a href="#Daily-Volatility" data-toc-modified-id="Daily-Volatility-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Daily Volatility</a></span></li><li><span><a href="#Historical-Volatility" data-toc-modified-id="Historical-Volatility-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Historical Volatility</a></span></li><li><span><a href="#Volatility-Indicators:" data-toc-modified-id="Volatility-Indicators:-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Volatility Indicators:</a></span><ul class="toc-item"><li><span><a href="#Bollinger-Bands" data-toc-modified-id="Bollinger-Bands-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Bollinger Bands</a></span></li><li><span><a href="#Identidying-overbought-and-potential-price-correction-datetimes" data-toc-modified-id="Identidying-overbought-and-potential-price-correction-datetimes-2.2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>Identidying overbought and potential price correction datetimes</a></span></li><li><span><a href="#Uptrend-and-downtrend" data-toc-modified-id="Uptrend-and-downtrend-2.3.3"><span class="toc-item-num">2.3.3&nbsp;&nbsp;</span>Uptrend and downtrend</a></span></li><li><span><a href="#Average-True-Range-(ATR):" data-toc-modified-id="Average-True-Range-(ATR):-2.3.4"><span class="toc-item-num">2.3.4&nbsp;&nbsp;</span>Average True Range (ATR):</a></span></li></ul></li></ul></li><li><span><a href="#Time-Series-Modeling-and-Forecasting" data-toc-modified-id="Time-Series-Modeling-and-Forecasting-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Time-Series Modeling and Forecasting</a></span><ul class="toc-item"><li><span><a href="#ARIMA-(AutoRegressive-Integrated-Moving-Average)-Model" data-toc-modified-id="ARIMA-(AutoRegressive-Integrated-Moving-Average)-Model-2.3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>ARIMA (AutoRegressive Integrated Moving Average) Model</a></span><ul class="toc-item"><li><span><a href="#ACF-and-PACF-Plots" data-toc-modified-id="ACF-and-PACF-Plots-3.1.1"><span class="toc-item-num">3.1.1&nbsp;&nbsp;</span>ACF and PACF Plots</a></span></li></ul></li><li><span><a href="#SARIMA-(Seasonal-ARIMA)-Model" data-toc-modified-id="SARIMA-(Seasonal-ARIMA)-Model-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>SARIMA (Seasonal ARIMA) Model</a></span></li><li><span><a href="#Facebook's-Prophet" data-toc-modified-id="Facebook's-Prophet-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Facebook's Prophet</a></span></li><li><span><a href="#Model-Performance-Comparisons" data-toc-modified-id="Model-Performance-Comparisons-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Model Performance Comparisons</a></span></li></ul></li></ul></div>

In [None]:
import time
import pandas as pd
import datetime
pd.options.mode.chained_assignment = None

In [None]:
# import requests
## API_KEY = key removed for privacy
# CRYPTO_SYMBOL = 'BTC'  # Bitcoin symbol
# CURRENCY = 'USD'       # Currency to convert prices into
# LIMIT = 2000           # Maximum limit per API call

# # Calculate the timestamp of today and one year ago (in seconds)

# today_timestamp = int(pd.to_datetime('2023-07-23 23:00:00').timestamp())
# one_year_ago_timestamp = int(pd.to_datetime('2021-07-23 23:00:00').timestamp())

# # Create an empty list to store the data points
# data_points = []

# # Number of data points to fetch for 1 year
# num_data_points = 2*365 * 24  # 365 days * 24 hours

# # Keep fetching data in batches until we get the required number of data points
# while len(data_points) < num_data_points:
#     # Calculate the number of data points to fetch in this batch
#     remaining_data_points = num_data_points - len(data_points)
#     batch_limit = min(remaining_data_points, LIMIT)
    
#     # Make the API call
#     url = f'https://min-api.cryptocompare.com/data/v2/histohour?fsym={CRYPTO_SYMBOL}&tsym={CURRENCY}&limit={batch_limit}&toTs={today_timestamp}&api_key={API_KEY}'
#     response = requests.get(url)

#     if response.status_code == 200:
#         batch_data = response.json()['Data']['Data']
#         data_points.extend(batch_data)
#         # Update 'today_timestamp' for the next batch
#         today_timestamp -= (batch_limit * 3600)  # 1 hour = 3600 seconds
#     else:
#         print(f'Error: Unable to retrieve data. Status code: {response.status_code}')
#         break

# # Create a DataFrame from the list of data points
# df = pd.DataFrame(data_points)
# df['time'] = pd.to_datetime(df['time'], unit='s')  # Convert timestamps to datetime format
# df.set_index('time', inplace=True)

# df.sort_index(inplace=True)

# df.to_csv('G:/Documents/Projects/Bitcoin/data/bitcoin_data.csv')

In [None]:
bitcoin_data = pd.read_csv('G:/Documents/Projects/Bitcoin/data/bitcoin_data.csv', index_col = 'time')

In [None]:
bitcoin_data.shape

In [None]:
bitcoin_data

In [None]:
bitcoin_data.info()

In [None]:
bitcoin_data.drop(columns = ['conversionSymbol', 'conversionType'], inplace = True)

In [None]:
# Convert the 'datetime' column to a pandas datetime object and set it as the index
bitcoin_data.index = pd.to_datetime(bitcoin_data.index)

In [None]:
bitcoin_data = bitcoin_data.drop_duplicates(keep='first')

## Exploratory Data Analysis (EDA)
Exploratory Data Analysis is crucial for understanding the characteristics of your data and identifying patterns or anomalies. Here are some EDA steps you can perform:

- Visualize Historical Prices: Plot the historical Bitcoin prices over time using line plots or candlestick charts. Observe any trends, seasonality, or notable events.

- Calculate and Visualize Returns: Compute the percentage returns from the price data and plot them. Analyze the distribution of returns and look for patterns.

- Rolling Statistics: Compute rolling statistics, such as moving averages and rolling standard deviations, to observe trends and volatility changes.

- Seasonal Decomposition: Use seasonal decomposition techniques (e.g., seasonal decomposition of time series, or STL) to separate the data into trend, seasonality, and residual components.

- Autocorrelation and Partial Autocorrelation: Analyze autocorrelation and partial autocorrelation plots to identify potential autoregressive (AR) and moving average (MA) components for time-series modeling.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import mplfinance as mpf
import numpy as np
import plotly.graph_objects as go

### Finding maximum and minimum values of each column with the date and time that they happened.

In [None]:
# Set the display format for floating-point numbers
pd.options.display.float_format = '{:.2f}'.format

# Create a new DataFrame to show the maximum value and its index beside it
max_values_df = pd.DataFrame({
    'Max Value': bitcoin_data.max(),
    'Date of Max Value': bitcoin_data.idxmax()
})

max_values_df

Let's check the market in the week leading to that all time high price on 2021-11-10 14:00:00.

In [None]:
highest_time = pd.to_datetime('2021-11-10 14:00:00')

# Calculate the datetime for a week before 't0'
week_before_highest = highest_time - pd.Timedelta(weeks=1)

# Get the datetimes within the week before 't0' using date_range
datetimes_week_before_highest = pd.date_range(start=week_before_highest, end=highest_time, freq='H')

# Access the corresponding rows in the DataFrame
data_week_before_highest = bitcoin_data.loc[datetimes_week_before_highest]

plt.figure(figsize=(9, 6))
sns.lineplot(x=data_week_before_highest.index, y=data_week_before_highest.high, color='blue')
plt.title('High Values for the Week Before all-time high')
plt.xlabel('Datetime')
plt.ylabel('High Value')
plt.grid(True)
plt.tight_layout()
plt.show()


Why did Bitcoin rise in November 2021?

Bitcoin (BTC) price again reached an all-time high in 2021, as values exceeded over 65,000 USD in November 2021. That particular price hike was connected to the launch of a Bitcoin ETF in the United States, whilst others in 2021 were due to events involving Tesla and Coinbase, respectively

A Bitcoin ETF (Exchange-Traded Fund) is a type of investment fund that tracks the price of Bitcoin (BTC) and aims to replicate its performance. ETFs are similar to mutual funds but are traded on stock exchanges like individual stocks. This means investors can buy and sell shares of a Bitcoin ETF throughout the trading day, just like any other stock.

The primary objective of a Bitcoin ETF is to provide investors with exposure to the price movements of Bitcoin without having to own the actual cryptocurrency. Instead of directly buying and holding Bitcoin, investors can buy shares of the ETF, which represent ownership of a portfolio of Bitcoin or Bitcoin futures contracts held by the ETF issuer.

In [None]:
# Create a new DataFrame to show the maximum value and its index beside it
min_values_df = pd.DataFrame({
    'Min Value': bitcoin_data.min(),
    'Date of Min Value': bitcoin_data.idxmin()
})

min_values_df

The slump in November 2022 was triggered by the collapse of FTX, which handled around \\$1 billion transactions each day. Its collapse is having a knock-on effect on other crypto exchanges. In June 2022 bitcoin dropped below \\$20,000 for the first time since 2020.

On November 11, 2022, FTX announced Bankman-Fried's resignation as CEO of FTX, his predecessor, John J. Ray III, and the company's bankruptcy filing via Twitter.

In [None]:
# Create an interactive line plot with zooming
fig = go.Figure()
fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', line=dict(color='dodgerblue')))
fig.update_layout(title='Hourly Bitcoin Closing Price',
                  xaxis_title='Date',
                  yaxis_title='Price')
fig.show()

Because there are so many data points, before plotting candlestick charts, I will resample the data to weekly frequency and aggregate using 'ohlc' (Open, High, Low, Close)

In [None]:
bitcoin_data_weekly = bitcoin_data.resample('W').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volumefrom': 'sum', 'volumeto': 'sum'})

# Candlestick chart
mc = mpf.make_marketcolors(up='g', down='r', wick='inherit', volume='inherit')
s = mpf.make_mpf_style(marketcolors=mc)
fig_size = (12, 8)  # Adjust width and height as needed
mpf.plot(bitcoin_data_weekly, type='candle', style=s, title='Weekly Bitcoin Prices (Candlestick Chart)', figsize=fig_size)

### Highest and lowest trading volumes from Bitcoin

- Volume From: "Volume From" represents the total trading volume of the base cryptocurrency (in this case, Bitcoin) in a specific trading pair. It indicates the total amount of the base cryptocurrency that has been traded during the given time period.

- Volume To: "Volume To" represents the total trading volume of the quote currency (in this case, USD) in a specific trading pair. It indicates the total amount of the quote currency that has been traded during the given time period.

For example, let's say you have a trading pair BTC/USD, where Bitcoin (BTC) is the base currency, and the US Dollar (USD) is the quote currency. If the reported values for the trading pair are:

- Volume From (BTC): 100 BTC
- Volume To (USD): 2,000,000 USD

This means that during the specified time period, 100 Bitcoin (BTC) has been traded against the US Dollar (USD), and the total value of those trades is $2,000,000 USD.

The trading volume is an essential metric in cryptocurrency markets, as it provides insights into the liquidity and trading activity of a specific cryptocurrency pair. High trading volumes are often associated with more liquid markets, which can be beneficial for traders and investors.

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_data.index, bitcoin_data['volumefrom'], label='Volume From', color='blue')

plt.title('Bitcoin Trading Volume Over Time')
plt.xlabel('Date')
plt.ylabel('Volume from Bitcoin')
plt.grid(True)
plt.show()

In [None]:
print("Highest trading volume from Bitcoin was {vol} and happened on {date}."
      .format(vol=bitcoin_data.volumefrom.max(),date=bitcoin_data.volumefrom.idxmax()))

print("Lowest trading volume from Bitcoin was {vol} and happened on {date}."
      .format(vol=bitcoin_data.volumefrom.min(),date=bitcoin_data.volumefrom.idxmin()))

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_data.index, bitcoin_data['volumeto'], label='Volume To', color='orange')

plt.title('Bitcoin Trading Volume Over Time')
plt.xlabel('Date')
plt.ylabel('Volume To USD')
plt.grid(True)
plt.show()

In [None]:
print("Highest trading volume to USD was {vol} and happened on {date}."
      .format(vol=bitcoin_data.volumeto.max(),date=bitcoin_data.volumeto.idxmax()))

print("Lowest trading volume to USD was {vol} and happened on {date}."
      .format(vol=bitcoin_data.volumeto.min(),date=bitcoin_data.volumeto.idxmin()))

### The correlation between the trading volume and the Bitcoin closing price

In [None]:
import plotly.express as px

correlation = bitcoin_data['volumefrom'].corr(bitcoin_data['close']) 

fig = px.scatter(bitcoin_data, x='volumefrom', y='close', opacity=0.5,
                 title='Correlation between Trading Volume and Bitcoin Closing Price',
                 labels={'volumefrom': 'Trading Volume', 'close': 'Closing Price'})
fig.update_layout(showlegend=False)
fig.show()

print("Correlation between Trading Volume and Closing Price:", correlation)


The negative correlation indicates that as trading volume increases, the closing price tends to decrease.

We see that the points are clustered to the left of the plot. This may indiciate:

- Higher Trading Activity at Lower Prices: The clustering of points to the left suggests that there is a concentration of higher trading activity (volume) when the Bitcoin price is relatively lower. This could mean that more traders are actively buying and selling Bitcoin when its price is in a specific range.


- Key Price Levels: The clustering might indicate that there are certain key price levels or support/resistance levels where traders tend to engage in more buying and selling activities, leading to higher trading volumes. These levels could be significant for traders in their decision-making process.


- Price Stability: The clustering might also reflect periods of price stability or consolidation, where the price is trading within a narrow range. During such periods, trading activity may be more pronounced as traders try to capitalize on potential price movements.

### Calculating and Visualizing Returns

In [None]:
# Calculate the percentage returns from the 'close' prices
bitcoin_data['Returns'] = bitcoin_data['close'].pct_change() * 100

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Returns'], mode='lines', line=dict(color='olivedrab')))

fig.update_layout(title='Bitcoin Percentage Returns',
                  xaxis_title='Date',
                  yaxis_title='Percentage Returns')

fig.show()

In [None]:
print("Maximum Percentage Return:", bitcoin_data['Returns'].max())
print("Date of Maximum Percentage Return:", bitcoin_data['Returns'].idxmax())

In [None]:
print("Minimum Percentage Return:", bitcoin_data['Returns'].min())
print("Date of Minimum Percentage Return:", bitcoin_data['Returns'].idxmin())

### Rolling Statistics

In [None]:
# Compute the Weekly rolling mean and standard deviation
# Data is hourly, so if we want to calculate 30 day rolling means the window size must be multiplied by 24 hours
bitcoin_data['Weekly Rolling Mean'] = bitcoin_data['close'].rolling(window=7*24).mean()
bitcoin_data['Weekly Rolling Std'] = bitcoin_data['close'].rolling(window=7*24).std()

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', line=dict(color='dodgerblue'),
                         name='Bitcoin Price'))

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Mean'], mode='lines', line=dict(color='orange'),
                         name='Weekly Rolling Mean'))

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Std'], mode='lines', line=dict(color='lightgray'),
                         name='Weekly Rolling Std', fill='tonexty'))

fig.update_layout(title='Bitcoin Price with Weekly Rolling Mean and Std',
                  xaxis_title='Date',
                  yaxis_title='Price',
                  xaxis=dict(showgrid=True),
                  yaxis=dict(showgrid=True),
                  showlegend=True,
                  xaxis_rangeslider_visible=True)

fig.show()


In [None]:
print("Maximum Weekly Rolling Std:", bitcoin_data['Weekly Rolling Std'].max())
print("Date of Maximum Weekly Rolling Std:", bitcoin_data['Weekly Rolling Std'].idxmax())

###  Seasonal Decomposition

In [None]:
import statsmodels.api as sm

# Perform seasonal decomposition
decomposition = sm.tsa.seasonal_decompose(bitcoin_data['close'], model='additive', period=7*24)
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid


In [None]:
# Create line plots for trend, seasonality, and residuals
plt.figure(figsize=(10, 8))
plt.subplot(4, 1, 1)
plt.plot(bitcoin_data.index, bitcoin_data['close'], label='Original')
plt.title('Original Bitcoin Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.grid(True)

plt.subplot(4, 1, 2)
plt.plot(bitcoin_data.index, trend, label='Trend', color='orange')
plt.title('Trend Component')
plt.xlabel('Date')
plt.ylabel('Trend')
plt.grid(True)

plt.subplot(4, 1, 3)
plt.plot(bitcoin_data.index, seasonal, label='Seasonal', color='green')
plt.title('Seasonal Component')
plt.xlabel('Date')
plt.ylabel('Seasonal')
plt.grid(True)

plt.subplot(4, 1, 4)
plt.plot(bitcoin_data.index, residual, label='Residual', color='red')
plt.title('Residual Component')
plt.xlabel('Date')
plt.ylabel('Residual')
plt.grid(True)

plt.tight_layout()
plt.show()

The Sesonal is overplotted; to see the seasonality we can either change the `period` of the decomposition, or plot a zoomable figure.

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=seasonal, mode='lines', line=dict(color='green'),
                         name='Seasonal'))

fig.update_layout(title='Zoomable Seasonality Plot',
                  xaxis_title='Date',
                  yaxis_title='Value',
                  xaxis_rangeslider_visible=True)

fig.show()

At first glance, the Trend and Residual components look almost identical to Weekly Rolling mean and Std. 

Let's plot each and compare.

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=trend, mode='lines', line=dict(color='dodgerblue'),
                         name='Trend'))

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Mean'], mode='lines', line=dict(color='orange'),
                         name='Weekly Rolling Mean'))

fig.update_layout(title='Trend and Weekly Rolling Mean',
                  xaxis_title='Date',
                  yaxis_title='Value',
                  xaxis=dict(showgrid=True),
                  yaxis=dict(showgrid=True),
                  showlegend=True,
                  xaxis_rangeslider_visible=True)

fig.show()


The trend component represents the underlying long-term movement of the data, while the rolling average is a way to smooth out short-term fluctuations and highlight the general trend.

The fact that the trend component and the Weekly rolling average are very similar shows that the seasonality is relatively stable over time, and the seasonal decomposition method effectively captures it. 

The seasonal decomposition method we used here is the "additive" model, where the observed data is considered as the sum of the trend, seasonal, and residual components. In this model, the trend component tends to show a linear or linear-like pattern, and it may be similar to the rolling average.

Since standard deviation can only have positive values, we take the absolute value of residual to make the comparison easier.

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=np.abs(residual), mode='lines', line=dict(color='dodgerblue'),
                         name='Trend'))

fig.add_trace(go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Std'], mode='lines', line=dict(color='yellow'),
                         name='Weekly Rolling Std'))

fig.update_layout(title='Residual and Weekly Rolling Std',
                  xaxis_title='Date',
                  yaxis_title='Value',
                  xaxis=dict(showgrid=True),
                  yaxis=dict(showgrid=True),
                  showlegend=True,
                  xaxis_rangeslider_visible=True)

fig.show()


The residual component represents the part of the data that cannot be explained by the trend and seasonality. It is essentially the leftover variation in the data after the trend and seasonality have been removed. On the other hand, the rolling standard deviation is a measure of how much the data deviates from its average value over a rolling window of 30 days.

Here the seasonal decomposition method effectively captures the trend and seasonality in the data, since the residual component should ideally contain random noise and irregular fluctuations. Therefore the residual component is similar to the rolling standard deviation, which captures the dispersion of the data around its mean.

Both the residual component and the rolling standard deviation provide insights into the volatility or variability of the data. A close similarity between the two may indicate that the data has relatively stable variability around the trend and seasonality.

## Volatility Analysis

How much variation there is in consecutive price changes over time.

Daily Volatility: Calculate the daily price changes and standard deviation to measure daily volatility.

Rolling Volatility: Compute rolling (moving) standard deviations to observe how volatility changes over time.

Volatility Clustering: Analyze periods of high and low volatility, known as volatility clustering, using visualizations or clustering techniques.

Volatility Forecasting: Use GARCH (Generalized Autoregressive Conditional Heteroskedasticity) or EGARCH (Exponential GARCH) models to forecast future volatility.



### Daily Volatility
Daily volatility measures the price changes of an asset on a daily basis. It can be calculated as the absolute daily returns or the standard deviation of daily returns. Using returns to measure volatility is a common practice in financial analysis, and it can provide valuable insights into the volatility of an asset's price movements. Volatility refers to the degree of variation of an asset's price over time, and returns are a key component in calculating and assessing this volatility.

Volatility Clustering refers to the tendency for periods of high volatility to be followed by more periods of high volatility and vice versa. This insight can help us understand how volatility changes over time and potentially predict periods of increased market activity.

By analyzing the relationship between volatility and market trends we see that high volatility coincides with major market events, such as price spikes or crashes. By examining volatility patterns alongside price movements, we can gain insights into how external factors impact the market.

In [None]:
# Resample the data to daily intervals and calculate the daily closing prices
bitcoin_close_daily = bitcoin_data['close'].resample('D').last()

# Calculate the percentage daily returns
daily_returns = bitcoin_close_daily.pct_change() * 100

In [None]:
# Resample the data to weekly intervals and calculate the weekly closing prices
bitcoin_close_weekly = bitcoin_data['close'].resample('W').mean()

# Calculate the percentage daily returns
weekly_returns = bitcoin_close_weekly.pct_change() * 100

In [None]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=bitcoin_close_daily.index, y=daily_returns, mode='lines', line=dict(color='dodgerblue'), name = 'Daily Returns'))
fig.add_trace(go.Scatter(x=bitcoin_close_weekly.index, y=weekly_returns, mode='lines', line=dict(color='blue'), name = 'Weekly Returns'))

fig.update_layout(title='Bitcoin Daily and Weekly Percentage Returns',
                  xaxis_title='Date',
                  yaxis_title='Daily/Weekly Percentage Returns')

fig.show()

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

# Create the first trace (Daily Returns) using the primary y-axis
fig.add_trace(go.Scatter(x=bitcoin_close_daily.index, y=daily_returns, mode='lines', line=dict(color='dodgerblue'), name='Daily Returns'))

# Create the second trace (Daily Closing Prices) using the secondary y-axis
fig.add_trace(go.Scatter(x=bitcoin_close_daily.index, y=bitcoin_close_daily, mode='lines', line=dict(color='orange'), name='Daily Closing Prices', yaxis='y2'))

# Set up the layout with two y-axes
fig.update_layout(title='Bitcoin Daily Returns and Closing Prices',
                  xaxis_title='Date',
                  yaxis_title='Daily Returns',
                  yaxis2=dict(title='Daily Closing Prices', overlaying='y', side='right'),
                  legend=dict(x=0, y=1, bgcolor='rgba(255, 255, 255, 0.5)'),
                  )

fig.show()


As can be seen in this plot, spikes and crashes of the market coincide with periods of high volatility.

### Historical Volatility
Historical volatility measures the past price fluctuations of an asset over a specific period. It is typically computed as the standard deviation of the asset's returns. Higher historical volatility implies greater price variability in the past.

In [None]:
# Annualizing using 252 trading days in a year
historical_volatility = bitcoin_data['Returns'].rolling(window=7*24).std() * (252 ** 0.5)  

In [None]:
# Assuming you have already calculated historical volatility as 'historical_volatility'

fig, ax1 = plt.subplots(figsize=(10, 6))

# Plot Bitcoin prices on the primary y-axis (left side)
ax1.plot(bitcoin_data.index, bitcoin_data['close'], label='Bitcoin Price', color='blue')
ax1.set_xlabel('Date')
ax1.set_ylabel('Bitcoin Price', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')
ax1.grid(True)

# Create a secondary y-axis for historical volatility (right side)
ax2 = ax1.twinx()
ax2.plot(bitcoin_data.index, historical_volatility, label='Historical Volatility', color='orange')
ax2.set_ylabel('Historical Volatility', color='orange')
ax2.tick_params(axis='y', labelcolor='orange')

# Add a legend that combines both lines from both y-axes
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc='upper left')

plt.title('Bitcoin Price and Historical Volatility')
plt.grid(True)
plt.tight_layout()
plt.show()


### Volatility Indicators:
There are various volatility indicators that can help identify trends or changes in volatility. Some popular volatility indicators include the Bollinger Bands, Average True Range (ATR), and the Volatility Index (VIX).

#### Bollinger Bands
Bollinger Bands are a popular technical indicator that helps traders and analysts understand price volatility and potential trading signals. They were developed by John Bollinger in the 1980s and consist of three lines plotted on a price chart:

Middle Band: The middle band is a simple moving average (SMA) of the asset's price over a specified period. The most commonly used period is 20 days, but you can adjust it based on your analysis objectives.

Upper Band: The upper band is derived by adding a specified number of standard deviations (usually 2) to the middle band. The standard deviation is a measure of the asset's price volatility. The upper band represents a zone where prices are relatively high.

Lower Band: The lower band is derived by subtracting a specified number of standard deviations (usually 2) from the middle band. The lower band represents a zone where prices are relatively low.

Bollinger Bands can be used for:

- Volatility Assessment: Bollinger Bands provide a visual representation of market volatility. When the bands are wide, it indicates higher volatility, and when they are narrow, it indicates lower volatility.

- Overbought and Oversold Levels: Traders often look for potential buying opportunities when the price touches or crosses the lower band, as it suggests that the asset may be oversold. Similarly, potential selling opportunities are sought when the price touches or crosses the upper band, as it suggests that the asset may be overbought.

- Price Breakouts: Bollinger Bands can help identify potential breakouts. A breakout occurs when the price moves outside the bands. Traders may interpret a breakout as a signal to enter or exit positions.

In [None]:
# For example, using Bollinger Bands to identify volatility bands around the moving average
upper_band = bitcoin_data['Weekly Rolling Mean'] + 2 * bitcoin_data['Weekly Rolling Std']
lower_band = bitcoin_data['Weekly Rolling Mean'] - 2 * bitcoin_data['Weekly Rolling Std']

#### Identidying overbought and potential price correction datetimes

In [None]:
# Identify overbought datetimes using the upper Bollinger Band
overbought_datetimes = bitcoin_data[bitcoin_data.close >= upper_band].index

# Set the number of periods to consider for potential corrections
num_periods_after_overbought = 5

# Initialize an empty list to store potential correction datetimes
correction_datetimes = []

# Look for price reversals after overbought datetimes
for overbought_datetime in overbought_datetimes:
    # Get the index of the overbought datetime in the DataFrame
    overbought_index = bitcoin_data.index.get_loc(overbought_datetime)
    
    # Check if the price falls below the 'Weekly Rolling Mean' within the specified number of periods
    for i in range(1, num_periods_after_overbought + 1):
        next_index = overbought_index + i
        if next_index < len(bitcoin_data):
            if bitcoin_data['close'].iloc[next_index] < bitcoin_data['Weekly Rolling Mean'].iloc[next_index]:
                correction_datetimes.append(bitcoin_data.index[next_index])
                break

In [None]:
# Plot Bitcoin prices
trace_price = go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', name='Bitcoin Price', line=dict(color='blue', width=2))

# Plot the Weekly Simple Moving Average (SMA) as the middle band
trace_sma = go.Scatter(x=bitcoin_data.index, y=bitcoin_data['Weekly Rolling Mean'], mode='lines', name='Weekly SMA', line=dict(color='orange', width=2))

trace_upper_band = go.Scatter(x=bitcoin_data.index, y=upper_band, name = 'Upper Band', 
                              fill='tonexty', fillcolor='rgba(128, 128, 128, 0.2)', line=dict(color='aliceblue'))

trace_lower_band = go.Scatter(x=bitcoin_data.index, y=lower_band, name = 'Lower Band', 
                              fill='tonexty', fillcolor='rgba(128, 128, 128, 0.2)', line=dict(color='honeydew'))

# Create a trace for potential correction datetimes
trace_correction_datetimes = go.Scatter(x=correction_datetimes, y=bitcoin_data.loc[correction_datetimes, 'close'], mode='markers', name='Potential Correction Datetimes', marker=dict(color='red', size=10, symbol='circle'))

# Combine all traces into a data list
data = [trace_price, trace_sma, trace_upper_band, trace_lower_band, trace_correction_datetimes]

# Create layout
layout = go.Layout(title='Bitcoin Price with Bollinger Bands and Potential Correction Datetimes',
                   xaxis=dict(title='Date'),
                   yaxis=dict(title='Bitcoin Price', side='left', titlefont=dict(color='blue')),
                   yaxis2=dict(title='Weekly Rolling Mean', overlaying='y', side='right', titlefont=dict(color='orange')),
                   showlegend=True,
                   legend=dict(x=0, y=1, traceorder='normal'))

# Create the figure
fig = go.Figure(data=data, layout=layout)

# Show the plot
fig.show()

#### Uptrend and downtrend

- Trading Signals: Uptrends and downtrends can serve as trading signals for traders and investors. For example, when the price is in an uptrend, it might be a signal to buy or hold the asset, while a downtrend could indicate a potential selling opportunity.

- Risk Management: Understanding the trend direction can help with risk management. Traders might reduce their exposure to the asset during downtrends to minimize potential losses, while increasing exposure during uptrends to take advantage of potential gains.

- Strategy Development: Uptrends and downtrends can be used as components in developing trading strategies. For instance, you could create a trend-following strategy that buys when an uptrend is confirmed and sells or shorts when a downtrend is confirmed.

- Volatility Analysis: Analyzing trends can also provide insights into the market's volatility. Volatile markets may exhibit rapid and frequent changes in trend direction, while less volatile markets might have more stable and sustained trends.

- Market Sentiment: Trends can reflect market sentiment and help gauge the overall bullish or bearish sentiment among traders and investors.

- Pattern Recognition: By identifying trends, you can also look for patterns like higher highs and higher lows in uptrends and lower highs and lower lows in downtrends. Recognizing patterns can provide additional information for making trading decisions.

In [None]:
# Resample the data to weekly intervals and calculate the weekly mean
bitcoin_data_weekly = bitcoin_data.resample('W').mean()

# Initialize a boolean mask for uptrends and downtrends
is_uptrend = bitcoin_data_weekly['close'] > bitcoin_data_weekly['Weekly Rolling Mean']
is_downtrend = bitcoin_data_weekly['close'] < bitcoin_data_weekly['Weekly Rolling Mean']

# Use the boolean masks to label the trends (1 for uptrend, -1 for downtrend, and 0 for neutral)
bitcoin_data_weekly['Trend'] = 0  # Initialize the 'Trend' column with 0 (neutral)
bitcoin_data_weekly.loc[is_uptrend, 'Trend'] = 1
bitcoin_data_weekly.loc[is_downtrend, 'Trend'] = -1

In [None]:
bitcoin_data['3D Rolling Mean'] = bitcoin_data['close'].rolling(window=3*24).mean()
bitcoin_data['3D Rolling Std'] = bitcoin_data['close'].rolling(window=3*24).std()

In [None]:
# Resample the data to 3-day intervals and calculate the 3-day mean
bitcoin_data_3day = bitcoin_data.resample('3D').mean()

# Initialize a boolean mask for uptrends and downtrends
is_uptrend = bitcoin_data_3day['close'] > bitcoin_data_3day['3D Rolling Mean']
is_downtrend = bitcoin_data_3day['close'] < bitcoin_data_3day['3D Rolling Mean']

# Create the Bitcoin price line trace containing all data
price_trace = go.Scatter(x=bitcoin_data.index, y=bitcoin_data['close'], mode='lines', name='Bitcoin Price')

# Create the uptrend and downtrend traces using scatter plots based on 3-day resampled data
uptrend_trace = go.Scatter(x=bitcoin_data_3day.index[is_uptrend], y=bitcoin_data_3day['close'][is_uptrend],
                           mode='markers', name='Uptrend', marker=dict(color='green', symbol='triangle-up'))
downtrend_trace = go.Scatter(x=bitcoin_data_3day.index[is_downtrend], y=bitcoin_data_3day['close'][is_downtrend],
                             mode='markers', name='Downtrend', marker=dict(color='red', symbol='triangle-down'))

# Combine the traces
data = [price_trace, uptrend_trace, downtrend_trace]

# Create the layout
layout = go.Layout(title='Bitcoin Price with Uptrends and Downtrends (3-Day Resampling)',
                   xaxis=dict(title='Date'),
                   yaxis=dict(title='Bitcoin Price (USD)'),
                   showlegend=True,
                   )

# Create the figure and plot
fig = go.Figure(data=data, layout=layout)
fig.show()


#### Average True Range (ATR):
ATR is a technical indicator used to measure market volatility. It was introduced by J. Welles Wilder Jr. in his book "New Concepts in Technical Trading Systems." ATR calculates the average range between the high and low prices over a specific period, considering potential gaps between consecutive trading days.

ATR can provide valuable insights into the volatility of an asset, helping traders and investors make informed decisions. Higher ATR values indicate higher volatility, while lower values indicate lower volatility.

In [None]:
# Calculate Average True Range (ATR) for Bitcoin data
high_low_range = bitcoin_data['high'] - bitcoin_data['low']
high_close_range = abs(bitcoin_data['high'] - bitcoin_data['close'].shift())
low_close_range = abs(bitcoin_data['low'] - bitcoin_data['close'].shift())
true_range = pd.DataFrame({'HL Range': high_low_range, 'HC Range': high_close_range, 'LC Range': low_close_range})
ATR = true_range.max(axis=1).rolling(window=14).mean()

In [None]:
# Plot Bitcoin prices with Average True Range (ATR) on a secondary y-axis
fig, ax1 = plt.subplots(figsize=(10, 6))

# Plot Bitcoin prices on the primary y-axis (left side)
ax1.plot(bitcoin_data.index, bitcoin_data['close'], label='Bitcoin Price', color='blue')
ax1.set_xlabel('Date')
ax1.set_ylabel('Bitcoin Price', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')
ax1.grid(True)

# Create a secondary y-axis for ATR (right side)
ax2 = ax1.twinx()
ax2.plot(bitcoin_data.index, ATR, label='Average True Range (ATR)', color='purple')
ax2.set_ylabel('Average True Range (ATR)', color='purple')
ax2.tick_params(axis='y', labelcolor='purple')

# Add a legend that combines both lines from both y-axes
lines_1, labels_1 = ax1.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
ax1.legend(lines_1 + lines_2, labels_1 + labels_2, loc='upper left')

plt.title('Bitcoin Price and Average True Range (ATR)')
plt.grid(True)
plt.tight_layout()
plt.show()


## Time-Series Modeling and Forecasting
 We'll consider the ARIMA (AutoRegressive Integrated Moving Average) model, SARIMA (Seasonal ARIMA) model, and Facebook Prophet for forecasting. The goal is to train the models on a portion of the data and validate the performance on unseen data.

### ARIMA (AutoRegressive Integrated Moving Average) Model

- ACF: The AutoCorrelation Function measures the correlation between a time series and its lagged values. It helps to identify the level of autocorrelation in a time series at different lagged time points. The ACF plot shows the correlation coefficient at various lags. It typically decays over time, indicating a non-stationary time series.

- PACF: The Partial AutoCorrelation Function measures the correlation between a time series and its lagged values, removing the effect of intermediate lags. It helps to identify the direct relationship between a time series and its lagged values. The PACF plot helps determine the order of the AR (AutoRegressive) component in an ARIMA model.

Using ACF and PACF plots, we can identify the order of the ARIMA model (p, d, q) as follows:

- AR(p): The order of the AutoRegressive component (p) can be determined by looking at the PACF plot. The PACF plot will show significant spikes at lag points that indicate the direct relationship between the time series and its lagged values. The order of the AR component is usually the highest lag value with a significant spike before it starts to drop off.

- I(d): The order of Integration (d) represents the number of differencing operations required to make the time series stationary. This can be determined by looking at the ACF plot. If the ACF plot shows a gradual decay or the series is already stationary, then d = 0. Otherwise, d is the minimum differencing required to make the series stationary.

- MA(q): The order of the Moving Average component (q) can be determined by looking at the ACF plot. The ACF plot will show significant spikes at lag points that indicate the correlation between the time series and its lagged moving average values. The order of the MA component is usually the highest lag value with a significant spike before it starts to drop off.

By analyzing the ACF and PACF plots, we can determine the appropriate values for p, d, and q, which form the order of the ARIMA model to best capture the underlying patterns and autocorrelation in the time series data.

#### ACF and PACF Plots

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [None]:
# Plot the ACF and PACF plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

# Plot the ACF plot
plot_acf(bitcoin_close_daily, lags=50, ax=ax1)

# Plot the PACF plot using the 'ywm' method
plot_pacf(bitcoin_close_daily, lags=50, ax=ax2, method='ywm')

plt.show()


In [None]:
# Plot the ACF and PACF plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

# Plot the ACF plot
plot_acf(bitcoin_data_weekly['close'], lags=50, ax=ax1)

# Plot the PACF plot using the 'ywm' method
plot_pacf(bitcoin_data_weekly['close'], lags=50, ax=ax2, method='ywm')

plt.show()


As can be seen in the plots, the ACF value never reaches 0 when we resample the hourly data to daily, but reaches 0 and becomes negative when we resample to weekly. This behavior is expected and can be attributed to the impact of seasonality on the time series.

When we resample the hourly data to daily intervals, we are likely retaining the impact of intra-day patterns and fluctuations in the time series. As a result, the ACF values may not reach 0 since there could be some correlation between the data points within each day. In other words, the daily data still carries the memory of the previous hour's data, leading to non-zero ACF values.

On the other hand, when we resample the data to weekly intervals, we are aggregating the daily data over each week. By doing so, we are effectively removing the finer intra-day patterns and fluctuations, and the resulting weekly data might exhibit more apparent seasonality or periodicity. The ACF values may then reach 0 and even become negative due to the seasonality patterns that repeat at weekly intervals.

This behavior highlights the importance of understanding the inherent patterns and seasonality in the time series data before conducting ACF analysis. The choice of the resampling frequency can have a significant impact on the ACF results and the insights derived from them.

We are interested in capturing and analyzing seasonality, so the ACF and PACF plots of the resampled weekly data could provide valuable insights into the underlying periodic patterns and autocorrelation structure.

In [None]:
# Calculate first-order differencing (removing daily seasonality)
bitcoin_data['Differenced'] = bitcoin_data['close'].diff()

In [None]:
# Plot the ACF and PACF plots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

# Plot the ACF plot
plot_acf(bitcoin_data['Differenced'], lags=50, ax=ax1)

# Plot the PACF plot using the 'ywm' method
plot_pacf(bitcoin_data['Differenced'], lags=50, ax=ax2, method='ywm')

plt.show()


1. Autoregressive (AR) Component (p):

- Look at the PACF plot and find the last significant spike before it drops to zero. The lag at which this spike occurs can give you an idea of the order of the AR component (p).
- If the PACF plot shows a significant spike at lag 1 and a gradual decay afterward, you might consider an AR(p) model with p=1.

2. Moving Average (MA) Component (q):

- Examine the ACF plot and find the last significant spike before it drops to zero. The lag at which this spike occurs can provide an indication of the order of the MA component (q).

- If the ACF plot shows a significant spike at lag 1 and a gradual decay afterward, you might consider an MA(q) model with q=1.

3. Differencing (d):

- Look for the number of times you need to difference the data to make it stationary. This is the value of d.

- If the differenced data shows a fairly stable mean and variance over time, d=1 may be sufficient. However, if it is still non-stationary, you may need to try d=2 or higher.

In [None]:
from sklearn.model_selection import train_test_split
from statsmodels.tsa.arima.model import ARIMA

In [None]:
bitcoin_daily = bitcoin_data['close'].asfreq('D')

# Split the data into training and validation sets using train_test_split
train_size = 0.8
train, test = train_test_split(bitcoin_daily, train_size=train_size, shuffle=False, random_state=42)

# Fit the ARIMA model
p, d, q = 28, 1, 1
# p, d, q = 29, 0, 1
arima_model = ARIMA(train, order=(p, d, q))
results_arima = arima_model.fit()

# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1
predictions_arima = results_arima.predict(start=start_index, end=end_index, dynamic=False)

# Plot the actual vs. predicted Bitcoin prices
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_arima.index, predictions_arima, label='Predicted Prices', color='red')
plt.title('ARIMA Model: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()


In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

mae_arima = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima)
rmse_arima = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima))
print("ARIMA MAE:", mae_arima)
print("ARIMA RMSE:", rmse_arima)

In [None]:
import warnings

# Supress all warnings
warnings.filterwarnings('ignore')

In [None]:
bitcoin_daily = bitcoin_data['close'].asfreq('D')

# Split the data into training and validation sets using train_test_split
train_size = 0.8
train, test = train_test_split(bitcoin_daily, train_size=train_size, shuffle=False)

# Define the parameter grid for p, d, and q
param_grid = {
    'p': [27, 28, 29],
    'd': [0, 1, 2],
    'q': [0, 1, 2],
}

best_mae = float('inf')
best_params = None

# Iterate through the parameter grid
for p in param_grid['p']:
    for d in param_grid['d']:
        for q in param_grid['q']:
            try:
                # Fit the ARIMA model
                arima_model = ARIMA(train, order=(p, d, q))
                results = arima_model.fit()
                
                # Make predictions on the validation set
                start_index = len(train)
                end_index = len(bitcoin_daily) - 1
                predictions = results.predict(start=start_index, end=end_index, dynamic=False)
                
                # Calculate MAE
                mae = mean_absolute_error(test, predictions)
                
                # Check if this combination of parameters gives a better MAE
                if mae < best_mae:
                    best_mae = mae
                    best_params = (p, d, q)
                
            except:
                continue

print("Best MAE:", best_mae)
print("Best Parameters (p, d, q):", best_params)


In [None]:
# Fit the ARIMA model with the best parameters
best_arima_model = ARIMA(train, order=best_params)
best_results = best_arima_model.fit()

# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1
predictions_arima_tuned = best_results.predict(start=start_index, end=end_index, dynamic=False)

mae_arima_tuned = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima_tuned)
rmse_arima_tuned = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima_tuned))
print("ARIMA MAE Tuned:", mae_arima_tuned)
print("ARIMA RMSE Tuned:", rmse_arima_tuned)

In [None]:
# Plot the actual vs. predicted Bitcoin prices
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_arima_tuned.index, predictions_arima_tuned, label='Predicted Prices', color='red')
plt.title('ARIMA Model With Hyperparameter Tuning: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()

### SARIMA (Seasonal ARIMA) Model
SARIMA (Seasonal Autoregressive Integrated Moving Average) is an extension of the ARIMA (Autoregressive Integrated Moving Average) model that incorporates seasonality. It is a powerful time-series forecasting method that can handle data with both trend and seasonality.
Seasonal Component: SARIMA introduces a seasonal component that captures repeating patterns in the data at fixed intervals. This is suitable for data with seasonality, such as monthly, quarterly, or yearly patterns.

Autoregressive (AR) Component: The autoregressive component captures the relationship between the current value of the series and its past values. It involves regressing the series against its own lagged values.

Integrated (I) Component: The integrated component refers to differencing the series to make it stationary. Stationarity is important for time-series models as it helps stabilize the mean and variance over time.

Moving Average (MA) Component: The moving average component models the relationship between the current value of the series and its past forecast errors (lags of the error term).

Parameters of SARIMA:

The SARIMA model is defined by three sets of parameters:

p, d, q: These parameters correspond to the autoregressive order (p), differencing order (d), and moving average order (q) of the non-seasonal part of the model.

P, D, Q, s: These parameters correspond to the autoregressive order (P), differencing order (D), moving average order (Q), and the length of the seasonal period (s) for the seasonal part of the model.


In [None]:
bitcoin_daily = bitcoin_data['close'].asfreq('D')

# Split the data into training and validation sets using train_test_split
train_size = 0.8
train, test = train_test_split(bitcoin_daily, train_size=train_size, shuffle=False, random_state=42)

# Define the order of the SARIMA model (p, d, q, P, D, Q, s)
order = best_params  # Non-seasonal components (p, d, q)
seasonal_order = (0, 1, 1, 7)  # Seasonal components (P, D, Q, s)

# Fit the SARIMA model
sarima_model = sm.tsa.SARIMAX(bitcoin_daily, order=order, seasonal_order=seasonal_order)
sarima_fit = sarima_model.fit()

# Make predictions on the validation set
start_index = len(train)
end_index = len(bitcoin_daily) - 1

predictions_sarima = sarima_fit.predict(start=start_index, end=end_index, dynamic=False)

# Plot the actual vs. predicted Bitcoin prices
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_sarima.index, predictions_sarima, label='Predicted Prices', color='red')
plt.title('SARIMA Model: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()

Checking for overfitting by making predictions on training data and calculating the MAE.

In [None]:
tp = sarima_fit.predict(start=1, end=len(train)-1, dynamic=False)

train_mae = mean_absolute_error(train[:-1], tp)

print("MAE on training data:", train_mae)

In [None]:
print("MAE on test data:", mean_absolute_error(bitcoin_daily[start_index:], predictions_sarima))

MAE on training data is not too small, so the model isn't overfitted.

In [None]:
# Set the number of periods you want to forecast
forecast_periods = len(test)  # Or the number of periods you want to forecast

# Create a DataFrame with future dates for forecasting
last_date = bitcoin_daily.index[-1]
future_dates = pd.date_range(start=last_date, periods=forecast_periods + 1, freq='D')
future_dates = future_dates[1:]  # Exclude the last date of the training data

# Generate forecasts for future dates
forecast_steps = len(future_dates)  # Number of steps to forecast
forecast = sarima_fit.forecast(steps=forecast_steps)

plt.figure(figsize=(12, 6))
plt.plot(bitcoin_daily.index, bitcoin_daily, label='Actual Prices', color='blue')
plt.plot(predictions_sarima.index, predictions_sarima, label='Predictions', color='red')
plt.plot(forecast.index, forecast, label='Forecasts', color='green')
plt.title('SARIMA Model: Actual, Predictions, and Forecasts')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()

### Facebook's Prophet
Prophet is a time series forecasting model developed by Facebook's Core Data Science team. It is designed to handle time series data with strong seasonal patterns, multiple seasonality, and holiday effects. The model decomposes the time series into several components, including trend, seasonality, and holiday effects. It then models each component independently and combines them to make accurate forecasts.

In [None]:
from prophet import Prophet

bitcoin_prophet = bitcoin_data[['close']].reset_index()
bitcoin_prophet.rename(columns={'time': 'ds', 'close': 'y'}, inplace=True)

train_size = 0.8
train_prophet, test_prophet = train_test_split(bitcoin_prophet, train_size=train_size, shuffle=False, random_state=42)

model = Prophet()
model.fit(train_prophet)

# Set the number of periods you want to forecast 
forecast_periods = 6*30*24

# Create a DataFrame with future dates for forecasting
future_dates = model.make_future_dataframe(periods=forecast_periods, freq='H')

# Make predictions for the future dates
forecast = model.predict(future_dates)

fig2 = model.plot_components(forecast)

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(bitcoin_prophet['ds'], bitcoin_prophet['y'], label='Actual Prices', color='blue')
plt.plot(forecast['ds'], forecast['yhat'], label='Predicted Prices', color='red')
plt.title('Prophet Model: Actual vs. Predicted Bitcoin Prices')
plt.xlabel('Date')
plt.ylabel('Bitcoin Price (USD)')
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Extract the predictions for the train set
train_predictions = forecast[forecast['ds'].isin(train_prophet['ds'])]

# Calculate the mean absolute error (MAE)
mae_train = mean_absolute_error(train_prophet['y'], train_predictions['yhat'])

# Calculate the root mean squared error (RMSE)
rmse_train = np.sqrt(mean_squared_error(train_prophet['y'], train_predictions['yhat']))

print("Mean Absolute Error (MAE):", mae_train)
print("Root Mean Squared Error (RMSE):", rmse_train)

## Model Performance Comparisons

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

start_index = len(train)
end_index = len(bitcoin_daily) - 1

mae_arima = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima)
rmse_arima = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima))
print("ARIMA MAE:", mae_arima)
print("ARIMA RMSE:", rmse_arima)
                     
mae_arima_tuned = mean_absolute_error(bitcoin_daily[start_index:], predictions_arima_tuned)
rmse_arima_tuned = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_arima_tuned))
print("ARIMA MAE Tuned:", mae_arima_tuned)
print("ARIMA RMSE Tuned:", rmse_arima_tuned)
                     
mae_sarima = mean_absolute_error(bitcoin_daily[start_index:], predictions_sarima)
rmse_sarima = np.sqrt(mean_squared_error(bitcoin_daily[start_index:], predictions_sarima))
print("SARIMA MAE:", mae_sarima)
print("SARIMA RMSE:", rmse_sarima)

predictions_prophet = forecast[forecast['ds'].isin(test_prophet['ds'])]
mae_prophet = mean_absolute_error(test_prophet['y'], predictions_prophet['yhat'])
rmse_prophet = np.sqrt(mean_squared_error(test_prophet['y'], predictions_prophet['yhat']))

print("Prophet MAE:", mae_prophet)
print("Prophet RMSE:", rmse_prophet)