# Bitcoin Price Prediction Using Deep Learning Techniques

## Introduction

This notebook presents a comprehensive analysis of Bitcoin price prediction using various deep learning techniques. The project explores different approaches to financial time series forecasting, comparing traditional methods with advanced deep learning architectures.

### Project Objectives

1. Analyze and characterize Bitcoin price time series data
2. Implement and compare multiple data transformation techniques
3. Develop and evaluate deep learning models (MLPs and CNNs) for price prediction
4. Assess the impact of stationarity and fractional differencing on prediction performance
5. Explore image-based representations (GAF) for time series forecasting

### Background and Significance

Financial time series prediction remains one of the most challenging problems in quantitative finance. Traditional statistical methods often struggle with the non-linear, non-stationary nature of financial data. Deep learning approaches have shown promising results in recent years, offering new ways to capture complex patterns in price movements.

Bitcoin, as the leading cryptocurrency, presents a particularly interesting case study due to its high volatility, relatively short history, and the influence of various market factors. This project aims to contribute to the growing body of research on applying deep learning to cryptocurrency price prediction.

## Environment Setup

First, we'll import the necessary libraries and configure the environment for our analysis.

In [None]:
# Import Libraries
import yfinance as yf
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import scipy.stats as stats
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import shapiro

# Deep Learning Libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.callbacks import EarlyStopping
import keras_tuner as kt

# Image Transformation for Time Series
from pyts.image import GramianAngularField

# Visualization Settings
import matplotlib as mpl
mpl.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['axes.grid'] = True
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 12

# Suppress Warnings
import warnings
warnings.filterwarnings("ignore")

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configuration Parameters
PERFORM_HPO = False  # Set to True to perform hyperparameter optimization
WINDOW_SIZE = 30     # Number of past days to use for prediction
TRAIN_SPLIT = 0.8    # Proportion of data to use for training

print("Environment setup complete.")

## Data Collection

We'll collect Bitcoin price data using the Yahoo Finance API. This provides us with historical price information that we can use for our analysis and modeling.

In [None]:
def fetch_crypto_data(ticker, start_date, end_date):
    """
    Fetch cryptocurrency data from Yahoo Finance.
    
    Parameters:
    -----------
    ticker : str
        The ticker symbol for the cryptocurrency (e.g., 'BTC-USD')
    start_date : str
        Start date in 'YYYY-MM-DD' format
    end_date : str
        End date in 'YYYY-MM-DD' format
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame containing the historical price data
    """
    try:
        data = yf.download(ticker, start=start_date, end=end_date)
        print(f"Successfully downloaded {ticker} data from {start_date} to {end_date}")
        return data
    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

# Define the time period
start_date = "2020-01-01"
end_date = "2023-12-31"

# Download Bitcoin data
btc_data = fetch_crypto_data("BTC-USD", start_date, end_date)

# Display the first few rows of the data
if btc_data is not None:
    print("\nData Preview:")
    display(btc_data.head())
    
    print("\nData Information:")
    display(btc_data.info())
    
    print(f"\nTotal number of observations: {len(btc_data)}")

## Initial Data Exploration

Let's explore the Bitcoin price data to understand its characteristics and patterns.

In [None]:
def plot_price_history(data, title="Bitcoin Price History"):
    """
    Plot the price history of the asset.
    
    Parameters:
    -----------
    data : pandas.DataFrame
        DataFrame containing the price data
    title : str
        Title for the plot
    """
    plt.figure(figsize=(14, 7))
    
    # Plot closing price
    plt.plot(data.index, data['Close'], label='Close Price', color='#1f77b4', linewidth=2)
    
    # Add volume as area plot on secondary axis
    ax2 = plt.gca().twinx()
    ax2.fill_between(data.index, 0, data['Volume'], alpha=0.2, color='gray', label='Volume')
    ax2.set_ylabel('Volume', fontsize=14)
    ax2.tick_params(axis='y', labelsize=12)
    
    # Formatting
    plt.grid(True, alpha=0.3)
    plt.title(title, fontsize=16, fontweight='bold', pad=20)
    plt.xlabel('Date', fontsize=14)
    plt.ylabel('Price (USD)', fontsize=14)
    
    # Add legend
    lines, labels = plt.gca().get_legend_handles_labels()
    lines2, labels2 = ax2.get_legend_handles_labels()
    plt.legend(lines + lines2, labels + labels2, loc='upper left')
    
    plt.tight_layout()
    plt.show()

def calculate_summary_statistics(data):
    """
    Calculate and display summary statistics for the price data.
    
    Parameters:
    -----------
    data : pandas.DataFrame
        DataFrame containing the price data
    """
    # Calculate daily returns
    data['Daily_Return'] = data['Close'].pct_change() * 100
    
    # Summary statistics
    stats_df = pd.DataFrame({
        'Statistic': ['Mean', 'Median', 'Standard Deviation', 'Minimum', 'Maximum', 
                      'Skewness', 'Kurtosis', 'First Quartile', 'Third Quartile'],
        'Price (USD)': [data['Close'].mean(), data['Close'].median(), data['Close'].std(),
                        data['Close'].min(), data['Close'].max(), data['Close'].skew(),
                        data['Close'].kurtosis(), data['Close'].quantile(0.25), data['Close'].quantile(0.75)],
        'Daily Returns (%)': [data['Daily_Return'].mean(), data['Daily_Return'].median(), data['Daily_Return'].std(),
                            data['Daily_Return'].min(), data['Daily_Return'].max(), data['Daily_Return'].skew(),
                            data['Daily_Return'].kurtosis(), data['Daily_Return'].quantile(0.25), data['Daily_Return'].quantile(0.75)]
    })
    
    # Format the numbers
    stats_df['Price (USD)'] = stats_df['Price (USD)'].map(lambda x: f"{x:,.2f}")
    stats_df['Daily Returns (%)'] = stats_df['Daily Returns (%)'].map(lambda x: f"{x:.4f}")
    
    return stats_df

# Plot the price history
if btc_data is not None:
    plot_price_history(btc_data)
    
    # Calculate and display summary statistics
    stats_df = calculate_summary_statistics(btc_data)
    display(stats_df)

## Distribution Analysis

Let's analyze the distribution of Bitcoin prices and returns to better understand the data characteristics.

In [None]:
def plot_distributions(data):
    """
    Plot the distributions of prices and returns.
    
    Parameters:
    -----------
    data : pandas.DataFrame
        DataFrame containing the price data with 'Close' and 'Daily_Return' columns
    """
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Price distribution
    sns.histplot(data['Close'], kde=True, ax=axes[0, 0], color='#1f77b4')
    axes[0, 0].set_title('Distribution of Bitcoin Prices', fontsize=14, fontweight='bold')
    axes[0, 0].set_xlabel('Price (USD)', fontsize=12)
    axes[0, 0].set_ylabel('Frequency', fontsize=12)
    
    # Price Q-Q plot
    stats.probplot(data['Close'], dist="norm", plot=axes[0, 1])
    axes[0, 1].set_title('Q-Q Plot of Bitcoin Prices', fontsize=14, fontweight='bold')
    
    # Returns distribution
    sns.histplot(data['Daily_Return'].dropna(), kde=True, ax=axes[1, 0], color='#ff7f0e')
    axes[1, 0].set_title('Distribution of Daily Returns', fontsize=14, fontweight='bold')
    axes[1, 0].set_xlabel('Daily Return (%)', fontsize=12)
    axes[1, 0].set_ylabel('Frequency', fontsize=12)
    
    # Returns Q-Q plot
    stats.probplot(data['Daily_Return'].dropna(), dist="norm", plot=axes[1, 1])
    axes[1, 1].set_title('Q-Q Plot of Daily Returns', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    # Perform normality tests
    price_shapiro = shapiro(data['Close'])
    returns_shapiro = shapiro(data['Daily_Return'].dropna())
    
    print("Normality Tests (Shapiro-Wilk):")
    print(f"Price: W={price_shapiro[0]:.4f}, p-value={price_shapiro[1]:.8f}")
    print(f"Daily Returns: W={returns_shapiro[0]:.4f}, p-value={returns_shapiro[1]:.8f}")
    print("\nInterpretation:")
    print("- p-value < 0.05 indicates the data significantly deviates from a normal distribution.")
    print("- p-value > 0.05 suggests the data may follow a normal distribution.")

# Plot distributions
if btc_data is not None:
    plot_distributions(btc_data)

## Time Series Stationarity Analysis

Stationarity is a crucial property for time series analysis. A stationary time series has constant statistical properties over time, making it more predictable. Let's analyze the stationarity of our Bitcoin price data.

In [None]:
def test_stationarity(time_series, title="", max_lags=40):
    """
    Test and visualize the stationarity of a time series.
    
    Parameters:
    -----------
    time_series : pandas.Series
        The time series to test
    title : str
        Title for the plots
    max_lags : int
        Maximum number of lags for ACF and PACF plots
    """
    # Create figure with subplots
    fig, axes = plt.subplots(3, 1, figsize=(14, 15))
    
    # Plot the time series
    axes[0].plot(time_series.index, time_series.values, color='#1f77b4', linewidth=1.5)
    axes[0].set_title(f"{title} Time Series", fontsize=14, fontweight='bold')
    axes[0].set_xlabel('Date', fontsize=12)
    axes[0].set_ylabel('Value', fontsize=12)
    axes[0].grid(True, alpha=0.3)
    
    # Plot ACF
    plot_acf(time_series.dropna(), lags=max_lags, ax=axes[1], alpha=0.05)
    axes[1].set_title(f"Autocorrelation Function (ACF)", fontsize=14, fontweight='bold')
    axes[1].grid(True, alpha=0.3)
    
    # Plot PACF
    plot_pacf(time_series.dropna(), lags=max_lags, ax=axes[2], alpha=0.05)
    axes[2].set_title(f"Partial Autocorrelation Function (PACF)", fontsize=14, fontweight='bold')
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Perform Augmented Dickey-Fuller test
    adf_result = adfuller(time_series.dropna())
    
    print(f"Augmented Dickey-Fuller Test Results for {title}:")
    print(f"ADF Statistic: {adf_result[0]:.4f}")
    print(f"p-value: {adf_result[1]:.8f}")
    print("Critical Values:")
    for key, value in adf_result[4].items():
        print(f"\t{key}: {value:.4f}")
    
    print("\nInterpretation:")
    if adf_result[1] <= 0.05:
        print("- The time series is stationary (reject the null hypothesis of a unit root).")
    else:
        print("- The time series is non-stationary (fail to reject the null hypothesis of a unit root).")
    print("- A stationary time series has constant statistical properties over time.")
    print("- Non-stationary data often requires transformation before modeling.")

# Test stationarity of the closing price
if btc_data is not None:
    test_stationarity(btc_data['Close'], title="Bitcoin Price")

## Data Transformation: Log Returns

Since the raw price data is likely non-stationary, we'll transform it using log returns, which is a common approach in financial time series analysis.

In [None]:
def calculate_log_returns(data):
    """
    Calculate log returns from price data.
    
    Parameters:
    -----------
    data : pandas.DataFrame
        DataFrame containing the price data with a 'Close' column
        
    Returns:
    --------
    pandas.Series
        Series containing the log returns
    """
    log_returns = np.log(data['Close'] / data['Close'].shift(1))
    return log_returns

# Calculate log returns
if btc_data is not None:
    btc_data['Log_Returns'] = calculate_log_returns(btc_data)
    
    # Plot log returns
    plt.figure(figsize=(14, 7))
    plt.plot(btc_data.index, btc_data['Log_Returns'], color='#ff7f0e', linewidth=1)
    plt.title('Bitcoin Log Returns', fontsize=16, fontweight='bold', pad=20)
    plt.xlabel('Date', fontsize=14)
    plt.ylabel('Log Returns', fontsize=14)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    # Test stationarity of log returns
    test_stationarity(btc_data['Log_Returns'].dropna(), title="Log Returns")

## Summary of Initial Analysis

In this first part of our analysis, we've:

1. Set up our environment with all necessary libraries
2. Collected Bitcoin price data from Yahoo Finance
3. Explored the basic characteristics of the price data
4. Analyzed the distribution of prices and returns
5. Tested the stationarity of the raw price series
6. Transformed the data using log returns and tested its stationarity

In the next parts, we'll:
- Implement fractional differencing for improved stationarity while preserving long-term memory
- Prepare data for deep learning models
- Build and train MLP models on both raw and transformed data
- Create GAF image representations and build CNN models
- Evaluate and compare the performance of different approaches