# Bitcoin Historical Data Visualization and Analysis with Plotly and Dash

## Project Overview
This notebook aims to explore and visualize Bitcoin historical data to uncover trends and insights in Bitcoin's price movements and trading volume over time. We will use the Bitcoin data available from the [Kaggle Bitcoin Historical Data](https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data) dataset, and leverage **Plotly** and **Dash** for creating interactive visualizations and dashboards.

### Note on File Size Rendering for Sharing
Due to file size, this notebook may not render fully on some platforms (GITHUB *cough*). To view the graphs and interact with the data, please download and run it locally in Jupyter for full functionality with Dash and Plotly.

### Dataset Columns:
- **Timestamp**: The time of the data point (Unix timestamp).
- **Open**: Opening price of Bitcoin for the given time period.
- **High**: The highest price of Bitcoin during the given time period.
- **Low**: The lowest price of Bitcoin during the given time period.
- **Close**: Closing price of Bitcoin for the given time period.
- **Volume**: The trading volume of Bitcoin in the given time period.

## Goals of the Analysis

### 1. **Data Preprocessing and Cleaning and Exploratory Data Analysis (EDA)**
   - **Load the dataset**: Import the dataset into a Pandas DataFrame.
   - **Handle missing values**: Identify and address any missing data points.
   - **Timestamp conversion**: Convert the Unix timestamps into a human-readable datetime format for easier analysis.
   - **Summary Statistics**: Calculate summary statistics like mean, median, and standard deviation for each numeric column.

### 2. **Interactive Visualization and Dashboard with Plotly & Dash**
   - **Data Distribution**: Visualize the distribution of Bitcoin prices (Open, High, Low, Close) and Volume over years 2014-2018.
   - **Candlestick Chart**: Display Bitcoin price movements (Open, High, Low, Close).
   - **Dash App for Exploration**: Create an interactive dashboard with dropdowns, sliders, and filters.
   - **Dynamic Updates**: Enable real-time interaction with visual components.

## Tools and Libraries
We will be using the following libraries to read, clean, analyze, and visualize the data:
- **Pandas**: For data manipulation and analysis.
- **Plotly**: For creating interactive visualizations such as line charts, candlestick charts, and bar charts.
- **Dash**: For building interactive dashboards that allow users to explore and interact with the data.
- **NumPy**: For numerical operations.

## Expected Outcomes
By the end of this notebook, we aim to:
1. Gain a deeper understanding of Bitcoin's price movements over time.
2. Identify periods of high volatility, bull markets, and bear markets.
3. Build interactive visualizations that allow us to explore Bitcoin's historical data in a meaningful way.
4. Create a user-friendly dashboard for users to interact with and analyze the data.
5. Provide insights into the relationship between Bitcoin price and trading volume.

We will start by loading the dataset, performing initial exploration, and setting up the interactive visuals with Plotly and Dash

---

## **(1) Data Preprocessing and Cleaning and Exploratory Data Analysis (EDA)**

See below for the initial configuration in order to work with our data

---

In [None]:
# Importing required modules
import pandas as pd
import numpy as np
import dash # You probably need to run '!pip install dash' in a command cell if you're first-time trying to replicate this experiment
from dash import dcc, html
from dash.dependencies import Input, Output
import plotly.express as px
import plotly.graph_objects as go
import os # Reach the directory structure

# Creating the initial data frame to read data from:
df = pd.read_csv("btcusd_1-min_data.csv")

# Read the data for the first time to see all data columns:
df.info()

#### Handle missing or null values that could skew analysis before proceeding

In [None]:
# Check for missing values in each column
df.isnull().sum()

# Remove rows with any missing values to create a cleaner data set
df_cleaned = df.dropna()

# Verify if there are any missing values left
df_cleaned.isnull().sum()

#### Convert Unix timestamps to human-readable foramts for easier analysis with Pandas

In [None]:
# Convert the 'Timestamp' column to datetime format from Unix time (in seconds) where the Kaggle page mentions this caveat
# Note: We're using '.loc' to make it clear we're using a copy of the data to explicitly indicate we're modifying a specific column
df_cleaned.loc[:, "Timestamp"] = pd.to_datetime(df_cleaned["Timestamp"], unit="s") 

# Display the first few rows to verify the conversion
df_cleaned.head()

### Because all of that above data is exactly the same, we can filter on a different year (and notice that information is printing minutes apart)
This will help us ascertain that we're working with legit data, see here for a filter on the year 2017 where we see changing data. This makes sense as Bitcoin was relatively inactive during 2012 but by 2017 it started to gain market activity with real trading volume and price fluctuations.

In [None]:
# Filter for data from the year 2017
df_2017 = df_cleaned[df_cleaned["Timestamp"].dt.year == 2017]

# Display the first few rows to verify the filtering
print(df_2017.head())

Check data types and summary statistics for each numeric column to ensure all data is 'good data', reminder: **we're using the df_cleaned data frame**

In [None]:
# Check data types of all columns
print(df_cleaned.dtypes) # Where float64 = numeric columns, Where datetime = timestamp
print("-"*100)
# Get summary statistics for the numeric columns (mean, standard deviation, min, max, and quartiles)
print(f"\nSummary Statistics for Bitcoin Historical Data:")
df_cleaned.describe()

## (2) Data Visualization with Plotly & Building Dashboards with Dash

### Visualize the distribution of Bitcoin prices (Open, High, Low, Close) and Volume over years 2014-2018 with Plotly

In [None]:
# Filter data for 2014-2018 and create a copy to avoid warnings
df_filtered = df_cleaned[(df_cleaned["Timestamp"].dt.year >= 2014) & (df_cleaned["Timestamp"].dt.year <= 2018)].copy()

# Add a 'Year' column for easier grouping
df_filtered["Year"] = df_filtered["Timestamp"].dt.year

# Price Distribution - Histogram for Close prices
fig_price_dist = px.histogram(df_filtered, x="Close", color=df_filtered["Year"].astype(str), nbins=100, title="Bitcoin Price Distribution (2014-2018)")
fig_price_dist.show()

# Closing Price Over Time - Line Chart
fig_price_trend = px.line(df_filtered, x="Timestamp", y="Close", color=df_filtered["Year"].astype(str), title="Bitcoin Closing Price Trends (2014-2018)")
fig_price_trend.show()

# Volume Trends Over Time - Line Chart
fig_volume_trend = px.line(df_filtered, x="Timestamp", y="Volume", color=df_filtered["Year"].astype(str), title="Bitcoin Trading Volume Trends (2014-2018)")
fig_volume_trend.show()


### Creating a Candlestick Chart using Plotly; Building a Basic Dash App Layout (with dropdowns and sliders); Implementing callbacks for interactivity from the filtered data
This code creates a Dash web app to visualize Bitcoin price data (2014-2018) with a candlestick chart. It includes a date range picker to filter the data dynamically, and the chart updates based on the selected range. The app runs within a Jupyter notebook to display interactive visualizations.

In [None]:
# Initialize Dash App
app = dash.Dash(__name__)

# Layout
app.layout = html.Div([
    html.H1("Bitcoin Historical Data Visualization", style={"textAlign": "center"}),

    # Dropdown to select data range
    html.Label("Select Date Range:"),
    dcc.DatePickerRange(
        id="date-picker",
        start_date=df_filtered["Timestamp"].min().date(),  # Use min date from the filtered data
        end_date=df_filtered["Timestamp"].max().date(),    # Use max date from the filtered data
        display_format="YYYY-MM-DD"
    ),

    # Candlestick chart
    dcc.Graph(id="candlestick-chart"),

    # Interval component for dynamic updates (if necessary)
    dcc.Interval(
        id="interval-component",
        interval=60000,  # 60 seconds
        n_intervals=0
    )
])

# Callback to update the candlestick chart
@app.callback(
    dash.dependencies.Output("candlestick-chart", "figure"),
    [
        dash.dependencies.Input("date-picker", "start_date"),
        dash.dependencies.Input("date-picker", "end_date"),
        dash.dependencies.Input("interval-component", "n_intervals"),
    ],
)
def update_candlestick_chart(start_date, end_date, _):
    # Ensure the date range is in the correct format
    start_date = pd.to_datetime(start_date)
    end_date = pd.to_datetime(end_date)
    
    # Filter the data based on the selected date range from the date picker
    filtered_df = df_filtered[
        (df_filtered["Timestamp"] >= start_date) & (df_filtered["Timestamp"] <= end_date)
    ]
    
    # Ensure there is data to plot
    if filtered_df.empty:
        return go.Figure()  # Return an empty figure if no data matches the selected range

    # Create the candlestick chart
    fig = go.Figure(data=[
        go.Candlestick(
            x=filtered_df["Timestamp"],
            open=filtered_df["Open"],
            high=filtered_df["High"],
            low=filtered_df["Low"],
            close=filtered_df["Close"]
        )
    ])
    
    # Update the layout of the chart
    fig.update_layout(
        title="Bitcoin Price Movements (OHLC)",
        xaxis_title="Date",
        yaxis_title="Price (USD)",
        xaxis_rangeslider_visible=False
    )
    
    return fig

# Run the app (this will display the Dash app in the notebook)
if __name__ == "__main__":
    app.run_server(debug=True, use_reloader=False)  # Ensure the server runs properly in a Jupyter notebook
