# 01 - Data Exploration (BTC-USD)

This notebook performs a quick Exploratory Data Analysis (EDA) on **BTC-USD** price data downloaded from Yahoo Finance via `yfinance`.

**Goal:** understand price dynamics, returns distribution, and volatility regimes.

> Educational portfolio project. Not financial advice.

## 0) Setup

In [None]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

print("Imports OK")


## 1) Download BTC-USD

In [None]:
TICKER = "BTC-USD"
START = "2012-01-01"
END = None  # None -> up to latest available

df = yf.download(TICKER, start=START, end=END, progress=False)
df = df.dropna().copy()
df.head()


## 2) Close Price

In [None]:
plt.figure()
plt.plot(df.index, df["Close"])
plt.title("BTC-USD Close Price")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.show()


## 3) Returns (log)

In [None]:
df_eda = df.copy()
df_eda["log_return"] = np.log(df_eda["Close"]).diff()
df_eda = df_eda.dropna()

df_eda[["Close", "log_return"]].head()


In [None]:
plt.figure()
plt.plot(df_eda.index, df_eda["log_return"])
plt.title("BTC-USD Daily Log Returns")
plt.xlabel("Date")
plt.ylabel("Log return")
plt.show()


## 4) Returns distribution (heavy tails)

In [None]:
plt.figure()
plt.hist(df_eda["log_return"], bins=200)
plt.title("Distribution of Daily Log Returns (BTC-USD)")
plt.xlabel("Log return")
plt.ylabel("Frequency")
plt.show()

print("Mean:", float(df_eda["log_return"].mean()))
print("Std :", float(df_eda["log_return"].std()))
print("Min :", float(df_eda["log_return"].min()))
print("Max :", float(df_eda["log_return"].max()))


## 5) Volatility regimes (rolling volatility)

In [None]:
window = 30
df_eda["roll_vol_30d"] = df_eda["log_return"].rolling(window).std()

plt.figure()
plt.plot(df_eda.index, df_eda["roll_vol_30d"])
plt.title("Rolling Volatility (30-day std of log returns)")
plt.xlabel("Date")
plt.ylabel("Volatility")
plt.show()


## 6) Notes
- BTC exhibits strong bull/bear regimes.
- Volatility clusters and fat tails make generalization hard.
- This motivates multi-window evaluation and walk-forward testing for RL agents.
