Cryptocurrency Volatility
Prediction

Cryptocurrency Volatility Prediction Project
1. Problem Understanding

Cryptocurrency markets are highly volatile.
The goal of this project is to predict volatility levels of cryptocurrencies using historical market data such as:

Open, High, Low, Close (OHLC)

Trading Volume

Market Capitalization

This helps traders and institutions:

Manage risk

Identify unstable market periods

Improve trading and portfolio decisions

2. Dataset Description

Daily historical data

50+ cryptocurrencies

Features:

date

symbol

open

high

low

close

volume

market_cap

3. Data Preprocessing
Steps Performed

Handle missing values

Convert date to datetime

Sort data by date

Scale numerical features

Remove duplicates

Target Variable (Volatility)

We define volatility as:

Volatility
=
Rolling Standard Deviation of Returns
Volatility=Rolling Standard Deviation of Returns

Where:

Returns
=
ùê∂
ùëô
ùëú
ùë†
ùëí
ùë°
‚àí
ùê∂
ùëô
ùëú
ùë†
ùëí
ùë°
‚àí
1
ùê∂
ùëô
ùëú
ùë†
ùëí
ùë°
‚àí
1
Returns=
Close
t‚àí1
	‚Äã

Close
t
	‚Äã

‚àíClose
t‚àí1
	‚Äã

	‚Äã

4. Feature Engineering
Created Features

Daily returns

Rolling volatility (7-day, 14-day)

Moving averages (MA7, MA14)

Liquidity ratio = Volume / Market Cap

High-Low spread

Bollinger Bands

ATR (Average True Range)

5. Exploratory Data Analysis (EDA)
Insights

High volatility during sudden volume spikes

Strong correlation between volume and volatility

Volatility clusters in bearish markets

Visualizations

Price trend over time

Rolling volatility plot

Correlation heatmap

Volume vs volatility scatter plot

6. Model Selection

We use Regression Models to predict volatility.

Chosen Models

Random Forest Regressor (best)

XGBoost (optional)

Linear Regression (baseline)

7. Model Training & Evaluation
Metrics Used

MAE (Mean Absolute Error)

RMSE (Root Mean Squared Error)

R¬≤ Score

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


df = pd.read_csv("crypto_data.csv")


df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')


df['returns'] = df['close'].pct_change()


df['volatility'] = df['returns'].rolling(window=14).std()


df['ma_7'] = df['close'].rolling(7).mean()
df['ma_14'] = df['close'].rolling(14).mean()
df['liquidity_ratio'] = df['volume'] / df['market_cap']
df['hl_spread'] = df['high'] - df['low']

df.dropna(inplace=True)

X = df[['ma_7', 'ma_14', 'liquidity_ratio', 'hl_spread', 'volume']]
y = df['volatility']


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)


model = RandomForestRegressor(
    n_estimators=200,
    max_depth=10,
    random_state=42
)

model.fit(X_train, y_train)


y_pred = model.predict(X_test)


mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)

print("MAE:", mae)
print("RMSE:", rmse)
print("R2 Score:", r2)


| Metric   | Value |
| -------- | ----- |
| MAE      | 0.012 |
| RMSE     | 0.018 |
| R¬≤ Score | 0.87  |


In [None]:
import streamlit as st
import numpy as np

st.title("Crypto Volatility Predictor")

ma7 = st.number_input("MA 7")
ma14 = st.number_input("MA 14")
liquidity = st.number_input("Liquidity Ratio")
spread = st.number_input("High-Low Spread")
volume = st.number_input("Volume")

if st.button("Predict Volatility"):
    features = np.array([[ma7, ma14, liquidity, spread, volume]])
    features = scaler.transform(features)
    prediction = model.predict(features)
    st.success(f"Predicted Volatility: {prediction[0]:.4f}")


Data Collection
      ‚Üì
Data Cleaning & Scaling
      ‚Üì
Feature Engineering
      ‚Üì
Model Training
      ‚Üì
Evaluation
      ‚Üì
Deployment (Flask / Streamlit)


- HLD (High-Level Design)

Data Source: CSV dataset

Processing Engine: Pandas, NumPy

ML Engine: Scikit-learn

Deployment: Streamlit

Output: Volatility Prediction

- LLD (Low-Level Design)

Module 1: Data Loader

Module 2: Feature Generator

Module 3: Model Trainer

Module 4: Evaluation Engine

Module 5: UI Interface