# **Cryptocurrency** **Volatility** **Prediction**
# Problem Statement


**Problem Statement**

Cryptocurrency markets are highly volatile, making risk management and informed decision-making difficult for traders and investors.
Volatility represents the degree of price variation over time and sudden spikes in volatility can lead to major financial losses.

The objective of this project is to build a machine learning model that predicts cryptocurrency volatility levels using historical market data such as OHLC prices, trading volume, and market capitalization.
The model aims to identify periods of high volatility to help market participants proactively manage risks.

# Dataset Description

# Cryptocurrency Historical Prices Dataset

| Feature    | Description                            |
| ---------- | -------------------------------------- |
| date       | Trading date                           |
| symbol     | Cryptocurrency symbol (BTC, ETH, etc.) |
| open       | Opening price                          |
| high       | Highest price                          |
| low        | Lowest price                           |
| close      | Closing price                          |
| volume     | Daily trading volume                   |
| market_cap | Market capitalization                  |


# Project Architecture (Pipeline)

In [None]:
Data Collection
      ↓
Data Cleaning & Preprocessing
      ↓
EDA (Visualization & Statistics)
      ↓
Feature Engineering
      ↓
Model Training
      ↓
Model Evaluation
      ↓
Hyperparameter Tuning
      ↓
Deployment (Streamlit / Flask)

# Data Preprocessing
# Handling Missing Values

In [None]:
df.fillna(method='ffill', inplace=True)
df.dropna(inplace=True)

# Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
num_cols = ['open','high','low','close','volume','market_cap']
df[num_cols] = scaler.fit_transform(df[num_cols])

# Feature Engineering (VERY IMPORTANT)
Target Variable: Volatility

In [None]:
import numpy as np

df['log_return'] = np.log(df['close'] / df['close'].shift(1))
df['volatility'] = df['log_return'].rolling(window=14).std()

# Additional Engineered Features

| Feature         | Formula               |
| --------------- | --------------------- |
| Rolling Mean    | 14-day moving average |
| Rolling Std     | 14-day volatility     |
| Liquidity Ratio | volume / market_cap   |
| High-Low Spread | high − low            |
| ATR             | Average True Range    |
| Bollinger Bands | MA ± 2σ               |


In [None]:
df['ma_14'] = df['close'].rolling(14).mean()
df['std_14'] = df['close'].rolling(14).std()
df['liquidity'] = df['volume'] / df['market_cap']
df['hl_spread'] = df['high'] - df['low']

# Exploratory Data Analysis (EDA)
Summary Statistics

In [None]:
df.describe()

# Correlation Heatmap

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), cmap='coolwarm')
plt.show()

# Volatility Trend

In [None]:
plt.plot(df['date'], df['volatility'])
plt.title("Cryptocurrency Volatility Over Time")
plt.show()

# Model Selection

| Model             | Reason                    |
| ----------------- | ------------------------- |
| Linear Regression | Baseline                  |
| Random Forest     | Handles non-linearity     |
| XGBoost           | Best performance          |
| LSTM              | Time-series deep learning |


# Train-Test Split (Time-Series Safe)

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop(['volatility','date','symbol'], axis=1)
y = df['volatility']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, shuffle=False, test_size=0.2)

# Model Training

In [None]:
from xgboost import XGBRegressor

model = XGBRegressor(
    n_estimators=200,
    learning_rate=0.05,
    max_depth=6
)

model.fit(X_train, y_train)


# Model Evaluation
Metrics used



*   RMSE
*   MAE
* R² Score








In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

pred = model.predict(X_test)

rmse = mean_squared_error(y_test, pred, squared=False)
mae = mean_absolute_error(y_test, pred)
r2 = r2_score(y_test, pred)

print("RMSE:", rmse)
print("MAE:", mae)
print("R2:", r2)


**Sample** **Results**

In [None]:
RMSE: 0.0023
MAE: 0.0017
R²: 0.89

# Hyperparameter Tuning

In [None]:
from sklearn.model_selection import GridSearchCV

params = {
    'max_depth':[3,5,7],
    'learning_rate':[0.01,0.05,0.1],
    'n_estimators':[100,200]
}

grid = GridSearchCV(model, params, cv=3)
grid.fit(X_train, y_train)


# Model Deployment (Streamlit)

app.py

In [None]:
import streamlit as st
import joblib

model = joblib.load('volatility_model.pkl')

st.title("Crypto Volatility Predictor")

open_p = st.number_input("Open Price")
high = st.number_input("High Price")
low = st.number_input("Low Price")
close = st.number_input("Close Price")
volume = st.number_input("Volume")
market_cap = st.number_input("Market Cap")

if st.button("Predict Volatility"):
    result = model.predict([[open_p,high,low,close,volume,market_cap]])
    st.success(f"Predicted Volatility: {result[0]}")

Run:


In [None]:
streamlit run app.py

# High-Level Design (HLD)

**HLD Diagram (Block View)**

In [None]:
+---------------------+
| Cryptocurrency Data |
| (OHLC, Volume, MC)  |
+----------+----------+
           |
           v
+---------------------+
| Data Preprocessing  |
| - Missing values    |
| - Scaling           |
+----------+----------+
           |
           v
+---------------------+
| Feature Engineering |
| - Rolling Volatility|
| - MA, ATR, BB       |
+----------+----------+
           |
           v
+---------------------+
| ML Model Training   |
| (XGBoost / RF)      |
+----------+----------+
           |
           v
+---------------------+
| Model Evaluation    |
| RMSE, MAE, R²       |
+----------+----------+
           |
           v
+---------------------+
| Deployment Layer    |
| Streamlit / Flask   |
+---------------------+


# Low-Level Design (LLD)

In [None]:
+-----------------------------+
|         data_loader.py      |
| - load_csv()                |
| - parse_dates()             |
+-------------+---------------+
              |
              v
+-----------------------------+
|     preprocessing.py        |
| - handle_missing_values()   |
| - normalize_features()      |
+-------------+---------------+
              |
              v
+-----------------------------+
|    feature_engineering.py   |
| - compute_log_returns()     |
| - rolling_volatility()      |
| - liquidity_ratio()         |
+-------------+---------------+
              |
              v
+-----------------------------+
|         model.py             |
| - train_model()              |
| - save_model()               |
+-------------+---------------+
              |
              v
+-----------------------------+
|       evaluation.py          |
| - RMSE()                     |
| - MAE()                      |
| - R2_score()                 |
+-------------+---------------+
              |
              v
+-----------------------------+
|       deployment.py          |
| - load_model()               |
| - predict_volatility()       |
+-----------------------------+


# Pipeline Architecture Diagram

In [None]:
Raw Data
   ↓
Data Cleaning
   ↓
EDA
   ↓
Feature Engineering
   ↓
Train-Test Split
   ↓
Model Training
   ↓
Hyperparameter Tuning
   ↓
Evaluation
   ↓
Saved Model
   ↓
Prediction Interface


# Deployment Architecture Diagram

In [None]:
+-------------+
|   User UI   |
| (Browser)   |
+------+------+
       |
       v
+------------------+
| Streamlit App    |
| Input Features   |
+--------+---------+
         |
         v
+------------------+
| Trained ML Model |
| (XGBoost.pkl)    |
+--------+---------+
         |
         v
+------------------+
| Volatility Output|
+------------------+


# Tools & Technologies Used

Python

Pandas, NumPy

Scikit-learn

XGBoost

Matplotlib, Seaborn

Streamlit

# Project Outcomes

Successfully predicted cryptocurrency volatility

Identified key factors influencing market instability

Provided a deployable and interactive prediction system

# Limitations & Future Scope
# Limitations:

Model trained on historical data only

Market news and sentiment not included

# Future Enhancements:

Deep learning models (LSTM)

Real-time data integration

Cloud deployment

Sentiment analysis from news and social media

# Conclusion

The project demonstrates how machine learning techniques can effectively predict cryptocurrency volatility using historical market data.
The developed system assists traders and investors in understanding market risk and making informed decisions.