# High-Level Design (HLD) Document

**Project Title**: Cryptocurrency Liquidity Prediction for Market Stability
**Objective**:To build a machine learning model that predicts the liquidity of cryptocurrencies using historical price, volume, and market cap data.

**System Overview**:The system will ingest historical cryptocurrency data, perform exploratory data analysis (EDA), train a machine learning model to predict the volume-to-market-cap ratio (as a liquidity indicator), and evaluate the model performance.

**Components**:
1. Data Ingestion: CSV files from CoinGecko.
2. Data Preprocessing: Cleaning, missing value handling, feature engineering.
3. EDA: Visualization of relationships and correlations.
4. Feature Engineering: Creation of liquidity ratio.
5. Model Training: RandomForestRegressor.
6. Model Evaluation: MAE, RMSE, R2.
7. Artifacts: Cleaned dataset, trained model, Jupyter notebooks.

**Technology Stack**:

* Python
* Jupyter Notebook
* pandas, scikit-learn, seaborn, matplotlib

**Users**:

* Data Scientists
* Financial Analysts

**Deliverables**:

* Cleaned data file
* EDA notebook
* Model training notebook
* Trained model (pickle format)
* Documentation (HLD, LLD, reports)

# Low-Level Design (LLD) Document

Modules & Functions:
1. load_data()
    * Reads and merges CSV files into a single DataFrame.
2. clean_data(df)
    * Fills missing values with column means.
    * Ensures data consistency.
3. feature_engineering(df)
    * Adds volume-to-market-cap ratio as liquidity feature.
4. perform_eda(df)
    * Generates correlation heatmap.
    * Describes distribution of features.
5. train_model(X, y)
    * Splits data.
    * Trains RandomForest model.
    * Returns trained model and evaluation metrics.
6. evaluate_model(model, X_test, y_test)
    * Computes MAE, RMSE, R2.

**Data Flow**:
CSV files -> DataFrame -> Cleaned Data -> EDA/Feature Engineering -> ML Model -> Predictions & Evaluation

## File Structure:
* coin_gecko_2022-03-16.csv
* coin_gecko_2022-03-17.csv
* cleaned_crypto_data.csv
* crypto_eda.ipynb
* crypto_model.ipynb
* crypto_liquidity_model.pkl
* documentation (PDF/DOCX export)

Security & Integrity:
* Validations for missing/invalid data
* Consistent data types across stages

Assumptions:
* Volume and market cap data are correctly scaled.
* Time-related features are not essential for this model.

Limitations:
* Small dataset (only two days)
* Not time-series based
* No external economic indicators considered

In [None]:
from os import write
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import joblib
import nbformat

# Load the dataset
df = pd.read_csv('cleaned_crypto_data.csv')

# Features and target
features = ['price', '1h', '24h', '7d', '24h_volume', 'mkt_cap']
target = 'volume_mktcap_ratio'

X = df[features]
y = df[target]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions and evaluation
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

# Save model
model_path = "/mnt/data/crypto_liquidity_model.pkl"
joblib.dump(model, model_path)

nb_model = nbformat.v4.new_notebook()
nb_model = v4.new_notebook()
nb_model.cells = [
    v4.new_markdown_cell("# Model Training - Predicting Liquidity (volume to market cap ratio)"),
    v4.new_code_cell("import pandas as pd\nfrom sklearn.model_selection import train_test_split\n"
                     "from sklearn.ensemble import RandomForestRegressor\nfrom sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n"
                     "import joblib\n\n"
                     "df = pd.read_csv('cleaned_crypto_data.csv')\n\n"
                     "features = ['price', '1h', '24h', '7d', '24h_volume', 'mkt_cap']\n"
                     "target = 'volume_mktcap_ratio'\n"
                     "X = df[features]\ny = df[target]\n"
                     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n"
                     "model = RandomForestRegressor(n_estimators=100, random_state=42)\nmodel.fit(X_train, y_train)\n"
                     "y_pred = model.predict(X_test)\n\n"
                     "mae = mean_absolute_error(y_test, y_pred)\n"
                     "rmse = mean_squared_error(y_test, y_pred, squared=False)\n"
                     "r2 = r2_score(y_test, y_pred)\n\n"
                     "print(f'MAE: {mae:.4f}\\nRMSE: {rmse:.4f}\\nR²: {r2:.4f}')\n\n"
                     "joblib.dump(model, 'crypto_liquidity_model.pkl')"),
]

notebook_model_path = "/mnt/data/crypto_model.ipynb"
with open(notebook_model_path, 'w') as f:
    write(nb_model, f)

notebook_model_path, model_path
