Skip to content

csdeepak/ML_mini_project

Repository files navigation

Sediment Particle Size Prediction (sedpred-project)

UE23CS352A — Machine Learning Mini Project

Team Members:

  • C S Deepak
  • Dareddy Devesh reddy
    Project Duration: Sept 29 – Oct 13, 2025
    Faculty: D uma

Project Overview

This project replicates the research paper

Machine Learning for Predicting Sediment Particle Size Distributions” — Galen Egan, Stanford University

  • We apply machine learning models to predict:
  • Median sediment particle diameter (d₅₀)
  • Particle size distribution variance (σ²)
    using water-quality and hydrodynamic features such as salinity, temperature, and velocity.

Objectives

  • Implement Random Forest and SVR models to estimate d₅₀ and σ²
  • Analyze feature importance
  • Compare model performance using R² scores
  • Visualize prediction accuracy and model stability (OOB curve)

Dataset

Data Since the original dataset link is no longer available, we generated a synthetic dataset (~1648 samples) that simulates real marine sediment data.

Feature Description
S Salinity (ppt)
ub Near-bottom wave velocity (m/s)
np Particle refractive index
T Water temperature (°C)
a676/a650 Organic peak ratio
a450/a676 Inorganic peak ratio
chl_a Chlorophyll-a concentration
u Mean tidal velocity

Targets:

  • d₅₀: Median particle diameter
  • σ²: Particle size distribution variance

⚙️ Model Pipeline

1️⃣ Data Preprocessing

01_EDA.ipynb

  • Generated synthetic dataset → data/synthetic_data.csv
  • Scaled and cleaned data using StandardScaler

2️⃣ Modeling

02_models.ipynb

  • RandomForestRegressor (n_estimators=32, oob_score=True)
  • Support Vector Regressor (SVR) (C=2048, epsilon=4)
  • R² score for each target variable
  • Out-of-Bag (OOB) score curve
  • Feature importance analysis

Results

Model R² (d₅₀) R² (σ²)
Random Forest 0.84047 0.914699
SVR 0.826608 0.928971

Conclusion:
Both models perform comparably, with Random Forest slightly outperforming SVR and providing clearer feature importance insights.


Visual Outputs

All plots are saved in the figures :

  • feature_importance_d50.png
  • oob_score_curve
  • predicted vs actual

Repository Structure

sedpred-project/
├─ data.zip         # raw and processed datasets (not pushed to repo if large)
├─ notebooks/
│   ├─ 01_EDA.ipynb      # exploratory data analysis
│   ├─ 02_models.ipynb   # baseline models (RF, SVR)
├─ figures.zip           # generated plots & figures
├─ Final_result .zip  
├─requirements.txt      # dependencies
├─ .gitignore
├─ README.md             # project documentation
└─ one_page_writeup.pdf  # final report (to be added)



---

##  Requirements

Install dependencies:
```bash
pip install -r requirements.txt

About

Machine learning for predicting sediment particle size distributions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published