This project analyzes and predicts acid rain behavior (specifically pH levels) across different regions of India using air quality and chemical ion concentration data. The solution includes data cleaning, feature engineering, exploratory analysis, machine learning modeling, and clustering to identify pollution hotspots and their correlation with acidic precipitation.
- Name:
acid_rain_pan_india_cleaned.csv - Size: ~30,000+ records
- Features Include:
- SO2, NOx, PM2.5, RSPM, SPM
- pH levels (target variable)
- Sulfate, Nitrate, Ammonium, Calcium, Chloride
- TDS, Conductivity
- Location, State, Season
- Predict the pH level of rainfall using pollutant concentrations.
- Identify which cities are most affected by acid rain precursors (SO2, NOx).
- Use clustering to detect geographical zones at high risk.
- Build ML models to quantify environmental damage potential.
- Python 3.x
- pandas, numpy, seaborn, matplotlib
- scikit-learn (Linear Regression, Random Forest, KMeans)
- Heatmaps for feature correlation
- Seasonal boxplots of pH levels
- Top 10 cities by SOβ pollution
- Scatter plots with clustering on SOβ vs pH
- Used for baseline prediction of pH
- RΒ² Score: ~0.70
- Captures nonlinear interactions between pollutants and pH
- RΒ² Score: ~0.88 (Best Performing)
- Clustered cities based on pollution and pH risk
- 4 risk clusters visually separated on SOβ vs pH plots
- Mean Absolute Error (MAE): ~0.18
- Mean Squared Error (MSE): ~0.06
- RΒ² Score (Accuracy):
- Linear Regression: 70%
- Random Forest: 88%
βββ Project.ipynb
βββ acid_rain_pan_india_cleaned.csv
βββ project.py
βββ README.md