Skip to content

This project predicts Benzene (C6H6) concentration using the UCI Air Quality dataset. It involves cleaning the data, training Linear Regression, Random Forest (RF), and Gradient Boosting models, and evaluating their performance. The RF and GB models achieved high accuracy (R²=0.998), with analysis revealing potential data leakage from a key feature

License

Notifications You must be signed in to change notification settings

anandmsak/Environmental-Monitoring-Project_Week-2

Repository files navigation

Week 2 – AICTE Internship Milestone (GitHub Repo) This repository tracks Week 2 work for the AICTE internship on the AI/ML track.

You'll upload this repo link to the LMS "Week 2" submission.

Project Scaffold The project structure remains the same, but now includes a models/ directory for your saved machine learning model.

. ├── notebooks/ # Jupyter notebooks (Week2_Environmental_Monitoring.ipynb) ├── src/ # Python source code (.py) ├── models/ # Saved models (e.g., rf_model.pkl) ├── reports/ │ └── figures/ # Charts saved from notebook ├── data/ # Local data (kept out of Git) ├── requirements.txt # Python deps (AI/ML track) ├── LICENSE # MIT License ├── .gitignore # Ignore junk / large data └── README.md # You are here Week 2 Scope (Suggested) AI/ML: Clean and prepare the dataset, train multiple advanced models (e.g., Random Forest, Gradient Boosting), evaluate their performance with metrics (RMSE, R², MAE), visualize the results (e.g., actual vs. predicted, feature importance), and save the best model file.

Power BI: Refine the data model, add DAX calculated columns/measures, implement drill-through or advanced filtering, and publish the report to the Power BI service.

How to Run (AI/ML) Create a virtual environment and install dependencies:

Bash

pip install -r requirements.txt Put the raw AirQualityUCI.csv file into the data/ directory.

Use notebooks/Week2_Environmental_Monitoring.ipynb for the complete workflow from data cleaning to model evaluation.

The notebook will automatically save charts to reports/figures/ and the trained model to models/.

Improvisations (for LMS comment box) In the LMS Week 2 submission, add a brief note about your specific accomplishments, for example:

"Compared three regression models (Linear, RF, GB) and evaluated them using RMSE and R²."

"Analyzed feature importances from the Random Forest model and identified PT08.S2(NMHC) as a key predictor, noting potential data leakage."

"Created visualizations for model performance and data distribution, saving the best model using joblib."

Checklist (Before Submitting GitHub Link to LMS) [ ] Repo is Public and named Week 2.

[ ] README.md is updated with your project details, steps, and outcomes.

[ ] At least one notebook is in notebooks/ showing the full training and evaluation process.

[ ] The best-performing model is saved in the models/ directory (e.g., rf_model.pkl).

[ ] All visualization outputs (.png files) are saved in reports/figures/.

[ ] Commit message is clear, e.g., feat(week2): model training and evaluation.

Meta Owner: Anandha Krishnan P (Intern)

Date: 2025-09-06

Project: Environmental Monitoring & Pollution Control (AI/ML) This repository contains the Week 2 notebook Week2_Environmental_Monitoring.ipynb, which focuses on training and evaluating machine learning models to predict Benzene (C6H6(GT)) concentration from sensor data. The key outcomes include a performance comparison of Linear Regression, Random Forest, and Gradient Boosting models, along with visualizations of feature importance and prediction accuracy.

About

This project predicts Benzene (C6H6) concentration using the UCI Air Quality dataset. It involves cleaning the data, training Linear Regression, Random Forest (RF), and Gradient Boosting models, and evaluating their performance. The RF and GB models achieved high accuracy (R²=0.998), with analysis revealing potential data leakage from a key feature

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published