Air Pollution in Madrid

Data Science project

King Faisal Universisty

Project Description

This project is about building a model that can predict the Air quality in Madrid based on 170k rows of an hourly recorded data from 2001 to 2022

Dataset

The dataset is downloaded from Kaggle and can be found here

Project Structure

├── data
│   ├── MadridPolution2001-2022.csv
│   ├── MadridPolution2001-2022_cleaned.csv
|
├── notebooks
│   ├── 1 - EDA - Air pollution.ipynb
│   ├── 2 - Data preprocessing - Air pollution.ipynb
│   |── 3 - Modelling implementation & assessment Air pollution.ipynb
|
├── README.md

Project Steps

Exploratory Data Analysis
1. Found that the data is not clean and need to be cleaned
2. remove the outliers
3. find the correlation between the features
4. find the distribution of the features
Data Preprocessing
1. fill the missing values
2. group the data by year and month
3. get the mean of each year
4. split the data into train and test
Modelling implementation & assessment
1. build 3 models
  1. Logistic Regression
  2. Random Forest
  3. XGBoost
2. evaluate the models
  1. accuracy
  2. precision
  3. recall
3. choose the best model
  1. XGBoost
  2. accuracy: 0.77
  3. precision: 0.77
  4. recall: 0.77
  5. f1-score: 0.77

Project Results

The best model is XGBoost with an accuracy of 0.77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

1 - EDA - Air pollution.ipynb

1 - EDA - Air pollution.ipynb

2 - Data preprocessing - Air pollution.ipynb

2 - Data preprocessing - Air pollution.ipynb

3 - Modelling implementation & assessment Air pollution.ipynb

3 - Modelling implementation & assessment Air pollution.ipynb

README.md

README.md

xgb_model.sav

xgb_model.sav

Repository files navigation

Air Pollution in Madrid

Data Science project

King Faisal Universisty

Project Description

Dataset

Project Structure

Project Steps

Project Results

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
1 - EDA - Air pollution.ipynb		1 - EDA - Air pollution.ipynb
2 - Data preprocessing - Air pollution.ipynb		2 - Data preprocessing - Air pollution.ipynb
3 - Modelling implementation & assessment Air pollution.ipynb		3 - Modelling implementation & assessment Air pollution.ipynb
README.md		README.md
xgb_model.sav		xgb_model.sav

Turki-Moha/DS-project

Folders and files

Latest commit

History

Repository files navigation

Air Pollution in Madrid

Data Science project

King Faisal Universisty

Project Description

Dataset

Project Structure

Project Steps

Project Results

About

Resources

Stars

Watchers

Forks

Languages