Skip to content

Turki-Moha/DS-project

Repository files navigation

Air Pollution in Madrid

Data Science project

King Faisal Universisty

Project Description

This project is about building a model that can predict the Air quality in Madrid based on 170k rows of an hourly recorded data from 2001 to 2022

Dataset

The dataset is downloaded from Kaggle and can be found here

Project Structure

├── data
│   ├── MadridPolution2001-2022.csv
│   ├── MadridPolution2001-2022_cleaned.csv
|
├── notebooks
│   ├── 1 - EDA - Air pollution.ipynb
│   ├── 2 - Data preprocessing - Air pollution.ipynb
│   |── 3 - Modelling implementation & assessment Air pollution.ipynb
|
├── README.md

Project Steps

  1. Exploratory Data Analysis
    1. Found that the data is not clean and need to be cleaned
    2. remove the outliers
    3. find the correlation between the features
    4. find the distribution of the features
  2. Data Preprocessing
    1. fill the missing values
    2. group the data by year and month
    3. get the mean of each year
    4. split the data into train and test
  3. Modelling implementation & assessment
    1. build 3 models
      1. Logistic Regression
      2. Random Forest
      3. XGBoost
    2. evaluate the models
      1. accuracy
      2. precision
      3. recall
    3. choose the best model
      1. XGBoost
      2. accuracy: 0.77
      3. precision: 0.77
      4. recall: 0.77
      5. f1-score: 0.77

Project Results

The best model is XGBoost with an accuracy of 0.77

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published