Skip to content

aryan7781/Air-Quality-Analysis

Repository files navigation

This repository contains a Data Analysis on a weather dataset created using Jupyter Notebook.

The weather dataset contains the following columns:- year, month, day, hour, PM2.5(Particulate matter), temperature, pressure, rain, wind_direction, wind_speed.

Pariculate matter :- PM stands for particulate matter (also called particle pollution): the term for a mixture of solid particles and liquid droplets found in the air. Some particles, such as dust, dirt, soot, or smoke, are large or dark enough to be seen with the naked eye.

Temperature :- Temperature is a degree of hotness or coldness the can be measured using a thermometer.

Pressure:- Atmospheric or air pressure is the force per unit of area exerted on the Earth’s surface by the weight of the air above the surface.

Rain :- The condensed moisture of the atmosphere falling visibly in separate drops.

Wind direction :- Wind direction is defined as the direction the wind is coming from. If you stand so that the wind is blowing directly into your face, the direction you are facing names the wind.For general purposes, the wind direction is reported to eight compass points: N, NE, E, SE, S, SW, W, NW.

Wind speed :- In meteorology, wind speed, or wind flow speed, is a fundamental atmospheric quantity caused by air moving from high to low pressure, usually due to changes in temperature.

Exploratory Data Analysis is an approach to analyze data, to summarize the main characteristics of data, and better understand the data set. It also allows us to quickly interpret the data and adjust different variables to see their effect. The three main steps to get a perfect EDA are :-

  • Extracting/Downloading the data from an authorized source.
  • Cleaning and processing the data
  • Performing data visualization on the cleaned data set.

We are going to analyze the Weather data set.

Our main aim is to perform data cleaning, data normalizing, testing the hypothesis, and deriving appropriate insights.

1. Importing the necessary libraries :

  • Pandas, Numpy and Scikit-Learn

2. Reading the dataset :

  • There are 31527 samples in the dataset.
  • All the Columns have a few NaN values.
  • Wind Direction is a categorical data with 16 categories representing various directions.

3. Cleaning and Processing the Dataset :

  • Convert all the numeric data from object data-type to float/int data-type.
  • Filled all the NaN values cells with suitable values.
  • Converted the given datetime data to Standard format.
  • Exported the cleaned data as Clean_Data.csv.


You can access our model using this link -https://air-quality-prediction-gfg.herokuapp.com/

Contributors-