# Proyek Analisis Data: PRSA_Data_20130301-20170228
- **Nama:** Wildan Fadhil Nazaruddin
- **Email:** wildanfadhil76@gmial.com
- **ID Dicoding:** 

## Air  Data Quality

Air particulate matter in various regions has a significant impact on public health. Year-over-year data analysis is crucial in guiding informed decision-making to mitigate the effects of global warming. This project aims to analyze weather conditions and air quality across different regions of China to better understand environmental trends and their implications. The dataset used in this project includes data from 12 provinces in China, which will be examined to identify patterns and provide insights for more effective policy-making in maintaining air quality and public health. This analysis can also contribute to developing better strategies for mitigating the ongoing climate change.

### 1.1 Clases 

- PRSA_Data_Aotizhongxin: Data collected from the Aotizhongxin station.
- PRSA_Data_Changping: Data from the Changping station.
- PRSA_Data_Dingling: Data from the Dingling station.
- PRSA_Data_Dongsi: Data from the Dongsi station.
- PRSA_Data_Guanyuan: Data from the Guanyuan station.
- PRSA_Data_Gucheng: Data from the Gucheng station.
- PRSA_Data_Huairou: Data from the Huairou station.
- PRSA_Data_Tiantan: Data from the Tiantan station.
- PRSA_Data_Wanliu: Data from the Wanliu station.

### 1.2 Methodology

1. Data Collection and Cleaning:
- - -
First, we will consolidate the datasets from the 12 provinces. Data cleaning will involve handling missing values, correcting inconsistencies, and ensuring all datasets are standardized.
Descriptive Statistics:
- - - 
Descriptive statistics such as mean, median, standard deviation, and interquartile range (IQR) will be used to summarize the key characteristics of the air particulate data (PM2.5, PM10) across different provinces. Visualizations like histograms, box plots, and time series plots will be used to better understand the distribution and spread of the data.
Correlation Analysis:
- - - 
To identify the relationship between different pollutants and weather conditions, we will conduct a Pearson or Spearman correlation analysis. This will help in understanding how temperature, humidity, or wind speed affect the levels of particulate matter in the air.
Trend Analysis:
- - -
Trend analysis will be performed to observe how air quality changes over time (seasonally or annually) and across provinces. We will use time series decomposition to break down the data into trend, seasonal, and residual components, enabling a clearer view of underlying patterns.
Geospatial Analysis:
- - -
By plotting data on maps, we will explore the geographical distribution of air particulate matter across provinces, using spatial visualization tools to observe how air quality varies between regions.
Hypothesis Testing:
- - -
Statistical hypothesis tests (such as t-tests or ANOVA) will be used to compare air quality between different regions or time periods, determining whether observed differences are statistically significant.

### 1.3 Deployment

Data set

## 2 Menentukan Pertanyaan Bisnis

- Pertanyaan 1 :
  - What are the primary trends in air quality levels (PM2.5, PM10) across the 12 provinces in China over the observed time period (2013-2017)?
- Pertanyaan 2 : 
  - How do various weather conditions (e.g., temperature, humidity, wind speed) correlate with particulate matter concentrations (PM2.5, PM10) in each province?
- Pertanyaan 3 :
   - Which regions show the highest and lowest levels of air particulate matter, and what factors contribute to these regional differences?
- Pertanyaan 4 :   
   - How do seasonal variations (e.g., winter vs. summer) impact air quality across the provinces, and what are the contributing factors?
- Pertanyaan 5 :
   - What actionable insights can be drawn from this analysis to inform policy decisions aimed at improving air quality and mitigating public health risks?


## 3 Import Semua Packages/Library yang Digunakan

In [11]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

## 4 Data Wrangling

### 4.1 Gathering Data

#### 4.1.1 Load data 

##### 4.1.1.1 Menentukan Direktori yang di tuju

In [23]:
# Mendefinisikan direktori saat ini
current_dir = os.getcwd()

In [25]:
# Daftar nama file CSV yang ada
csv_files = [
    "PRSA_Data_Aotizhongxin_20130301-20170228.csv",
    "PRSA_Data_Changping_20130301-20170228.csv",
    "PRSA_Data_Dingling_20130301-20170228.csv",
    "PRSA_Data_Dongsi_20130301-20170228.csv",
    "PRSA_Data_Guanyuan_20130301-20170228.csv",
    "PRSA_Data_Gucheng_20130301-20170228.csv",
    "PRSA_Data_Huairou_20130301-20170228.csv",
    "PRSA_Data_Nongzhanguan_20130301-20170228.csv",
    "PRSA_Data_Shunyi_20130301-20170228.csv",
    "PRSA_Data_Tiantan_20130301-20170228.csv",
    "PRSA_Data_Wanliu_20130301-20170228.csv",
    "PRSA_Data_Wanshouxigong_20130301-20170228.csv"
]


In [26]:
# Membaca setiap file CSV dan menyimpannya dalam DataFrame
dataframes = {}

In [27]:
for csv_file in csv_files:
    # Mendapatkan nama lokasi (misalnya, Aotizhongxin, Changping, dll.) dari nama file
    location = csv_file.split('_')[2]
    
    # Menggabungkan path untuk setiap file
    file_path = os.path.join(current_dir, "data", csv_file)
    
    # Membaca CSV ke dalam DataFrame
    df = pd.read_csv(file_path)
    
    # Menyimpan DataFrame ke dalam dictionary dengan kunci nama lokasi
    dataframes[location] = df

In [44]:
aotizhongxin_df = dataframes['Aotizhongxin']

changping_df = dataframes['Changping']

dingling_df = dataframes['Dingling']

dongsi_df = dataframes['Dongsi']

guanyuan_df = dataframes['Guanyuan']

gucheng_df = dataframes['Gucheng']

huairou_df = dataframes ['Huairou']

Nongzhanguan_df = dataframes['Nongzhanguan']

shunyi_df = dataframes['Shunyi']

tiantian_df = dataframes['Tiantan']

wanliu_df = dataframes['Wanliu']

Wanshouxigong_df = dataframes['Wanshouxigong']


##### 4.1.1.2 Melihat Info Data 

###### 4.1.1.2.1 Aotizhongxin 

In [33]:
aotizhongxin_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,4.0,7.0,300.0,77.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Aotizhongxin
1,2,2013,3,1,1,8.0,8.0,4.0,7.0,300.0,77.0,-1.1,1023.2,-18.2,0.0,N,4.7,Aotizhongxin
2,3,2013,3,1,2,7.0,7.0,5.0,10.0,300.0,73.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Aotizhongxin
3,4,2013,3,1,3,6.0,6.0,11.0,11.0,300.0,72.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Aotizhongxin
4,5,2013,3,1,4,3.0,3.0,12.0,12.0,300.0,72.0,-2.0,1025.2,-19.5,0.0,N,2.0,Aotizhongxin


###### 4.1.1.2.2 changping

In [34]:
changping_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,3.0,6.0,13.0,7.0,300.0,85.0,-2.3,1020.8,-19.7,0.0,E,0.5,Changping
1,2,2013,3,1,1,3.0,3.0,6.0,6.0,300.0,85.0,-2.5,1021.3,-19.0,0.0,ENE,0.7,Changping
2,3,2013,3,1,2,3.0,3.0,22.0,13.0,400.0,74.0,-3.0,1021.3,-19.9,0.0,ENE,0.2,Changping
3,4,2013,3,1,3,3.0,6.0,12.0,8.0,300.0,81.0,-3.6,1021.8,-19.1,0.0,NNE,1.0,Changping
4,5,2013,3,1,4,3.0,3.0,14.0,8.0,300.0,81.0,-3.5,1022.3,-19.4,0.0,N,2.1,Changping


In [35]:
dingling_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,3.0,,200.0,82.0,-2.3,1020.8,-19.7,0.0,E,0.5,Dingling
1,2,2013,3,1,1,7.0,7.0,3.0,,200.0,80.0,-2.5,1021.3,-19.0,0.0,ENE,0.7,Dingling
2,3,2013,3,1,2,5.0,5.0,3.0,2.0,200.0,79.0,-3.0,1021.3,-19.9,0.0,ENE,0.2,Dingling
3,4,2013,3,1,3,6.0,6.0,3.0,,200.0,79.0,-3.6,1021.8,-19.1,0.0,NNE,1.0,Dingling
4,5,2013,3,1,4,5.0,5.0,3.0,,200.0,81.0,-3.5,1022.3,-19.4,0.0,N,2.1,Dingling


In [36]:
dongsi_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,9.0,9.0,3.0,17.0,300.0,89.0,-0.5,1024.5,-21.4,0.0,NNW,5.7,Dongsi
1,2,2013,3,1,1,4.0,4.0,3.0,16.0,300.0,88.0,-0.7,1025.1,-22.1,0.0,NW,3.9,Dongsi
2,3,2013,3,1,2,7.0,7.0,,17.0,300.0,60.0,-1.2,1025.3,-24.6,0.0,NNW,5.3,Dongsi
3,4,2013,3,1,3,3.0,3.0,5.0,18.0,,,-1.4,1026.2,-25.5,0.0,N,4.9,Dongsi
4,5,2013,3,1,4,3.0,3.0,7.0,,200.0,84.0,-1.9,1027.1,-24.5,0.0,NNW,3.2,Dongsi


In [37]:
guanyuan_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,14.0,20.0,300.0,69.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Guanyuan
1,2,2013,3,1,1,4.0,4.0,13.0,17.0,300.0,72.0,-1.1,1023.2,-18.2,0.0,N,4.7,Guanyuan
2,3,2013,3,1,2,3.0,3.0,10.0,19.0,300.0,69.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Guanyuan
3,4,2013,3,1,3,3.0,6.0,7.0,24.0,400.0,62.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Guanyuan
4,5,2013,3,1,4,3.0,6.0,5.0,14.0,400.0,71.0,-2.0,1025.2,-19.5,0.0,N,2.0,Guanyuan


In [38]:
guanyuan_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,14.0,20.0,300.0,69.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Guanyuan
1,2,2013,3,1,1,4.0,4.0,13.0,17.0,300.0,72.0,-1.1,1023.2,-18.2,0.0,N,4.7,Guanyuan
2,3,2013,3,1,2,3.0,3.0,10.0,19.0,300.0,69.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Guanyuan
3,4,2013,3,1,3,3.0,6.0,7.0,24.0,400.0,62.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Guanyuan
4,5,2013,3,1,4,3.0,6.0,5.0,14.0,400.0,71.0,-2.0,1025.2,-19.5,0.0,N,2.0,Guanyuan


In [39]:
huairou_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,7.0,7.0,3.0,2.0,100.0,91.0,-2.3,1020.3,-20.7,0.0,WNW,3.1,Huairou
1,2,2013,3,1,1,4.0,4.0,3.0,,100.0,92.0,-2.7,1020.8,-20.5,0.0,NNW,1.5,Huairou
2,3,2013,3,1,2,4.0,4.0,,,100.0,91.0,-3.2,1020.6,-21.4,0.0,NW,1.8,Huairou
3,4,2013,3,1,3,3.0,3.0,3.0,2.0,,,-3.3,1021.3,-23.7,0.0,NNW,2.4,Huairou
4,5,2013,3,1,4,3.0,3.0,7.0,,300.0,86.0,-4.1,1022.1,-22.7,0.0,NNW,2.2,Huairou


In [40]:
Nongzhanguan_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,5.0,14.0,4.0,12.0,200.0,85.0,-0.5,1024.5,-21.4,0.0,NNW,5.7,Nongzhanguan
1,2,2013,3,1,1,8.0,12.0,6.0,14.0,200.0,84.0,-0.7,1025.1,-22.1,0.0,NW,3.9,Nongzhanguan
2,3,2013,3,1,2,3.0,6.0,5.0,14.0,200.0,83.0,-1.2,1025.3,-24.6,0.0,NNW,5.3,Nongzhanguan
3,4,2013,3,1,3,5.0,5.0,5.0,14.0,200.0,84.0,-1.4,1026.2,-25.5,0.0,N,4.9,Nongzhanguan
4,5,2013,3,1,4,5.0,5.0,6.0,21.0,200.0,77.0,-1.9,1027.1,-24.5,0.0,NNW,3.2,Nongzhanguan


In [41]:
shunyi_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,3.0,6.0,3.0,8.0,300.0,44.0,-0.9,1025.8,-20.5,0.0,NW,9.3,Shunyi
1,2,2013,3,1,1,12.0,12.0,3.0,7.0,300.0,47.0,-1.1,1026.1,-21.3,0.0,NW,9.4,Shunyi
2,3,2013,3,1,2,14.0,14.0,,7.0,200.0,22.0,-1.7,1026.2,-23.0,0.0,NW,8.6,Shunyi
3,4,2013,3,1,3,12.0,12.0,3.0,5.0,,,-2.1,1027.3,-23.3,0.0,NW,6.6,Shunyi
4,5,2013,3,1,4,12.0,12.0,3.0,,200.0,11.0,-2.4,1027.7,-22.9,0.0,NW,4.5,Shunyi


In [42]:
tiantian_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,6.0,6.0,4.0,8.0,300.0,81.0,-0.5,1024.5,-21.4,0.0,NNW,5.7,Tiantan
1,2,2013,3,1,1,6.0,29.0,5.0,9.0,300.0,80.0,-0.7,1025.1,-22.1,0.0,NW,3.9,Tiantan
2,3,2013,3,1,2,6.0,6.0,4.0,12.0,300.0,75.0,-1.2,1025.3,-24.6,0.0,NNW,5.3,Tiantan
3,4,2013,3,1,3,6.0,6.0,4.0,12.0,300.0,74.0,-1.4,1026.2,-25.5,0.0,N,4.9,Tiantan
4,5,2013,3,1,4,5.0,5.0,7.0,15.0,400.0,70.0,-1.9,1027.1,-24.5,0.0,NNW,3.2,Tiantan


In [43]:
wanliu_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,8.0,8.0,6.0,28.0,400.0,52.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Wanliu
1,2,2013,3,1,1,9.0,9.0,6.0,28.0,400.0,50.0,-1.1,1023.2,-18.2,0.0,N,4.7,Wanliu
2,3,2013,3,1,2,3.0,6.0,,19.0,400.0,55.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Wanliu
3,4,2013,3,1,3,11.0,30.0,8.0,14.0,,,-1.4,1024.5,-19.4,0.0,NW,3.1,Wanliu
4,5,2013,3,1,4,3.0,13.0,9.0,,300.0,54.0,-2.0,1025.2,-19.5,0.0,N,2.0,Wanliu


In [45]:
Wanshouxigong_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,9.0,9.0,6.0,17.0,200.0,62.0,0.3,1021.9,-19.0,0.0,WNW,2.0,Wanshouxigong
1,2,2013,3,1,1,11.0,11.0,7.0,14.0,200.0,66.0,-0.1,1022.4,-19.3,0.0,WNW,4.4,Wanshouxigong
2,3,2013,3,1,2,8.0,8.0,,16.0,200.0,59.0,-0.6,1022.6,-19.7,0.0,WNW,4.7,Wanshouxigong
3,4,2013,3,1,3,8.0,8.0,3.0,16.0,,,-0.7,1023.5,-20.9,0.0,NW,2.6,Wanshouxigong
4,5,2013,3,1,4,8.0,8.0,3.0,,300.0,36.0,-0.9,1024.1,-21.7,0.0,WNW,2.5,Wanshouxigong


## 5 Assessing Data

In [47]:
aotizhongxin_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35064 entries, 0 to 35063
Data columns (total 18 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   No       35064 non-null  int64  
 1   year     35064 non-null  int64  
 2   month    35064 non-null  int64  
 3   day      35064 non-null  int64  
 4   hour     35064 non-null  int64  
 5   PM2.5    34139 non-null  float64
 6   PM10     34346 non-null  float64
 7   SO2      34129 non-null  float64
 8   NO2      34041 non-null  float64
 9   CO       33288 non-null  float64
 10  O3       33345 non-null  float64
 11  TEMP     35044 non-null  float64
 12  PRES     35044 non-null  float64
 13  DEWP     35044 non-null  float64
 14  RAIN     35044 non-null  float64
 15  wd       34983 non-null  object 
 16  WSPM     35050 non-null  float64
 17  station  35064 non-null  object 
dtypes: float64(11), int64(5), object(2)
memory usage: 4.8+ MB


In [48]:
changping_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35064 entries, 0 to 35063
Data columns (total 18 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   No       35064 non-null  int64  
 1   year     35064 non-null  int64  
 2   month    35064 non-null  int64  
 3   day      35064 non-null  int64  
 4   hour     35064 non-null  int64  
 5   PM2.5    34290 non-null  float64
 6   PM10     34482 non-null  float64
 7   SO2      34436 non-null  float64
 8   NO2      34397 non-null  float64
 9   CO       33543 non-null  float64
 10  O3       34460 non-null  float64
 11  TEMP     35011 non-null  float64
 12  PRES     35014 non-null  float64
 13  DEWP     35011 non-null  float64
 14  RAIN     35013 non-null  float64
 15  wd       34924 non-null  object 
 16  WSPM     35021 non-null  float64
 17  station  35064 non-null  object 
dtypes: float64(11), int64(5), object(2)
memory usage: 4.8+ MB


## 6 Cleaning Data

### 6.1 Aotizhongxin

#### 6.2 Merubah tipe data 

## 7 Exploratory Data Analysis (EDA)

### 7.1 Explore

## 8 Visualization & Explanatory Analysis

### 8.1 Pertanyaan 1:

### 8.1 Pertanyaan 2:

## 9 Analisis Lanjutan (Opsional)

## 10 Conclusion