<a href="https://colab.research.google.com/github/Wlnfadhil/Analisa-Data-Air-Quality-Control/blob/coca-coba-code/submission/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Proyek Analisis Data: PRSA_Data_20130301-20170228
- **Nama:** Wildan Fadhil Nazaruddin
- **Email:** wildanfadhil76@gmial.com
- **ID Dicoding:**

## Air  Data Quality

Air particulate matter in various regions has a significant impact on public health. Year-over-year data analysis is crucial in guiding informed decision-making to mitigate the effects of global warming. This project aims to analyze weather conditions and air quality across different regions of China to better understand environmental trends and their implications. The dataset used in this project includes data from 12 provinces in China, which will be examined to identify patterns and provide insights for more effective policy-making in maintaining air quality and public health. This analysis can also contribute to developing better strategies for mitigating the ongoing climate change.

### 1.1 Clases

- PRSA_Data_Aotizhongxin: Data collected from the Aotizhongxin station.
- PRSA_Data_Changping: Data from the Changping station.
- PRSA_Data_Dingling: Data from the Dingling station.
- PRSA_Data_Dongsi: Data from the Dongsi station.
- PRSA_Data_Guanyuan: Data from the Guanyuan station.
- PRSA_Data_Gucheng: Data from the Gucheng station.
- PRSA_Data_Huairou: Data from the Huairou station.
- PRSA_Data_Tiantan: Data from the Tiantan station.
- PRSA_Data_Wanliu: Data from the Wanliu station.

### 1.2 Methodology

1. Data Collection and Cleaning:
- - -
First, we will consolidate the datasets from the 12 provinces. Data cleaning will involve handling missing values, correcting inconsistencies, and ensuring all datasets are standardized.
Descriptive Statistics:
- - -
Descriptive statistics such as mean, median, standard deviation, and interquartile range (IQR) will be used to summarize the key characteristics of the air particulate data (PM2.5, PM10) across different provinces. Visualizations like histograms, box plots, and time series plots will be used to better understand the distribution and spread of the data.
Correlation Analysis:
- - -
To identify the relationship between different pollutants and weather conditions, we will conduct a Pearson or Spearman correlation analysis. This will help in understanding how temperature, humidity, or wind speed affect the levels of particulate matter in the air.
Trend Analysis:
- - -
Trend analysis will be performed to observe how air quality changes over time (seasonally or annually) and across provinces. We will use time series decomposition to break down the data into trend, seasonal, and residual components, enabling a clearer view of underlying patterns.
Geospatial Analysis:
- - -
By plotting data on maps, we will explore the geographical distribution of air particulate matter across provinces, using spatial visualization tools to observe how air quality varies between regions.
Hypothesis Testing:
- - -
Statistical hypothesis tests (such as t-tests or ANOVA) will be used to compare air quality between different regions or time periods, determining whether observed differences are statistically significant.

### 1.3 Deployment

Data set

```
! git clone https://github.com/Wlnfadhil/Analisa-Data-Air-Quality-Control.git
```



## 1 code enggine

In [None]:
! git clone https://github.com/Wlnfadhil/Analisa-Data-Air-Quality-Control.git

Cloning into 'Analisa-Data-Air-Quality-Control'...
remote: Enumerating objects: 15582, done.[K
remote: Counting objects: 100% (1706/1706), done.[K
remote: Compressing objects: 100% (1627/1627), done.[K
remote: Total 15582 (delta 74), reused 1681 (delta 63), pack-reused 13876 (from 1)[K
Receiving objects: 100% (15582/15582), 102.19 MiB | 7.59 MiB/s, done.
Resolving deltas: 100% (1161/1161), done.
Updating files: 100% (15062/15062), done.


## 2 Menentukan Pertanyaan Bisnis

- Pertanyaan 1 :
  - What are the primary trends in air quality levels (PM2.5, PM10) across the 12 provinces in China over the observed time period (2013-2017)?
- Pertanyaan 2 :
  - How do various weather conditions (e.g., temperature, humidity, wind speed) correlate with particulate matter concentrations (PM2.5, PM10) in each province?
- Pertanyaan 3 :
   - Which regions show the highest and lowest levels of air particulate matter, and what factors contribute to these regional differences?
- Pertanyaan 4 :   
   - How do seasonal variations (e.g., winter vs. summer) impact air quality across the provinces, and what are the contributing factors?
- Pertanyaan 5 :
   - What actionable insights can be drawn from this analysis to inform policy decisions aimed at improving air quality and mitigating public health risks?


## 3 Import Semua Packages/Library yang Digunakan

In [None]:
!pip install pandasql

Collecting pandasql
  Downloading pandasql-0.7.3.tar.gz (26 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pandasql
  Building wheel for pandasql (setup.py) ... [?25l[?25hdone
  Created wheel for pandasql: filename=pandasql-0.7.3-py3-none-any.whl size=26772 sha256=2ab50674541d47e08ff3cbf91457c3e14c1e69cbc063e6e6c7cd8b2dbb5007aa
  Stored in directory: /root/.cache/pip/wheels/e9/bc/3a/8434bdcccf5779e72894a9b24fecbdcaf97940607eaf4bcdf9
Successfully built pandasql
Installing collected packages: pandasql
Successfully installed pandasql-0.7.3


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from IPython.display import Markdown, display
from pandasql import sqldf
import pandasql as psql
import warnings
warnings.filterwarnings('ignore')


## 4 Data Wrangling

### 4.1 Gathering Data

#### 4.1.1 Load data

##### 4.1.1.1 Menentukan Direktori yang di tuju

In [None]:
# Mendefinisikan direktori saat ini
current_dir = os.getcwd()

In [None]:
# Daftar nama file CSV yang ada
csv_files = [
    "PRSA_Data_Aotizhongxin_20130301-20170228.csv",
    "PRSA_Data_Changping_20130301-20170228.csv",
    "PRSA_Data_Dingling_20130301-20170228.csv",
    "PRSA_Data_Dongsi_20130301-20170228.csv",
    "PRSA_Data_Guanyuan_20130301-20170228.csv",
    "PRSA_Data_Gucheng_20130301-20170228.csv",
    "PRSA_Data_Huairou_20130301-20170228.csv",
    "PRSA_Data_Nongzhanguan_20130301-20170228.csv",
    "PRSA_Data_Shunyi_20130301-20170228.csv",
    "PRSA_Data_Tiantan_20130301-20170228.csv",
    "PRSA_Data_Wanliu_20130301-20170228.csv",
    "PRSA_Data_Wanshouxigong_20130301-20170228.csv"
]


In [None]:
# Membaca setiap file CSV dan menyimpannya dalam DataFrame
dataframes = {}

In [None]:
for csv_file in csv_files:
    # Mendapatkan nama lokasi (misalnya, Aotizhongxin, Changping, dll.) dari nama file
    location = csv_file.split('_')[2]

    # Menggabungkan path untuk setiap file
    file_path = os.path.join(current_dir, "./Analisa-Data-Air-Quality-Control/submission/data", csv_file)

    # Membaca CSV ke dalam DataFrame
    df = pd.read_csv(file_path)

    # Menyimpan DataFrame ke dalam dictionary dengan kunci nama lokasi
    dataframes[location] = df

In [None]:
aotizhongxin_df = dataframes['Aotizhongxin']

changping_df = dataframes['Changping']

dingling_df = dataframes['Dingling']

dongsi_df = dataframes['Dongsi']

guanyuan_df = dataframes['Guanyuan']

gucheng_df = dataframes['Gucheng']

huairou_df = dataframes ['Huairou']

Nongzhanguan_df = dataframes['Nongzhanguan']

shunyi_df = dataframes['Shunyi']

tiantian_df = dataframes['Tiantan']

wanliu_df = dataframes['Wanliu']

Wanshouxigong_df = dataframes['Wanshouxigong']


##### 4.1.1.2 Melihat Info Data

###### 4.1.1.2.1 Aotizhongxin

In [None]:
aotizhongxin_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,4.0,7.0,300.0,77.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Aotizhongxin
1,2,2013,3,1,1,8.0,8.0,4.0,7.0,300.0,77.0,-1.1,1023.2,-18.2,0.0,N,4.7,Aotizhongxin
2,3,2013,3,1,2,7.0,7.0,5.0,10.0,300.0,73.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Aotizhongxin
3,4,2013,3,1,3,6.0,6.0,11.0,11.0,300.0,72.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Aotizhongxin
4,5,2013,3,1,4,3.0,3.0,12.0,12.0,300.0,72.0,-2.0,1025.2,-19.5,0.0,N,2.0,Aotizhongxin


In [None]:
aotizhongxin_df.describe()

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,WSPM
count,35064.0,35064.0,35064.0,35064.0,35064.0,34139.0,34346.0,34129.0,34041.0,33288.0,33345.0,35044.0,35044.0,35044.0,35044.0,35050.0
mean,17532.5,2014.66256,6.52293,15.729637,11.5,82.773611,110.060391,17.375901,59.305833,1262.945145,56.353358,13.584607,1011.84692,3.123062,0.067421,1.708496
std,10122.249256,1.177213,3.448752,8.800218,6.922285,82.135694,95.223005,22.823017,37.1162,1221.436236,57.916327,11.399097,10.404047,13.688896,0.910056,1.204071
min,1.0,2013.0,1.0,1.0,0.0,3.0,2.0,0.2856,2.0,100.0,0.2142,-16.8,985.9,-35.3,0.0,0.0
25%,8766.75,2014.0,4.0,8.0,5.75,22.0,38.0,3.0,30.0,500.0,8.0,3.1,1003.3,-8.1,0.0,0.9
50%,17532.5,2015.0,7.0,16.0,11.5,58.0,87.0,9.0,53.0,900.0,42.0,14.5,1011.4,3.8,0.0,1.4
75%,26298.25,2016.0,10.0,23.0,17.25,114.0,155.0,21.0,82.0,1500.0,82.0,23.3,1020.1,15.6,0.0,2.2
max,35064.0,2017.0,12.0,31.0,23.0,898.0,984.0,341.0,290.0,10000.0,423.0,40.5,1042.0,28.5,72.5,11.2


###### 4.1.1.2.2 changping

In [None]:
changping_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,3.0,6.0,13.0,7.0,300.0,85.0,-2.3,1020.8,-19.7,0.0,E,0.5,Changping
1,2,2013,3,1,1,3.0,3.0,6.0,6.0,300.0,85.0,-2.5,1021.3,-19.0,0.0,ENE,0.7,Changping
2,3,2013,3,1,2,3.0,3.0,22.0,13.0,400.0,74.0,-3.0,1021.3,-19.9,0.0,ENE,0.2,Changping
3,4,2013,3,1,3,3.0,6.0,12.0,8.0,300.0,81.0,-3.6,1021.8,-19.1,0.0,NNE,1.0,Changping
4,5,2013,3,1,4,3.0,3.0,14.0,8.0,300.0,81.0,-3.5,1022.3,-19.4,0.0,N,2.1,Changping


In [None]:
dingling_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,3.0,,200.0,82.0,-2.3,1020.8,-19.7,0.0,E,0.5,Dingling
1,2,2013,3,1,1,7.0,7.0,3.0,,200.0,80.0,-2.5,1021.3,-19.0,0.0,ENE,0.7,Dingling
2,3,2013,3,1,2,5.0,5.0,3.0,2.0,200.0,79.0,-3.0,1021.3,-19.9,0.0,ENE,0.2,Dingling
3,4,2013,3,1,3,6.0,6.0,3.0,,200.0,79.0,-3.6,1021.8,-19.1,0.0,NNE,1.0,Dingling
4,5,2013,3,1,4,5.0,5.0,3.0,,200.0,81.0,-3.5,1022.3,-19.4,0.0,N,2.1,Dingling


In [None]:
dongsi_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,9.0,9.0,3.0,17.0,300.0,89.0,-0.5,1024.5,-21.4,0.0,NNW,5.7,Dongsi
1,2,2013,3,1,1,4.0,4.0,3.0,16.0,300.0,88.0,-0.7,1025.1,-22.1,0.0,NW,3.9,Dongsi
2,3,2013,3,1,2,7.0,7.0,,17.0,300.0,60.0,-1.2,1025.3,-24.6,0.0,NNW,5.3,Dongsi
3,4,2013,3,1,3,3.0,3.0,5.0,18.0,,,-1.4,1026.2,-25.5,0.0,N,4.9,Dongsi
4,5,2013,3,1,4,3.0,3.0,7.0,,200.0,84.0,-1.9,1027.1,-24.5,0.0,NNW,3.2,Dongsi


In [None]:
guanyuan_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,14.0,20.0,300.0,69.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Guanyuan
1,2,2013,3,1,1,4.0,4.0,13.0,17.0,300.0,72.0,-1.1,1023.2,-18.2,0.0,N,4.7,Guanyuan
2,3,2013,3,1,2,3.0,3.0,10.0,19.0,300.0,69.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Guanyuan
3,4,2013,3,1,3,3.0,6.0,7.0,24.0,400.0,62.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Guanyuan
4,5,2013,3,1,4,3.0,6.0,5.0,14.0,400.0,71.0,-2.0,1025.2,-19.5,0.0,N,2.0,Guanyuan


In [None]:
guanyuan_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,14.0,20.0,300.0,69.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Guanyuan
1,2,2013,3,1,1,4.0,4.0,13.0,17.0,300.0,72.0,-1.1,1023.2,-18.2,0.0,N,4.7,Guanyuan
2,3,2013,3,1,2,3.0,3.0,10.0,19.0,300.0,69.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Guanyuan
3,4,2013,3,1,3,3.0,6.0,7.0,24.0,400.0,62.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Guanyuan
4,5,2013,3,1,4,3.0,6.0,5.0,14.0,400.0,71.0,-2.0,1025.2,-19.5,0.0,N,2.0,Guanyuan


In [None]:
huairou_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,7.0,7.0,3.0,2.0,100.0,91.0,-2.3,1020.3,-20.7,0.0,WNW,3.1,Huairou
1,2,2013,3,1,1,4.0,4.0,3.0,,100.0,92.0,-2.7,1020.8,-20.5,0.0,NNW,1.5,Huairou
2,3,2013,3,1,2,4.0,4.0,,,100.0,91.0,-3.2,1020.6,-21.4,0.0,NW,1.8,Huairou
3,4,2013,3,1,3,3.0,3.0,3.0,2.0,,,-3.3,1021.3,-23.7,0.0,NNW,2.4,Huairou
4,5,2013,3,1,4,3.0,3.0,7.0,,300.0,86.0,-4.1,1022.1,-22.7,0.0,NNW,2.2,Huairou


In [None]:
Nongzhanguan_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,5.0,14.0,4.0,12.0,200.0,85.0,-0.5,1024.5,-21.4,0.0,NNW,5.7,Nongzhanguan
1,2,2013,3,1,1,8.0,12.0,6.0,14.0,200.0,84.0,-0.7,1025.1,-22.1,0.0,NW,3.9,Nongzhanguan
2,3,2013,3,1,2,3.0,6.0,5.0,14.0,200.0,83.0,-1.2,1025.3,-24.6,0.0,NNW,5.3,Nongzhanguan
3,4,2013,3,1,3,5.0,5.0,5.0,14.0,200.0,84.0,-1.4,1026.2,-25.5,0.0,N,4.9,Nongzhanguan
4,5,2013,3,1,4,5.0,5.0,6.0,21.0,200.0,77.0,-1.9,1027.1,-24.5,0.0,NNW,3.2,Nongzhanguan


In [None]:
shunyi_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,3.0,6.0,3.0,8.0,300.0,44.0,-0.9,1025.8,-20.5,0.0,NW,9.3,Shunyi
1,2,2013,3,1,1,12.0,12.0,3.0,7.0,300.0,47.0,-1.1,1026.1,-21.3,0.0,NW,9.4,Shunyi
2,3,2013,3,1,2,14.0,14.0,,7.0,200.0,22.0,-1.7,1026.2,-23.0,0.0,NW,8.6,Shunyi
3,4,2013,3,1,3,12.0,12.0,3.0,5.0,,,-2.1,1027.3,-23.3,0.0,NW,6.6,Shunyi
4,5,2013,3,1,4,12.0,12.0,3.0,,200.0,11.0,-2.4,1027.7,-22.9,0.0,NW,4.5,Shunyi


In [None]:
tiantian_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,6.0,6.0,4.0,8.0,300.0,81.0,-0.5,1024.5,-21.4,0.0,NNW,5.7,Tiantan
1,2,2013,3,1,1,6.0,29.0,5.0,9.0,300.0,80.0,-0.7,1025.1,-22.1,0.0,NW,3.9,Tiantan
2,3,2013,3,1,2,6.0,6.0,4.0,12.0,300.0,75.0,-1.2,1025.3,-24.6,0.0,NNW,5.3,Tiantan
3,4,2013,3,1,3,6.0,6.0,4.0,12.0,300.0,74.0,-1.4,1026.2,-25.5,0.0,N,4.9,Tiantan
4,5,2013,3,1,4,5.0,5.0,7.0,15.0,400.0,70.0,-1.9,1027.1,-24.5,0.0,NNW,3.2,Tiantan


In [None]:
wanliu_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,8.0,8.0,6.0,28.0,400.0,52.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Wanliu
1,2,2013,3,1,1,9.0,9.0,6.0,28.0,400.0,50.0,-1.1,1023.2,-18.2,0.0,N,4.7,Wanliu
2,3,2013,3,1,2,3.0,6.0,,19.0,400.0,55.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Wanliu
3,4,2013,3,1,3,11.0,30.0,8.0,14.0,,,-1.4,1024.5,-19.4,0.0,NW,3.1,Wanliu
4,5,2013,3,1,4,3.0,13.0,9.0,,300.0,54.0,-2.0,1025.2,-19.5,0.0,N,2.0,Wanliu


In [None]:
Wanshouxigong_df.head(5)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,9.0,9.0,6.0,17.0,200.0,62.0,0.3,1021.9,-19.0,0.0,WNW,2.0,Wanshouxigong
1,2,2013,3,1,1,11.0,11.0,7.0,14.0,200.0,66.0,-0.1,1022.4,-19.3,0.0,WNW,4.4,Wanshouxigong
2,3,2013,3,1,2,8.0,8.0,,16.0,200.0,59.0,-0.6,1022.6,-19.7,0.0,WNW,4.7,Wanshouxigong
3,4,2013,3,1,3,8.0,8.0,3.0,16.0,,,-0.7,1023.5,-20.9,0.0,NW,2.6,Wanshouxigong
4,5,2013,3,1,4,8.0,8.0,3.0,,300.0,36.0,-0.9,1024.1,-21.7,0.0,WNW,2.5,Wanshouxigong


## 5 Assessing Data

### 5.1 aotizhongxin

In [None]:
aotizhongxin_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35064 entries, 0 to 35063
Data columns (total 18 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   No       35064 non-null  int64  
 1   year     35064 non-null  int64  
 2   month    35064 non-null  int64  
 3   day      35064 non-null  int64  
 4   hour     35064 non-null  int64  
 5   PM2.5    34139 non-null  float64
 6   PM10     34346 non-null  float64
 7   SO2      34129 non-null  float64
 8   NO2      34041 non-null  float64
 9   CO       33288 non-null  float64
 10  O3       33345 non-null  float64
 11  TEMP     35044 non-null  float64
 12  PRES     35044 non-null  float64
 13  DEWP     35044 non-null  float64
 14  RAIN     35044 non-null  float64
 15  wd       34983 non-null  object 
 16  WSPM     35050 non-null  float64
 17  station  35064 non-null  object 
dtypes: float64(11), int64(5), object(2)
memory usage: 4.8+ MB


In [None]:
aotizhongxin_df.isnull().sum()

No            0
year          0
month         0
day           0
hour          0
PM2.5       925
PM10        718
SO2         935
NO2        1023
CO         1776
O3         1719
TEMP         20
PRES         20
DEWP         20
RAIN         20
wd           81
WSPM         14
station       0
dtype: int64

In [None]:
changping_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35064 entries, 0 to 35063
Data columns (total 18 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   No       35064 non-null  int64  
 1   year     35064 non-null  int64  
 2   month    35064 non-null  int64  
 3   day      35064 non-null  int64  
 4   hour     35064 non-null  int64  
 5   PM2.5    34290 non-null  float64
 6   PM10     34482 non-null  float64
 7   SO2      34436 non-null  float64
 8   NO2      34397 non-null  float64
 9   CO       33543 non-null  float64
 10  O3       34460 non-null  float64
 11  TEMP     35011 non-null  float64
 12  PRES     35014 non-null  float64
 13  DEWP     35011 non-null  float64
 14  RAIN     35013 non-null  float64
 15  wd       34924 non-null  object 
 16  WSPM     35021 non-null  float64
 17  station  35064 non-null  object 
dtypes: float64(11), int64(5), object(2)
memory usage: 4.8+ MB


In [None]:
aotizhongxin_df.duplicated()

0        False
1        False
2        False
3        False
4        False
         ...  
35059    False
35060    False
35061    False
35062    False
35063    False
Length: 35064, dtype: bool

In [None]:

def cek_nilai_tidak_akurat(df):
    print("Pemeriksaan nilai tidak akurat:")

    # Definisikan rentang nilai yang dianggap akurat untuk setiap kolom
    rentang_akurat = {
        'PM2.5': (0, 1000),
        'PM10': (0, 1500),
        'SO2': (0, 2000),
        'NO2': (0, 2000),
        'CO': (0, 200),
        'O3': (0, 500),
        'TEMP': (-50, 50),
        'PRES': (800, 1100),
        'DEWP': (-50, 50),
        'RAIN': (0, 500),
        'WSPM': (0, 100)
    }

    hasil_tidak_akurat = {}

    for kolom, (batas_bawah, batas_atas) in rentang_akurat.items():
        if kolom in df.columns:
            nilai_tidak_akurat = df[(df[kolom] < batas_bawah) | (df[kolom] > batas_atas)]
            if not nilai_tidak_akurat.empty:
                hasil_tidak_akurat[kolom] = nilai_tidak_akurat[[kolom]]
                print(f"\nNilai tidak akurat ditemukan di kolom {kolom}:")
                print(nilai_tidak_akurat[[kolom]])

    if not hasil_tidak_akurat:
        print("Tidak ditemukan nilai yang tidak akurat dalam rentang yang ditentukan.")

    return hasil_tidak_akurat

# Menjalankan fungsi
hasil_tidak_akurat = cek_nilai_tidak_akurat(aotizhongxin_df)

Pemeriksaan nilai tidak akurat:

Nilai tidak akurat ditemukan di kolom CO:
          CO
0      300.0
1      300.0
2      300.0
3      300.0
4      300.0
...      ...
35059  400.0
35060  500.0
35061  700.0
35062  700.0
35063  600.0

[31776 rows x 1 columns]


In [None]:

def cek_nilai_tidak_konsisten(df):
    print("Pemeriksaan nilai tidak konsisten:")

    hasil_tidak_konsisten = {}

    # 1. Memeriksa inkonsistensi antara PM2.5 dan PM10
    if 'PM2.5' in df.columns and 'PM10' in df.columns:
        pm_tidak_konsisten = df[df['PM2.5'] > df['PM10']]
        if not pm_tidak_konsisten.empty:
            hasil_tidak_konsisten['PM2.5 > PM10'] = pm_tidak_konsisten[['PM2.5', 'PM10']]
            print("\nDitemukan nilai PM2.5 yang lebih besar dari PM10:")
            print(pm_tidak_konsisten[['PM2.5', 'PM10']])

    # 2. Memeriksa inkonsistensi dalam arah angin
    if 'wd' in df.columns:
        arah_angin_valid = ['N', 'NNE', 'NE', 'ENE', 'E', 'ESE', 'SE', 'SSE',
                            'S', 'SSW', 'SW', 'WSW', 'W', 'WNW', 'NW', 'NNW']
        arah_angin_tidak_valid = df[~df['wd'].isin(arah_angin_valid)]
        if not arah_angin_tidak_valid.empty:
            hasil_tidak_konsisten['Arah Angin Tidak Valid'] = arah_angin_tidak_valid[['wd']]
            print("\nDitemukan arah angin yang tidak valid:")
            print(arah_angin_tidak_valid[['wd']])

    # 3. Memeriksa inkonsistensi dalam suhu dan titik embun
    if 'TEMP' in df.columns and 'DEWP' in df.columns:
        suhu_tidak_konsisten = df[df['DEWP'] > df['TEMP']]
        if not suhu_tidak_konsisten.empty:
            hasil_tidak_konsisten['DEWP > TEMP'] = suhu_tidak_konsisten[['TEMP', 'DEWP']]
            print("\nDitemukan titik embun yang lebih tinggi dari suhu:")
            print(suhu_tidak_konsisten[['TEMP', 'DEWP']])

    # 4. Memeriksa inkonsistensi dalam kecepatan angin
    if 'WSPM' in df.columns:
        kecepatan_angin_negatif = df[df['WSPM'] < 0]
        if not kecepatan_angin_negatif.empty:
            hasil_tidak_konsisten['Kecepatan Angin Negatif'] = kecepatan_angin_negatif[['WSPM']]
            print("\nDitemukan kecepatan angin negatif:")
            print(kecepatan_angin_negatif[['WSPM']])

    if not hasil_tidak_konsisten:
        print("Tidak ditemukan nilai yang tidak konsisten berdasarkan kriteria yang diperiksa.")

    return hasil_tidak_konsisten

# Menjalankan fungsi
hasil_tidak_konsisten = cek_nilai_tidak_konsisten(aotizhongxin_df)

Pemeriksaan nilai tidak konsisten:

Ditemukan nilai PM2.5 yang lebih besar dari PM10:
       PM2.5   PM10
273    116.0   90.0
274    123.0   94.0
275    130.0  104.0
276    132.0   87.0
277    129.0   84.0
...      ...    ...
34011   93.0   90.0
34013   97.0   95.0
34025  122.0  113.0
34026  128.0  122.0
34049  155.0  151.0

[1183 rows x 2 columns]

Ditemukan arah angin yang tidak valid:
        wd
6388   NaN
11718  NaN
13412  NaN
16748  NaN
17263  NaN
...    ...
34314  NaN
34334  NaN
34560  NaN
34638  NaN
34746  NaN

[81 rows x 1 columns]


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def cek_outlier(df):
    print("Pemeriksaan outlier menggunakan metode IQR:")

    # Pilih kolom numerik
    kolom_numerik = df.select_dtypes(include=[np.number]).columns

    hasil_outlier = {}

    for kolom in kolom_numerik:
        Q1 = df[kolom].quantile(0.25)
        Q3 = df[kolom].quantile(0.75)
        IQR = Q3 - Q1

        batas_bawah = Q1 - 1.5 * IQR
        batas_atas = Q3 + 1.5 * IQR

        outlier = df[(df[kolom] < batas_bawah) | (df[kolom] > batas_atas)]

        if not outlier.empty:
            hasil_outlier[kolom] = outlier[[kolom]]
            print(f"\nOutlier ditemukan di kolom {kolom}:")
            print(f"Jumlah outlier: {len(outlier)}")
            print(f"Persentase outlier: {(len(outlier) / len(df)) * 100:.2f}%")
            print(f"Range nilai outlier: {outlier[kolom].min()} hingga {outlier[kolom].max()}")


    if not hasil_outlier:
        print("Tidak ditemukan outlier berdasarkan metode IQR.")

    return hasil_outlier

# Menjalankan fungsi
hasil_outlier = cek_outlier(aotizhongxin_df)

Pemeriksaan outlier menggunakan metode IQR:

Outlier ditemukan di kolom PM2.5:
Jumlah outlier: 1624
Persentase outlier: 4.63%
Range nilai outlier: 253.0 hingga 898.0

Outlier ditemukan di kolom PM10:
Jumlah outlier: 1080
Persentase outlier: 3.08%
Range nilai outlier: 331.0 hingga 984.0

Outlier ditemukan di kolom SO2:
Jumlah outlier: 3054
Persentase outlier: 8.71%
Range nilai outlier: 49.0 hingga 341.0

Outlier ditemukan di kolom NO2:
Jumlah outlier: 472
Persentase outlier: 1.35%
Range nilai outlier: 161.0 hingga 290.0

Outlier ditemukan di kolom CO:
Jumlah outlier: 2607
Persentase outlier: 7.43%
Range nilai outlier: 3100.0 hingga 10000.0

Outlier ditemukan di kolom O3:
Jumlah outlier: 1310
Persentase outlier: 3.74%
Range nilai outlier: 193.851 hingga 423.0

Outlier ditemukan di kolom RAIN:
Jumlah outlier: 1380
Persentase outlier: 3.94%
Range nilai outlier: 0.1 hingga 72.5

Outlier ditemukan di kolom WSPM:
Jumlah outlier: 1742
Persentase outlier: 4.97%
Range nilai outlier: 4.2 hingga 1

In [None]:


# Buat tabel markdown
table = """
| Dataset     | Tipe data                                    | Missing value                                     | Duplicate data                     | Inaccurate value                       |
|-------------|----------------------------------------------|---------------------------------------------------|-------------------------------------|----------------------------------------|
| aotizhongxin | <br>Terdapat kesalahan tipe :<br>1.data(hour,day,month,year)menjadi date_times<br><br>2.colom wd menjadi categori <br>3. station menjadi categori<br>| Terdapat   missing values:    <br> 1. 925 missing values pada PM2.5. <br>    | Terdapat 11 data yang duplikat.      | Terdapat inaccurate value pada kolom age. |
| orders_df   | Terdapat kesalahan tipe data untuk kolom order_date & delivery_date. | - | -                                   | -                                      |
| product_df  | -                                            | -                                                 | Terdapat 6 data yang duplikat.      | -                                      |
| sales_df    | -                                            | Terdapat 19 missing value pada kolom total_price. | -                                   | -                                      |
"""




In [None]:
display(Markdown(table))



| Dataset     | Tipe data                                    | Missing value                                     | Duplicate data                     | Inaccurate value                       |
|-------------|----------------------------------------------|---------------------------------------------------|-------------------------------------|----------------------------------------|
| aotizhongxin | <br>Terdapat kesalahan tipe :<br>1.data(hour,day,month,year)menjadi date_times<br><br>2.colom wd menjadi categori <br>3. station menjadi categori<br>| Terdapat   missing values:    <br> 1. 925 missing values pada PM2.5. <br>    | Terdapat 11 data yang duplikat.      | Terdapat inaccurate value pada kolom age. |
| orders_df   | Terdapat kesalahan tipe data untuk kolom order_date & delivery_date. | - | -                                   | -                                      |
| product_df  | -                                            | -                                                 | Terdapat 6 data yang duplikat.      | -                                      |
| sales_df    | -                                            | Terdapat 19 missing value pada kolom total_price. | -                                   | -                                      |


## 6 Cleaning Data

### 6.1 Aotizhongxin

#### 6.2 Merubah tipe data

In [None]:
# Menggabungkan kolom year, month, day, dan hour menjadi satu kolom datetime
aotizhongxin_df['datetime'] = pd.to_datetime(aotizhongxin_df[['year', 'month', 'day', 'hour']])

In [None]:
aotizhongxin_df.set_index('datetime', inplace=True)

In [None]:
aotizhongxin_df['wd'] = aotizhongxin_df['wd'].astype('category')

In [None]:
aotizhongxin_df['station'] = aotizhongxin_df['station'].astype('category')

#### 6.1.3 Mengisi nilai yang hilang

In [None]:
pollutants = ['PM2.5', 'PM10', 'SO2', 'NO2', 'CO', 'O3']
aotizhongxin_df[pollutants] = aotizhongxin_df[pollutants].interpolate(method='time')

In [None]:
meteorological = ['TEMP', 'PRES', 'DEWP', 'RAIN']
aotizhongxin_df[meteorological] = aotizhongxin_df[meteorological].interpolate(method='linear')

In [None]:
aotizhongxin_df['wd'] = aotizhongxin_df['wd'].fillna(method='ffill')

In [None]:
aotizhongxin_df['WSPM'] = aotizhongxin_df['WSPM'].fillna(method='ffill')

In [None]:
aotizhongxin_df.isna().sum()

Unnamed: 0,0
No,0
year,0
month,0
day,0
hour,0
PM2.5,0
PM10,0
SO2,0
NO2,0
CO,0


In [None]:
aotizhongxin_df.head(5)

Unnamed: 0_level_0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2013-03-01 00:00:00,1,2013,3,1,0,4.0,4.0,4.0,7.0,300.0,77.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Aotizhongxin
2013-03-01 01:00:00,2,2013,3,1,1,8.0,8.0,4.0,7.0,300.0,77.0,-1.1,1023.2,-18.2,0.0,N,4.7,Aotizhongxin
2013-03-01 02:00:00,3,2013,3,1,2,7.0,7.0,5.0,10.0,300.0,73.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Aotizhongxin
2013-03-01 03:00:00,4,2013,3,1,3,6.0,6.0,11.0,11.0,300.0,72.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Aotizhongxin
2013-03-01 04:00:00,5,2013,3,1,4,3.0,3.0,12.0,12.0,300.0,72.0,-2.0,1025.2,-19.5,0.0,N,2.0,Aotizhongxin


### 6.2 changping

#### merubah tipe data

In [None]:
changping_df['datetime'] = pd.to_datetime(changping_df[['year', 'month', 'day', 'hour']])

In [None]:
changping_df.set_index('datetime', inplace=True)

In [None]:
changping_df['wd'] = changping_df['wd'].astype('category')
changping_df['station'] = changping_df['station'].astype('category')
changping_df['PRES'] = changping_df['PRES'].astype('float64')
changping_df['DEWP'] = changping_df['DEWP'].astype('float64')

In [None]:
pollutants = ['PM2.5', 'PM10', 'SO2', 'NO2', 'CO', 'O3']
changping_df[pollutants] = changping_df[pollutants].interpolate(method='time')

In [None]:
meteorological = ['TEMP', 'PRES', 'DEWP', 'RAIN']
changping_df[meteorological] = changping_df[meteorological].interpolate(method='linear')

## 7 Exploratory Data Analysis (EDA)

### 7.1 Explore

#### 7.1.1 Aoti

##### 7.1.1.1 Melihat korelasi dari tiap tiap data frame yang ada

In [None]:
aotizhongxin_df.corr(numeric_only=True)
# The numeric_only parameter is set to True to only include numerical columns in the correlation calculation

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,WSPM
No,1.0,0.9695331,0.04318051,0.01776442,0.0006838683,-0.02672,-0.082777,-0.25001,-0.087975,0.066792,0.064187,-0.108054,0.237334,-0.133366,0.002161,0.095806
year,0.969533,1.0,-0.2020099,-0.005569082,1.164482e-15,-0.029873,-0.074631,-0.186096,-0.112742,0.051737,0.090127,-0.13765,0.233167,-0.197555,-0.001095,0.133673
month,0.043181,-0.2020099,1.0,0.01052232,4.72477e-16,0.014398,-0.029275,-0.242023,0.107467,0.056978,-0.110452,0.131137,-0.006452,0.273825,0.013523,-0.161306
day,0.017764,-0.005569082,0.01052232,1.0,-4.4895550000000006e-17,0.004067,0.030897,-0.01399,0.006609,-0.018661,0.002287,0.014246,0.022745,0.023386,-0.002517,-0.01662
hour,0.000684,1.164482e-15,4.72477e-16,-4.4895550000000006e-17,1.0,-0.01047,0.021359,0.001791,-0.044821,-0.044939,0.296022,0.141348,-0.037706,-0.013198,0.011522,0.155468
PM2.5,-0.02672,-0.02987343,0.01439807,0.004066935,-0.01046964,1.0,0.875198,0.479025,0.682795,0.786052,-0.160271,-0.122505,-0.008796,0.123277,-0.01378,-0.27586
PM10,-0.082777,-0.0746311,-0.02927498,0.03089673,0.02135908,0.875198,1.0,0.469399,0.65004,0.682026,-0.141969,-0.109321,-0.035391,0.061443,-0.027816,-0.179645
SO2,-0.25001,-0.1860963,-0.2420228,-0.01399027,0.001790773,0.479025,0.469399,1.0,0.430005,0.523269,-0.206802,-0.352274,0.205117,-0.284395,-0.041565,-0.112352
NO2,-0.087975,-0.1127423,0.1074667,0.006609379,-0.0448211,0.682795,0.65004,0.430005,1.0,0.687243,-0.495797,-0.232562,0.074193,0.072417,-0.039261,-0.487299
CO,0.066792,0.05173683,0.05697822,-0.01866092,-0.04493931,0.786052,0.682026,0.523269,0.687243,1.0,-0.3206,-0.359192,0.206537,-0.096834,-0.016204,-0.27561


##### 7.1.1.2 Analisis Polutas PM 2,5 dan PM10



1.   Polutan Per Hari
2.   Polutan Per bulan
3.   Polutan Per Minggu
4.   Polutan Per Tahun





In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 03

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 3 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_03 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_03['avg_PM25'] = partikulasi_polusi_harian_2013_03['avg_PM25'].round()
partikulasi_polusi_harian_2013_03['avg_PM10'] = partikulasi_polusi_harian_2013_03['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_03.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,3,1,7.0,11.0
1,2013,3,2,31.0,42.0
2,2013,3,3,77.0,121.0
3,2013,3,4,23.0,45.0
4,2013,3,5,149.0,184.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 04

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 4 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_04 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_04['avg_PM25'] = partikulasi_polusi_harian_2013_04['avg_PM25'].round()
partikulasi_polusi_harian_2013_04['avg_PM10'] = partikulasi_polusi_harian_2013_04['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_04.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,4,1,95.0,113.0
1,2013,4,2,100.0,141.0
2,2013,4,3,124.0,205.0
3,2013,4,4,67.0,111.0
4,2013,4,5,55.0,50.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 05

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 5 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_05 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_05['avg_PM25'] = partikulasi_polusi_harian_2013_05['avg_PM25'].round()
partikulasi_polusi_harian_2013_05['avg_PM10'] = partikulasi_polusi_harian_2013_05['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_05.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,5,1,35.0,91.0
1,2013,5,2,81.0,127.0
2,2013,5,3,85.0,76.0
3,2013,5,4,102.0,121.0
4,2013,5,5,197.0,242.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 06

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 6 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_06 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_06['avg_PM25'] = partikulasi_polusi_harian_2013_06['avg_PM25'].round()
partikulasi_polusi_harian_2013_06['avg_PM10'] = partikulasi_polusi_harian_2013_06['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_06.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,6,1,70.0,123.0
1,2013,6,2,154.0,184.0
2,2013,6,3,118.0,156.0
3,2013,6,4,123.0,167.0
4,2013,6,5,123.0,118.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 07

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 7 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_07 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_07['avg_PM25'] = partikulasi_polusi_harian_2013_07['avg_PM25'].round()
partikulasi_polusi_harian_2013_07['avg_PM10'] = partikulasi_polusi_harian_2013_07['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_07.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,7,1,175.0,147.0
1,2013,7,2,14.0,28.0
2,2013,7,3,27.0,80.0
3,2013,7,4,22.0,64.0
4,2013,7,5,20.0,48.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 08

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 8 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_08 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_08['avg_PM25'] = partikulasi_polusi_harian_2013_08['avg_PM25'].round()
partikulasi_polusi_harian_2013_08['avg_PM10'] = partikulasi_polusi_harian_2013_08['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_08.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,8,1,43.0,38.0
1,2013,8,2,70.0,74.0
2,2013,8,3,26.0,52.0
3,2013,8,4,78.0,93.0
4,2013,8,5,45.0,48.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 09

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 9 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_09 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_09['avg_PM25'] = partikulasi_polusi_harian_2013_09['avg_PM25'].round()
partikulasi_polusi_harian_2013_09['avg_PM10'] = partikulasi_polusi_harian_2013_09['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_09.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,9,1,42.0,91.0
1,2013,9,2,48.0,76.0
2,2013,9,3,94.0,132.0
3,2013,9,4,70.0,66.0
4,2013,9,5,20.0,24.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 10

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 5 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_10 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_10['avg_PM25'] = partikulasi_polusi_harian_2013_10['avg_PM25'].round()
partikulasi_polusi_harian_2013_10['avg_PM10'] = partikulasi_polusi_harian_2013_10['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_10.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,5,1,35.0,91.0
1,2013,5,2,81.0,127.0
2,2013,5,3,85.0,76.0
3,2013,5,4,102.0,121.0
4,2013,5,5,197.0,242.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 11

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 11 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_11 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_11['avg_PM25'] = partikulasi_polusi_harian_2013_11['avg_PM25'].round()
partikulasi_polusi_harian_2013_11['avg_PM10'] = partikulasi_polusi_harian_2013_11['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_11.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,11,1,184.0,201.0
1,2013,11,2,228.0,232.0
2,2013,11,3,50.0,54.0
3,2013,11,4,37.0,61.0
4,2013,11,5,166.0,222.0


In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 12 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_12 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_12['avg_PM25'] = partikulasi_polusi_harian_2013_12['avg_PM25'].round()
partikulasi_polusi_harian_2013_12['avg_PM10'] = partikulasi_polusi_harian_2013_12['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_12.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10
0,2013,12,1,66.0,96.0
1,2013,12,2,109.0,134.0
2,2013,12,3,74.0,103.0
3,2013,12,4,86.0,108.0
4,2013,12,5,38.0,57.0


In [None]:
# menghitung polusi bulanan 2013 bulan 03 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month >= 3 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2013 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2013.head()


Unnamed: 0,year,month,avg_PM25,avg_PM10
0,2013,3,110.0,145.0
1,2013,4,63.0,108.0
2,2013,5,85.0,141.0
3,2013,6,106.0,129.0
4,2013,7,69.0,85.0


In [None]:
# menghitung polusi bulanan 2014 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2014 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2014 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2014.head()


Unnamed: 0,year,month,avg_PM25,avg_PM10
0,2014,1,95.0,125.0
1,2014,2,150.0,156.0
2,2014,3,99.0,147.0
3,2014,4,103.0,168.0
4,2014,5,71.0,134.0


In [None]:
# menghitung polusi bulanan 2015 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2015 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2015 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2015.head()


Unnamed: 0,year,month,avg_PM25,avg_PM10
0,2015,1,91.0,115.0
1,2015,2,85.0,123.0
2,2015,3,88.0,156.0
3,2015,4,77.0,135.0
4,2015,5,59.0,110.0


In [None]:
# menghitung polusi bulanan 2016 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2016 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2016 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2016.head()


Unnamed: 0,year,month,avg_PM25,avg_PM10
0,2016,1,68.0,86.0
1,2016,2,45.0,61.0
2,2016,3,103.0,155.0
3,2016,4,71.0,115.0
4,2016,5,53.0,76.0


In [None]:
# menghitung polusi bulanan 2017 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2017 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2017 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2017.head()


Unnamed: 0,year,month,avg_PM25,avg_PM10
0,2017,1,116.0,128.0
1,2017,2,71.0,84.0


In [None]:
# Fungsi untuk menjalankan query SQL
pysqldf = lambda q: sqldf(q, globals())

In [None]:

query = '''
SELECT
    year,
    ROUND(AVG("PM2.5")) as PM_2_5,
    ROUND(AVG("PM10")) as PM_10
FROM
    aotizhongxin_df
WHERE
    year BETWEEN 2013 AND 2017
GROUP BY
    year
ORDER BY
    year;
'''

# Using sqldf directly, as it is now imported
aotizhongxin_partikulasi_polusi__tahunan = sqldf(query, globals())
aotizhongxin_partikulasi_polusi__tahunan.head()

Unnamed: 0,year,PM_2_5,PM_10
0,2013,82.0,113.0
1,2014,90.0,122.0
2,2015,82.0,112.0
3,2016,74.0,94.0
4,2017,94.0,107.0


In [82]:
aotizhongxin_df[['PM2.5', 'PM10','NO2' ,'CO','year','month','day','hour']].corr(method='spearman')


Unnamed: 0,PM2.5,PM10,NO2,CO,year,month,day,hour
PM2.5,1.0,0.891137,0.682801,0.812637,-0.077156,-0.000423,0.015514,-0.011784
PM10,0.891137,1.0,0.650496,0.714457,-0.11378,-0.03297,0.034192,0.031203
NO2,0.682801,0.650496,1.0,0.735803,-0.153301,0.101877,0.012201,-0.075936
CO,0.812637,0.714457,0.735803,1.0,-0.021805,0.044578,0.006298,-0.034026
year,-0.077156,-0.11378,-0.153301,-0.021805,1.0,-0.184746,-0.004935,0.0
month,-0.000423,-0.03297,0.101877,0.044578,-0.184746,1.0,0.010475,0.0
day,0.015514,0.034192,0.012201,0.006298,-0.004935,0.010475,1.0,0.0
hour,-0.011784,0.031203,-0.075936,-0.034026,0.0,0.0,0.0,1.0




1.   Item daftar
2.   Item daftar
3.   Item daftar
4.   Item daftar



##### 7.1.1.3 pola perubahan kualitas udara

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 03

In [85]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 3 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_03 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_03['avg_PM25'] = pola_perubahan_kualitas_harian_2013_03['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_03['avg_PM10'] = pola_perubahan_kualitas_harian_2013_03['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_03['avg_NO2'] = pola_perubahan_kualitas_harian_2013_03['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_03['avg_CO'] = pola_perubahan_kualitas_harian_2013_03['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_03.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,3,1,7.0,11.0,23.0,429.0
1,2013,3,2,31.0,42.0,67.0,825.0
2,2013,3,3,77.0,121.0,81.0,1621.0
3,2013,3,4,23.0,45.0,46.0,606.0
4,2013,3,5,149.0,184.0,133.0,2358.0


In [87]:
# pola perubahan kualitas udara mingguan 2013 bulan 04

In [88]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 4 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_04 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_04['avg_PM25'] = pola_perubahan_kualitas_harian_2013_04['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_04['avg_PM10'] = pola_perubahan_kualitas_harian_2013_04['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_04['avg_NO2'] = pola_perubahan_kualitas_harian_2013_04['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_04['avg_CO'] = pola_perubahan_kualitas_harian_2013_04['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_04.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,4,1,95.0,113.0,39.0,1208.0
1,2013,4,2,100.0,141.0,69.0,1317.0
2,2013,4,3,124.0,205.0,88.0,1454.0
3,2013,4,4,67.0,111.0,85.0,2270.0
4,2013,4,5,55.0,50.0,41.0,1179.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 05

In [89]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 5 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_05 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_05['avg_PM25'] = pola_perubahan_kualitas_harian_2013_05['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_05['avg_PM10'] = pola_perubahan_kualitas_harian_2013_05['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_05['avg_NO2'] = pola_perubahan_kualitas_harian_2013_05['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_05['avg_CO'] = pola_perubahan_kualitas_harian_2013_05['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_05.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,5,1,35.0,91.0,48.0,565.0
1,2013,5,2,81.0,127.0,64.0,1000.0
2,2013,5,3,85.0,76.0,30.0,671.0
3,2013,5,4,102.0,121.0,37.0,638.0
4,2013,5,5,197.0,242.0,72.0,1367.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 06

In [92]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 6 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_06 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_06['avg_PM25'] = pola_perubahan_kualitas_harian_2013_06['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_06['avg_PM10'] = pola_perubahan_kualitas_harian_2013_06['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_06['avg_NO2'] = pola_perubahan_kualitas_harian_2013_06['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_06['avg_CO'] = pola_perubahan_kualitas_harian_2013_06['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_06.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,6,1,70.0,123.0,61.0,571.0
1,2013,6,2,154.0,184.0,68.0,1546.0
2,2013,6,3,118.0,156.0,79.0,1267.0
3,2013,6,4,123.0,167.0,89.0,1671.0
4,2013,6,5,123.0,118.0,56.0,1537.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 07

In [94]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 7 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_07 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_07['avg_PM25'] = pola_perubahan_kualitas_harian_2013_07['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_07['avg_PM10'] = pola_perubahan_kualitas_harian_2013_07['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_07['avg_NO2'] = pola_perubahan_kualitas_harian_2013_07['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_07['avg_CO'] = pola_perubahan_kualitas_harian_2013_07['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_07.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,7,1,175.0,147.0,73.0,2102.0
1,2013,7,2,14.0,28.0,44.0,375.0
2,2013,7,3,27.0,80.0,72.0,504.0
3,2013,7,4,22.0,64.0,57.0,400.0
4,2013,7,5,20.0,48.0,46.0,350.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 07

In [95]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 7 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_07 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_07['avg_PM25'] = pola_perubahan_kualitas_harian_2013_07['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_07['avg_PM10'] = pola_perubahan_kualitas_harian_2013_07['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_07['avg_NO2'] = pola_perubahan_kualitas_harian_2013_07['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_07['avg_CO'] = pola_perubahan_kualitas_harian_2013_07['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_07.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,7,1,175.0,147.0,73.0,2102.0
1,2013,7,2,14.0,28.0,44.0,375.0
2,2013,7,3,27.0,80.0,72.0,504.0
3,2013,7,4,22.0,64.0,57.0,400.0
4,2013,7,5,20.0,48.0,46.0,350.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 08

In [96]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 8 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_08 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_08['avg_PM25'] = pola_perubahan_kualitas_harian_2013_08['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_08['avg_PM10'] = pola_perubahan_kualitas_harian_2013_08['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_08['avg_NO2'] = pola_perubahan_kualitas_harian_2013_08['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_08['avg_CO'] = pola_perubahan_kualitas_harian_2013_08['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_08.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,8,1,43.0,38.0,63.0,623.0
1,2013,8,2,70.0,74.0,59.0,825.0
2,2013,8,3,26.0,52.0,44.0,340.0
3,2013,8,4,78.0,93.0,41.0,643.0
4,2013,8,5,45.0,48.0,48.0,460.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 09

In [97]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 9 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_09 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_09['avg_PM25'] = pola_perubahan_kualitas_harian_2013_09['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_09['avg_PM10'] = pola_perubahan_kualitas_harian_2013_09['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_09['avg_NO2'] = pola_perubahan_kualitas_harian_2013_09['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_09['avg_CO'] = pola_perubahan_kualitas_harian_2013_09['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_09.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,9,1,42.0,91.0,50.0,858.0
1,2013,9,2,48.0,76.0,51.0,812.0
2,2013,9,3,94.0,132.0,56.0,1033.0
3,2013,9,4,70.0,66.0,52.0,917.0
4,2013,9,5,20.0,24.0,38.0,488.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 10

In [98]:
filtered_df = aotizhongxin_df.query('year == 2013 and month ==10 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_10 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_10['avg_PM25'] = pola_perubahan_kualitas_harian_2013_10['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_10['avg_PM10'] = pola_perubahan_kualitas_harian_2013_10['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_10['avg_NO2'] = pola_perubahan_kualitas_harian_2013_10['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_10['avg_CO'] = pola_perubahan_kualitas_harian_2013_10['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_10.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,10,1,76.0,86.0,61.0,1029.0
1,2013,10,2,16.0,30.0,39.0,304.0
2,2013,10,3,57.0,92.0,63.0,1167.0
3,2013,10,4,139.0,171.0,92.0,2121.0
4,2013,10,5,281.0,300.0,113.0,2267.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 11

In [99]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 11 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_11 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_11['avg_PM25'] = pola_perubahan_kualitas_harian_2013_11['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_11['avg_PM10'] = pola_perubahan_kualitas_harian_2013_11['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_11['avg_NO2'] = pola_perubahan_kualitas_harian_2013_11['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_11['avg_CO'] = pola_perubahan_kualitas_harian_2013_11['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_11.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,11,1,184.0,201.0,106.0,1758.0
1,2013,11,2,228.0,232.0,112.0,2042.0
2,2013,11,3,50.0,54.0,43.0,692.0
3,2013,11,4,37.0,61.0,59.0,675.0
4,2013,11,5,166.0,222.0,100.0,2117.0


In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 12

In [100]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 12 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_12 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_12['avg_PM25'] = pola_perubahan_kualitas_harian_2013_12['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_12['avg_PM10'] = pola_perubahan_kualitas_harian_2013_12['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_12['avg_NO2'] = pola_perubahan_kualitas_harian_2013_12['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_12['avg_CO'] = pola_perubahan_kualitas_harian_2013_12['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_12.head()

Unnamed: 0,year,month,day,avg_PM25,avg_PM10,avg_NO2,avg_CO
0,2013,12,1,66.0,96.0,74.0,1500.0
1,2013,12,2,109.0,134.0,84.0,2683.0
2,2013,12,3,74.0,103.0,62.0,1875.0
3,2013,12,4,86.0,108.0,71.0,2154.0
4,2013,12,5,38.0,57.0,44.0,1175.0


## 8 Visualization & Explanatory Analysis

### 8.1 Pertanyaan 1:

In [None]:
# waktu kapan yang paling besar tingkat polusi udara mingguan

In [104]:
import plotly.graph_objects as go
import pandas as pd

# Gabungkan semua dataframe per tahun
all_years_data = pd.concat([aotizhongxin_partikulasi_polusi_bulanan_2013,
                            aotizhongxin_partikulasi_polusi_bulanan_2014,
                            aotizhongxin_partikulasi_polusi_bulanan_2015,
                            aotizhongxin_partikulasi_polusi_bulanan_2016,
                            aotizhongxin_partikulasi_polusi_bulanan_2017])

# Definisikan warna berbeda untuk tiap tahun
year_colors = {
    2013: 'red',    # Warna untuk tahun 2013
    2014: 'blue',   # Warna untuk tahun 2014
    2015: 'green',  # Warna untuk tahun 2015
    2016: 'orange', # Warna untuk tahun 2016
    2017: 'purple'  # Warna untuk tahun 2017
}

# Membuat bar chart menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk tiap tahun dengan warna spesifik
for year in range(2013, 2017):
    yearly_data = all_years_data[all_years_data['year'] == year]
    fig.add_trace(go.Bar(
        x=yearly_data['month'],
        y=yearly_data['avg_PM25'],
        name=f'{year}',
        marker_color=year_colors[year],  # Tentukan warna berdasarkan tahun
        hovertemplate='<b>Bulan %{x}</b><br>Tahun: %{customdata}<br>PM2.5: %{y:.2f} μg/m³<extra></extra>',
        customdata=yearly_data['year']  # Menyimpan data tahun untuk tooltip
    ))

# Layout dan label
fig.update_layout(
    title='Polusi PM2.5 per Bulan (2013-2017)',
    xaxis=dict(title='Bulan', tickvals=list(range(1, 13)),
               ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']),
    yaxis=dict(title='Kadar PM2.5 (μg/m³)'),
    barmode='group',
    hovermode='x unified'  # Mengaktifkan tooltip gabungan untuk setiap tahun di satu bulan
)

# Tampilkan grafik
fig.show()


### 8.1 Pertanyaan 2:

## 9 Analisis Lanjutan (Opsional)

## 10 Conclusion