<a href="https://colab.research.google.com/github/Wlnfadhil/Analisa-Data-Air-Quality-Control/blob/coca-coba-code/submission/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Proyek Analisis Data: PRSA_Data_20130301-20170228
- **Nama:** Wildan Fadhil Nazaruddin
- **Email:** wildanfadhil76@gmial.com
- **ID Dicoding:**

## Air  Data Quality

Air particulate matter in various regions has a significant impact on public health. Year-over-year data analysis is crucial in guiding informed decision-making to mitigate the effects of global warming. This project aims to analyze weather conditions and air quality across different regions of China to better understand environmental trends and their implications. The dataset used in this project includes data from 12 provinces in China, which will be examined to identify patterns and provide insights for more effective policy-making in maintaining air quality and public health. This analysis can also contribute to developing better strategies for mitigating the ongoing climate change.

### 1.1 Clases

- PRSA_Data_Aotizhongxin: Data collected from the Aotizhongxin station.
- PRSA_Data_Changping: Data from the Changping station.
- PRSA_Data_Dingling: Data from the Dingling station.
- PRSA_Data_Dongsi: Data from the Dongsi station.
- PRSA_Data_Guanyuan: Data from the Guanyuan station.
- PRSA_Data_Gucheng: Data from the Gucheng station.
- PRSA_Data_Huairou: Data from the Huairou station.
- PRSA_Data_Tiantan: Data from the Tiantan station.
- PRSA_Data_Wanliu: Data from the Wanliu station.

### 1.2 Methodology

1. Data Collection and Cleaning:
- - -
First, we will consolidate the datasets from the 12 provinces. Data cleaning will involve handling missing values, correcting inconsistencies, and ensuring all datasets are standardized.
Descriptive Statistics:
- - -
Descriptive statistics such as mean, median, standard deviation, and interquartile range (IQR) will be used to summarize the key characteristics of the air particulate data (PM2.5, PM10) across different provinces. Visualizations like histograms, box plots, and time series plots will be used to better understand the distribution and spread of the data.
Correlation Analysis:
- - -
To identify the relationship between different pollutants and weather conditions, we will conduct a Pearson or Spearman correlation analysis. This will help in understanding how temperature, humidity, or wind speed affect the levels of particulate matter in the air.
Trend Analysis:
- - -
Trend analysis will be performed to observe how air quality changes over time (seasonally or annually) and across provinces. We will use time series decomposition to break down the data into trend, seasonal, and residual components, enabling a clearer view of underlying patterns.
Geospatial Analysis:
- - -
By plotting data on maps, we will explore the geographical distribution of air particulate matter across provinces, using spatial visualization tools to observe how air quality varies between regions.
Hypothesis Testing:
- - -
Statistical hypothesis tests (such as t-tests or ANOVA) will be used to compare air quality between different regions or time periods, determining whether observed differences are statistically significant.

### 1.3 Deployment

Data set

```
! git clone https://github.com/Wlnfadhil/Analisa-Data-Air-Quality-Control.git
```



## 1 code enggine

In [1]:
# ! git clone https://github.com/Wlnfadhil/Analisa-Data-Air-Quality-Control.git

#### 1.1   menganalisa KORELASI SUHU DAN TEMP DAN PERBANDINGAN DENGAN PRES 

##### 1.1.1  code untuk menganalisa KORELASI SUHU DAN TEMP DAN PERBANDINGAN DENGAN PRES harian

In [2]:
def korelasi_suhu_(df, year, month, day_start, day_end):
    
    filtered_df = df.query('year == @year and month == @month and day >= @day_start and day <= @day_end')

    result = (
        filtered_df.groupby(['year', 'month', 'day'])
        .agg(avg_TEMP=('TEMP', 'mean'), avg_O3=('O3', 'mean'), avg_PRES=('PRES', 'mean'))
        .reset_index()
    )

    result['avg_TEMP'] = result['avg_TEMP'].round()
    result['avg_O3'] = result['avg_O3'].round()
    result['avg_PRES'] = result['avg_PRES'].round()

    return result

### 1.2 code untuk menganalisa Polutan Pm 2.5  DAN pm 10 

#### 1.2.1 Polusi Harian

In [1]:
def partikulasi_polusi(df, year, month, day_start, day_end):
    filtered_df = df.query('year == @year and month == @month and day >= @day_start and day <= @day_end')

    result = (
        filtered_df.groupby(['year', 'month', 'day'])
        .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
        .reset_index()
    )

    result['avg_PM25'] = result['avg_PM25'].round()
    result['avg_PM10'] = result['avg_PM10'].round()

    return result

#### 1.2.2 partikulasi_polusi_mingguan

In [2]:
def partikulasi_polusi_mingguan(df, year, month):
    # Filter the DataFrame for the specified year and month
    filtered_df = df.query('year == @year and month == @month')

    filtered_df['week'] = filtered_df['day'].apply(lambda x: (x - 1) // 7 + 1) 

    result = (
        filtered_df.groupby(['year', 'month', 'week'])
        .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
        .reset_index()
    )

    # Round the averages to the nearest integer
    result['avg_PM25'] = result['avg_PM25'].round()
    result['avg_PM10'] = result['avg_PM10'].round()

    return result


#### 1.2.3 partikulasi_polusi_Tahunan

In [3]:
def partikulasi_polusi_tahunan(df, year):
    filtered_df = df.query('year == @year' )

    result = (
        filtered_df.groupby(['year'])
        .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
        .reset_index()
    )

    # Fix indentation for rounding and assignment
    result['avg_PM25'] = result['avg_PM25'].round()
    result['avg_PM10'] = result['avg_PM10'].round()

    return result

#### 1.2.4 partikulasi_polusi_bulanan

In [4]:
def partikulasi_polusi_bulanan(df, year, month):
    filtered_df = df.query('year == @year and month == @month')

    result = (
        filtered_df.groupby(['year', 'month'])
        .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
        .reset_index()
    )

    result['avg_PM25'] = result['avg_PM25'].round()
    result['avg_PM10'] = result['avg_PM10'].round()

    return result

### 1.3 menganalisa KORELASI perubahan kualitas udara

#### 1.3.1 code untuk menganalisa KORELASI perubahan kualitas udara harian

In [5]:
def kualitas_udara(df, year, month, day_start, day_end):  # Function definition

    # Filter the data based on year, month, and day range
    filtered_df = df.query('year == @year and month == @month and day >= @day_start and day <= @day_end')

    # Group the filtered data by year, month, and day, and calculate mean of pollutants
    result = (
        filtered_df.groupby(['year', 'month', 'day'])
        .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
        .reset_index()
    )

    # Round the calculated averages
    result['avg_PM25'] = result['avg_PM25'].round()
    result['avg_PM10'] = result['avg_PM10'].round()
    result['avg_NO2'] = result['avg_NO2'].round()
    result['avg_CO'] = result['avg_CO'].round()

    # Return the result
    return result


### 1.4 Data Cleaning 

In [6]:
def preprocess_dataframe(df):
    df['date'] = pd.to_datetime(df[['year', 'month', 'day', 'hour']]) 
    df.set_index('date', inplace=True) 

    df['wd'] = df['wd'].astype('category')
    df['station'] = df['station'].astype('category')

    pollutants = ['PM2.5', 'PM10', 'SO2', 'NO2', 'CO', 'O3']
    df[pollutants] = df[pollutants].interpolate(method='time')

    meteorological = ['TEMP', 'PRES', 'DEWP', 'RAIN']
    df[meteorological] = df[meteorological].interpolate(method='linear')

    df['wd'] = df['wd'].fillna(method='ffill')
    df['WSPM'] = df['WSPM'].fillna(method='ffill')

    return df

## 2 Menentukan Pertanyaan Bisnis

- Pertanyaan 1 :
  - What are the primary trends in air quality levels (PM2.5, PM10) across the 12 provinces in China over the observed time period (2013-2017)?
- Pertanyaan 2 :
  - How do various weather conditions (e.g., temperature, humidity, wind speed) correlate with particulate matter concentrations (PM2.5, PM10) in each province?
- Pertanyaan 3 :
   - Which regions show the highest and lowest levels of air particulate matter, and what factors contribute to these regional differences?
- Pertanyaan 4 :   
   - How do seasonal variations (e.g., winter vs. summer) impact air quality across the provinces, and what are the contributing factors?
- Pertanyaan 5 :
   - What actionable insights can be drawn from this analysis to inform policy decisions aimed at improving air quality and mitigating public health risks?


## 3 Import Semua Packages/Library yang Digunakan

In [8]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from IPython.display import Markdown, display
from pandasql import sqldf
import pandasql as psql
import warnings
warnings.filterwarnings('ignore')
import plotly.graph_objects as go



ModuleNotFoundError: No module named 'matplotlib'

In [None]:
## 4 Data Wrangling

### 4.1 Gathering Data

#### 4.1.1 Load data

##### 4.1.1.1 Menentukan Direktori yang di tuju

In [13]:
current_dir = os.getcwd()

csv_files = [
    "PRSA_Data_Aotizhongxin_20130301-20170228.csv",
    "PRSA_Data_Changping_20130301-20170228.csv",
    "PRSA_Data_Dingling_20130301-20170228.csv",
    "PRSA_Data_Dongsi_20130301-20170228.csv",
    "PRSA_Data_Guanyuan_20130301-20170228.csv",
    "PRSA_Data_Gucheng_20130301-20170228.csv",
    "PRSA_Data_Huairou_20130301-20170228.csv",
    "PRSA_Data_Nongzhanguan_20130301-20170228.csv",
    "PRSA_Data_Shunyi_20130301-20170228.csv",
    "PRSA_Data_Tiantan_20130301-20170228.csv",
    "PRSA_Data_Wanliu_20130301-20170228.csv",
    "PRSA_Data_Wanshouxigong_20130301-20170228.csv"
]

dataframes = {}
for csv_file in csv_files:
    file_path = os.path.join(current_dir, "data", csv_file)  # Update directory here

    if os.path.exists(file_path):
        print(f"File {csv_file} ditemukan di {file_path}")
        try:
            df = pd.read_csv(file_path)
            location = csv_file.split('_')[2]
            dataframes[location] = df
        except Exception as e:
            print(f"Terjadi kesalahan saat membaca file {csv_file}: {e}")
    else:
        print(f"File {csv_file} TIDAK ditemukan di {file_path}")


NameError: name 'os' is not defined

In [12]:
aotizhongxin_df = dataframes['Aotizhongxin']

changping_df = dataframes['Changping']

dingling_df = dataframes['Dingling']

dongsi_df = dataframes['Dongsi']

guanyuan_df = dataframes['Guanyuan']

gucheng_df = dataframes['Gucheng']

huairou_df = dataframes ['Huairou']

Nongzhanguan_df = dataframes['Nongzhanguan']

shunyi_df = dataframes['Shunyi']

tiantian_df = dataframes['Tiantan']

wanliu_df = dataframes['Wanliu']

Wanshouxigong_df = dataframes['Wanshouxigong']


NameError: name 'dataframes' is not defined

##### 4.1.1.2 Melihat Info Data

###### 4.1.1.2.1 Aotizhongxin

In [None]:
aotizhongxin_df.head(5)

In [None]:
aotizhongxin_df.describe()

###### 4.1.1.2.2 changping

In [None]:
changping_df.head(5)

In [None]:
dingling_df.head(5)

In [None]:
dongsi_df.head(5)

In [None]:
guanyuan_df.head(5)

In [None]:
guanyuan_df.head(5)

In [None]:
huairou_df.head(5)

In [None]:
Nongzhanguan_df.head(5)

In [None]:
shunyi_df.head(5)

In [None]:
tiantian_df.head(5)

In [None]:
wanliu_df.head(5)

In [None]:
Wanshouxigong_df.head(5)

## 5 Assessing Data

### 5.1 aotizhongxin

In [None]:
aotizhongxin_df.info()

In [None]:
aotizhongxin_df.isnull().sum()

In [None]:
changping_df.info()

In [None]:
aotizhongxin_df.duplicated()

In [None]:

def cek_nilai_tidak_akurat(df):
    print("Pemeriksaan nilai tidak akurat:")

    # Definisikan rentang nilai yang dianggap akurat untuk setiap kolom
    rentang_akurat = {
        'PM2.5': (0, 1000),
        'PM10': (0, 1500),
        'SO2': (0, 2000),
        'NO2': (0, 2000),
        'CO': (0, 200),
        'O3': (0, 500),
        'TEMP': (-50, 50),
        'PRES': (800, 1100),
        'DEWP': (-50, 50),
        'RAIN': (0, 500),
        'WSPM': (0, 100)
    }

    hasil_tidak_akurat = {}

    for kolom, (batas_bawah, batas_atas) in rentang_akurat.items():
        if kolom in df.columns:
            nilai_tidak_akurat = df[(df[kolom] < batas_bawah) | (df[kolom] > batas_atas)]
            if not nilai_tidak_akurat.empty:
                hasil_tidak_akurat[kolom] = nilai_tidak_akurat[[kolom]]
                print(f"\nNilai tidak akurat ditemukan di kolom {kolom}:")
                print(nilai_tidak_akurat[[kolom]])

    if not hasil_tidak_akurat:
        print("Tidak ditemukan nilai yang tidak akurat dalam rentang yang ditentukan.")

    return hasil_tidak_akurat

# Menjalankan fungsi
hasil_tidak_akurat = cek_nilai_tidak_akurat(aotizhongxin_df)

In [None]:

def cek_nilai_tidak_konsisten(df):
    print("Pemeriksaan nilai tidak konsisten:")

    hasil_tidak_konsisten = {}

    # 1. Memeriksa inkonsistensi antara PM2.5 dan PM10
    if 'PM2.5' in df.columns and 'PM10' in df.columns:
        pm_tidak_konsisten = df[df['PM2.5'] > df['PM10']]
        if not pm_tidak_konsisten.empty:
            hasil_tidak_konsisten['PM2.5 > PM10'] = pm_tidak_konsisten[['PM2.5', 'PM10']]
            print("\nDitemukan nilai PM2.5 yang lebih besar dari PM10:")
            print(pm_tidak_konsisten[['PM2.5', 'PM10']])

    # 2. Memeriksa inkonsistensi dalam arah angin
    if 'wd' in df.columns:
        arah_angin_valid = ['N', 'NNE', 'NE', 'ENE', 'E', 'ESE', 'SE', 'SSE',
                            'S', 'SSW', 'SW', 'WSW', 'W', 'WNW', 'NW', 'NNW']
        arah_angin_tidak_valid = df[~df['wd'].isin(arah_angin_valid)]
        if not arah_angin_tidak_valid.empty:
            hasil_tidak_konsisten['Arah Angin Tidak Valid'] = arah_angin_tidak_valid[['wd']]
            print("\nDitemukan arah angin yang tidak valid:")
            print(arah_angin_tidak_valid[['wd']])

    # 3. Memeriksa inkonsistensi dalam suhu dan titik embun
    if 'TEMP' in df.columns and 'DEWP' in df.columns:
        suhu_tidak_konsisten = df[df['DEWP'] > df['TEMP']]
        if not suhu_tidak_konsisten.empty:
            hasil_tidak_konsisten['DEWP > TEMP'] = suhu_tidak_konsisten[['TEMP', 'DEWP']]
            print("\nDitemukan titik embun yang lebih tinggi dari suhu:")
            print(suhu_tidak_konsisten[['TEMP', 'DEWP']])

    # 4. Memeriksa inkonsistensi dalam kecepatan angin
    if 'WSPM' in df.columns:
        kecepatan_angin_negatif = df[df['WSPM'] < 0]
        if not kecepatan_angin_negatif.empty:
            hasil_tidak_konsisten['Kecepatan Angin Negatif'] = kecepatan_angin_negatif[['WSPM']]
            print("\nDitemukan kecepatan angin negatif:")
            print(kecepatan_angin_negatif[['WSPM']])

    if not hasil_tidak_konsisten:
        print("Tidak ditemukan nilai yang tidak konsisten berdasarkan kriteria yang diperiksa.")

    return hasil_tidak_konsisten

# Menjalankan fungsi
hasil_tidak_konsisten = cek_nilai_tidak_konsisten(aotizhongxin_df)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def cek_outlier(df):
    print("Pemeriksaan outlier menggunakan metode IQR:")

    # Pilih kolom numerik
    kolom_numerik = df.select_dtypes(include=[np.number]).columns

    hasil_outlier = {}

    for kolom in kolom_numerik:
        Q1 = df[kolom].quantile(0.25)
        Q3 = df[kolom].quantile(0.75)
        IQR = Q3 - Q1

        batas_bawah = Q1 - 1.5 * IQR
        batas_atas = Q3 + 1.5 * IQR

        outlier = df[(df[kolom] < batas_bawah) | (df[kolom] > batas_atas)]

        if not outlier.empty:
            hasil_outlier[kolom] = outlier[[kolom]]
            print(f"\nOutlier ditemukan di kolom {kolom}:")
            print(f"Jumlah outlier: {len(outlier)}")
            print(f"Persentase outlier: {(len(outlier) / len(df)) * 100:.2f}%")
            print(f"Range nilai outlier: {outlier[kolom].min()} hingga {outlier[kolom].max()}")


    if not hasil_outlier:
        print("Tidak ditemukan outlier berdasarkan metode IQR.")

    return hasil_outlier

# Menjalankan fungsi
hasil_outlier = cek_outlier(aotizhongxin_df)

In [None]:


# Buat tabel markdown
table = """
| Dataset     | Tipe data                                    | Missing value                                     | Duplicate data                     | Inaccurate value                       |
|-------------|----------------------------------------------|---------------------------------------------------|-------------------------------------|----------------------------------------|
| aotizhongxin | <br>Terdapat kesalahan tipe :<br>1.data(hour,day,month,year)menjadi date_times<br><br>2.colom wd menjadi categori <br>3. station menjadi categori<br>| Terdapat   missing values:    <br> 1. 925 missing values pada PM2.5. <br>    | Terdapat 11 data yang duplikat.      | Terdapat inaccurate value pada kolom age. |
| orders_df   | Terdapat kesalahan tipe data untuk kolom order_date & delivery_date. | - | -                                   | -                                      |
| product_df  | -                                            | -                                                 | Terdapat 6 data yang duplikat.      | -                                      |
| sales_df    | -                                            | Terdapat 19 missing value pada kolom total_price. | -                                   | -                                      |
"""




In [None]:
display(Markdown(table))


## 6 Cleaning Data

In [None]:
dataframes = {
    'Aotizhongxin': aotizhongxin_df,
    'Changping': changping_df,
    'Dingling': dingling_df,
    'Dongsi': dongsi_df,
    'Guanyuan': guanyuan_df,
    'Gucheng': gucheng_df,
    'Huairou': huairou_df,
    'Nongzhanguan': Nongzhanguan_df,
    'Shunyi': shunyi_df,
    'Tiantan': tiantian_df,
    'Wanliu': wanliu_df,
    'Wanshouxigong': Wanshouxigong_df
}

In [None]:
for name, df in dataframes.items():
    dataframes[name] = preprocess_dataframe(df)

## 7 Exploratory Data Analysis (EDA)

### 7.1 Explore

#### 7.1.1 Aoti

##### 7.1.1.1 Melihat korelasi dari tiap tiap data frame yang ada

In [None]:
aotizhongxin_df.corr(numeric_only=True)

##### 7.1.1.2 Analisis Polutas PM 2,5 dan PM10



1.   Polutan Per Hari
2.   Polutan Per bulan
3.   Polutan Per Minggu
4.   Polutan Per Tahun





In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 03

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 3 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_03 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_03['avg_PM25'] = partikulasi_polusi_harian_2013_03['avg_PM25'].round()
partikulasi_polusi_harian_2013_03['avg_PM10'] = partikulasi_polusi_harian_2013_03['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_03.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 04

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 4 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_04 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_04['avg_PM25'] = partikulasi_polusi_harian_2013_04['avg_PM25'].round()
partikulasi_polusi_harian_2013_04['avg_PM10'] = partikulasi_polusi_harian_2013_04['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_04.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 05

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 5 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_05 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_05['avg_PM25'] = partikulasi_polusi_harian_2013_05['avg_PM25'].round()
partikulasi_polusi_harian_2013_05['avg_PM10'] = partikulasi_polusi_harian_2013_05['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_05.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 06

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 6 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_06 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_06['avg_PM25'] = partikulasi_polusi_harian_2013_06['avg_PM25'].round()
partikulasi_polusi_harian_2013_06['avg_PM10'] = partikulasi_polusi_harian_2013_06['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_06.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 07

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 7 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_07 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_07['avg_PM25'] = partikulasi_polusi_harian_2013_07['avg_PM25'].round()
partikulasi_polusi_harian_2013_07['avg_PM10'] = partikulasi_polusi_harian_2013_07['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_07.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 08

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 8 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_08 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_08['avg_PM25'] = partikulasi_polusi_harian_2013_08['avg_PM25'].round()
partikulasi_polusi_harian_2013_08['avg_PM10'] = partikulasi_polusi_harian_2013_08['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_08.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 09

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 9 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_09 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_09['avg_PM25'] = partikulasi_polusi_harian_2013_09['avg_PM25'].round()
partikulasi_polusi_harian_2013_09['avg_PM10'] = partikulasi_polusi_harian_2013_09['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_09.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 10

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 5 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_10 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_10['avg_PM25'] = partikulasi_polusi_harian_2013_10['avg_PM25'].round()
partikulasi_polusi_harian_2013_10['avg_PM10'] = partikulasi_polusi_harian_2013_10['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_10.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 11

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 11 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_11 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_11['avg_PM25'] = partikulasi_polusi_harian_2013_11['avg_PM25'].round()
partikulasi_polusi_harian_2013_11['avg_PM10'] = partikulasi_polusi_harian_2013_11['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_11.head()

In [None]:
# menghitung polutan per hari dari tahun 2013 bulan 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 12 and day >= 1 and day <= 30')

partikulasi_polusi_harian_2013_12 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

partikulasi_polusi_harian_2013_12['avg_PM25'] = partikulasi_polusi_harian_2013_12['avg_PM25'].round()
partikulasi_polusi_harian_2013_12['avg_PM10'] = partikulasi_polusi_harian_2013_12['avg_PM10'].round()

# Menampilkan hasil
partikulasi_polusi_harian_2013_12.head()

In [None]:
# menghitung polusi bulanan 2013 bulan 03 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month >= 3 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2013 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2013['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2013.head()


In [None]:
# menghitung polusi bulanan 2014 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2014 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2014 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2014['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2014.head()


In [None]:
# menghitung polusi bulanan 2015 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2015 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2015 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2015['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2015.head()


In [None]:
# menghitung polusi bulanan 2016 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2016 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2016 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2016['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2016.head()


In [None]:
# menghitung polusi bulanan 2017 bulan 1 sampai 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2017 and month >= 1 and month <= 12')

# Menghitung rata-rata PM2.5 dan PM10 per bulan
aotizhongxin_partikulasi_polusi_bulanan_2017 = (
    filtered_df.groupby(['year', 'month'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'))
    .reset_index()
)

# Membulatkan nilai rata-rata
aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM25'] = aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM25'].round()
aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM10'] = aotizhongxin_partikulasi_polusi_bulanan_2017['avg_PM10'].round()

# Menampilkan hasil
aotizhongxin_partikulasi_polusi_bulanan_2017.head()


In [None]:
# Fungsi untuk menjalankan query SQL
pysqldf = lambda q: sqldf(q, globals())

In [None]:

query = '''
SELECT
    year,
    ROUND(AVG("PM2.5")) as PM_2_5,
    ROUND(AVG("PM10")) as PM_10
FROM
    aotizhongxin_df
WHERE
    year BETWEEN 2013 AND 2017
GROUP BY
    year
ORDER BY
    year;
'''

# Using sqldf directly, as it is now imported
aotizhongxin_partikulasi_polusi__tahunan = sqldf(query, globals())
aotizhongxin_partikulasi_polusi__tahunan.head()

In [None]:
aotizhongxin_df[['PM2.5', 'PM10','NO2' ,'CO','year','month','day','hour']].corr(method='spearman')




1.   Item daftar
2.   Item daftar
3.   Item daftar
4.   Item daftar



##### 7.1.1.3 pola perubahan kualitas udara

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 03

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 3 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_03 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_03['avg_PM25'] = pola_perubahan_kualitas_harian_2013_03['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_03['avg_PM10'] = pola_perubahan_kualitas_harian_2013_03['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_03['avg_NO2'] = pola_perubahan_kualitas_harian_2013_03['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_03['avg_CO'] = pola_perubahan_kualitas_harian_2013_03['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_03.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 04

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 4 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_04 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_04['avg_PM25'] = pola_perubahan_kualitas_harian_2013_04['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_04['avg_PM10'] = pola_perubahan_kualitas_harian_2013_04['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_04['avg_NO2'] = pola_perubahan_kualitas_harian_2013_04['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_04['avg_CO'] = pola_perubahan_kualitas_harian_2013_04['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_04.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 05

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 5 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_05 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_05['avg_PM25'] = pola_perubahan_kualitas_harian_2013_05['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_05['avg_PM10'] = pola_perubahan_kualitas_harian_2013_05['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_05['avg_NO2'] = pola_perubahan_kualitas_harian_2013_05['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_05['avg_CO'] = pola_perubahan_kualitas_harian_2013_05['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_05.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 06

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 6 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_06 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_06['avg_PM25'] = pola_perubahan_kualitas_harian_2013_06['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_06['avg_PM10'] = pola_perubahan_kualitas_harian_2013_06['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_06['avg_NO2'] = pola_perubahan_kualitas_harian_2013_06['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_06['avg_CO'] = pola_perubahan_kualitas_harian_2013_06['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_06.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 07

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 7 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_07 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_07['avg_PM25'] = pola_perubahan_kualitas_harian_2013_07['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_07['avg_PM10'] = pola_perubahan_kualitas_harian_2013_07['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_07['avg_NO2'] = pola_perubahan_kualitas_harian_2013_07['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_07['avg_CO'] = pola_perubahan_kualitas_harian_2013_07['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_07.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 07

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 7 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_07 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_07['avg_PM25'] = pola_perubahan_kualitas_harian_2013_07['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_07['avg_PM10'] = pola_perubahan_kualitas_harian_2013_07['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_07['avg_NO2'] = pola_perubahan_kualitas_harian_2013_07['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_07['avg_CO'] = pola_perubahan_kualitas_harian_2013_07['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_07.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 08

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 8 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_08 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_08['avg_PM25'] = pola_perubahan_kualitas_harian_2013_08['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_08['avg_PM10'] = pola_perubahan_kualitas_harian_2013_08['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_08['avg_NO2'] = pola_perubahan_kualitas_harian_2013_08['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_08['avg_CO'] = pola_perubahan_kualitas_harian_2013_08['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_08.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 09

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 9 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_09 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_09['avg_PM25'] = pola_perubahan_kualitas_harian_2013_09['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_09['avg_PM10'] = pola_perubahan_kualitas_harian_2013_09['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_09['avg_NO2'] = pola_perubahan_kualitas_harian_2013_09['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_09['avg_CO'] = pola_perubahan_kualitas_harian_2013_09['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_09.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 10

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month ==10 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_10 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_10['avg_PM25'] = pola_perubahan_kualitas_harian_2013_10['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_10['avg_PM10'] = pola_perubahan_kualitas_harian_2013_10['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_10['avg_NO2'] = pola_perubahan_kualitas_harian_2013_10['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_10['avg_CO'] = pola_perubahan_kualitas_harian_2013_10['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_10.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 11

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 11 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_11 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_11['avg_PM25'] = pola_perubahan_kualitas_harian_2013_11['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_11['avg_PM10'] = pola_perubahan_kualitas_harian_2013_11['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_11['avg_NO2'] = pola_perubahan_kualitas_harian_2013_11['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_11['avg_CO'] = pola_perubahan_kualitas_harian_2013_11['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_11.head()

In [None]:
# pola perubahan kualitas udara mingguan 2013 bulan 12

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 12 and day >= 1 and day <= 30')

pola_perubahan_kualitas_harian_2013_12 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

pola_perubahan_kualitas_harian_2013_12['avg_PM25'] = pola_perubahan_kualitas_harian_2013_12['avg_PM25'].round()
pola_perubahan_kualitas_harian_2013_12['avg_PM10'] = pola_perubahan_kualitas_harian_2013_12['avg_PM10'].round()
pola_perubahan_kualitas_harian_2013_12['avg_NO2'] = pola_perubahan_kualitas_harian_2013_12['avg_NO2'].round()
pola_perubahan_kualitas_harian_2013_12['avg_CO'] = pola_perubahan_kualitas_harian_2013_12['avg_CO'].round()

# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_12.head()

##### 7.1.1.4 KORELASI SUHU DAN TEMP DAN PERBANDINGAN DENGAN PRES


In [None]:
# KORELASI SUHU DAN TEMP DAN PERBANDINGAN DENGAN PRES MINGGUAN AOTI 2013 03 

In [None]:
filtered_df = aotizhongxin_df.query('year == 2013 and month == 3 and day >= 1 and day <= 30')

KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03  = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_TEMP=('TEMP', 'mean'), avg_O3=('O3', 'mean'), avg_PRES=('PRES', 'mean'))
    .reset_index()
)

KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03['avg_TEMP'] = KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03['avg_TEMP'].round()
KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03['avg_O3'] = KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03['avg_O3'].round()
KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03['avg_PRES'] = KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03['avg_PRES'].round()

# Menampilkan hasil
KORELASI_SUHU_DAN_TEMP_DAN_PERBANDINGAN_DENGAN_PRES_MINGGUAN_AOTI_2013_03.head()

In [None]:
# Proses dataset Changping untuk April 2013
KORELASI_AOTIZHONGXIN_2013_04 = korelasi_suhu(aotizhongxin_df, year=2013, month=4, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_04.head()

In [None]:
# Proses dataset Changping untuk April 2013
KORELASI_AOTIZHONGXIN_2013_05 = korelasi_suhu(aotizhongxin_df, year=2013, month=5, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_05.head()

In [None]:
# Proses dataset Changping untuk April 2013
KORELASI_AOTIZHONGXIN_2013_06 = korelasi_suhu(aotizhongxin_df, year=2013, month=6, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_06.head()

In [None]:
# Proses dataset Changping untuk Juli 2013
KORELASI_AOTIZHONGXIN_2013_07= korelasi_suhu(aotizhongxin_df, year=2013, month=7, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_07.head()

In [None]:
# Proses dataset Changping untuk April 2013
KORELASI_AOTIZHONGXIN_2013_08 = korelasi_suhu(aotizhongxin_df, year=2013, month=8, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_08.head()

In [None]:
KORELASI_AOTIZHONGXIN_2013_09 = korelasi_suhu(aotizhongxin_df, year=2013, month=9, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_09.head()

In [None]:
KORELASI_AOTIZHONGXIN_2013_10 = korelasi_suhu(aotizhongxin_df, year=2013, month=10, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_10.head()

In [None]:
KORELASI_AOTIZHONGXIN_2013_11 = korelasi_suhu(aotizhongxin_df, year=2013, month=11, day_start=1, day_end=30)

KORELASI_AOTIZHONGXIN_2013_11.head()

In [None]:
KORELASI_AOTIZHONGXIN_2013_12 = korelasi_suhu(aotizhongxin_df, year=2013, month=12, day_start=1, day_end=30)
KORELASI_AOTIZHONGXIN_2013_12.head()

In [None]:
# Daftar bulan untuk diproses
months = range(4, 13)  # Dari April (4) hingga Desember (12)
korelasi_aotizhongxin = {}

for month in months:
    korelasi_aotizhongxin[month] = korelasi_suhu(aotizhongxin_df, year=2013, month=month, day_start=1, day_end=30)
    print(f"Korelasi Aotizhongxin untuk bulan {month}:")
    print(korelasi_aotizhongxin[month].head())

# Memproses untuk bulan Desember dengan filter
filtered_df = aotizhongxin_df.query('year == 2013 and month == 12 and day >= 1 and day <= 30')
pola_perubahan_kualitas_harian_2013_12 = (
    filtered_df.groupby(['year', 'month', 'day'])
    .agg(avg_PM25=('PM2.5', 'mean'), avg_PM10=('PM10', 'mean'), avg_NO2=('NO2', 'mean'), avg_CO=('CO', 'mean'))
    .reset_index()
)

# Pembulatan nilai rata-rata
pola_perubahan_kualitas_harian_2013_12 = pola_perubahan_kualitas_harian_2013_12.round()
# Menampilkan hasil
pola_perubahan_kualitas_harian_2013_12.head()


### 7.2.1 Changpig

#### 7.2.1.1 Korelasi suhu

##### 7.2.1.1 suhu bulanan 2013-2017

In [None]:
months = range(3, 13)  
korelasi_changping_suhu_bulanan_2013 = {}

for month in months:
    # Call the korelasi_suhu function for each month and store the result
    korelasi_changping_suhu_bulanan_2013[month] = korelasi_suhu(changping_df, year=2013, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"Korelasi Changping untuk bulan {month}:")
    print(korelasi_changping_suhu_bulanan_2013[month].head())  # Print the first few rows of the result


In [None]:
months = range(1, 13)  # Start from 1 (January) to 12 (December)
korelasi_changping_suhu_bulanan_2014 = {}

for month in months:
    # Call the korelasi_suhu function for each month and store the result
    korelasi_changping_suhu_bulanan_2014[month] = korelasi_suhu(changping_df, year=2014, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"Korelasi Changping untuk bulan {month}:")
    print(korelasi_changping_suhu_bulanan_2014[month].head())  # Print the first few rows of the result


In [None]:
months = range(1, 13)  # Start from 1 (January) to 12 (December)
korelasi_changping_suhu_bulanan_2015 = {}

for month in months:
    # Call the korelasi_suhu function for each month and store the result
    korelasi_changping_suhu_bulanan_2015[month] = korelasi_suhu(changping_df, year=2015, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"Korelasi Changping untuk bulan {month}:")
    print(korelasi_changping_suhu_bulanan_2015[month].head())  # Print the first few rows of the result


In [None]:
months = range(1, 13)  # Start from 1 (January) to 12 (December)
korelasi_changping_suhu_bulanan_2016 = {}

for month in months:
    # Call the korelasi_suhu function for each month and store the result
    korelasi_changping_suhu_bulanan_2016[month] = korelasi_suhu(changping_df, year=2016, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"Korelasi Changping untuk bulan {month}:")
    print(korelasi_changping_suhu_bulanan_2016[month].head())  # Print the first few rows of the result


In [None]:
months = range(1, 13)  # Start from 1 (January) to 12 (December)
korelasi_changping_suhu_bulanan_2016 = {}

for month in months:
    # Call the korelasi_suhu function for each month and store the result
    korelasi_changping_suhu_bulanan_2016[month] = korelasi_suhu(changping_df, year=2016, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"Korelasi Changping untuk bulan {month}:")
    print(korelasi_changping_suhu_bulanan_2016[month].head())  # Print the first few rows of the result


In [None]:
months = range(1, 2)  # Start from 1 (January) to 12 (December)
korelasi_changping_suhu_bulanan_2017 = {}

for month in months:
    # Call the korelasi_suhu function for each month and store the result
    korelasi_changping_suhu_bulanan_2017[month] = korelasi_suhu(changping_df, year=2017, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"Korelasi Changping untuk bulan {month}:")
    print(korelasi_changping_suhu_bulanan_2017[month].head())  # Print the first few rows of the result


##### 7.2.1.2 partikulasi polusi changping 2013-2017

In [None]:
months = range(1, 13)  # Start from 1 (January) to 12 (December)
partikulasi_polusi_changping_suhu_bulanan_2014 = {}

for month in months:

    partikulasi_polusi_changping_suhu_bulanan_2014[month] = partikulasi_polusi(changping_df, year=2014, month=month, day_start=1, day_end=30)
    
    # Print the first few rows of the result for each month
    print(f"partikulasi_polusi_Changping untuk bulan {month}:")
    print(partikulasi_polusi_changping_suhu_bulanan_2014[month].head())

### 7.3.1 Dingling

In [None]:
months = range(3, 13)
partikulasi_polusi_dingling_suhu_bulanan_2013 = {}

for month in months:
    partikulasi_polusi_dingling_suhu_bulanan_2013[month] = partikulasi_polusi(dingling_df, year=2013, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dingling_suhu_bulanan_2013[month]['avg_PM25'] = round(partikulasi_polusi_dingling_suhu_bulanan_2013[month]['avg_PM25'])
    partikulasi_polusi_dingling_suhu_bulanan_2013[month]['avg_PM10'] = round(partikulasi_polusi_dingling_suhu_bulanan_2013[month]['avg_PM10'])

    display(partikulasi_polusi_dingling_suhu_bulanan_2013[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dingling_suhu_bulanan_2014 = {}

for month in months:
    partikulasi_polusi_dingling_suhu_bulanan_2014[month] = partikulasi_polusi(dingling_df, year=2014, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dingling_suhu_bulanan_2014[month]['avg_PM25'] = round(partikulasi_polusi_dingling_suhu_bulanan_2014[month]['avg_PM25'])
    partikulasi_polusi_dingling_suhu_bulanan_2014[month]['avg_PM10'] = round(partikulasi_polusi_dingling_suhu_bulanan_2014[month]['avg_PM10'])

    display(partikulasi_polusi_dingling_suhu_bulanan_2014[month].head())


In [None]:
months = range(1, 13)
partikulasi_polusi_dingling_suhu_bulanan_2015 = {}

for month in months:
    partikulasi_polusi_dingling_suhu_bulanan_2015[month] = partikulasi_polusi(dingling_df, year=2015, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM25'] = round(partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM25'])
    partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM10'] = round(partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM10'])

    display(partikulasi_polusi_dingling_suhu_bulanan_2015[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dingling_suhu_bulanan_2015 = {}

for month in months:
    partikulasi_polusi_dingling_suhu_bulanan_2015[month] = partikulasi_polusi(dingling_df, year=2015, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM25'] = round(partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM25'])
    partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM10'] = round(partikulasi_polusi_dingling_suhu_bulanan_2015[month]['avg_PM10'])

    display(partikulasi_polusi_dingling_suhu_bulanan_2015[month].head())

In [None]:
months = range(1, 3)
partikulasi_polusi_dingling_suhu_bulanan_2017 = {}

for month in months:
    partikulasi_polusi_dingling_suhu_bulanan_2017[month] = partikulasi_polusi(dingling_df, year=2017, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dingling_suhu_bulanan_2017[month]['avg_PM25'] = round(partikulasi_polusi_dingling_suhu_bulanan_2017[month]['avg_PM25'])
    partikulasi_polusi_dingling_suhu_bulanan_2017[month]['avg_PM10'] = round(partikulasi_polusi_dingling_suhu_bulanan_2017[month]['avg_PM10'])

    display(partikulasi_polusi_dingling_suhu_bulanan_2017[month].head())

### 7.4.1 Dongsi 

#### 7.4.1.1 Partikulasi Pm 2.5 dan PM 10 harian  Bulanan dan Tahunan

In [None]:
months = range(3, 13)
partikulasi_polusi_dongsi_suhu_bulanan_2013 = {}

for month in months:
    partikulasi_polusi_dongsi_suhu_bulanan_2013[month] = partikulasi_polusi(dongsi_df, year=2013, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_suhu_bulanan_2013[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2013[month]['avg_PM25'])
    partikulasi_polusi_dongsi_suhu_bulanan_2013[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2013[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_bulanan_2013[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_suhu_bulanan_2014 = {}

for month in months:
    partikulasi_polusi_dongsi_suhu_bulanan_2014[month] = partikulasi_polusi(dongsi_df, year=2014, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_suhu_bulanan_2014[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2014[month]['avg_PM25'])
    partikulasi_polusi_dongsi_suhu_bulanan_2014[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2014[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_bulanan_2014[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_suhu_bulanan_2015 = {}

for month in months:
    partikulasi_polusi_dongsi_suhu_bulanan_2015[month] = partikulasi_polusi(dongsi_df, year=2015, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_suhu_bulanan_2015[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2015[month]['avg_PM25'])
    partikulasi_polusi_dongsi_suhu_bulanan_2015[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2015[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_bulanan_2015[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_suhu_bulanan_2016 = {}

for month in months:
    partikulasi_polusi_dongsi_suhu_bulanan_2016[month] = partikulasi_polusi(dongsi_df, year=2016, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM25'])
    partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_bulanan_2016[month].head())

In [None]:
months = range(1, 3)
partikulasi_polusi_dongsi_suhu_bulanan_2017 = {}

for month in months:
    partikulasi_polusi_dongsi_suhu_bulanan_2017[month] = partikulasi_polusi(dongsi_df, year=2017, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_suhu_bulanan_2017[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2017[month]['avg_PM25'])
    partikulasi_polusi_dongsi_suhu_bulanan_2017[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2017[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_bulanan_2017[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_suhu_bulanan_2016 = {}

for month in months:
    partikulasi_polusi_dongsi_suhu_bulanan_2016[month] = partikulasi_polusi_bulanan(dongsi_df, year=2016, month=month)
    
    partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM25'])
    partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_suhu_bulanan_2016[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_bulanan_2016[month].head())

In [None]:
partikulasi_polusi_dongsi_suhu_tahunan = {}

for year in range(2013, 2018):
    partikulasi_polusi_dongsi_suhu_tahunan[year] = partikulasi_polusi_tahunan(dongsi_df, year)

annual_summary = pd.concat(partikulasi_polusi_dongsi_suhu_tahunan.values())

display(annual_summary)

### 7.5.1 Guanyuan

#### 7.5.1.1 Partikulasi PM 25 dan PM 10 Harian Bulanan Dan Tahunan

##### 7.5.1.1.1 Per hari

In [None]:
months = range(3, 13)
partikulasi_polusi_dongsi_polusi_harian_2013 = {}

for month in months:
    partikulasi_polusi_dongsi_polusi_harian_2013[month] = partikulasi_polusi(guanyuan_df, year=2013, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_polusi_harian_2013[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_polusi_harian_2013[month]['avg_PM25'])
    partikulasi_polusi_dongsi_polusi_harian_2013[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_polusi_harian_2013[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_polusi_harian_2013[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_polusi_harian_2014 = {}

for month in months:
    partikulasi_polusi_dongsi_polusi_harian_2014[month] = partikulasi_polusi(guanyuan_df, year=2014, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_polusi_harian_2014[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_polusi_harian_2014[month]['avg_PM25'])
    partikulasi_polusi_dongsi_polusi_harian_2014[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_polusi_harian_2014[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_polusi_harian_2014[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_polusi_harian_2015 = {}

for month in months:
    partikulasi_polusi_dongsi_polusi_harian_2015[month] = partikulasi_polusi(guanyuan_df, year=2015, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_polusi_harian_2015[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_polusi_harian_2015[month]['avg_PM25'])
    partikulasi_polusi_dongsi_polusi_harian_2015[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_polusi_harian_2015[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_polusi_harian_2015[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_dongsi_polusi_harian_2016 = {}

for month in months:
    partikulasi_polusi_dongsi_polusi_harian_2016[month] = partikulasi_polusi(guanyuan_df, year=2016, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_polusi_harian_2016[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_polusi_harian_2016[month]['avg_PM25'])
    partikulasi_polusi_dongsi_polusi_harian_2016[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_polusi_harian_2016[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_polusi_harian_2016[month].head())

In [None]:
months = range(1, 2)
partikulasi_polusi_dongsi_polusi_harian_2017 = {}

for month in months:
    partikulasi_polusi_dongsi_polusi_harian_2017[month] = partikulasi_polusi(guanyuan_df, year=2017, month=month, day_start=1, day_end=30)
    
    partikulasi_polusi_dongsi_polusi_harian_2017[month]['avg_PM25'] = round(partikulasi_polusi_dongsi_polusi_harian_2017[month]['avg_PM25'])
    partikulasi_polusi_dongsi_polusi_harian_2017[month]['avg_PM10'] = round(partikulasi_polusi_dongsi_polusi_harian_2017[month]['avg_PM10'])

    display(partikulasi_polusi_dongsi_suhu_harian_2017[month].head())

##### 7.5.1.1.2 Per Minggu

In [None]:
months = range(3, 13)
partikulasi_polusi_guanyuan_mingguan_2013 = {}

for month in months:
    partikulasi_polusi_guanyuan_mingguan_2013[month] = partikulasi_polusi_mingguan(guanyuan_df, year=2013, month=month)

    display(partikulasi_polusi_guanyuan_mingguan_2013[month].head())


In [None]:
months = range(1, 13)
partikulasi_polusi_guanyuan_mingguan_2014 = {}

for month in months:
    partikulasi_polusi_guanyuan_mingguan_2014[month] = partikulasi_polusi_mingguan(guanyuan_df, year=2014, month=month)

    display(partikulasi_polusi_guanyuan_mingguan_2014[month].head())


In [None]:
months = range(1, 13)
partikulasi_polusi_guanyuan_mingguan_2015 = {}

for month in months:
    partikulasi_polusi_guanyuan_mingguan_2015[month] = partikulasi_polusi_mingguan(guanyuan_df, year=2015, month=month)

    display(partikulasi_polusi_guanyuan_mingguan_2015[month].head())


In [None]:
months = range(1, 13)
partikulasi_polusi_guanyuan_mingguan_2016 = {}

for month in months:
    partikulasi_polusi_guanyuan_mingguan_2016[month] = partikulasi_polusi_mingguan(guanyuan_df, year=2016, month=month)

    display(partikulasi_polusi_guanyuan_mingguan_2016[month].head())


In [None]:
months = range(1, 3)
partikulasi_polusi_guanyuan_mingguan_2017 = {}

for month in months:
    partikulasi_polusi_guanyuan_mingguan_2017[month] = partikulasi_polusi_mingguan(guanyuan_df, year=2017, month=month)

    display(partikulasi_polusi_guanyuan_mingguan_2017[month].head())


##### 7.5.1.1.3 Per Bulan

In [None]:
months = range(3, 13)
partikulasi_polusi_guanyuan_bulanan_2013 = {}

for month in months:
    partikulasi_polusi_guanyuan_bulanan_2013[month] = partikulasi_polusi_bulanan(guanyuan_df, year=2013, month=month)
    display(partikulasi_polusi_guanyuan_bulanan_2013[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_guanyuan_bulanan_2014 = {}

for month in months:
    partikulasi_polusi_guanyuan_bulanan_2014[month] = partikulasi_polusi_bulanan(guanyuan_df, year=2014, month=month)
    display(partikulasi_polusi_guanyuan_bulanan_2014[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_guanyuan_bulanan_2015 = {}

for month in months:
    partikulasi_polusi_guanyuan_bulanan_2015[month] = partikulasi_polusi_bulanan(guanyuan_df, year=2015, month=month)
    display(partikulasi_polusi_guanyuan_bulanan_2015[month].head())

In [None]:
months = range(1, 13)
partikulasi_polusi_guanyuan_bulanan_2016 = {}

for month in months:
    partikulasi_polusi_guanyuan_bulanan_2016[month] = partikulasi_polusi_bulanan(guanyuan_df, year=2016, month=month)
    display(partikulasi_polusi_guanyuan_bulanan_2016[month].head())

In [None]:
months = range(1, 3)
partikulasi_polusi_guanyuan_bulanan_2017 = {}

for month in months:
    partikulasi_polusi_guanyuan_bulanan_2017[month] = partikulasi_polusi_bulanan(guanyuan_df, year=2017, month=month)
    display(partikulasi_polusi_guanyuan_bulanan_2017[month].head())

### 7.6.1 Gucheng

In [None]:
years = range(2013, 2018)  # Rentang tahun 2013-2017
months = range(3, 13)  # Rentang bulan Maret hingga Desember

partikulasi_polusi_per_tahun_bulan = {}

for year in years:
    for month in months:
        # Hitung partikulasi polusi untuk setiap bulan dan tahun
        polusi_data = partikulasi_polusi(gucheng_df, year=year, month=month, day_start=1, day_end=30)
        
        # Bulatkan nilai rata-rata PM2.5 dan PM10
        polusi_data['avg_PM25'] = round(polusi_data['avg_PM25'])
        polusi_data['avg_PM10'] = round(polusi_data['avg_PM10'])
        
        # Simpan hasil ke dalam dictionary dengan kunci (tahun, bulan)
        partikulasi_polusi_per_tahun_bulan[(year, month)] = polusi_data

        # Menampilkan 5 baris pertama dari data untuk setiap bulan
        display(polusi_data.head())


### 7.7.1 Huairou

### 7.8.1 Nongzhanguan

### 7.9.1 Shunyi

### 7.5.1 Tiantian

### 7.11.1 Wanliu

### 7.12.1 Wanshouxigong

## 8 Visualization & Explanatory Analysis

### 8.1 Pertanyaan 1: Apa tren utama dalam tingkat kualitas udara (PM2.5, PM10) di 12 provinsi di Tiongkok selama periode waktu yang diamati (2013-2017)?

#### Aoti_df

In [None]:
def partikulasi_polusi(df, year, month, day_start, day_end):
    # Filter data sesuai tahun, bulan, dan hari
    mask = (df['year'] == year) & (df['month'] == month) & (df['day'].between(day_start, day_end))
    monthly_data = df[mask]
    
    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = monthly_data['PM2.5'].mean()
    avg_PM10 = monthly_data['PM10'].mean()
    
    return pd.DataFrame({
        'month': [month],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_aoti_df_harian_2013 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_aoti_df_harian_2013[month] = partikulasi_polusi(
        aotizhongxin_df, year=2013, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_aoti_df_harian_2013.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2013 (Aoti)',
    xaxis=dict(
        title='Bulan',  # Menetapkan judul untuk sumbu x
        tickvals=list(range(1, 13)),  # Nilai untuk sumbu x (bulan 1 sampai 12)
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Label bulan
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


In [None]:
def partikulasi_polusi(df, year, month, day_start, day_end):
    # Filter data sesuai tahun, bulan, dan hari
    mask = (df['year'] == year) & (df['month'] == month) & (df['day'].between(day_start, day_end))
    monthly_data = df[mask]
    
    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = monthly_data['PM2.5'].mean()
    avg_PM10 = monthly_data['PM10'].mean()
    
    return pd.DataFrame({
        'month': [month],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_aoti_df_harian_2014 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_aoti_df_harian_2014[month] = partikulasi_polusi(aotizhongxin_df, year=2014, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_aoti_df_harian_2014.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2014 (Aoti)',
    xaxis=dict(
        title='Bulan',  # Menetapkan judul untuk sumbu x
        tickvals=list(range(1, 13)),  # Nilai untuk sumbu x (bulan 1 sampai 12)
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Label bulan
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


In [None]:
def partikulasi_polusi(df, year, month, day_start, day_end):
    # Filter data sesuai tahun, bulan, dan hari
    mask = (df['year'] == year) & (df['month'] == month) & (df['day'].between(day_start, day_end))
    monthly_data = df[mask]
    
    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = monthly_data['PM2.5'].mean()
    avg_PM10 = monthly_data['PM10'].mean()
    
    return pd.DataFrame({
        'month': [month],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_aoti_df_harian_2015 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_aoti_df_harian_2015[month] = partikulasi_polusi(aotizhongxin_df, year=2015, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_aoti_df_harian_2015.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2015 (Aoti)',
    xaxis=dict(
        title='Bulan',  # Menetapkan judul untuk sumbu x
        tickvals=list(range(1, 13)),  # Nilai untuk sumbu x (bulan 1 sampai 12)
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Label bulan
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


In [None]:
def partikulasi_polusi(df, year, month, day_start, day_end):
    # Filter data sesuai tahun, bulan, dan hari
    mask = (df['year'] == year) & (df['month'] == month) & (df['day'].between(day_start, day_end))
    monthly_data = df[mask]
    
    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = monthly_data['PM2.5'].mean()
    avg_PM10 = monthly_data['PM10'].mean()
    
    return pd.DataFrame({
        'month': [month],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_aoti_df_harian_2016 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_aoti_df_harian_2016[month] = partikulasi_polusi(aotizhongxin_df, year=2016, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_aoti_df_harian_2016.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2016 (Aoti)',
    xaxis=dict(
        title='Bulan',  # Menetapkan judul untuk sumbu x
        tickvals=list(range(1, 13)),  # Nilai untuk sumbu x (bulan 1 sampai 12)
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Label bulan
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


In [None]:
def partikulasi_polusi(df, year, month, day_start, day_end):
    # Filter data sesuai tahun, bulan, dan hari
    mask = (df['year'] == year) & (df['month'] == month) & (df['day'].between(day_start, day_end))
    monthly_data = df[mask]
    
    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = monthly_data['PM2.5'].mean()
    avg_PM10 = monthly_data['PM10'].mean()
    
    return pd.DataFrame({
        'month': [month],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_aoti_df_harian_2017 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_aoti_df_harian_2017[month] = partikulasi_polusi(aotizhongxin_df, year=2017, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_aoti_df_harian_2017.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Ubah bulan menjadi string untuk sumbu x
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2017 (Aoti)',
    xaxis=dict(
        title='Bulan',  # Menetapkan judul untuk sumbu x
        tickvals=list(range(1, 13)),  # Nilai untuk sumbu x (bulan 1 sampai 12)
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Label bulan
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


In [None]:

# Gabungkan semua dataframe per tahun
all_years_data = pd.concat([aotizhongxin_partikulasi_polusi_bulanan_2013,
                            aotizhongxin_partikulasi_polusi_bulanan_2014,
                            aotizhongxin_partikulasi_polusi_bulanan_2015,
                            aotizhongxin_partikulasi_polusi_bulanan_2016,
                            aotizhongxin_partikulasi_polusi_bulanan_2017])

# Definisikan warna berbeda untuk tiap tahun
year_colors = {
    2013: '#636EFA',  # Blue
    2014: '#EF553B',  # Red
    2015: '#00CC96',  # Green
    2016: '#AB63FA',  # Purple
    2017: '#FFA15A'  # Orange
}

# Membuat bar chart menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk tiap tahun dengan warna spesifik
for year in range(2013, 2017):
    yearly_data = all_years_data[all_years_data['year'] == year]
    fig.add_trace(go.Bar(
        x=yearly_data['month'],
        y=yearly_data['avg_PM25'],
        name=f'{year}',
        marker_color=year_colors[year],
        hovertemplate='<b>Bulan %{x}</b><br>Tahun: %{customdata}<br>PM2.5: %{y:.2f} μg/m³<extra></extra>',
        customdata=yearly_data['year']
    ))

# Layout dan label
fig.update_layout(
    title='Polusi PM2.5 per Bulan (2013-2017)',
    xaxis=dict(title='Bulan', tickvals=list(range(1, 13)),
               ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']),
    yaxis=dict(title='Kadar PM2.5 (μg/m³)'),
    barmode='group',
    hovermode='x unified',
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
    )
)

# Tampilkan grafik
fig.show()


In [None]:
# Fungsi untuk menghitung rata-rata polusi tahunan
def partikulasi_polusi_tahunan(df, year):
    # Filter data sesuai tahun
    mask = (df['year'] == year)
    yearly_data = df[mask]

    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = yearly_data['PM2.5'].mean()
    avg_PM10 = yearly_data['PM10'].mean()

    return pd.DataFrame({
        'year': [year],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data tahunan
partikulasi_polusi_aoti_df_tahunan = {}

# Loop untuk mengisi data per tahun
years = range(2013, 2018)  # Dari 2013 sampai 2017
for year in years:
    partikulasi_polusi_aoti_df_tahunan[year] = partikulasi_polusi_tahunan(aotizhongxin_df, year)

# Gabungkan semua dataframe tahunan ke dalam satu dataframe
all_years_data = pd.concat(partikulasi_polusi_aoti_df_tahunan.values())

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_years_data['year'].astype(str),  # Ubah tahun menjadi string untuk sumbu x
    y=all_years_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Tahun %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_years_data['year'].astype(str),  # Ubah tahun menjadi string untuk sumbu x
    y=all_years_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Tahun %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Tahun dari 2013 hingga 2017 (Aoti)',
    xaxis=dict(
        title='Tahun',  # Menetapkan judul untuk sumbu x
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


#### changping df

In [None]:
partikulasi_polusi_changping_df_df_harian_2013 = {}

# Loop to fill data for each month
months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_df_df_harian_2013[month] = partikulasi_polusi(changping_df, year=2013, month=month, day_start=1, day_end=30)

# Combine all monthly DataFrames into a single DataFrame
all_months_data = pd.concat(partikulasi_polusi_changping_df_df_harian_2013.values())

# Ensure the month order is correct
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(3, 13)), ordered=True)

# Define different colors for PM2.5 and PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Create a plot using Plotly
fig = go.Figure()

# Add data for PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Convert month to string for x-axis
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Add data for PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Convert month to string for x-axis
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout and labels
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2013 (Changping)',
    xaxis=dict(
        title='Bulan',  # Set title for x-axis
        tickvals=list(range(3, 13)),  # Values for x-axis (months 1 to 12)
        ticktext=[ 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Month labels
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Set title for y-axis
    barmode='group',  # Group PM2.5 and PM10 bars
    plot_bgcolor='rgba(0,0,0,0)',  # Background color of the plot
    paper_bgcolor='rgba(0,0,0,0)',  # Background color of the paper
    font=dict(
        family="Courier New, monospace",  # Font type
        size=18,  # Font size
        color="#7f7f7f"  # Font color
    )
)

# Show the plot
fig.show()


In [None]:
partikulasi_polusi_changping_df_df_harian_2014 = {}

# Loop to fill data for each month
months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_df_df_harian_2014[month] = partikulasi_polusi(changping_df, year=2014, month=month, day_start=1, day_end=30)

# Combine all monthly DataFrames into a single DataFrame
all_months_data = pd.concat(partikulasi_polusi_changping_df_df_harian_2014.values())

# Ensure the month order is correct
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Define different colors for PM2.5 and PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Create a plot using Plotly
fig = go.Figure()

# Add data for PM2.5
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Convert month to string for x-axis
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Add data for PM10
fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str),  # Convert month to string for x-axis
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout and labels
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2014 (Changping)',
    xaxis=dict(
        title='Bulan',  # Set title for x-axis
        tickvals=list(range(1, 13)),  # Values for x-axis (months 1 to 12)
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  # Month labels
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Set title for y-axis
    barmode='group',  # Group PM2.5 and PM10 bars
    plot_bgcolor='rgba(0,0,0,0)',  # Background color of the plot
    paper_bgcolor='rgba(0,0,0,0)',  # Background color of the paper
    font=dict(
        family="Courier New, monospace",  # Font type
        size=18,  # Font size
        color="#7f7f7f"  # Font color
    )
)

# Show the plot
fig.show()


In [None]:

partikulasi_polusi_changping_df_df_harian_2015 = {}

months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_df_df_harian_2015[month] = partikulasi_polusi(changping_df, year=2015, month=month, day_start=1, day_end=30)

all_months_data = pd.concat(partikulasi_polusi_changping_df_df_harian_2015.values())

all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

color_pm25 = 'blue'
color_pm10 = 'orange'

fig = go.Figure()

fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str), 
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str), 
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2015 (Changping)',
    xaxis=dict(
        title='Bulan', 
        tickvals=list(range(1, 13)),  
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  
    barmode='group',  
    plot_bgcolor='rgba(0,0,0,0)', 
    paper_bgcolor='rgba(0,0,0,0)',  
    font=dict(
        family="Courier New, monospace", 
        size=18, 
        color="#7f7f7f"  
    )
)

fig.show()


In [None]:

partikulasi_polusi_changping_df_df_harian_2016 = {}

months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_df_df_harian_2016[month] = partikulasi_polusi(changping_df, year=2016, month=month, day_start=1, day_end=30)

all_months_data = pd.concat(partikulasi_polusi_changping_df_df_harian_2016.values())

all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

color_pm25 = 'blue'
color_pm10 = 'orange'

fig = go.Figure()

fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str), 
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str), 
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2016 (Changping)',
    xaxis=dict(
        title='Bulan', 
        tickvals=list(range(1, 13)),  
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']  
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  
    barmode='group',  
    plot_bgcolor='rgba(0,0,0,0)', 
    paper_bgcolor='rgba(0,0,0,0)',  
    font=dict(
        family="Courier New, monospace", 
        size=18, 
        color="#7f7f7f"  
    )
)

fig.show()


In [None]:

partikulasi_polusi_changping_df_df_harian_201 = {}

months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_df_df_harian_201[month] = partikulasi_polusi(changping_df, year=2017, month=month, day_start=1, day_end=30)

all_months_data = pd.concat(partikulasi_polusi_changping_df_df_harian_201.values())

all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 3)), ordered=True)

color_pm25 = 'blue'
color_pm10 = 'orange'

fig = go.Figure()

fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str), 
    y=all_months_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Bulan %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

fig.add_trace(go.Bar(
    x=all_months_data['month'].astype(str), 
    y=all_months_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Bulan %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Bulan di Tahun 2017 (Changping)',
    xaxis=dict(
        title='Bulan', 
        tickvals=list(range(1, 3)),  
        ticktext=['Jan', 'Feb']  
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  
    barmode='group',  
    plot_bgcolor='rgba(0,0,0,0)', 
    paper_bgcolor='rgba(0,0,0,0)',  
    font=dict(
        family="Courier New, monospace", 
        size=18, 
        color="#7f7f7f"  
    )
)

fig.show()


In [None]:
# Fungsi untuk menghitung rata-rata polusi tahunan
def partikulasi_polusi_tahunan(df, year):
    # Filter data sesuai tahun
    mask = (df['year'] == year)
    yearly_data = df[mask]

    # Menghitung rata-rata PM2.5 dan PM10
    avg_PM25 = yearly_data['PM2.5'].mean()
    avg_PM10 = yearly_data['PM10'].mean()

    return pd.DataFrame({
        'year': [year],
        'avg_PM25': [avg_PM25],
        'avg_PM10': [avg_PM10]
    })

# Buat dictionary untuk menyimpan data tahunan
partikulasi_polusi_changping_df_tahunan = {}

# Loop untuk mengisi data per tahun
years = range(2013, 2018)  # Dari 2013 sampai 2017
for year in years:
    partikulasi_polusi_changping_df_tahunan[year] = partikulasi_polusi_tahunan(changping_df, year)

# Gabungkan semua dataframe tahunan ke dalam satu dataframe
all_years_data = pd.concat(partikulasi_polusi_changping_df_tahunan.values())

# Definisikan warna berbeda untuk PM2.5 dan PM10
color_pm25 = 'blue'
color_pm10 = 'orange'

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk PM2.5
fig.add_trace(go.Bar(
    x=all_years_data['year'].astype(str),  
    y=all_years_data['avg_PM25'],
    name='PM2.5',
    marker_color=color_pm25,
    hovertemplate='<b>Tahun %{x}</b><br>PM2.5: %{y:.2f} μg/m³<extra></extra>'
))

# Tambahkan data untuk PM10
fig.add_trace(go.Bar(
    x=all_years_data['year'].astype(str),  
    y=all_years_data['avg_PM10'],
    name='PM10',
    marker_color=color_pm10,
    hovertemplate='<b>Tahun %{x}</b><br>PM10: %{y:.2f} μg/m³<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Kadar PM2.5 dan PM10 per Tahun dari 2013 hingga 2017 Changping',
    xaxis=dict(
        title='Tahun',  # Menetapkan judul untuk sumbu x
    ),
    yaxis=dict(title='Kadar PM (μg/m³)'),  # Menetapkan judul untuk sumbu y
    barmode='group',  # Mengelompokkan batang PM2.5 dan PM10
    plot_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang plot
    paper_bgcolor='rgba(0,0,0,0)',  # Warna latar belakang kertas
    font=dict(
        family="Courier New, monospace",  # Jenis font
        size=18,  # Ukuran font
        color="#7f7f7f"  # Warna font
    )
)

# Tampilkan grafik
fig.show()


In [None]:
# waktu kapan yang paling besar tingkat polusi udara mingguan

In [None]:

# Gabungkan semua dataframe per tahun
all_years_data = pd.concat([aotizhongxin_partikulasi_polusi_bulanan_2013,
                            aotizhongxin_partikulasi_polusi_bulanan_2014,
                            aotizhongxin_partikulasi_polusi_bulanan_2015,
                            aotizhongxin_partikulasi_polusi_bulanan_2016,
                            aotizhongxin_partikulasi_polusi_bulanan_2017])

# Definisikan warna berbeda untuk tiap tahun
year_colors = {
    2013: '#636EFA',  # Blue
    2014: '#EF553B',  # Red
    2015: '#00CC96',  # Green
    2016: '#AB63FA',  # Purple
    2017: '#FFA15A'  # Orange
}

# Membuat bar chart menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk tiap tahun dengan warna spesifik
for year in range(2013, 2017):
    yearly_data = all_years_data[all_years_data['year'] == year]
    fig.add_trace(go.Bar(
        x=yearly_data['month'],
        y=yearly_data['avg_PM25'],
        name=f'{year}',
        marker_color=year_colors[year],
        hovertemplate='<b>Bulan %{x}</b><br>Tahun: %{customdata}<br>PM2.5: %{y:.2f} μg/m³<extra></extra>',
        customdata=yearly_data['year']
    ))

# Layout dan label
fig.update_layout(
    title='Polusi PM2.5 per Bulan (2013-2017)',
    xaxis=dict(title='Bulan', tickvals=list(range(1, 13)),
               ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']),
    yaxis=dict(title='Kadar PM2.5 (μg/m³)'),
    barmode='group',
    hovermode='x unified',
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
    )
)

# Tampilkan grafik
fig.show()


In [None]:
# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_changping_suhu_harian_2014 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_suhu_harian_2014[month] = korelasi_suhu(changping_df, year=2014, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_changping_suhu_harian_2014.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Definisikan warna berbeda untuk tiap bulan
month_colors = {
    1: 'red', 2: 'blue', 3: 'green', 4: 'orange', 5: 'purple', 6: 'cyan',
    7: 'magenta', 8: 'yellow', 9: 'brown', 10: 'pink', 11: 'gray', 12: 'olive'
}

# Membuat plot menggunakan Plotly
fig = go.Figure()

# Tambahkan data untuk tiap bulan dengan warna spesifik
for month in range(1, 13):
    monthly_data = partikulasi_polusi_changping_suhu_harian_2014[month]
    if not monthly_data.empty:
        fig.add_trace(go.Scatter(
            x=monthly_data['day'],
            y=monthly_data['avg_TEMP'],
            mode='lines+markers',
            name=f'Bulan {month}',
            marker=dict(color=month_colors[month]),
            hovertemplate='<b>Hari %{x}</b><br>Suhu Rata-rata: %{y:.2f}°C<extra></extra>'
        ))

# Layout dan label
fig.update_layout(
    title='Suhu Rata-rata per Hari di Tahun 2014 (Changping)',
    xaxis=dict(title='Hari'),
    yaxis=dict(title='Suhu Rata-rata (°C)'),
    hovermode='x unified',
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
    )
)

# Tampilkan grafik
fig.show()


In [None]:
import plotly.graph_objects as go
import pandas as pd

# Buat dictionary untuk menyimpan data per bulan
partikulasi_polusi_changping_suhu_bulanan_2014 = {}

# Loop untuk mengisi data per bulan
months = range(1, 13)
for month in months:
    partikulasi_polusi_changping_suhu_bulanan_2014[month] = korelasi_suhu(changping_df, year=2014, month=month, day_start=1, day_end=30)

# Gabungkan semua dataframe bulanan ke dalam satu dataframe
all_months_data = pd.concat(partikulasi_polusi_changping_suhu_bulanan_2014.values())

# Pastikan urutan bulan benar
all_months_data['month'] = pd.Categorical(all_months_data['month'], categories=list(range(1, 13)), ordered=True)

# Hitung rata-rata suhu per bulan
avg_temp_per_month = all_months_data.groupby('month')['avg_TEMP'].mean().reset_index()

# Definisikan warna berbeda untuk tiap bulan
month_colors = {
    1: 'red', 2: 'blue', 3: 'green', 4: 'orange', 5: 'purple', 6: 'cyan',
    7: 'magenta', 8: 'yellow', 9: 'brown', 10: 'pink', 11: 'gray', 12: 'olive'
}

# Membuat bar chart menggunakan Plotly
fig = go.Figure()

# Tambahkan data suhu rata-rata per bulan
fig.add_trace(go.Bar(
    x=avg_temp_per_month['month'],
    y=avg_temp_per_month['avg_TEMP'],
    marker_color=[month_colors[month] for month in avg_temp_per_month['month']],
    hovertemplate='<b>Bulan %{x}</b><br>Suhu Rata-rata: %{y:.2f}°C<extra></extra>'
))

# Layout dan label
fig.update_layout(
    title='Suhu Rata-rata per Bulan di Tahun 2014 (Changping)',
    xaxis=dict(
        title='Bulan',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    ),
    yaxis=dict(title='Suhu Rata-rata (°C)'),
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
    )
)

# Tampilkan grafik
fig.show()


### 8.1 Pertanyaan 2:

## 9 Analisis Lanjutan (Opsional)

## 10 Conclusion