# 01 — Data Loading & Scope
Notebook ini berisi proses pengolahan data emisi karbon.
Langkah yang dilakukan adalah:
* Mengambil data emisi karbon dari sumber data (format CSV)
* Melihat gambaran umum data (jumlah baris/kolom, contoh isi data, dan tipe data)
* Setelah data dipahami, saya melakukan filter data menjadi negara ASEAN dan rentang tahun 2000–2024

Tujuan dari bagian ini adalah memastikan dataset sudah berhasil dipanggil dengan benar sebelum masuk ke tahap analisis lebih lanjut.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import scipy as sp
import math


## Data Collection
Dataset emisi karbon diambil dari file CSV yang disimpan di GitHub.
Setelah data dipanggil menjadi dataframe, beberapa baris awal ditampilkan untuk memastikan data terbaca dengan benar.


In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv", delimiter=',')
df.head()


Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
0,Afghanistan,1750,AFG,2802560.0,,0.0,0.0,,,,...,,,,,,,,,,
1,Afghanistan,1751,AFG,,,0.0,,,,,...,,,,,,,,,,
2,Afghanistan,1752,AFG,,,0.0,,,,,...,,,,,,,,,,
3,Afghanistan,1753,AFG,,,0.0,,,,,...,,,,,,,,,,
4,Afghanistan,1754,AFG,,,0.0,,,,,...,,,,,,,,,,


Dataset ini berisi informasi emisi untuk banyak negara dengan rentang tahun 1750–2024.

Sebelum melakukan filter, beberapa hal penting dicek terlebih dulu:
- ukuran dataset (jumlah baris dan kolom)
- kolom apa saja yang tersedia
- variabel mana yang numerik dan mana yang kategorik

Langkah ini membantu memastikan proses filter dan analisis berikutnya berjalan pada kolom yang benar.

In [3]:
df.shape

(50411, 79)

Hasil ini menunjukkan ukuran awal dan informasi dataset sebelum dilakukan penyaringan negara dan tahun.

Langkah berikutnya adalah melihat daftar kolom yang tersedia di dataset.
Hal ini membantu mengidentifikasi variabel yang relevan untuk analisis emisi karbon.

In [4]:
df.columns


Index(['country', 'year', 'iso_code', 'population', 'gdp', 'cement_co2',
       'cement_co2_per_capita', 'co2', 'co2_growth_abs', 'co2_growth_prct',
       'co2_including_luc', 'co2_including_luc_growth_abs',
       'co2_including_luc_growth_prct', 'co2_including_luc_per_capita',
       'co2_including_luc_per_gdp', 'co2_including_luc_per_unit_energy',
       'co2_per_capita', 'co2_per_gdp', 'co2_per_unit_energy', 'coal_co2',
       'coal_co2_per_capita', 'consumption_co2', 'consumption_co2_per_capita',
       'consumption_co2_per_gdp', 'cumulative_cement_co2', 'cumulative_co2',
       'cumulative_co2_including_luc', 'cumulative_coal_co2',
       'cumulative_flaring_co2', 'cumulative_gas_co2', 'cumulative_luc_co2',
       'cumulative_oil_co2', 'cumulative_other_co2', 'energy_per_capita',
       'energy_per_gdp', 'flaring_co2', 'flaring_co2_per_capita', 'gas_co2',
       'gas_co2_per_capita', 'ghg_excluding_lucf_per_capita', 'ghg_per_capita',
       'land_use_change_co2', 'land_use_chang

Kolom yang tersedia mencakup informasi negara, tahun, serta berbagai metrik emisi karbon dan gas rumah kaca lainnya.

## Data Understanding
Untuk memahami struktur data, kolom-kolom dipisahkan menjadi:
- numerik (berisi nilai kuantitatif)
- kategori (berisi label atau kategori)

Pemisahan ini membantu menentukan pendekatan analisis dan visualisasi yang sesuai.

In [5]:
num_cols = df._get_numeric_data().columns
cat_cols = list(set(df.columns) - set(num_cols))
print("Numerical columns: ",num_cols)
print("Categorical columns: ",cat_cols)


Numerical columns:  Index(['year', 'population', 'gdp', 'cement_co2', 'cement_co2_per_capita',
       'co2', 'co2_growth_abs', 'co2_growth_prct', 'co2_including_luc',
       'co2_including_luc_growth_abs', 'co2_including_luc_growth_prct',
       'co2_including_luc_per_capita', 'co2_including_luc_per_gdp',
       'co2_including_luc_per_unit_energy', 'co2_per_capita', 'co2_per_gdp',
       'co2_per_unit_energy', 'coal_co2', 'coal_co2_per_capita',
       'consumption_co2', 'consumption_co2_per_capita',
       'consumption_co2_per_gdp', 'cumulative_cement_co2', 'cumulative_co2',
       'cumulative_co2_including_luc', 'cumulative_coal_co2',
       'cumulative_flaring_co2', 'cumulative_gas_co2', 'cumulative_luc_co2',
       'cumulative_oil_co2', 'cumulative_other_co2', 'energy_per_capita',
       'energy_per_gdp', 'flaring_co2', 'flaring_co2_per_capita', 'gas_co2',
       'gas_co2_per_capita', 'ghg_excluding_lucf_per_capita', 'ghg_per_capita',
       'land_use_change_co2', 'land_use_change_c

Kolom numerik umumnya berisi data emisi, sedangkan kolom kategori berisi informasi identitas seperti nama negara dan kode wilayah.

Langkah selanjutnya adalah memeriksa tipe data dan keberadaan missing values pada setiap kolom.
Informasi ini penting untuk menentukan apakah diperlukan proses data cleaning pada tahap berikutnya.

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50411 entries, 0 to 50410
Data columns (total 79 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   country                                    50411 non-null  object 
 1   year                                       50411 non-null  int64  
 2   iso_code                                   42480 non-null  object 
 3   population                                 41167 non-null  float64
 4   gdp                                        15251 non-null  float64
 5   cement_co2                                 29173 non-null  float64
 6   cement_co2_per_capita                      25648 non-null  float64
 7   co2                                        29384 non-null  float64
 8   co2_growth_abs                             27216 non-null  float64
 9   co2_growth_prct                            26239 non-null  float64
 10  co2_including_luc     

Ringkasan ini menunjukkan tipe data tiap kolom serta jumlah data non-null, sehingga potensi permasalahan data dapat diidentifikasi lebih awal.

**Interpretasi**

Setelah struktur data dipahami, proses filtering mulai dilakukan.
Filtering dilakukan secara bertahap agar perubahan pada dataset dapat dipantau dengan jelas.
Tahap pertama adalah membatasi data hanya pada negara-negara ASEAN.

### filter data
data emisi difilter menjadi hanya negara-negara ASEAN saja

In [7]:
asean = ['Indonesia', 'Malaysia', 'Singapore', 'Thailand', 'Philippines', 'Brunei', 'Vietnam', 'Laos', 'Myanmar', 'Cambodia']
df = df[df['country'].isin(asean)]
df


Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
7388,Brunei,1750,BRN,8817.0,,0.000,0.000,,,,...,,,,,,,,,,
7389,Brunei,1751,BRN,,,0.000,,,,,...,,,,,,,,,,
7390,Brunei,1752,BRN,,,0.000,,,,,...,,,,,,,,,,
7391,Brunei,1753,BRN,,,0.000,,,,,...,,,,,,,,,,
7392,Brunei,1754,BRN,,,0.000,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49332,Vietnam,2020,VNM,98079196.0,7.526296e+11,60.066,0.612,362.599,23.287,6.863,...,,0.662,0.004,0.006,0.010,0.0,579.987,412.028,-102.706,-28.325
49333,Vietnam,2021,VNM,98935101.0,7.719120e+11,62.071,0.627,314.197,-48.402,-13.349,...,,0.668,0.004,0.006,0.011,0.0,528.790,365.921,-91.876,-29.241
49334,Vietnam,2022,VNM,99680656.0,8.338039e+11,55.979,0.562,322.653,8.457,2.691,...,,0.673,0.004,0.007,0.011,0.0,539.141,373.927,-94.756,-29.368
49335,Vietnam,2023,VNM,100352189.0,,44.488,0.443,347.399,24.746,7.669,...,,0.679,0.004,0.007,0.011,0.0,564.164,399.984,-99.611,-28.673


Tahap berikutnya adalah memfilter data berdasarkan rentang tahun 2000 sampai 2024.
Rentang ini dipilih untuk merepresentasikan kondisi emisi karbon pada periode modern.

In [8]:
df = df[(df['year'] >= 2000) & (df['year'] <= 2024)]
df


Unnamed: 0,country,year,iso_code,population,gdp,cement_co2,cement_co2_per_capita,co2,co2_growth_abs,co2_growth_prct,...,share_global_other_co2,share_of_temperature_change_from_ghg,temperature_change_from_ch4,temperature_change_from_co2,temperature_change_from_ghg,temperature_change_from_n2o,total_ghg,total_ghg_excluding_lucf,trade_co2,trade_co2_share
7638,Brunei,2000,BRN,326429.0,,0.000,0.000,5.886,-0.092,-1.537,...,,0.017,0.000,0.000,0.000,0.0,9.218,8.123,-2.344,-39.830
7639,Brunei,2001,BRN,333353.0,,0.000,0.000,5.758,-0.128,-2.178,...,,0.017,0.000,0.000,0.000,0.0,9.554,8.428,-2.237,-38.852
7640,Brunei,2002,BRN,340108.0,,0.000,0.000,5.285,-0.473,-8.206,...,,0.017,0.000,0.000,0.000,0.0,8.517,7.424,-1.717,-32.479
7641,Brunei,2003,BRN,346650.0,,0.000,0.000,6.140,0.854,16.162,...,,0.018,0.000,0.000,0.000,0.0,9.600,8.381,-0.516,-8.412
7642,Brunei,2004,BRN,352921.0,,0.000,0.000,5.967,-0.173,-2.817,...,,0.018,0.000,0.000,0.000,0.0,8.826,7.879,-0.508,-8.519
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49332,Vietnam,2020,VNM,98079196.0,7.526296e+11,60.066,0.612,362.599,23.287,6.863,...,,0.662,0.004,0.006,0.010,0.0,579.987,412.028,-102.706,-28.325
49333,Vietnam,2021,VNM,98935101.0,7.719120e+11,62.071,0.627,314.197,-48.402,-13.349,...,,0.668,0.004,0.006,0.011,0.0,528.790,365.921,-91.876,-29.241
49334,Vietnam,2022,VNM,99680656.0,8.338039e+11,55.979,0.562,322.653,8.457,2.691,...,,0.673,0.004,0.007,0.011,0.0,539.141,373.927,-94.756,-29.368
49335,Vietnam,2023,VNM,100352189.0,,44.488,0.443,347.399,24.746,7.669,...,,0.679,0.004,0.007,0.011,0.0,564.164,399.984,-99.611,-28.673


Dataset kini hanya mencakup data emisi karbon negara ASEAN dalam periode 2000–2024.

In [9]:
df.shape


(250, 79)

Hasil menunjukkan dataset telah menyusut menjadi 250 baris dan 79 kolom.
Dataset ini sudah terfokus dan siap digunakan untuk analisis eksploratif dan pencarian insight pada tahap berikutnya.