# Data Merger

In [35]:
import numpy as np
import pandas as pd

In [4]:
train_data = pd.read_csv('california_housing_train.csv')
test_data = pd.read_csv('california_housing_test.csv')

In [5]:
print("ini shape data dari train", train_data.shape)
print("ini shape data dari test", test_data.shape)

ini shape data dari train (17000, 9)
ini shape data dari test (3000, 9)


In [6]:
train_data.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


In [7]:
test_data.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.3,34.26,43.0,1510.0,310.0,809.0,277.0,3.599,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0


In [None]:
# func nya udh deprecated/udh kagak ada lagi
# full_data = train_data.append(test_data)
full_data = pd.concat([train_data, test_data], ignore_index=True)

In [10]:
full_data.shape

(20000, 9)

In [11]:
full_data.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0
1,-114.47,34.4,19.0,7650.0,1901.0,1129.0,463.0,1.82,80100.0
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.925,65500.0


# Concat

`concat` adalah **fungsi di pandas** yang digunakan untuk **menggabungkan** beberapa DataFrame atau Series menjadi satu. Bisa dibilang `concat` ini adalah pengganti yang lebih fleksibel dari `.append()` yang dulu sering dipakai.

---

### Fungsi `pd.concat`

* **Sintaks umum:**

```python
pd.concat(objs, axis=0, ignore_index=False, ...)
```

* `objs` : list atau dict dari objek pandas (biasanya DataFrame atau Series) yang ingin digabung.
* `axis` : menentukan arah penggabungan.

  * `axis=0` → menggabungkan baris (default), jadi DataFrame yang satu di bawah yang lain.
  * `axis=1` → menggabungkan kolom, jadi DataFrame yang satu di sebelah kanan yang lain.
* `ignore_index=True` → membuat indeks baru yang terurut mulai dari 0, jadi indeks lama diabaikan.

---

### Contoh sederhana:

Misal kamu punya dua DataFrame:

```python
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

print(df1)
#    A  B
# 0  1  3
# 1  2  4

print(df2)
#    A  B
# 0  5  7
# 1  6  8
```

Kalau kamu gabung pakai `concat`:

```python
full_df = pd.concat([df1, df2], ignore_index=True)
print(full_df)
```

Hasilnya:

```
   A  B
0  1  3
1  2  4
2  5  7
3  6  8
```

---

### Kenapa pakai `concat`?

* Lebih fleksibel daripada `.append()`.
* Bisa gabung banyak DataFrame sekaligus.
* Bisa gabung secara vertikal (`axis=0`) maupun horizontal (`axis=1`).
* Lebih direkomendasikan di versi pandas terbaru.


In [12]:
# data berbentuk dict

data1 = {"country" : ["JPY", "CHN", "IND"],
         "year" : [2019, 2019, 2020]}

data2 = {"country" : ["JPY", "CHN", "KOR"],
         "city" : ["Tokyo", "Beijing", "Seoul"]}


In [13]:
# jadiin data frame pake pandas
data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)

In [14]:
print(data1)
print(data2)

  country  year
0     JPY  2019
1     CHN  2019
2     IND  2020
  country     city
0     JPY    Tokyo
1     CHN  Beijing
2     KOR    Seoul


```python
df_kolom = pd.concat([data1, data2], axis=1)
```

Maka `axis=1` artinya **penggabungan dilakukan secara horizontal**, alias **menggabungkan kolom**.

---

### Penjelasan:

* `axis=0` → gabung baris (data2 ditempel *di bawah* data1)
* `axis=1` → gabung kolom (data2 ditempel *di samping kanan* data1)

Jadi, kalau kamu pakai `axis=1`, hasilnya DataFrame baru yang **kolom-kolomnya adalah gabungan kolom data1 dan data2**, sedangkan jumlah baris tetap mengikuti jumlah baris terbanyak di antara keduanya.

---

### Contoh:

```python
import pandas as pd

data1 = pd.DataFrame({'A': [1, 2, 3]})
data2 = pd.DataFrame({'B': [4, 5, 6]})

df_kolom = pd.concat([data1, data2], axis=1)
print(df_kolom)
```

Hasilnya:

```
   A  B
0  1  4
1  2  5
2  3  6
```

Kolom `A` dari `data1` dan kolom `B` dari `data2` digabung jadi satu DataFrame.

---

Kalau ada baris yang tidak matching, nanti akan muncul `NaN` di tempat yang kosong. Misalnya kalau `data1` punya 3 baris tapi `data2` cuma 2 baris, baris terakhir kolom `data2` akan `NaN`.

---

Jadi intinya:

* `axis=1` = gabung kolom (side-by-side)
* `axis=0` = gabung baris (stacked)


In [15]:
# jadi digabung ke kanan
df_kolom = pd.concat([data1, data2], axis = 1)
df_kolom

Unnamed: 0,country,year,country.1,city
0,JPY,2019,JPY,Tokyo
1,CHN,2019,CHN,Beijing
2,IND,2020,KOR,Seoul


In [16]:
# ini gabung ke bawah
df_baris = pd.concat([data1, data2], axis = 0)
df_baris

Unnamed: 0,country,year,city
0,JPY,2019.0,
1,CHN,2019.0,
2,IND,2020.0,
0,JPY,,Tokyo
1,CHN,,Beijing
2,KOR,,Seoul


# Join

---

### Apa itu `join` di pandas?

`join` adalah cara untuk **menggabungkan dua DataFrame berdasarkan kolom kunci (key)**, mirip seperti konsep **JOIN di SQL**.

Fungsinya untuk menggabungkan data dari dua tabel (DataFrame) berdasarkan nilai kolom yang sama, bukan hanya sekadar tempel baris atau kolom.

---

### Perbedaan `concat` dan `join`:

* `concat`: gabung DataFrame berdasarkan **posisi** (baris atau kolom), tanpa memperhatikan nilai kolom.
* `join`: gabung DataFrame berdasarkan **nilai kolom tertentu (key)**, seperti SQL JOIN (inner, left, right, outer).

---

### Cara pakai `join` di pandas

Ada dua cara:

1. Pakai method `.join()` di DataFrame
2. Pakai `pd.merge()`

---

#### 1. `.join()` method

```python
df1.join(df2, on=None, how='left', lsuffix='', rsuffix='')
```

* Biasanya dipakai untuk gabung DataFrame berdasarkan index (default).
* Kalau mau gabung berdasarkan kolom lain, biasanya pakai `pd.merge()` lebih fleksibel.

---

#### 2. `pd.merge()` (cara yang lebih umum dan powerful)

```python
pd.merge(df1, df2, on='key_column', how='inner')
```

* `on` : kolom yang jadi kunci penggabungan.
* `how` : tipe join, bisa:

  * `'inner'` : ambil data yang ada di kedua DataFrame (intersection)
  * `'left'` : ambil semua data dari df1, yang cocok dari df2
  * `'right'` : ambil semua data dari df2, yang cocok dari df1
  * `'outer'` : ambil semua data dari kedua DataFrame (union)

---

### Contoh join dengan `merge`:

```python
import pandas as pd

df1 = pd.DataFrame({
    'id': [1, 2, 3],
    'nama': ['Ani', 'Budi', 'Cici']
})

df2 = pd.DataFrame({
    'id': [2, 3, 4],
    'nilai': [90, 80, 70]
})

# Gabung berdasarkan kolom 'id'
hasil = pd.merge(df1, df2, on='id', how='inner')
print(hasil)
```

Hasilnya:

```
   id  nama  nilai
0   2  Budi     90
1   3  Cici     80
```

Ini contoh **inner join**, hanya yang `id` sama di kedua DataFrame yang muncul.

---


In [18]:
population = pd.read_csv("WorldBank_POP.csv")
population.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
0,Aruba,ABW,"Population, total",SP.POP.TOTL,54922.0,55578.0,56320.0,57002.0,57619.0,58190.0,...,108727.0,108735.0,108908.0,109203.0,108587.0,107700.0,107310.0,107359.0,,
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,130072080.0,133534923.0,137171659.0,140945536.0,144904094.0,149033472.0,...,623369401.0,640058741.0,657801085.0,675950189.0,694446100.0,713090928.0,731821393.0,750503764.0,,
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,9035043.0,9214083.0,9404406.0,9604487.0,9814318.0,10036008.0,...,34700612.0,35688935.0,36743039.0,37856121.0,39068979.0,40000412.0,40578842.0,41454761.0,,
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,97630925.0,99706674.0,101854756.0,104089175.0,106388440.0,108772632.0,...,429454743.0,440882906.0,452195915.0,463365429.0,474569351.0,485920997.0,497387180.0,509398589.0,,
4,Angola,AGO,"Population, total",SP.POP.TOTL,5231654.0,5301583.0,5354310.0,5408320.0,5464187.0,5521981.0,...,29183070.0,30234839.0,31297155.0,32375632.0,33451132.0,34532429.0,35635029.0,36749906.0,,


# apa itu melt 

Fungsi **`melt`** di pandas itu dipakai untuk mengubah data dari format **wide** (lebar) menjadi format **long** (panjang).

---

### Penjelasan singkat:

* **Wide format:** Data yang tiap variabelnya punya kolom sendiri-sendiri (misal: tiap tahun ada kolom berbeda).
* **Long format:** Data yang tiap observasi (baris) punya satu nilai variabel dan satu variabel lain sebagai penanda (misal: kolom "tahun" dan kolom "nilai").

---

### Fungsi `melt` membuat:

* Kolom-kolom tertentu dijadikan **identifier** (`id_vars`) — ini kolom yang gak diubah.
* Kolom-kolom lain yang ingin “dilebur” jadi dua kolom: satu kolom berisi nama kolom asli, satu kolom berisi isinya.

---

### Contoh sederhana:

Misal DataFrame awal:

| Nama | 2019 | 2020 | 2021 |
| ---- | ---- | ---- | ---- |
| Ani  | 10   | 12   | 15   |
| Budi | 20   | 18   | 22   |

Kalau kamu melt:

```python
pd.melt(df, id_vars=['Nama'], var_name='Tahun', value_name='Nilai')
```

Hasilnya jadi:

| Nama | Tahun | Nilai |
| ---- | ----- | ----- |
| Ani  | 2019  | 10    |
| Ani  | 2020  | 12    |
| Ani  | 2021  | 15    |
| Budi | 2019  | 20    |
| Budi | 2020  | 18    |
| Budi | 2021  | 22    |

---

### Kesimpulan:

`melt` berguna banget untuk **merapikan data** supaya lebih mudah dianalisis, khususnya kalau data aslinya punya banyak kolom yang sebenarnya adalah “variabel yang sama” tapi beda nilai, misalnya tahun, bulan, kategori, dll.


In [23]:
population = pd.melt(
    population,
    id_vars=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'],  # kolom yang tidak diubah
    value_vars=[str(y) for y in range(1960, 2025)],  # kolom tahun yang mau dilebur
    var_name='Year',  # nama kolom baru yang berisi tahun
    value_name='Value'  # nama kolom baru yang berisi nilai indikator
)

In [24]:
population.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Year,Value
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,1960,130072080.0
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,1960,9035043.0
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,1960,97630925.0
4,Angola,AGO,"Population, total",SP.POP.TOTL,1960,5231654.0


In [20]:
gdp = pd.read_csv("WorldBank_GDP.csv")
gdp.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,2983635000.0,3092429000.0,3276184000.0,3395799000.0,2481857000.0,2929447000.0,3279344000.0,3648573000.0,,
1,Africa Eastern and Southern,AFE,GDP (current US$),NY.GDP.MKTP.CD,24210630000.0,24963980000.0,27078800000.0,31775750000.0,30285790000.0,33813170000.0,...,828942800000.0,972998900000.0,1012306000000.0,1009721000000.0,933391800000.0,1085745000000.0,1191423000000.0,1245472000000.0,,
2,Afghanistan,AFG,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,18116570000.0,18753460000.0,18053220000.0,18799440000.0,19955930000.0,14260000000.0,14497240000.0,17233050000.0,,
3,Africa Western and Central,AFW,GDP (current US$),NY.GDP.MKTP.CD,11904950000.0,12707880000.0,13630760000.0,14469090000.0,15803760000.0,16921090000.0,...,694361000000.0,687849200000.0,770495000000.0,826483800000.0,789801700000.0,849312400000.0,883973900000.0,799106000000.0,,
4,Angola,AGO,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,52761620000.0,73690150000.0,79450690000.0,70897960000.0,48501560000.0,66505130000.0,104399700000.0,84824650000.0,,


In [25]:
gdp = pd.melt(
    gdp,
    id_vars=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'],  # kolom yang tidak diubah
    value_vars=[str(y) for y in range(1960, 2025)],  # kolom tahun yang mau dilebur
    var_name='Year',  # nama kolom baru yang berisi tahun
    value_name='Value'  # nama kolom baru yang berisi nilai indikator
)

In [26]:
gdp.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Year,Value
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,1960,
1,Africa Eastern and Southern,AFE,GDP (current US$),NY.GDP.MKTP.CD,1960,24210630000.0
2,Afghanistan,AFG,GDP (current US$),NY.GDP.MKTP.CD,1960,
3,Africa Western and Central,AFW,GDP (current US$),NY.GDP.MKTP.CD,1960,11904950000.0
4,Angola,AGO,GDP (current US$),NY.GDP.MKTP.CD,1960,


# apa itu outer 

`how="outer"` di fungsi `merge()` itu artinya kamu pakai **Outer Join** (atau Full Outer Join).

---

### Apa itu Outer Join?

* Outer join menggabungkan **seluruh baris dari kedua DataFrame** (baik dari `population` maupun `gdp`).
* Kalau ada data yang **cocok (matching key)** di kedua tabel, maka baris digabung jadi satu.
* Kalau ada data yang hanya ada di salah satu tabel (tidak ada pasangan di tabel lain), baris itu tetap dimasukkan, tapi kolom dari tabel lain yang tidak cocok akan berisi **NaN**.

---

### Visualisasi singkat:

```
DataFrame A:       DataFrame B:

Key  Value         Key  Value

1     10           2     20
2     15           3     30
3     20           4     40

Outer join hasil:

Key  Value_A  Value_B
1     10      NaN
2     15      20
3     20      30
4     NaN     40
```

---

### Jadi:

* `how="outer"` → **Full Outer Join**: semua data dari kedua tabel muncul, lengkap, dengan NaN untuk yang tidak cocok.

Kalau kamu mau join yang hanya data cocok dari kedua tabel → pakai `how="inner"`
Kalau mau semua dari kiri saja → `how="left"`
Kalau semua dari kanan saja → `how="right"`

---


In [27]:
# ini join pake key country code count
world_bank_outer = population.merge(gdp,
                                    on = ["Country Code", "Country Name", "Year"],
                                    how = "outer")

In [28]:
world_bank_outer

Unnamed: 0,Country Name,Country Code,Indicator Name_x,Indicator Code_x,Year,Value_x,Indicator Name_y,Indicator Code_y,Value_y
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0,GDP (current US$),NY.GDP.MKTP.CD,
1,Aruba,ABW,"Population, total",SP.POP.TOTL,1961,55578.0,GDP (current US$),NY.GDP.MKTP.CD,
2,Aruba,ABW,"Population, total",SP.POP.TOTL,1962,56320.0,GDP (current US$),NY.GDP.MKTP.CD,
3,Aruba,ABW,"Population, total",SP.POP.TOTL,1963,57002.0,GDP (current US$),NY.GDP.MKTP.CD,
4,Aruba,ABW,"Population, total",SP.POP.TOTL,1964,57619.0,GDP (current US$),NY.GDP.MKTP.CD,
...,...,...,...,...,...,...,...,...,...
17285,Zimbabwe,ZWE,"Population, total",SP.POP.TOTL,2020,15526888.0,GDP (current US$),NY.GDP.MKTP.CD,2.686794e+10
17286,Zimbabwe,ZWE,"Population, total",SP.POP.TOTL,2021,15797210.0,GDP (current US$),NY.GDP.MKTP.CD,2.724052e+10
17287,Zimbabwe,ZWE,"Population, total",SP.POP.TOTL,2022,16069056.0,GDP (current US$),NY.GDP.MKTP.CD,3.278975e+10
17288,Zimbabwe,ZWE,"Population, total",SP.POP.TOTL,2023,16340822.0,GDP (current US$),NY.GDP.MKTP.CD,3.523137e+10


# inner join 

```python
world_bank_inner = population.merge(gdp,
                                    on=["Country Code", "Country Name", "Year"],
                                    how="inner")
```

Maka yang terjadi adalah **Inner Join**, yaitu:

* Hanya baris-baris yang punya nilai **matching (sama persis)** pada kolom `Country Code`, `Country Name`, dan `Year` di **kedua DataFrame** (`population` dan `gdp`) yang akan muncul di hasil.
* Baris yang hanya ada di salah satu DataFrame tapi tidak ada pasangan di DataFrame lain akan **dikeluarkan** (tidak muncul).

---

### Singkatnya:

* **Inner join** = irisan data, hanya data yang cocok di kedua tabel.
* Hasilnya lebih kecil atau sama banyaknya baris dibanding masing-masing tabel.
* Biasanya dipakai kalau kamu cuma ingin data yang lengkap dan valid di kedua tabel.

---

### Visualisasi sederhana:

```
Tabel A:      Tabel B:

Key  Val     Key  Val

1    10      2    20
2    15      3    30
3    20      4    40

Inner join:

Key  Val_A  Val_B
2    15     20
3    20     30
```

---


In [29]:
# inner itu join yang sama aja antar tabel
world_bank_inner = population.merge(gdp,
                                    on = ["Country Code", "Country Name", "Year"],
                                    how = "inner")
world_bank_inner

Unnamed: 0,Country Name,Country Code,Indicator Name_x,Indicator Code_x,Year,Value_x,Indicator Name_y,Indicator Code_y,Value_y
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0,GDP (current US$),NY.GDP.MKTP.CD,
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,1960,130072080.0,GDP (current US$),NY.GDP.MKTP.CD,2.421063e+10
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,1960,9035043.0,GDP (current US$),NY.GDP.MKTP.CD,
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,1960,97630925.0,GDP (current US$),NY.GDP.MKTP.CD,1.190495e+10
4,Angola,AGO,"Population, total",SP.POP.TOTL,1960,5231654.0,GDP (current US$),NY.GDP.MKTP.CD,
...,...,...,...,...,...,...,...,...,...
17285,Kosovo,XKX,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17286,"Yemen, Rep.",YEM,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17287,South Africa,ZAF,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17288,Zambia,ZMB,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,


# apa itu left join 

Kalau kamu pakai:

```python
world_bank_left = population.merge(gdp,
                                  on=["Country Code", "Country Name", "Year"],
                                  how="left")
```

Maka itu artinya **Left Join**.

---

### Apa itu Left Join?

* Semua baris dari **DataFrame kiri** (`population`) akan tetap muncul di hasil.
* Baris dari DataFrame kanan (`gdp`) yang **cocok (matching key)** dengan DataFrame kiri akan digabung.
* Kalau di DataFrame kanan tidak ada pasangan yang cocok, kolom dari DataFrame kanan akan berisi **NaN**.

---

### Visualisasi sederhana:

```
Tabel Kiri (population):    Tabel Kanan (gdp):

Key  Val                   Key  Val

1    10                    2    20
2    15                    3    30
3    20                    4    40

Left join hasil:

Key  Val_kiri  Val_kanan
1    10        NaN
2    15        20
3    20        30
```

---

### Intinya:

* Data utama yang kamu inginkan adalah dari DataFrame kiri (`population`).
* Data dari DataFrame kanan (`gdp`) cuma ditambahkan jika ada kecocokan.

---


In [30]:
# kalau left itu yang table yang sebelah kiri semua di ambil, table kanan yang sama aja 
world_bank_left = population.merge(gdp,
                                    on = ["Country Code", "Country Name", "Year"],
                                    how = "left")
world_bank_left

Unnamed: 0,Country Name,Country Code,Indicator Name_x,Indicator Code_x,Year,Value_x,Indicator Name_y,Indicator Code_y,Value_y
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0,GDP (current US$),NY.GDP.MKTP.CD,
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,1960,130072080.0,GDP (current US$),NY.GDP.MKTP.CD,2.421063e+10
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,1960,9035043.0,GDP (current US$),NY.GDP.MKTP.CD,
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,1960,97630925.0,GDP (current US$),NY.GDP.MKTP.CD,1.190495e+10
4,Angola,AGO,"Population, total",SP.POP.TOTL,1960,5231654.0,GDP (current US$),NY.GDP.MKTP.CD,
...,...,...,...,...,...,...,...,...,...
17285,Kosovo,XKX,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17286,"Yemen, Rep.",YEM,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17287,South Africa,ZAF,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17288,Zambia,ZMB,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,


# join right adalah 

Kalau kamu pakai:

```python
world_bank_right = population.merge(gdp,
                                   on=["Country Code", "Country Name", "Year"],
                                   how="right")
```

Itu artinya **Right Join**.

---

### Apa itu Right Join?

* Semua baris dari **DataFrame kanan** (`gdp`) akan tetap muncul di hasil.
* Baris dari DataFrame kiri (`population`) yang **cocok (matching key)** dengan DataFrame kanan akan digabung.
* Kalau di DataFrame kiri tidak ada pasangan yang cocok, kolom dari DataFrame kiri akan berisi **NaN**.

---

### Visualisasi sederhana:

```
Tabel Kiri (population):    Tabel Kanan (gdp):

Key  Val                   Key  Val

1    10                    2    20
2    15                    3    30
3    20                    4    40

Right join hasil:

Key  Val_kiri  Val_kanan
2    15        20
3    20        30
4    NaN       40
```

---

### Intinya:

* Data utama adalah DataFrame kanan (`gdp`).
* Data dari DataFrame kiri (`population`) hanya ditambahkan jika ada kecocokan.

---

In [31]:
# ya kebalikan dari left si
world_bank_right = population.merge(gdp,
                                    on = ["Country Code", "Country Name", "Year"],
                                    how = "right")
world_bank_right

Unnamed: 0,Country Name,Country Code,Indicator Name_x,Indicator Code_x,Year,Value_x,Indicator Name_y,Indicator Code_y,Value_y
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0,GDP (current US$),NY.GDP.MKTP.CD,
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,1960,130072080.0,GDP (current US$),NY.GDP.MKTP.CD,2.421063e+10
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,1960,9035043.0,GDP (current US$),NY.GDP.MKTP.CD,
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,1960,97630925.0,GDP (current US$),NY.GDP.MKTP.CD,1.190495e+10
4,Angola,AGO,"Population, total",SP.POP.TOTL,1960,5231654.0,GDP (current US$),NY.GDP.MKTP.CD,
...,...,...,...,...,...,...,...,...,...
17285,Kosovo,XKX,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17286,"Yemen, Rep.",YEM,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17287,South Africa,ZAF,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,
17288,Zambia,ZMB,"Population, total",SP.POP.TOTL,2024,,GDP (current US$),NY.GDP.MKTP.CD,


In [34]:
# ini default nya pakai inner join
world_bank_multiple = population.merge(gdp,
                                    on = ["Country Code", "Country Code", "Year"]
                                    )
world_bank_multiple

Unnamed: 0,Country Name_x,Country Code,Indicator Name_x,Indicator Code_x,Year,Value_x,Country Name_y,Indicator Name_y,Indicator Code_y,Value_y
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0,Aruba,GDP (current US$),NY.GDP.MKTP.CD,
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,1960,130072080.0,Africa Eastern and Southern,GDP (current US$),NY.GDP.MKTP.CD,2.421063e+10
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,1960,9035043.0,Afghanistan,GDP (current US$),NY.GDP.MKTP.CD,
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,1960,97630925.0,Africa Western and Central,GDP (current US$),NY.GDP.MKTP.CD,1.190495e+10
4,Angola,AGO,"Population, total",SP.POP.TOTL,1960,5231654.0,Angola,GDP (current US$),NY.GDP.MKTP.CD,
...,...,...,...,...,...,...,...,...,...,...
17285,Kosovo,XKX,"Population, total",SP.POP.TOTL,2024,,Kosovo,GDP (current US$),NY.GDP.MKTP.CD,
17286,"Yemen, Rep.",YEM,"Population, total",SP.POP.TOTL,2024,,"Yemen, Rep.",GDP (current US$),NY.GDP.MKTP.CD,
17287,South Africa,ZAF,"Population, total",SP.POP.TOTL,2024,,South Africa,GDP (current US$),NY.GDP.MKTP.CD,
17288,Zambia,ZMB,"Population, total",SP.POP.TOTL,2024,,Zambia,GDP (current US$),NY.GDP.MKTP.CD,


In [33]:
population

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Year,Value
0,Aruba,ABW,"Population, total",SP.POP.TOTL,1960,54922.0
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,1960,130072080.0
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,1960,9035043.0
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,1960,97630925.0
4,Angola,AGO,"Population, total",SP.POP.TOTL,1960,5231654.0
...,...,...,...,...,...,...
17285,Kosovo,XKX,"Population, total",SP.POP.TOTL,2024,
17286,"Yemen, Rep.",YEM,"Population, total",SP.POP.TOTL,2024,
17287,South Africa,ZAF,"Population, total",SP.POP.TOTL,2024,
17288,Zambia,ZMB,"Population, total",SP.POP.TOTL,2024,
