**Data agregat** adalah **data yang telah diringkas atau digabung** dari data mentah untuk memberikan **informasi yang lebih sederhana dan bermakna**.

---

### 📊 Contoh Sederhana:

Misalnya kamu punya data mentah seperti ini:

| Kota    | Penjualan |
| ------- | --------- |
| Jakarta | 100       |
| Bandung | 80        |
| Jakarta | 120       |
| Bandung | 90        |

Kalau kamu ingin tahu **total penjualan per kota**, kamu lakukan **agregasi** berdasarkan kota.

Hasilnya:

| Kota    | Total Penjualan |
| ------- | --------------- |
| Jakarta | 220             |
| Bandung | 170             |

---

### 🔧 Contoh Operasi Agregasi:

Agregasi biasanya melibatkan fungsi seperti:

* `sum()` → menjumlahkan
* `mean()` → menghitung rata-rata
* `count()` → menghitung jumlah baris
* `max()` → nilai maksimum
* `min()` → nilai minimum

---

### 🧠 Kapan digunakan?

* Saat kamu ingin **menganalisis tren**, bukan melihat tiap baris data mentah.
* Misalnya:

  * Rata-rata nilai siswa per kelas
  * Total pendapatan per bulan
  * Jumlah produk terjual per kategori

---

### 🔍 Contoh di pandas:

```python
import pandas as pd

data = pd.DataFrame({
    "Kota": ["Jakarta", "Bandung", "Jakarta", "Bandung"],
    "Penjualan": [100, 80, 120, 90]
})

# Agregasi: total penjualan per kota
hasil = data.groupby("Kota")["Penjualan"].sum()
print(hasil)
```

**Output:**

```
Kota
Bandung    170
Jakarta    220
Name: Penjualan, dtype: int64
```

---


In [1]:
import pandas as pd
import numpy as np

In [2]:
house_data = pd.read_csv("House_Rent_Dataset.csv")

In [3]:
house_data.head()

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
1,2022-05-13,2,20000,800,1 out of 3,Super Area,"Phool Bagan, Kankurgachi",Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
2,2022-05-16,2,17000,1000,1 out of 3,Super Area,Salt Lake City Sector 2,Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
3,2022-07-04,2,10000,800,1 out of 2,Super Area,Dumdum Park,Kolkata,Unfurnished,Bachelors/Family,1,Contact Owner
4,2022-05-09,2,7500,850,1 out of 2,Carpet Area,South Dum Dum,Kolkata,Unfurnished,Bachelors,1,Contact Owner


# Tipe Area

In [4]:
house_data["Area Type"].unique()

array(['Super Area', 'Carpet Area', 'Built Area'], dtype=object)

Fungsi `groupby()` di **pandas** digunakan untuk **mengelompokkan data** berdasarkan satu atau lebih kolom, sehingga kamu bisa melakukan **agregasi** (seperti `sum()`, `mean()`, `count()`, dll) untuk masing-masing grup.

---

### 🔧 **Fungsi Dasar**

```python
df.groupby("kolom")
```

Ini akan mengelompokkan data dalam `df` berdasarkan nilai-nilai unik di kolom tersebut.

---

### 📊 **Ilustrasi:**

Misalnya kamu punya DataFrame seperti ini:

```python
import pandas as pd

data = pd.DataFrame({
    "Tipe Area": ["Urban", "Urban", "Rural", "Suburban", "Rural"],
    "Sewa": [1500, 1600, 800, 1200, 900]
})
```

| Tipe Area | Sewa |
| --------- | ---- |
| Urban     | 1500 |
| Urban     | 1600 |
| Rural     | 800  |
| Suburban  | 1200 |
| Rural     | 900  |

---

### ✅ **Contoh Penggunaan `groupby()`**

#### Rata-rata sewa per tipe area:

```python
data.groupby("Tipe Area")["Sewa"].mean()
```

**Hasil:**

```
Tipe Area
Rural        850.0
Suburban    1200.0
Urban       1550.0
Name: Sewa, dtype: float64
```

#### Total sewa per tipe area:

```python
data.groupby("Tipe Area")["Sewa"].sum()
```

#### Jumlah baris per tipe area:

```python
data.groupby("Tipe Area").size()
```

---

### 📌 Inti Konsep `groupby`:

1. **Split** → Bagi data ke dalam grup berdasarkan kolom tertentu.
2. **Apply** → Lakukan operasi agregasi (sum, mean, count, dsb).
3. **Combine** → Satukan hasilnya dalam satu struktur.

---


In [5]:
rent_by_area = house_data.groupby("Area Type")

In [6]:
rent_by_area.head()

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
1,2022-05-13,2,20000,800,1 out of 3,Super Area,"Phool Bagan, Kankurgachi",Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
2,2022-05-16,2,17000,1000,1 out of 3,Super Area,Salt Lake City Sector 2,Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
3,2022-07-04,2,10000,800,1 out of 2,Super Area,Dumdum Park,Kolkata,Unfurnished,Bachelors/Family,1,Contact Owner
4,2022-05-09,2,7500,850,1 out of 2,Carpet Area,South Dum Dum,Kolkata,Unfurnished,Bachelors,1,Contact Owner
5,2022-04-29,2,7000,600,Ground out of 1,Super Area,Thakurpukur,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
8,2022-06-07,2,26000,800,1 out of 2,Carpet Area,"Palm Avenue Kolkata, Ballygunge",Kolkata,Unfurnished,Bachelors,2,Contact Agent
9,2022-06-20,2,10000,1000,1 out of 3,Carpet Area,Natunhat,Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Owner
10,2022-05-23,3,25000,1200,1 out of 4,Carpet Area,"Action Area 1, Rajarhat Newtown",Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Agent
11,2022-06-07,1,5000,400,1 out of 1,Carpet Area,Keshtopur,Kolkata,Unfurnished,Bachelors/Family,1,Contact Agent


Fungsi `.get_group()` di **pandas** digunakan untuk **mengambil data dari satu grup tertentu** setelah kamu melakukan `groupby()`.

---

### 📌 Format Umum:

```python
group = df.groupby("Kolom")
group.get_group("nilai_grup")
```

---

### 📊 Contoh:

Misalnya kamu punya DataFrame seperti ini:

```python
import pandas as pd

df = pd.DataFrame({
    "Tipe Area": ["Urban", "Urban", "Rural", "Suburban", "Rural"],
    "Sewa": [1500, 1600, 800, 1200, 900]
})
```

Kamu ingin melihat semua baris yang termasuk dalam `"Tipe Area" = "Rural"`:

```python
grouped = df.groupby("Tipe Area")
rural_data = grouped.get_group("Rural")
print(rural_data)
```

**Output:**

```
  Tipe Area  Sewa
2     Rural   800
4     Rural   900
```

---

### ✅ Kegunaan `get_group()`:

* Mengambil **semua baris asli** dari grup tertentu.
* Cocok kalau kamu ingin **melihat data mentah** dari satu grup spesifik, bukan hanya hasil agregasi.

---

### ⚠️ Catatan:

* Pastikan nilai grup yang kamu minta **benar-benar ada** di kolom pengelompokan, kalau tidak akan muncul error `KeyError`.

---


In [7]:
super_area = rent_by_area.get_group("Super Area")
super_area.head()

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
1,2022-05-13,2,20000,800,1 out of 3,Super Area,"Phool Bagan, Kankurgachi",Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
2,2022-05-16,2,17000,1000,1 out of 3,Super Area,Salt Lake City Sector 2,Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
3,2022-07-04,2,10000,800,1 out of 2,Super Area,Dumdum Park,Kolkata,Unfurnished,Bachelors/Family,1,Contact Owner
5,2022-04-29,2,7000,600,Ground out of 1,Super Area,Thakurpukur,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner


In [8]:
rent_by_area.get_group("Carpet Area")

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
4,2022-05-09,2,7500,850,1 out of 2,Carpet Area,South Dum Dum,Kolkata,Unfurnished,Bachelors,1,Contact Owner
8,2022-06-07,2,26000,800,1 out of 2,Carpet Area,"Palm Avenue Kolkata, Ballygunge",Kolkata,Unfurnished,Bachelors,2,Contact Agent
9,2022-06-20,2,10000,1000,1 out of 3,Carpet Area,Natunhat,Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Owner
10,2022-05-23,3,25000,1200,1 out of 4,Carpet Area,"Action Area 1, Rajarhat Newtown",Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Agent
11,2022-06-07,1,5000,400,1 out of 1,Carpet Area,Keshtopur,Kolkata,Unfurnished,Bachelors/Family,1,Contact Agent
...,...,...,...,...,...,...,...,...,...,...,...,...
4739,2022-07-06,2,25000,1040,2 out of 4,Carpet Area,Gachibowli,Hyderabad,Unfurnished,Bachelors,2,Contact Owner
4741,2022-05-18,2,15000,1000,3 out of 5,Carpet Area,Bandam Kommu,Hyderabad,Semi-Furnished,Bachelors/Family,2,Contact Owner
4743,2022-07-10,3,35000,1750,3 out of 5,Carpet Area,"Himayath Nagar, NH 7",Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Agent
4744,2022-07-06,3,45000,1500,23 out of 34,Carpet Area,Gachibowli,Hyderabad,Semi-Furnished,Family,2,Contact Agent


In [None]:
area_local = house_data["Area Locality"].unique()

2235

In [13]:
by_area_by_local = house_data.groupby(["Area Type", "Area Locality"])

In [14]:
by_area_by_local.get_group(("Super Area", "Bandel"))

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
73,2022-05-30,2,6500,700,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors,1,Contact Owner


In [15]:
rent_by_area["Rent"].mean().sort_values(ascending = False)

Area Type
Carpet Area    52385.897302
Super Area     18673.396566
Built Area     10500.000000
Name: Rent, dtype: float64

In [16]:
by_area_tenant = house_data.groupby(["Area Type", "Tenant Preferred"])

In [17]:
by_area_tenant.head()

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact
0,2022-05-18,2,10000,1100,Ground out of 2,Super Area,Bandel,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
1,2022-05-13,2,20000,800,1 out of 3,Super Area,"Phool Bagan, Kankurgachi",Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
2,2022-05-16,2,17000,1000,1 out of 3,Super Area,Salt Lake City Sector 2,Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner
3,2022-07-04,2,10000,800,1 out of 2,Super Area,Dumdum Park,Kolkata,Unfurnished,Bachelors/Family,1,Contact Owner
4,2022-05-09,2,7500,850,1 out of 2,Carpet Area,South Dum Dum,Kolkata,Unfurnished,Bachelors,1,Contact Owner
5,2022-04-29,2,7000,600,Ground out of 1,Super Area,Thakurpukur,Kolkata,Unfurnished,Bachelors/Family,2,Contact Owner
6,2022-06-21,2,10000,700,Ground out of 4,Super Area,Malancha,Kolkata,Unfurnished,Bachelors,2,Contact Agent
7,2022-06-21,1,5000,250,1 out of 2,Super Area,Malancha,Kolkata,Unfurnished,Bachelors,1,Contact Agent
8,2022-06-07,2,26000,800,1 out of 2,Carpet Area,"Palm Avenue Kolkata, Ballygunge",Kolkata,Unfurnished,Bachelors,2,Contact Agent
9,2022-06-20,2,10000,1000,1 out of 3,Carpet Area,Natunhat,Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Owner


`by_area_tenant.groups.keys()` dipakai untuk **melihat daftar nama grup** yang dihasilkan oleh `groupby()` di pandas.

---

### 🔍 Penjelasan Detail:

Misalnya kamu punya DataFrame `df`, dan kamu kelompokkan seperti ini:

```python
by_area_tenant = df.groupby("Tipe Area")
```

Kalau kamu akses:

```python
by_area_tenant.groups.keys()
```

Maka kamu akan dapat **semua nilai unik** dari kolom `"Tipe Area"` yang digunakan sebagai grup, yaitu semacam **"daftar label grup"**.

---

### 📊 Contoh:

```python
import pandas as pd

df = pd.DataFrame({
    "Tipe Area": ["Urban", "Urban", "Rural", "Suburban", "Rural"],
    "Sewa": [1500, 1600, 800, 1200, 900]
})

by_area_tenant = df.groupby("Tipe Area")
print(by_area_tenant.groups.keys())
```

**Output:**

```
dict_keys(['Rural', 'Suburban', 'Urban'])
```

---

### 🧠 Jadi `keys()` itu:

* Seperti melihat isi kolom yang dipakai untuk grup (tanpa duplikat).
* Berguna kalau kamu mau **looping ke setiap grup**, misalnya:

```python
for k in by_area_tenant.groups.keys():
    print(f"Grup: {k}")
    print(by_area_tenant.get_group(k))
```

---


In [18]:
by_area_tenant.groups.keys()

dict_keys([('Built Area', 'Bachelors/Family'), ('Carpet Area', 'Bachelors'), ('Carpet Area', 'Bachelors/Family'), ('Carpet Area', 'Family'), ('Super Area', 'Bachelors'), ('Super Area', 'Bachelors/Family'), ('Super Area', 'Family')])

Dalam `pandas`, fungsi `agg()` (alias **aggregate**) sangat fleksibel dan bisa digunakan dengan banyak **fungsi bawaan** maupun **fungsi buatan sendiri**. Berikut ini daftar fungsi-fungsi agregat yang paling umum digunakan:

---

### ✅ **Fungsi Bawaan Pandas/Numpy yang Sering Dipakai di `.agg()`**

| Fungsi      | Deskripsi                         |
| ----------- | --------------------------------- |
| `"sum"`     | Menjumlahkan                      |
| `"mean"`    | Rata-rata                         |
| `"median"`  | Median                            |
| `"min"`     | Nilai minimum                     |
| `"max"`     | Nilai maksimum                    |
| `"count"`   | Jumlah baris yang tidak NaN       |
| `"size"`    | Jumlah total baris (termasuk NaN) |
| `"std"`     | Standar deviasi                   |
| `"var"`     | Varians                           |
| `"nunique"` | Jumlah nilai unik                 |
| `"first"`   | Nilai pertama                     |
| `"last"`    | Nilai terakhir                    |
| `"prod"`    | Hasil perkalian semua nilai       |

---

### 🔁 **Bisa Digunakan dengan Banyak Fungsi Sekaligus**

Contoh:

```python
df.groupby("Kategori")["Harga"].agg(["mean", "min", "max", "count"])
```

---

### 💡 **Fungsi Buatan Sendiri**

Kamu juga bisa buat fungsi sendiri:

```python
def over_500(x):
    return sum(x > 500)

df.groupby("Area")["Rent"].agg(["mean", over_500])
```

---

### 🧠 **Bisa Pakai Dictionary untuk Nama Kolom Baru**

Contoh:

```python
df.groupby("Area").agg({
    "Rent": ["mean", "max"],
    "Size": "sum"
})
```

Atau kasih nama custom:

```python
df.groupby("Area").agg(
    MeanRent=("Rent", "mean"),
    MaxRent=("Rent", "max")
)
```

---

### ⚠️ Catatan

* `.agg()` biasanya dipakai untuk **meringkas data (grouped)**.
* Kalau kamu butuh hasil yang **seukuran dengan data asli**, gunakan `.transform()`.

---


In [19]:
def over_value(x, nilai = 500):

  return sum(x > nilai)

Itu adalah fungsi Python yang kamu buat sendiri untuk **menghitung berapa banyak nilai dalam `x` yang lebih besar dari `nilai` tertentu** (default-nya `500`).

---

### 📌 Penjelasan Baris per Baris:

```python
def over_value(x, nilai=500):
```

* Membuat fungsi bernama `over_value`
* Parameter:

  * `x`: bisa berupa list, array, Series, dll
  * `nilai`: batas acuan (default: 500)

```python
  return sum(x > nilai)
```

* Mengecek setiap elemen di `x`, apakah lebih besar dari `nilai`
* Mengembalikan jumlah nilai yang **lebih besar dari `nilai`**

---

### 📊 Contoh Penggunaan:

#### 1. Dengan list biasa (pakai NumPy biar bisa `x > nilai` langsung):

```python
import numpy as np

x = np.array([100, 550, 700, 450])
over_value(x)
```

**Output:** `2`
Karena hanya `550` dan `700` yang > 500

---

### ⚠️ Penting:

* `x > nilai` hanya bekerja langsung jika `x` adalah **NumPy array** atau **pandas Series**.
* Kalau `x` adalah list biasa, harus kamu konversi dulu.

---

### 🛠 Alternatif dengan list biasa:

```python
def over_value(x, nilai=500):
    return sum([i > nilai for i in x])
```

---


In [20]:
by_area_tenant["Rent"].agg(["min", "max", "median", "mean", over_value])

Unnamed: 0_level_0,Unnamed: 1_level_0,min,max,median,mean,over_value
Area Type,Tenant Preferred,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Built Area,Bachelors/Family,6000,15000,10500.0,10500.0,2
Carpet Area,Bachelors,3500,3500000,21000.0,45910.780029,691
Carpet Area,Bachelors/Family,1200,1200000,25000.0,53738.206089,1281
Carpet Area,Family,5500,1000000,34000.0,60796.935583,326
Super Area,Bachelors,3500,160000,15000.0,23417.266187,139
Super Area,Bachelors/Family,1500,350000,12000.0,17876.135123,2161
Super Area,Family,5000,260000,18000.0,25957.534247,146


Mantap, ini adalah contoh penggunaan **`.agg()` untuk agregasi statistik campuran**, termasuk fungsi kustom (`over_value`) di dalam `groupby()` — kamu lagi ngelakuin analisis statistik per grup!

---

### 📌 Penjelasan kode:

```python
by_area_tenant["Rent"].agg(["min", "max", "median", "mean", over_value])
```

#### Artinya:

* `by_area_tenant` adalah hasil dari `df.groupby("Tipe Area")` atau semacamnya.
* `"Rent"` adalah kolom yang mau dianalisis.
* `.agg([...])` artinya kamu mau menerapkan **beberapa fungsi agregasi sekaligus**, yaitu:

  * `"min"` → nilai terkecil
  * `"max"` → nilai terbesar
  * `"median"` → nilai tengah
  * `"mean"` → rata-rata
  * `over_value` → fungsi buatan sendiri untuk menghitung **berapa banyak nilai yang > 500**

---

### 📊 Hasilnya: DataFrame baru seperti ini (misal contoh 3 grup area)

| Tipe Area | min | max  | median | mean | over\_value |
| --------- | --- | ---- | ------ | ---- | ----------- |
| Urban     | 800 | 2000 | 1300   | 1350 | 2           |
| Rural     | 400 | 700  | 550    | 525  | 1           |
| Suburban  | 600 | 900  | 750    | 750  | 2           |

---

### ✅ Keren-nya `.agg()`:

* Bisa pakai **fungsi bawaan** seperti `"mean"`, `"sum"`, `"count"`, dll.
* Bisa juga pakai **fungsi buatan sendiri** kayak `over_value`.

---

### 🔁 Mau ganti parameter di fungsi kustom?

Kalau kamu mau ganti nilai default di `over_value`, misalnya threshold-nya bukan 500:

```python
from functools import partial

custom_over = partial(over_value, nilai=1000)
by_area_tenant["Rent"].agg(["min", "max", "median", "mean", custom_over])
```

---

Kalau kamu mau gabung dengan fungsi lain atau hitung proporsi (misalnya: "berapa % yang > 500"), tinggal bilang aja — bisa dikustom lebih jauh!


In [21]:
by_area_tenant.agg({"Rent" : ["min", "max", "mean"],
                    "Size" : ["min", "max"]})

Unnamed: 0_level_0,Unnamed: 1_level_0,Rent,Rent,Rent,Size,Size
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max,mean,min,max
Area Type,Tenant Preferred,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Built Area,Bachelors/Family,6000,15000,10500.0,500,1000
Carpet Area,Bachelors,3500,3500000,45910.780029,50,4761
Carpet Area,Bachelors/Family,1200,1200000,53738.206089,10,7000
Carpet Area,Family,5500,1000000,60796.935583,100,4500
Super Area,Bachelors,3500,160000,23417.266187,100,4000
Super Area,Bachelors/Family,1500,350000,17876.135123,20,8000
Super Area,Family,5000,260000,25957.534247,100,4761


In [22]:
for group, df in by_area_tenant:
  print(f"{group[0] and group[1]} and number of rows {df.shape[0]}")

Bachelors/Family and number of rows 2
Bachelors and number of rows 691
Bachelors/Family and number of rows 1281
Family and number of rows 326
Bachelors and number of rows 139
Bachelors/Family and number of rows 2161
Family and number of rows 146


Kode ini adalah contoh **looping (iterasi) pada hasil `groupby` di pandas**, tapi ada yang kurang tepat di bagian `print`-nya kalau `group` adalah tuple.

---

### Penjelasan:

```python
for group, df in by_area_tenant:
    print(f"{group[0] and group[1]} and number of rows {df.shape[0]}")
```

* `by_area_tenant` diasumsikan adalah hasil dari `groupby()` yang **menginvolve lebih dari satu kolom**, sehingga:

  * `group` adalah **tuple** yang berisi nilai-nilai grup dari kolom yang dikelompokkan.
  * `df` adalah DataFrame subset dari grup itu.

---

### Masalah di kode:

`{group[0] and group[1]}` bukan cara yang benar untuk menampilkan isi tuple `group`.

* Operator `and` akan mengembalikan `group[1]` jika `group[0]` truthy, jadi hanya `group[1]` yang ditampilkan.
* Jadi hasilnya bukan gabungan dari kedua nilai, tapi cuma nilai kedua saja (atau nilai pertama jika yang kedua falsy).

---

### Cara yang benar untuk menampilkan isi tuple:

Kalau kamu mau cetak isi tuple lengkap, bisa pakai:

```python
for group, df in by_area_tenant:
    print(f"{group} and number of rows {df.shape[0]}")
```

Atau, jika ingin menampilkan per elemen:

```python
for group, df in by_area_tenant:
    print(f"{group[0]}, {group[1]} and number of rows {df.shape[0]}")
```

---

### Contoh kalau groupby dua kolom:

Misal kamu groupby `["Area Type", "Tenant Type"]`, maka `group` bisa jadi seperti `("Urban", "Individual")`.

Output contoh:

```
('Urban', 'Individual') and number of rows 25
('Urban', 'Company') and number of rows 10
('Rural', 'Individual') and number of rows 8
...
```

---

Kalau kamu mau bantu buat kode looping dengan kondisi tertentu atau hitung statistik per grup, aku siap bantu!


Oke, ini penjelasan soal **`transform()` vs `agg()` di pandas**, terutama dalam konteks `groupby()`:

---

## 1. `.agg()` (aggregate)

* **Tujuan:** Menghasilkan **hasil ringkasan per grup**.
* Output: **Jumlah baris grup = jumlah baris hasil** (biasanya lebih sedikit dari data asli).
* Contoh fungsi: `sum()`, `mean()`, `min()`, `max()`, atau fungsi kustom yang mengembalikan satu nilai per grup.
* Contoh:

  ```python
  df.groupby("Area")["Rent"].agg("mean")
  ```

  → hasilnya satu nilai rata-rata per grup.

---

## 2. `.transform()`

* **Tujuan:** Menghasilkan output **dengan ukuran sama dengan data asli**, tapi tiap nilai "ditransformasi" berdasarkan grupnya.
* Output: **Jumlah baris hasil = jumlah baris data asli**.
* Berguna untuk membuat kolom baru yang berisi nilai hasil transformasi per baris, misal rata-rata grup, standar deviasi grup, dll.
* Contoh:

  ```python
  df["MeanRent"] = df.groupby("Area")["Rent"].transform("mean")
  ```

  → tiap baris akan punya nilai rata-rata sewa untuk grupnya, jadi ukuran sama dengan data asli.

---

## Perbedaan utama:

| Aspek        | `agg()`                       | `transform()`                           |
| ------------ | ----------------------------- | --------------------------------------- |
| Output baris | Satu baris per grup           | Sama banyak baris dengan data asli      |
| Fungsi hasil | Ringkasan satu nilai per grup | Nilai per baris hasil transformasi      |
| Contoh hasil | Mean per Area (ringkas)       | Mean per Area ditambahkan ke tiap baris |

---

## Contoh penggunaan:

Misal data:

| Area  | Rent |
| ----- | ---- |
| Urban | 1000 |
| Urban | 1200 |
| Rural | 800  |

### Dengan `agg()`:

```python
df.groupby("Area")["Rent"].agg("mean")
```

Hasil:

```
Area
Rural    800
Urban    1100
Name: Rent, dtype: int64
```

### Dengan `transform()`:

```python
df["MeanRent"] = df.groupby("Area")["Rent"].transform("mean")
```

Hasil data `df` jadi:

| Area  | Rent | MeanRent |
| ----- | ---- | -------- |
| Urban | 1000 | 1100     |
| Urban | 1200 | 1100     |
| Rural | 800  | 800      |

---

Kalau kamu mau contoh buat fungsi kustom di `transform()` juga bisa aku bantu!


In [23]:
house_data["median_rent_by_area"] = rent_by_area["Rent"].transform("median")

In [25]:
house_data[house_data["Rent"] >= house_data["median_rent_by_area"]]

Unnamed: 0,Posted On,BHK,Rent,Size,Floor,Area Type,Area Locality,City,Furnishing Status,Tenant Preferred,Bathroom,Point of Contact,median_rent_by_area
1,2022-05-13,2,20000,800,1 out of 3,Super Area,"Phool Bagan, Kankurgachi",Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner,13000.0
2,2022-05-16,2,17000,1000,1 out of 3,Super Area,Salt Lake City Sector 2,Kolkata,Semi-Furnished,Bachelors/Family,1,Contact Owner,13000.0
8,2022-06-07,2,26000,800,1 out of 2,Carpet Area,"Palm Avenue Kolkata, Ballygunge",Kolkata,Unfurnished,Bachelors,2,Contact Agent,25000.0
10,2022-05-23,3,25000,1200,1 out of 4,Carpet Area,"Action Area 1, Rajarhat Newtown",Kolkata,Semi-Furnished,Bachelors/Family,2,Contact Agent,25000.0
15,2022-06-01,3,40000,1286,1 out of 1,Carpet Area,New Town Action Area 1,Kolkata,Furnished,Bachelors/Family,2,Contact Owner,25000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4737,2022-07-07,3,15000,1500,Lower Basement out of 2,Super Area,Almasguda,Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Owner,13000.0
4739,2022-07-06,2,25000,1040,2 out of 4,Carpet Area,Gachibowli,Hyderabad,Unfurnished,Bachelors,2,Contact Owner,25000.0
4742,2022-05-15,3,29000,2000,1 out of 4,Super Area,"Manikonda, Hyderabad",Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Owner,13000.0
4743,2022-07-10,3,35000,1750,3 out of 5,Carpet Area,"Himayath Nagar, NH 7",Hyderabad,Semi-Furnished,Bachelors/Family,3,Contact Agent,25000.0
