# 📚**About Dataset**
The dataset obtained from scraping the website "rumah.com" contains information on property listings. The data includes the property's link, location, price, number of bedrooms, number of bathrooms, and floor area. With this dataset, one can analyze property prices and characteristics to gain insights into the real estate market represented on the website.

Rumah.com is an online platform for buying and selling properties.

**Columns**
*   nav-hlink-ref: url
*   listing-location: location of the property
* price: price of the property
* bed: number of bedroom in the property
* bath: number of bathroom in the property
* listing-floorarea: floor area in m2
* listing-floorarea 2: property's price per m2


🔗 https://www.kaggle.com/datasets/gerryzani/housing-price-in-south-tangerang-city-indonesia/data



# 🚗 Import Libraries and Data

In [160]:
import pandas as pd
import numpy as np

In [161]:
housing_df = pd.read_csv("/content/tangsel_housing.csv", encoding='latin1')

# 🏗 **STRUCTURING**

## Melihat Tipe Data

In [162]:
housing_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29420 entries, 0 to 29419
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   nav-link href        29420 non-null  object 
 1   listing-location     29420 non-null  object 
 2   price                29420 non-null  object 
 3   bed                  29282 non-null  float64
 4   bath                 29215 non-null  float64
 5   listing-floorarea    29420 non-null  object 
 6   listing-floorarea 2  29383 non-null  object 
dtypes: float64(2), object(5)
memory usage: 1.6+ MB


In [163]:
housing_df.describe()

Unnamed: 0,bed,bath
count,29282.0,29215.0
mean,3.754457,2.986137
std,1.356762,1.359788
min,1.0,1.0
25%,3.0,2.0
50%,4.0,3.0
75%,4.0,4.0
max,10.0,10.0


In [164]:
housing_df

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2
0,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten","6,9 M",5.0,5.0,420 m²,Rp 20.720.721 per m²
1,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten","4,5 M",3.0,3.0,190 m²,Rp 12.747.875 per m²
2,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten","3,95 M",3.0,3.0,132 m²,Rp 30.859.375 per m²
3,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten","3,3 M",4.0,3.0,220 m²,Rp 18.333.333 per m²
4,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten","3,5 M",3.0,2.0,180 m²,Rp 27.777.778 per m²
...,...,...,...,...,...,...,...
29415,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten","3,3 M",3.0,3.0,300 m²,Rp 21.710.526 per m²
29416,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",4 M,8.0,4.0,330 m²,Rp 36.363.636 per m²
29417,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten","2,2 M",3.0,2.0,160 m²,Rp 18.333.333 per m²
29418,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten","1,95 M",3.0,2.0,120 m²,Rp 21.666.667 per m²


nanti di bagian enriching

##Convert Floor Area

In [165]:
housing_df['floor_area'] = housing_df['listing-floorarea'].str.replace('m²','')

In [None]:
housing_df['floor_area'] = housing_df['floor_area'].astype(float)

## Convert Price Per m2

In [169]:
housing_df['price_per_m2'] = housing_df['listing-floorarea 2'].str.replace('Rp', '')
housing_df['price_per_m2'] = housing_df['price_per_m2'].str.replace('.', '')
housing_df['price_per_m2'] = housing_df['price_per_m2'].str[:-7]
housing_df['price_per_m2'] = housing_df['price_per_m2'].str.lstrip()

In [170]:
housing_df['price_per_m2'] = housing_df['price_per_m2'].astype(float)

## Convert Price into numeric

In [173]:
housing_df['price'].str.contains('Rp').sum()

np.int64(4)

In [174]:
housing_df.loc[housing_df['price'].str.contains('Rp')]

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2,floor_area,price_per_m2
613,https://www.rumah.com/listing-properti/proyek/...,"10 Jl. Ki Hajar Dewantara, Ciputat, Tangerang ...","2,2 M - Rp 2,7 M",4.0,3.0,140 m²,,140.0,
622,https://www.rumah.com/listing-properti/proyek/...,"Jl.Inpres, RT 02/ RW 09, Pamulang, Tangerang S...","875,5 jt - Rp 1,1 M",2.0,2.0,52 m²,,52.0,
623,https://www.rumah.com/listing-properti/proyek/...,"Jl Benosa, Serpong, Tangerang Selatan, Banten","1,08 M - Rp 2,3 M",2.0,2.0,69 m²,,69.0,
624,https://www.rumah.com/listing-properti/proyek/...,"99 Jalan raya pengasinan, BSD, Tangerang Selat...",800 jt - Rp 1 M,3.0,2.0,63 m²,,63.0,


In [175]:
housing_df.loc[613, 'price'] = '2,45 M'
housing_df.loc[622, 'price'] = '987,75 jt'
housing_df.loc[623, 'price'] = '1,69 M'
housing_df.loc[624, 'price'] = '900 jt'

In [176]:
housing_df.loc[housing_df['price'].str.contains('rb')]

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2,floor_area,price_per_m2
6175,https://www.rumah.com/listing-properti/dijual-...,"Pamulang, Tangerang Selatan, Banten",875 rb,3.0,2.0,70 m²,Rp 14.583 per m²,70.0,14583.0
8497,https://www.rumah.com/listing-properti/dijual-...,"Serpong, Tangerang Selatan, Banten",300 rb,3.0,1.0,46 m²,Rp 5.000 per m²,46.0,5000.0
8622,https://www.rumah.com/listing-properti/dijual-...,"Serpong, Tangerang Selatan, Banten",600 rb,3.0,1.0,66 m²,Rp 10.000 per m²,66.0,10000.0
11760,https://www.rumah.com/listing-properti/dijual-...,"Pamulang, Tangerang Selatan, Banten",1 rb,2.0,1.0,10 m²,Rp 100 per m²,10.0,100.0
12720,https://www.rumah.com/listing-properti/dijual-...,"Pamulang, Tangerang Selatan, Banten",1 rb,2.0,1.0,10 m²,Rp 100 per m²,10.0,100.0
13288,https://www.rumah.com/listing-properti/dijual-...,"Jl beruang 2 bintaro, Bintaro, Tangerang Selat...",675 rb,2.0,2.0,50 m²,Rp 15.000 per m²,50.0,15000.0
13557,https://www.rumah.com/listing-properti/dijual-...,"Pondok benda pamulang, Pamulang, Tangerang Sel...",800 rb,3.0,2.0,88 m²,Rp 11.429 per m²,88.0,11429.0
14477,https://www.rumah.com/listing-properti/dijual-...,"Jln raya sengkol, Pamulang, Tangerang Selatan,...",750 rb,2.0,1.0,50 m²,Rp 9.868 per m²,50.0,9868.0
16585,https://www.rumah.com/listing-properti/dijual-...,"Serpong, Tangerang Selatan, Banten",900 rb,3.0,1.0,60 m²,Rp 10.000 per m²,60.0,10000.0
17032,https://www.rumah.com/listing-properti/dijual-...,"Pamulang Raya, Pamulang, Tangerang Selatan, Ba...",900 rb,3.0,2.0,54 m²,Rp 7.500 per m²,54.0,7500.0


In [178]:
housing_df['price_num'] = housing_df['price'].str.lower()
housing_df['price_num'] = housing_df['price_num'].str.replace('rp', '')
housing_df['price_num'] = housing_df['price_num'].str.replace(',', '.')

In [179]:
housing_df['price_num']

Unnamed: 0,price_num
0,6.9 m
1,4.5 m
2,3.95 m
3,3.3 m
4,3.5 m
...,...
29415,3.3 m
29416,4 m
29417,2.2 m
29418,1.95 m


In [180]:
for i in range(len(housing_df)):
  if 'm' in housing_df.loc[i,'price_num']:
    housing_df.loc[i, 'price_num'] = housing_df.loc[i,'price_num'].replace('m', '')
    housing_df.loc[i, 'price_num'] = float(housing_df.loc[i,'price_num']) * 1000000000
  elif 'jt' in housing_df.loc[i,'price_num']:
    housing_df.loc[i, 'price_num'] = housing_df.loc[i,'price_num'].replace('jt', '')
    housing_df.loc[i, 'price_num'] = float(housing_df.loc[i,'price_num']) * 1000000
  elif 'rb' in housing_df.loc[i,'price_num']:
    housing_df.loc[i, 'price_num'] = housing_df.loc[i,'price_num'].replace('rb', '')
    housing_df.loc[i, 'price_num'] = float(housing_df.loc[i,'price_num']) * 1000

In [181]:
housing_df['price_num'] = housing_df['price_num'].astype(float)

In [182]:
housing_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29420 entries, 0 to 29419
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   nav-link href        29420 non-null  object 
 1   listing-location     29420 non-null  object 
 2   price                29420 non-null  object 
 3   bed                  29282 non-null  float64
 4   bath                 29215 non-null  float64
 5   listing-floorarea    29420 non-null  object 
 6   listing-floorarea 2  29383 non-null  object 
 7   floor_area           29420 non-null  float64
 8   price_per_m2         29383 non-null  float64
 9   price_num            29420 non-null  float64
dtypes: float64(5), object(5)
memory usage: 2.2+ MB


# 🧹**CLEANING**

## Handling Duplicate Data

In [183]:
housing_df.duplicated().sum()

np.int64(5241)

In [184]:
housing_df.drop_duplicates(inplace=True)

## Handling Missing Value

In [185]:
housing_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 24179 entries, 0 to 29419
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   nav-link href        24179 non-null  object 
 1   listing-location     24179 non-null  object 
 2   price                24179 non-null  object 
 3   bed                  24069 non-null  float64
 4   bath                 24016 non-null  float64
 5   listing-floorarea    24179 non-null  object 
 6   listing-floorarea 2  24145 non-null  object 
 7   floor_area           24179 non-null  float64
 8   price_per_m2         24145 non-null  float64
 9   price_num            24179 non-null  float64
dtypes: float64(5), object(5)
memory usage: 2.0+ MB


In [186]:
housing_df.isna().sum()

Unnamed: 0,0
nav-link href,0
listing-location,0
price,0
bed,110
bath,163
listing-floorarea,0
listing-floorarea 2,34
floor_area,0
price_per_m2,34
price_num,0


In [187]:
housing_df.loc[housing_df['bed'].isna()]

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2,floor_area,price_per_m2,price_num
52,https://www.rumah.com/listing-properti/dijual-...,"Jl Jombang Raya Bintaro Ciputat, Ciputat, Tang...","1,43 M",,,94 m²,Rp 20.140.845 per m²,94.0,20140845.0,1.430000e+09
285,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",2 M,,,162 m²,Rp 18.181.818 per m²,162.0,18181818.0,2.000000e+09
328,https://www.rumah.com/listing-properti/dijual-...,"Plumeria GR 23, Pondok Aren, Tangerang Selatan...","1,2 M",,,69 m²,Rp 20.000.000 per m²,69.0,20000000.0,1.200000e+09
633,https://www.rumah.com/listing-properti/dijual-...,"BSD, Tangerang Selatan, Banten","19,5 M",,,151 m²,Rp 29.862.175 per m²,151.0,29862175.0,1.950000e+10
923,https://www.rumah.com/listing-properti/dijual-...,"Jl. Discovery Aluvia, Pondok Aren, Tangerang S...","3,159 M",,,110 m²,Rp 35.100.000 per m²,110.0,35100000.0,3.159000e+09
...,...,...,...,...,...,...,...,...,...,...
27026,https://www.rumah.com/listing-properti/dijual-...,"Perumahan bukit mas, Bintaro, Tangerang Selata...",6 M,,,275 m²,Rp 12.000.000 per m²,275.0,12000000.0,6.000000e+09
27029,https://www.rumah.com/listing-properti/dijual-...,"Ciputat, Tangerang Selatan, Banten","1,8 M",,,130 m²,Rp 12.676.056 per m²,130.0,12676056.0,1.800000e+09
27150,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten","6,5 M",,,300 m²,Rp 18.258.427 per m²,300.0,18258427.0,6.500000e+09
27233,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",45 M,,,3000 m²,Rp 35.828.025 per m²,3000.0,35828025.0,4.500000e+10


In [188]:
housing_df['bed'] = housing_df['bed'].fillna(0)
housing_df['bath'] = housing_df['bath'].fillna(0)

In [189]:
housing_df.loc[housing_df['price_per_m2'].isna()]

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2,floor_area,price_per_m2,price_num
474,https://www.rumah.com/listing-properti/dijual-...,"jl lengkong wetan, Serpong, Tangerang Selatan,...","1,5 M",4.0,3.0,72 m²,,72.0,,1500000000.0
612,https://www.rumah.com/listing-properti/proyek/...,Jl. Pinang Raya RT. 002 RW. 022 Pamulang Timur...,"842,751837 jt",2.0,1.0,36 m²,,36.0,,842751800.0
613,https://www.rumah.com/listing-properti/proyek/...,"10 Jl. Ki Hajar Dewantara, Ciputat, Tangerang ...","2,45 M",4.0,3.0,140 m²,,140.0,,2450000000.0
614,https://www.rumah.com/listing-properti/proyek/...,"33 Pdk.cempaka Rt.03,Rw.10, Rempoa, Ciputat Ti...","1,9 M",3.0,2.0,84 m²,,84.0,,1900000000.0
616,https://www.rumah.com/listing-properti/proyek/...,"12 Jl Raya SouthCity Utara, Lot 5, Pamulang, T...","1,58286 M",3.0,3.0,71 m²,,71.0,,1582860000.0
622,https://www.rumah.com/listing-properti/proyek/...,"Jl.Inpres, RT 02/ RW 09, Pamulang, Tangerang S...","987,75 jt",2.0,2.0,52 m²,,52.0,,987750000.0
623,https://www.rumah.com/listing-properti/proyek/...,"Jl Benosa, Serpong, Tangerang Selatan, Banten","1,69 M",2.0,2.0,69 m²,,69.0,,1690000000.0
624,https://www.rumah.com/listing-properti/proyek/...,"99 Jalan raya pengasinan, BSD, Tangerang Selat...",900 jt,3.0,2.0,63 m²,,63.0,,900000000.0
625,https://www.rumah.com/listing-properti/proyek/...,"26 Jl. Kunir, Pd. Cabe Udik, Pamulang, Tangera...","1,7362 M",4.0,4.0,126 m²,,126.0,,1736200000.0
1224,https://www.rumah.com/listing-properti/dijual-...,"Jl. Amd V, Bintaro, Tangerang Selatan, Banten","959,8 jt",3.0,3.0,65 m²,,65.0,,959800000.0


In [190]:
housing_df['price_per_m2'] = housing_df['price_per_m2'].fillna(housing_df['price_num']/housing_df['floor_area'])
#diambil perkiraan harga per m2 dengan membagi harga jual dengan luas bangunan

##Handling Outlier

In [191]:
housing_df.describe()

Unnamed: 0,bed,bath,floor_area,price_per_m2,price_num
count,24179.0,24179.0,24179.0,24179.0,24179.0
mean,3.740353,2.971256,211.664213,33055320.0,3596452000.0
std,1.374916,1.376736,2721.6283,528301700.0,7259004000.0
min,0.0,0.0,1.0,0.0,1.0
25%,3.0,2.0,80.0,13250000.0,1350000000.0
50%,4.0,3.0,140.0,18400000.0,2500000000.0
75%,4.0,4.0,237.0,24285710.0,4350000000.0
max,10.0,10.0,400000.0,29375000000.0,850000000000.0


In [192]:
#mengecek tanah dengan harga per m2 di bawah 1 juta rupiah
housing_df.loc[housing_df['price_per_m2']<1000000]

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2,floor_area,price_per_m2,price_num
92,https://www.rumah.com/listing-properti/dijual-...,"Graha raya, Bintaro, Tangerang Selatan, Banten",120 jt,4.0,2.0,170 m²,Rp 666.667 per m²,170.0,666667.0,120000000.0
102,https://www.rumah.com/listing-properti/dijual-...,"Pamulang, Tangerang Selatan, Banten",630,2.0,1.0,50 m²,Rp 7 per m²,50.0,7.0,630.0
219,https://www.rumah.com/listing-properti/dijual-...,"Malibu village gading serpong, Gading Serpong,...",45 jt,3.0,2.0,55 m²,Rp 937.500 per m²,55.0,937500.0,45000000.0
319,https://www.rumah.com/listing-properti/dijual-...,"Jl. Hasyim Ashari, Cipondoh, Ciledug, Tangeran...",50 jt,2.0,2.0,50 m²,Rp 833.333 per m²,50.0,833333.0,50000000.0
526,https://www.rumah.com/listing-properti/dijual-...,"BSD, Tangerang Selatan, Banten",75 jt,3.0,3.0,144 m²,Rp 625.000 per m²,144.0,625000.0,75000000.0
...,...,...,...,...,...,...,...,...,...,...
28893,https://www.rumah.com/listing-properti/dijual-...,"Pamulang, Tangerang Selatan, Banten",800,3.0,2.0,65 m²,Rp 13 per m²,65.0,13.0,800.0
29238,https://www.rumah.com/listing-properti/dijual-...,"Pondok Aren, Tangerang Selatan, Banten",90 jt,6.0,4.0,200 m²,Rp 545.455 per m²,200.0,545455.0,90000000.0
29326,https://www.rumah.com/listing-properti/dijual-...,"Townhouse puyuh barat, Bintaro, Tangerang Sela...","2,799 jt",4.0,2.0,275 m²,Rp 13.995 per m²,275.0,13995.0,2799000.0
29329,https://www.rumah.com/listing-properti/dijual-...,"Jl. Puyuh barat IV, Bintaro, Tangerang Selatan...","2,799 jt",4.0,3.0,220 m²,Rp 17.828 per m²,220.0,17828.0,2799000.0


In [193]:
#diputuskan menghapus baris dengan harga per m2 di bawah 1 juta rupiah
housing_df.drop(housing_df.loc[housing_df['price_per_m2']<1000000].index, inplace=True)

In [194]:
housing_df.loc[housing_df['price_per_m2']>5.0e+07]

Unnamed: 0,nav-link href,listing-location,price,bed,bath,listing-floorarea,listing-floorarea 2,floor_area,price_per_m2,price_num
112,https://www.rumah.com/listing-properti/dijual-...,"Gramercy Alam Sutera, Alam Sutera, Tangerang S...","21,1 M",6.0,7.0,451 m²,Rp 50.238.095 per m²,451.0,5.023810e+07,2.110000e+10
157,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",17 M,5.0,4.0,502 m²,Rp 55.194.805 per m²,502.0,5.519480e+07,1.700000e+10
203,https://www.rumah.com/listing-properti/dijual-...,"jl. BSD boulevard utara, BSD, Tangerang Selata...","21,749 M",5.0,5.0,544 m²,Rp 57.997.333 per m²,544.0,5.799733e+07,2.174900e+10
243,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",33 M,5.0,3.0,198 m²,Rp 125.000.000 per m²,198.0,1.250000e+08,3.300000e+10
423,https://www.rumah.com/listing-properti/dijual-...,"BSD, Tangerang Selatan, Banten",1.450 M,3.0,2.0,88 m²,Rp 25.892.857.143 per m²,88.0,2.589286e+10,1.450000e+09
...,...,...,...,...,...,...,...,...,...,...
27820,https://www.rumah.com/listing-properti/dijual-...,"Navapark BSD, BSD, Tangerang Selatan, Banten",15 M,4.0,3.0,300 m²,Rp 75.000.000 per m²,300.0,7.500000e+07,1.500000e+10
27961,https://www.rumah.com/listing-properti/dijual-...,"Lakewood, BSD, Tangerang Selatan, Banten",12 M,7.0,5.0,337 m²,Rp 54.545.455 per m²,337.0,5.454546e+07,1.200000e+10
27994,https://www.rumah.com/listing-properti/dijual-...,"Navapark, BSD, Tangerang Selatan, Banten",23 M,5.0,4.0,653 m²,Rp 61.333.333 per m²,653.0,6.133333e+07,2.300000e+10
29134,https://www.rumah.com/listing-properti/dijual-...,"BSD, Tangerang Selatan, Banten","8,9 M",4.0,5.0,320 m²,Rp 69.531.250 per m²,320.0,6.953125e+07,8.900000e+09


# ✨**ENRICHING**

Menambahkan kecamatan lokasi bangunan tersebut dijual yang diperoleh dari kolom listing-location

In [195]:
housing_df['listing-location']

Unnamed: 0,listing-location
0,"Gading Serpong, Tangerang Selatan, Banten"
1,"Gading Serpong, Tangerang Selatan, Banten"
2,"Gading Serpong, Tangerang Selatan, Banten"
3,"Gading Serpong, Tangerang Selatan, Banten"
4,"Gading Serpong, Tangerang Selatan, Banten"
...,...
29415,"Bintaro, Tangerang Selatan, Banten"
29416,"Bintaro, Tangerang Selatan, Banten"
29417,"Bintaro, Tangerang Selatan, Banten"
29418,"Bintaro, Tangerang Selatan, Banten"


In [204]:
location_df = housing_df['listing-location'].str.lower()
location_df = location_df.str.replace(', banten','')
location_df = location_df.str.replace(', tangerang selatan','')
location_df = pd.DataFrame(location_df)

In [197]:
list(location_df.unique())

['gading serpong',
 'bsd',
 'jalan cempaka, bintaro',
 'alam sutera',
 'jl gading serpong, gading serpong',
 'bintaro',
 'jl gading sektor, gading serpong',
 'jl sektor gading serpong, gading serpong',
 'serpong',
 'jl cluster berryl, gading serpong',
 'graha carissa, serpong utara',
 'cluster graha raya bintaro, bintaro',
 'sektor 9, bintaro',
 'bintaro, bintaro',
 'graha bintaro, bintaro',
 'cluster strozzi symphonia gading serpong, bsd',
 'jln pandawa, pamulang',
 'pondok benda, pamulang',
 'serpong paradise city, serpong',
 'jl jombang raya bintaro ciputat, ciputat',
 'lele 5, pamulang',
 'bsd, bsd',
 'alegria park, bsd city, bsd',
 'jalan pondok aren, pondok aren',
 'foresta bsd, bsd',
 'ciputat timur',
 'ciputat',
 'jl. semeru 1 a2 no. 17, jombang, ciputat, ciputat',
 'graha raya, serpong utara',
 'serpong, serpong',
 'jl wortel no 7 sektor 1.6 griya loka, serpong',
 'gading serpong, pondok aren',
 'gading serpong, pamulang',
 'sektor 6, gading serpong',
 'volta, gading serpong',

In [207]:
location_df

Unnamed: 0,listing-location,subdistrict
0,gading serpong,gading serpong
1,gading serpong,gading serpong
2,gading serpong,gading serpong
3,gading serpong,gading serpong
4,gading serpong,gading serpong
...,...,...
29415,bintaro,bintaro
29416,bintaro,bintaro
29417,bintaro,bintaro
29418,bintaro,bintaro


In [206]:
location_df['subdistrict'] = location_df['listing-location'].apply(
    lambda x: x.rpartition(',')[-1].strip() if isinstance(x, str) else None
)

In [209]:
housing_df['subdistrict'] = location_df['subdistrict']

In [211]:
housing_df = housing_df[['nav-link href','listing-location','subdistrict','price_num','floor_area','price_per_m2','bed','bath']]

# ✅ **VALIDATION**

In [212]:
housing_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 23943 entries, 0 to 29419
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   nav-link href     23943 non-null  object 
 1   listing-location  23943 non-null  object 
 2   subdistrict       23943 non-null  object 
 3   price_num         23943 non-null  float64
 4   floor_area        23943 non-null  float64
 5   price_per_m2      23943 non-null  float64
 6   bed               23943 non-null  float64
 7   bath              23943 non-null  float64
dtypes: float64(5), object(3)
memory usage: 1.6+ MB


In [213]:
housing_df.isnull().sum()

Unnamed: 0,0
nav-link href,0
listing-location,0
subdistrict,0
price_num,0
floor_area,0
price_per_m2,0
bed,0
bath,0


In [214]:
housing_df.duplicated().sum()

np.int64(0)

In [215]:
housing_df.describe()

Unnamed: 0,price_num,floor_area,price_per_m2,bed,bath
count,23943.0,23943.0,23943.0,23943.0,23943.0
mean,3631614000.0,212.433321,33379750.0,3.746857,2.978866
std,7285999000.0,2734.818979,530889000.0,1.374474,1.374513
min,48000000.0,1.0,1000000.0,0.0,0.0
25%,1400000000.0,80.0,13333330.0,3.0,2.0
50%,2500000000.0,142.0,18518520.0,4.0,3.0
75%,4400000000.0,240.0,24333330.0,4.0,4.0
max,850000000000.0,400000.0,29375000000.0,10.0,10.0


In [216]:
housing_df

Unnamed: 0,nav-link href,listing-location,subdistrict,price_num,floor_area,price_per_m2,bed,bath
0,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten",gading serpong,6.900000e+09,420.0,20720721.0,5.0,5.0
1,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten",gading serpong,4.500000e+09,190.0,12747875.0,3.0,3.0
2,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten",gading serpong,3.950000e+09,132.0,30859375.0,3.0,3.0
3,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten",gading serpong,3.300000e+09,220.0,18333333.0,4.0,3.0
4,https://www.rumah.com/listing-properti/dijual-...,"Gading Serpong, Tangerang Selatan, Banten",gading serpong,3.500000e+09,180.0,27777778.0,3.0,2.0
...,...,...,...,...,...,...,...,...
29415,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",bintaro,3.300000e+09,300.0,21710526.0,3.0,3.0
29416,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",bintaro,4.000000e+09,330.0,36363636.0,8.0,4.0
29417,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",bintaro,2.200000e+09,160.0,18333333.0,3.0,2.0
29418,https://www.rumah.com/listing-properti/dijual-...,"Bintaro, Tangerang Selatan, Banten",bintaro,1.950000e+09,120.0,21666667.0,3.0,2.0
