# <CENTER>**`II.` ` DATA CLEANING ` - Gurugram Real Estate Project**<CENTER>

---

#### **OBJECTIVE:**  
This notebook focuses on cleaning and preparing the **raw real estate data** gathered from multiple web sources for further analysis and modeling.  
The raw dataset contains property listings from various websites, with attributes such as **sector**, **price**, **area**, **furnishing**, **amenities**, **seller details**, and **other property-specific features**.

#### **Key Tasks Performed:**
1. Handling missing, inconsistent, and duplicate records.  
2. Standardizing text fields (e.g., sector names, furnishing types, property categories).  
3. Converting numeric columns to proper data types and removing formatting artifacts (like commas, currency symbols, etc.).  
4. Deriving clean, consistent feature names and units for downstream use.  
5. Removing outliers and logically invalid entries (e.g., zero or negative area/price).  

#### **Tools & Libraries Used:**
- **Pandas:** For data manipulation, cleaning operations, and structured transformations.  
- **NumPy:** For numerical operations and efficient array-based computations.  

---

### **`Outcome:`**  
A clean, well-structured dataset ready for **Exploratory Data Analysis (EDA)** and **Feature Engineering**, ensuring consistency, completeness, and integrity of the data used in the modeling pipeline.

---


In [29]:
## Importing necessary data manipulation tools:
import pandas as pd
import numpy as np

pd.options.display.max_columns = None

In [31]:
df = pd.read_csv('Housing_Listings_all_records_(numbers)_FINAL.csv', index_col = 'Unnamed: 0')
df.rename(columns = {'Name':'Flat', 'Vaastu':'Facing'}, inplace = True)
df.head(3)

Unnamed: 0,Flat,Address,Seller_Builder,EMI,Built_Up_Area,Avg_Price,Age_of_property,Possession_status,Floor,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
0,2 BHK Flat,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹36.01 K,593 sq.ft,₹11.47 K/sq.ft,1 Years Old,Ready to move,12 of 14,East facing,Semi Furnished,Pyramid Elite,68000,68.0 L,2,2,1 Open Parking,1,More than a month ago,"['Lift', 'Power Backup', 'Garden', 'Sports', '...","[['School', ""St. Xavier's High School""], ['Hos...",Looking for a 2 BHK Flat for sale in Gurgaon? ...,https://housing.com/in/buy/resale/page/1761033...
1,2 BHK Flat,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹33.36 K,690 sq.ft,₹9.13 K/sq.ft,1 Years Old,Ready to move,4 of 15,-,Unfurnished,Pyramid Elite,63000,63.0 L,2,2,No Parking,1,17 days ago,,"[['School', ""St. Xavier's High School""], ['Hos...",Best 2 BHK Flat for modern-day lifestyle is no...,https://housing.com/in/buy/resale/page/1681859...
2,2 BHK Flat,"Experion The Heartsong, Sector 108, Gurgaon",Experion Developers,EMI starts at ₹74.47 K,1000 sq.ft,₹15 K/sq.ft,2 Year Old,Ready to move,9 of 26,North-East facing,Semi Furnished,Experion The Heartsong,1.5 Lacs,1.5 Cr,2,2,1 Covered and 1 Open Parking,3,More than a month ago,"['Amphitheater', 'Cricket Pitch', 'Gazebo', 'S...","[['School', 'The Shikshiyan School'], ['Hospit...","2 BHK Flat for sale in Sector 108, Gurgaon - c...",https://housing.com/in/buy/resale/page/1767484...


### Using Address column to create Sector column:

In [34]:
## Adding Sector Column to the DF:
def func_1(x):
    if x.find('Sector ') != -1:
        x = x.split(', ')
        for i in range(len(x)):
            if x[i][0:7] == 'Sector ':
                temp = x[i]
                res = temp.split(" ", 2)[0] + " " + temp.split(" ", 2)[1]
                return res
    else:
        return '-'
        
value = df['Address'].apply(func_1)
value[3302] = 'Sector 15'    ## Anomaly

df.insert(1, 'Sector', value, allow_duplicates=False)
df.head(3)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,Built_Up_Area,Avg_Price,Age_of_property,Possession_status,Floor,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
0,2 BHK Flat,Sector 86,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹36.01 K,593 sq.ft,₹11.47 K/sq.ft,1 Years Old,Ready to move,12 of 14,East facing,Semi Furnished,Pyramid Elite,68000,68.0 L,2,2,1 Open Parking,1,More than a month ago,"['Lift', 'Power Backup', 'Garden', 'Sports', '...","[['School', ""St. Xavier's High School""], ['Hos...",Looking for a 2 BHK Flat for sale in Gurgaon? ...,https://housing.com/in/buy/resale/page/1761033...
1,2 BHK Flat,Sector 86,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹33.36 K,690 sq.ft,₹9.13 K/sq.ft,1 Years Old,Ready to move,4 of 15,-,Unfurnished,Pyramid Elite,63000,63.0 L,2,2,No Parking,1,17 days ago,,"[['School', ""St. Xavier's High School""], ['Hos...",Best 2 BHK Flat for modern-day lifestyle is no...,https://housing.com/in/buy/resale/page/1681859...
2,2 BHK Flat,Sector 108,"Experion The Heartsong, Sector 108, Gurgaon",Experion Developers,EMI starts at ₹74.47 K,1000 sq.ft,₹15 K/sq.ft,2 Year Old,Ready to move,9 of 26,North-East facing,Semi Furnished,Experion The Heartsong,1.5 Lacs,1.5 Cr,2,2,1 Covered and 1 Open Parking,3,More than a month ago,"['Amphitheater', 'Cricket Pitch', 'Gazebo', 'S...","[['School', 'The Shikshiyan School'], ['Hospit...","2 BHK Flat for sale in Sector 108, Gurgaon - c...",https://housing.com/in/buy/resale/page/1767484...


### Converting EMI from string to integer format:

In [37]:
## Adding "EMI in Rupees" Column to the DF:
def func_2(x):
    if x[1] == 'K':
        return int(float(x[0][1:]) * 1000)
    else:
        return int(float(x[0][1:]) * 100000)

value = df['EMI'].apply(lambda x: x.split('EMI starts at ')[1].split(' ')).apply(func_2)

df.insert(5, 'EMI_in_rupees', value, allow_duplicates=False)
df.sample(5)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Avg_Price,Age_of_property,Possession_status,Floor,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
3270,3 BHK Flat,Sector 106,"Godrej Meridien Phase II, Panwala Khusropur, S...",Godrej Properties Ltd.,EMI starts at ₹2.43 Lacs,243000,1503 sq.ft,₹32.53 K/sq.ft,1 Years Old,Ready to move,51 of 60,North-West facing,Semi Furnished,Godrej Meridien Phase II,No Charge,4.89 Cr,3,3,1 Covered and 1 Open Parking,2,6 days ago,"['Swimming Pool', 'Amphitheater', 'Spa', 'Skat...","[['School', 'Euro International School, Sector...","Looking for a good 3 BHK Flat in Sector 106, G...",https://housing.com/in/buy/resale/page/1790738...
4185,3 BHK Flat,Sector 68,"Pareena Mi Casa, Sector 68, Gurgaon",Pareena Infrastructure Builders,EMI starts at ₹1.09 Lacs,109000,1705 sq.ft,₹12.9 K/sq.ft,2 Year Old,Ready to move,18 of 34,North facing,Semi Furnished,Pareena Mi Casa,2.1 Lacs,2.2 Cr,3,3,1 Covered Parking,3,More than a month ago,"['AC', 'Cupboard', 'Geyser', 'Power Backup', '...","[['School', 'The Vivekananda School - Sector 6...","Property for sale in Sector 68, Gurgaon. This ...",https://housing.com/in/buy/resale/page/1747487...
4383,3 BHK Flat,Sector 37C,"Imperia Esfera, Sector 37C, Gurgaon",Imperia Structures Ltd,EMI starts at ₹69.5 K,69500,1815 sq.ft,₹7.71 K/sq.ft,1 Years Old,Ready to move,13 of 25,North-East facing,Unfurnished,Imperia Esfera,No Charge,1.4 Cr,3,3,1 Covered and 1 Open Parking,4,More than a month ago,"['Swimming Pool', 'Gym', 'Lift', 'Power Backup...","[['School', 'Euro International School, Sector...",Best 3 BHK Flat for modern-day lifestyle is no...,https://housing.com/in/buy/resale/page/1758191...
4226,3 BHK Flat,Sector 37C,"Corona Optus, Sector 37C, Gurgaon",Corona Group,EMI starts at ₹1.12 Lacs,112000,1763 sq.ft,₹12.76 K/sq.ft,6 Year Old,Ready to move,7 of 14,-,Semi Furnished,Corona Optus,2.3 Lacs,2.25 Cr,3,3,1 Covered Parking,5,More than a month ago,"['Amphitheater', 'Cricket Pitch', 'Volleyball ...","[['School', 'Euro International School, Sector...",Best 3 BHK Flat for modern-day lifestyle is no...,https://housing.com/in/buy/resale/page/1123756...
3514,3 BHK Flat,Sector 109,"ATS Kocoon, Sector 109, Gurgaon",ATS Infrastructure Limited,EMI starts at ₹1.17 Lacs,117000,1745 sq.ft,₹13.47 K/sq.ft,7 Year Old,Ready to move,7 of 24,North-East facing,Semi Furnished,ATS Kocoon,2.4 Lacs,2.35 Cr,3,3,1 Covered and 1 Open Parking,3,More than a month ago,"['Water Purifier', 'Gas Pipeline', 'AC', 'Powe...","[['School', 'Euro International School, Sector...","3 BHK Flat for sale in Babupur Village, Gurgao...",https://housing.com/in/buy/resale/page/1751998...


### Converting Built-up Area for properties given in string format to Integer format:

In [40]:
## Adding "Built_up_area_in_sqft" Column to the DF:
df.insert(7, 'Built_up_area_in_sqft' , df['Built_Up_Area'].apply(lambda x:int( x.split(' ')[0])).astype('int32'))
df.sample(5)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Age_of_property,Possession_status,Floor,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
6560,4 BHK Flat,-,"Tulip Purple, Tulip Violet Society, Gurgaon",Tulip Infratech Pvt Ltd,EMI starts at ₹1.29 Lacs,129000,2400 sq.ft,2400,₹10.83 K/sq.ft,5 Year Old,Ready to move,5 of 15,North-East facing,Semi Furnished,Tulip Purple,2.6 Lacs,2.6 Cr,4,2,2 Covered Parking,4,23 days ago,"['Gas Pipeline', 'AC', 'Pet allowed']","[['School', 'The Vivekananda School'], ['Hospi...","4 BHK Flat for sale in Sector 69, Gurgaon with...",https://housing.com/in/buy/resale/page/1670875...
1277,1 BHK Flat,Sector 33,"Breez Global Heights, Sector 33, Sohna, Gurgaon",Breez Builders and Developers Pvt. Ltd.\t,EMI starts at ₹18.53 K,18530,394 sq.ft,394,₹8.88 K/sq.ft,4 Year Old,Ready to move,-,-,Unfurnished,Breez Global Heights,No Charge,35.0 L,1,1,1 Covered Parking,1,More than a month ago,"['Lift', 'Garden', 'Sports', 'Kids Area', 'CCT...","[['School', 'GD GOENKA SIGNATURE SCHOOL'], ['H...","Looking for a good 1 BHK Flat in Sector 33, Gu...",https://housing.com/in/buy/resale/page/1762297...
2635,3 BHK Flat,Sector 37D,"Sector 37D, Gurgaon",,EMI starts at ₹96.81 K,96810,1521 sq.ft,1521,₹12.82 K/sq.ft,10 Year Old,Ready to move,5 of 20,-,Semi Furnished,-,1.9 Lacs,1.95 Cr,3,2,1 Covered and 1 Open Parking,3,More than a month ago,,"[['School', 'Euro International School, Sector...","3 BHK Flat for sale in Sector 37 D, Gurgaon - ...",https://housing.com/in/buy/resale/page/1758413...
2779,3.5 BHK Flat,Sector 89,"Smart World Gems, Sector 89, Gurgaon",Smartworld Developers,EMI starts at ₹79.43 K,79430,1503 sq.ft,1503,₹10.64 K/sq.ft,-,Ready to move,1 of 4,-,Unfurnished,Smart World Gems,1.6 Lacs,1.6 Cr,4,3,1 Covered and 1 Open Parking,2,9 days ago,"['Swimming Pool', 'Gym', 'Lift', 'Power Backup...","[['School', 'Delhi Public School, Sector 84'],...",Looking for a 3.5 BHK Flat for sale in Gurgaon...,https://housing.com/in/buy/resale/page/1703354...
6462,3 BHK Flat,Sector 72,"TATA Primanti Uberluxe, Sector 72, Gurgaon",Tata Realty and Infrastructure Limited,EMI starts at ₹2.46 Lacs,246000,2550 sq.ft,2550,₹19.41 K/sq.ft,5 Year Old,Ready to move,6 of 44,North facing,Semi Furnished,TATA Primanti Uberluxe,5 Lacs,4.95 Cr,3,3,2 Covered Parking,3,2 days ago,"['Gas Pipeline', 'Cupboard', 'Servant Room', '...","[['School', 'CD International School'], ['Hosp...","Looking for a good 3 BHK Flat in Sector 72, Gu...",https://housing.com/in/buy/resale/page/1793560...


### Converting Price Density for properties given in string format to Integer format:

In [43]:
## Adding "Avg_price_rupee_per_sqft" Column to the DF:
def func_3(x):
    if x[-1] == 'nan':
        return np.nan
    elif x[-1] == 'K/sq.ft':
        return int(float(x[0][1:]) * 1000)
    else:
        return int(float(x[0][1:]) * 100000)

    
value = df['Avg_Price'].astype('str').apply(lambda j: j.split(' ')).apply(func_3)
df.insert(9, 'Avg_price_rupee_per_sqft' , value)
df.sample(5)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Avg_price_rupee_per_sqft,Age_of_property,Possession_status,Floor,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
5749,3.5 BHK Flat,Sector 88A,"Godrej Icon, Sector 88A, Gurgaon",Godrej Properties Ltd.,EMI starts at ₹1.2 Lacs,120000,2170 sq.ft,2170,₹11.15 K/sq.ft,11150.0,4 Year Old,Ready to move,12 of 15,-,Semi Furnished,Godrej Icon,2.4 Lacs,2.42 Cr,4,4,2 Covered Parking,3,More than a month ago,"['AC', 'Cupboard', 'Geyser', 'Pet allowed']","[['School', 'Euro International School, Sector...","Property for sale in Sector 88A, Gurgaon. This...",https://housing.com/in/buy/resale/page/1753373...
829,2 BHK Flat,Sector 89A,"Adani Aangan, Sector 89A, Gurgaon",Adani Realty,EMI starts at ₹39.72 K,39720,745 sq.ft,745,₹10.07 K/sq.ft,10070.0,4 Year Old,Ready to move,7 of 14,North-East facing,Semi Furnished,Adani Aangan,75000,75.0 L,2,2,1 Open Parking,2,8 days ago,"['Lift', 'Power Backup', 'Garden', 'Sports', '...","[['School', 'Saraswati Model School'], ['Hospi...",2 BHK Flat for sale in Gurgaon. This property ...,https://housing.com/in/buy/resale/page/1788820...
5848,3 BHK Flat,Sector 92,"Bestech Park View Sanskruti, Sector 92, Gurgaon",Bestech India Pvt Ltd,EMI starts at ₹1.28 Lacs,128000,2120 sq.ft,2120,₹12.12 K/sq.ft,12120.0,6 Year Old,Ready to move,8 of 20,North facing,Semi Furnished,Bestech Park View Sanskruti,2.5 Lacs,2.57 Cr,3,3,1 Covered Parking,3,25 days ago,"['Swimming Pool', 'Gym', 'Lift', 'Intercom', '...","[['School', 'RPS International School'], ['Hos...","Looking for a good 3 BHK Flat in Sector 92, Gu...",https://housing.com/in/buy/resale/page/1774704...
3961,3 BHK Flat,Sector 106,"Godrej Meridien, Panwala Khusropur, Sector 106...",Godrej Properties Ltd.,EMI starts at ₹1.39 Lacs,139000,1855 sq.ft,1855,₹15.09 K/sq.ft,15090.0,1 Years Old,Ready to move,11 of 23,North facing,Semi Furnished,Godrej Meridien,2.8 Lacs,2.8 Cr,3,3,1 Covered and 1 Open Parking,4,22 days ago,"['Swimming Pool', 'Gym', 'Lift', 'Garden', 'Sp...","[['School', 'Euro International School, Sector...","A 3 BHK Flat for sale in Sector 106, Gurgaon. ...",https://housing.com/in/buy/resale/page/1776739...
76,3 BHK Flat,Sector 36,"Signature Global Park 4 And 5, Sector 36 Sohna...",Signature Global Builders Pvt. Ltd.,EMI starts at ₹51.63 K,51630,1000 sq.ft,1000,₹10.4 K/sq.ft,10400.0,1 Years Old,Ready to move,1 of 4,North-East facing,Semi Furnished,Signature Global Park 4 And 5,1 Lac,1.04 Cr,3,2,1 Covered Parking,4,20 days ago,"['Stove', 'Gas Pipeline', 'AC', 'Cupboard', 'G...","[['School', 'GD GOENKA SIGNATURE SCHOOL'], ['M...",Looking for a 3 BHK Flat for sale in Gurgaon? ...,https://housing.com/in/buy/resale/page/1728792...


### Converting Proeprty Age feature given in string format to Integer format:

In [46]:
## Adding "Age_of_property_in_years" Column to the DF:
value = df['Age_of_property'].apply(lambda x: x.split(' ')[0])

df.insert(11, 'Age_of_property_in_years', value)
df.sample(3)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Avg_price_rupee_per_sqft,Age_of_property,Age_of_property_in_years,Possession_status,Floor,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
704,2 BHK Flat,Sector 70A,"Pyramid Urban Homes, Sector 70A, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹38.66 K,38660,602 sq.ft,602,₹12.13 K/sq.ft,12130.0,6 Year Old,6,Ready to move,3 of 14,North facing,Semi Furnished,Pyramid Urban Homes,1 Lac,73.0 L,2,2,No Parking,1,More than a month ago,"['Water Purifier', 'Geyser', 'Power Backup', '...","[['School', 'The Vivekananda School'], ['Hospi...","2 BHK Flat for sale in Sector 70 A, Gurgaon - ...",https://housing.com/in/buy/resale/page/1671469...
6851,4 BHK Flat,Sector 59,"Conscient Elevate, Sector 59, Gurgaon",Conscient Infrastructure,EMI starts at ₹4.22 Lacs,422000,3395 sq.ft,3395,₹25.04 K/sq.ft,25040.0,2 Year Old,2,Ready to move,16 of 33,-,Unfurnished,Conscient Elevate,No Charge,8.5 Cr,4,4,3 Covered Parking,4,24 days ago,"['Amphitheater', 'Multipurpose Room', 'Tennis ...","[['School', 'Unicosmos School'], ['Hospital', ...",4 BHK Flat for sale in Gurgaon. This property ...,https://housing.com/in/buy/resale/page/1709681...
6714,4 BHK Flat,Sector 2,"Ansal Celebrity Homes, Sector 2, Palam Vihar, ...",Ansal API,EMI starts at ₹2.48 Lacs,248000,3750 sq.ft,3750,₹13.33 K/sq.ft,13330.0,18 Year Old,18,Ready to move,2 of 2,North facing,Semi Furnished,Ansal Celebrity Homes,5 Lacs,5.0 Cr,4,4,2 Covered and 1 Open Parking,2,14 days ago,"['Swimming Pool', 'Gym', 'Power Backup', 'Inte...","[['School', 'The Maurya School | Best School I...",Looking for a 4 BHK Flat for sale in Gurgaon? ...,https://housing.com/in/buy/resale/page/1783840...


### Cleaning up Floor feature to create to new features from it - Floor Number & Building Height:

In [8]:
## Adding two new columsn - "Floor_number" and "Building_height":
value_1 = df['Floor'].apply(lambda x: x.split(' of ')[0])
value_2 = df['Floor'].apply(lambda x: x.split(' of ')[-1])

df.insert(14 , 'Floor_number', value_1)
df.insert(15 , 'Building_height', value_2)
df.sample(4)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Avg_price_rupee_per_sqft,Age_of_property,Age_of_property_in_years,Possession_status,Floor,Floor_number,Building_height,Facing,Furnishing,Society,Brokerage,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
4454,3 BHK Flat,Sector 84,"SS The Coralwood, Sector 84, Gurgaon",SS Group,EMI starts at ₹96.81 K,96810,1890 sq.ft,1890,₹10.32 K/sq.ft,10320.0,5 Year Old,5,Ready to move,14 of 21,14,21,North-East facing,Semi Furnished,SS The Coralwood,1.9 Lacs,1.95 Cr,3,3,1 Covered and 1 Open Parking,4,21 days ago,"['Swimming Pool', 'Gym', 'Lift', 'Power Backup...","[['School', 'Delhi Public School, Sector 84'],...","3 BHK Flat for sale in Sector 84, Gurgaon with...",https://housing.com/in/buy/resale/page/1710193...
2533,3 BHK Flat,Sector 72,"CHD Avenue 71, Sector 72, Gurgaon",CHD Developers Ltd,EMI starts at ₹86.88 K,86880,1400 sq.ft,1400,₹12.5 K/sq.ft,12500.0,11 Year Old,11,Ready to move,-,-,-,North-East facing,Semi Furnished,CHD Avenue 71,No Charge,1.75 Cr,3,3,1 Covered and 1 Open Parking,3,More than a month ago,"['Dining Table', 'Water Purifier', 'Gas Pipeli...","[['School', 'CD International School'], ['Hosp...","3 BHK Flat for sale in Sector 72, Gurgaon. Thi...",https://housing.com/in/buy/resale/page/1718385...
2305,3 BHK Flat,Sector 69,"Tulip White, Sector 69, Gurgaon",Tulip Infratech Pvt Ltd,EMI starts at ₹76.95 K,76950,1326 sq.ft,1326,₹11.69 K/sq.ft,11690.0,10 Year Old,10,Ready to move,9 of 12,9,12,West facing,Semi Furnished,Tulip White,1.5 Lacs,1.55 Cr,3,2,1 Covered and 1 Open Parking,3,More than a month ago,"['Amphitheater', 'Vastu Compliant', 'Fire Spri...","[['School', 'The Vivekananda School'], ['Hospi...","3 BHK Flat for sale in Sector 69, Gurgaon with...",https://housing.com/in/buy/resale/page/1767182...
2160,2 BHK Flat,Sector 68,"Pareena Mi Casa, Sector 68, Gurgaon",Pareena Infrastructure Builders,EMI starts at ₹73.97 K,73970,1245 sq.ft,1245,₹11.97 K/sq.ft,11970.0,1 Years Old,1,Ready to move,8 of 35,8,35,East facing,Semi Furnished,Pareena Mi Casa,1.5 Lacs,1.49 Cr,2,2,1 Covered and 1 Open Parking,3,2 days ago,"['Swimming Pool', 'Gym', 'Lift', 'Power Backup...","[['School', 'The Vivekananda School - Sector 6...",Check out this 2 BHK Flat for sale in Sector 6...,https://housing.com/in/buy/resale/page/1793736...


### Converting Brokerage for properties given in string format to Integer format:

In [9]:
## Adding "Brokerage_in_rupees" Column to the DF:
def func_4(x):
    if len(x.split(' ')) == 2:
        return int(float(x.split(' ')[0]) * 100000)
    else:
        temp = ''
        for i in x.split(','):
            temp += i
        return int(temp)

value = df['Brokerage'].str.replace('No Charge', '0').apply(func_4).astype('int32')

df.insert(20, 'Brokerage_in_rupees', value)
df.sample(4)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Avg_price_rupee_per_sqft,Age_of_property,Age_of_property_in_years,Possession_status,Floor,Floor_number,Building_height,Facing,Furnishing,Society,Brokerage,Brokerage_in_rupees,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
4303,3 BHK Flat,Sector 84,"SS The Coralwood, Sector 84, Gurgaon",SS Group,EMI starts at ₹90.85 K,90850,1750 sq.ft,1750,₹10.46 K/sq.ft,10460.0,5 Year Old,5,Ready to move,20 of 30,20,30,North-East facing,Semi Furnished,SS The Coralwood,1.4 Lacs,140000,1.83 Cr,3,3,1 Covered Parking,4,More than a month ago,"['Swimming Pool', 'Gym', 'Lift', 'Intercom', '...","[['School', 'Delhi Public School, Sector 84'],...",3 BHK Flat for sale in Gurgaon. This property ...,https://housing.com/in/buy/resale/page/1718313...
6551,3.5 BHK Flat,Sector 108,"Raheja Vedaanta, Sector 108, Gurgaon",Raheja Developers Ltd.,EMI starts at ₹99.29 K,99290,2666 sq.ft,2666,₹7.5 K/sq.ft,7500.0,5 Year Old,5,Ready to move,9 of 15,9,15,-,Semi Furnished,Raheja Vedaanta,2 Lacs,200000,2.0 Cr,4,4,2 Covered and 1 Open Parking,3,More than a month ago,"['Internet / Wi-Fi', 'Tennis Court', 'Sauna Ba...","[['School', 'The Shikshiyan School'], ['Hospit...","3.5 BHK Flat for sale in Sector 108, Gurgaon -...",https://housing.com/in/buy/resale/page/1749103...
6099,3 BHK Flat,Sector 112,"Tata Gurgaon Gateway, Sector 112, Gurgaon",Tata Realty and Infrastructure Limited,EMI starts at ₹1.56 Lacs,156000,2120 sq.ft,2120,₹14.81 K/sq.ft,14810.0,7 Year Old,7,Ready to move,14 of 24,14,24,North-East facing,Semi Furnished,Tata Gurgaon Gateway,3 Lacs,300000,3.14 Cr,3,3,1 Covered and 1 Open Parking,4,8 days ago,"['Cupboard', '4Light', '1Wardrobe']","[['School', 'Prudence Schools - Top & Best CBS...","Looking for a good 3 BHK Flat in Sector 112, G...",https://housing.com/in/buy/resale/page/1788439...
355,2 BHK Flat,Sector 90,"Shree Green Court, Sector 90, Gurgaon",Shree Vardhman Group,EMI starts at ₹34.42 K,34420,636 sq.ft,636,₹10.22 K/sq.ft,10220.0,6 Year Old,6,Ready to move,6 of 14,6,14,-,Unfurnished,Shree Green Court,65000,65000,65.0 L,2,2,1 Open Parking,2,21 days ago,"['Grocery Shop', 'Vastu Compliant', 'Fountains...","[['School', 'RPS International School'], ['Hos...","Property for sale in Sector 90, Gurgaon. This ...",https://housing.com/in/buy/resale/page/1777532...


### Converting Price feature for properties given in string format to Integer format:

In [10]:
## Adding "Price_in_rupees" Column to the DF:
def func_5(x):
    if x[1] == 'L':
        return int(float(x[0]) * 100000)
    elif x[1] == 'Cr':
        return int(float(x[0]) * 10000000)

value = df['Price'].apply(lambda x: x.split(' ')).apply(func_5)

df.insert(22, 'Price_in_rupees', value)
df.sample(4)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Avg_price_rupee_per_sqft,Age_of_property,Age_of_property_in_years,Possession_status,Floor,Floor_number,Building_height,Facing,Furnishing,Society,Brokerage,Brokerage_in_rupees,Price,Price_in_rupees,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
4450,4 BHK Flat,Sector 65,"Emaar Emerald Hill, Sector 65, Gurgaon",Emaar,EMI starts at ₹1.44 Lacs,144000,1750 sq.ft,1750,₹16.57 K/sq.ft,16570.0,4 Year Old,4,Ready to move,2 of 2,2,2,North facing,Semi Furnished,Emaar Emerald Hill,2.9 Lacs,290000,2.9 Cr,29000000,4,4,1 Covered and 1 Open Parking,3,More than a month ago,"['Water Purifier', 'Gas Pipeline', 'Servant Ro...","[['School', 'Lotus Valley International School...","Property for sale in Sector 65, Gurgaon. This ...",https://housing.com/in/buy/resale/page/1764659...
2947,3 BHK Flat,Sector 81,"Bestech Park View Ananda, Sector 81, Gurgaon",Bestech India Pvt Ltd,EMI starts at ₹89.36 K,89360,1660 sq.ft,1660,₹10.84 K/sq.ft,10840.0,10 Year Old,10,Ready to move,4 of 19,4,19,-,Semi Furnished,Bestech Park View Ananda,1.8 Lacs,180000,1.8 Cr,18000000,3,3,1 Covered Parking,2,24 days ago,"['Stove', 'Water Purifier', 'Gas Pipeline']","[['School', ""St. Xavier's High School""], ['Hos...",One of the finest property in Sector 81 is now...,https://housing.com/in/buy/resale/page/1774760...
6038,4 BHK Flat,Sector 102,"Shapoorji Pallonji JoyVille, Sector 102, Gurgaon",Shapoorji Pallonji Real Estate,EMI starts at ₹1.51 Lacs,151000,2100 sq.ft,2100,₹14.52 K/sq.ft,14520.0,1 Years Old,1,Ready to move,5 of 24,5,24,North-East facing,Semi Furnished,Shapoorji Pallonji JoyVille,3.2 Lacs,320000,3.05 Cr,30500000,4,4,2 Covered and 1 Open Parking,3,More than a month ago,"['Stove', 'Gas Pipeline', 'AC', 'Cupboard', 'G...","[['School', 'Delhi Public School'], ['Hospital...","Looking for a good 4 BHK Flat in Sector 102, G...",https://housing.com/in/buy/resale/page/1751554...
681,2 BHK Flat,Sector 37D,"Signature Global The Millennia I, Sector 37D, ...",Signature Global Builders Pvt. Ltd.,EMI starts at ₹38.13 K,38130,650 sq.ft,650,₹11.08 K/sq.ft,11080.0,4 Year Old,4,Ready to move,3 of 14,3,14,South-West facing,Semi Furnished,Signature Global The Millennia I,69999,69999,72.0 L,7200000,2,2,1 Covered and 1 Open Parking,2,18 days ago,"['Cupboard', 'Power Backup', 'Pet allowed']","[['School', 'Euro International School, Sector...","Looking for a good 2 BHK Flat in Sector 37 D, ...",https://housing.com/in/buy/resale/page/1779914...


### Converting Parking feature of properties into 2 new features - Covered & Open Parking with right data format:

In [21]:
## Adding two new columns - 'Covered_parking' & 'Open_parking' :
def func_6(x):
    if len(x) == 2:
        return [x[0][0] , x[1][0]]
    
    elif x[0][0].isalpha():
        return [0, 0]
        
    elif x[0][0] == '-':
        return ['-', '-']
    
    else:
        if x[0][2] == 'C':
            return [x[0][0] , 0]
        elif x[0][2] == 'O':
            return [0, x[0][0]]
        

value = df['Parking'].apply(lambda x: x.split(' and ')).apply(func_6)

df.insert(26, 'Covered_parking', value.apply(lambda x: x[0]))
df.insert(27, 'Open_parking', value.apply(lambda x: x[1]))
df.sample(4)

Unnamed: 0,Flat,Sector,Address,Seller_Builder,EMI,EMI_in_rupees,Built_Up_Area,Built_up_area_in_sqft,Avg_Price,Avg_price_rupee_per_sqft,Age_of_property,Age_of_property_in_years,Possession_status,Floor,Floor_number,Building_height,Facing,Furnishing,Society,Brokerage,Brokerage_in_rupees,Price,Price_in_rupees,Bedrooms,Bathrooms,Parking,Covered_parking,Open_parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
2119,2 BHK Flat,Sector 37C,"ILD Greens, Sector 37C, Gurgaon",ILD,EMI starts at ₹54.11 K,54110,1320 sq.ft,1320,₹8.26 K/sq.ft,8260.0,5 Year Old,5,Ready to move,3 of 10,3,10,North-East facing,Semi Furnished,ILD Greens,1.1 Lacs,110000,1.09 Cr,10900000,2,2,1 Open Parking,0,1,2,2 days ago,['Cupboard'],"[['School', 'Euro International School, Sector...",One of the finest property in Sector 37 C is n...,https://housing.com/in/buy/resale/page/1747001...
4704,3.5 BHK Flat,Sector 102,"Conscient Heritage Max, Sector 102, Gurgaon",Conscient Infrastructure,EMI starts at ₹1.56 Lacs,156000,2100 sq.ft,2100,₹15 K/sq.ft,15000.0,7 Year Old,7,Ready to move,10 of 25,10,25,North-East facing,Semi Furnished,Conscient Heritage Max,3.1 Lacs,310000,3.15 Cr,31500000,4,3,2 Covered Parking,2,0,3,More than a month ago,"['Swimming Pool', 'Gym', 'Lift', 'Power Backup...","[['School', 'Delhi Public School'], ['Hospital...",3.5 BHK Flat for sale in Gurgaon. This propert...,https://housing.com/in/buy/resale/page/1749257...
3975,3 BHK Flat,Sector 83,"Emaar Palm Gardens, Sector 83, Gurgaon",Emaar,EMI starts at ₹1.22 Lacs,122000,1850 sq.ft,1850,₹13.24 K/sq.ft,13240.0,6 Year Old,6,Ready to move,10 of 18,10,18,East facing,Semi Furnished,Emaar Palm Gardens,2.5 Lacs,250000,2.45 Cr,24500000,3,3,1 Covered Parking,1,0,3,More than a month ago,"['Amphitheater', 'Spa', 'Skating Rink', 'Vastu...","[['School', 'Euro International School, Sector...",One of the finest property in Sector 83 is now...,https://housing.com/in/buy/resale/page/1585604...
3546,3 BHK Flat,Sector 104,"Hero Homes, Sector 104, Gurgaon",Hero Realty Private Limited,EMI starts at ₹1.14 Lacs,113999,1689 sq.ft,1689,₹13.62 K/sq.ft,13620.0,-,-,Ready to move,1 of 36,1,36,-,Semi Furnished,Hero Homes,2.3 Lacs,229999,2.3 Cr,23000000,3,3,2 Covered and 1 Open Parking,2,1,2,More than a month ago,"['Amphitheater', 'Cricket Pitch', 'Spa', 'Skat...","[['School', 'Delhi Public School'], ['Hospital...",Check out this 3 BHK Flat for sale in Sector 1...,https://housing.com/in/buy/resale/page/1764034...


### Saving the Cleaned DF as a New CSV File:

In [25]:
## Saving the cleaned dataframe as CSV file:
df.to_csv('Housing_Listings_all_records_(numbers)_FINAL_Cleaned.csv')

----

### Unpacking and Saving Amenity Feature for each Property listings for Future use:

In [156]:
## Unpacking "Amenities" column:
all_amenities = {}
def func_amen(x):
    if x is np.nan:
        return None
    
    for i in x[1:-1].split(', '):
        j = i[1:-1]
        if j not in all_amenities:
            all_amenities[j] = 1
        else:
            all_amenities[j] += 1

df['Amenities'].apply(func_amen)

all_amenities

{'Lift': 1595,
 'Power Backup': 1516,
 'Garden': 1558,
 'Sports': 1340,
 'Kids Area': 824,
 'CCTV': 438,
 'Gated Community': 358,
 'Amphitheater': 753,
 'Cricket Pitch': 503,
 'Gazebo': 272,
 'Spa': 391,
 'Skating Rink': 407,
 'Fountains': 442,
 'Table Tennis': 392,
 'Stove': 1218,
 'Gas Pipeline': 1914,
 'Cupboard': 2094,
 'Pet allowed': 1805,
 'Intercom': 1271,
 'Water Purifier': 1382,
 '3Fan': 103,
 '3Light': 34,
 '2Wardrobe': 197,
 '6Light': 70,
 'Reflexology Park': 113,
 'Vastu Compliant': 436,
 'Gymnasium': 339,
 'Flower Garden': 6,
 'Reading Lounge': 59,
 'Internet / Wi-Fi': 178,
 'Tennis Court': 383,
 'Garbage Disposal': 119,
 'Swimming Pool': 1972,
 'Landscaping & Tree Planting': 549,
 'Bed': 146,
 'Geyser': 1537,
 'Dining Table': 545,
 'Washing Machine': 678,
 'Sofa': 600,
 'Microwave': 790,
 'Fridge': 681,
 'Gym': 1244,
 'Parking': 62,
 '24X7 Water Supply': 258,
 'Multipurpose Room': 158,
 'Senior Citizen Siteout': 65,
 "Children's Play Area": 248,
 'Entrance Lobby': 32,
 'Y

In [164]:
import json

df.to_csv('Housing_Listings_all_records_(numbers)_FINAL_Cleaned.csv')

with open('All_Amenities_with_counts.json' , 'w') as file:
    json.dump(all_amenities, file)

## Now data is cleaned up! We can start EDA next...
---
---
---