# **Dataset Summary**

#### This dataset appears to be scraped from a property listing site like 99acres.com and includes detailed information about flat listings, including location, size, price, amenities, and more.

| Column Name         | Description                                                                                                  
| --------------      | ------------------------------------------------------------------------------------------------------------ |
| **property_name**   | Title of the listing, typically showing the number of bedrooms and locality.                                
| **link**            | URL link to the specific property listing online.                                                            
| **society**         | Name of the residential society or building where the flat is located.                                       
| **price**           | Listed price of the property (e.g., “45 Lac”, “1.47 Crore”).                                                 
| **area**            | Cost per square foot (e.g., “₹ 5,000/sq.ft.”).                                                               
| **areaWithType**    | Detailed area with type (e.g., Carpet area or Super Built-up area with size in sq.ft. and                            sq.m.).            
| **bedRoom**         | Number of bedrooms in text form (e.g., “2 Bedrooms”).                                                        
| **bathroom**        | Number of bathrooms in text form (e.g., “2 Bathrooms”).                                                      
| **balcony**         | Number of balconies (e.g., “1 Balcony”, “3 Balconies”).                                                      
| **additionalRoom**  | Any additional rooms like study, servant room, etc.                                                          
| **address**         | Full address of the flat including locality, city, and state.                                                
| **floorNum**        | Floor level of the flat and total floors (e.g., “4th of 4 Floors”).                                          
| **facing**          | Direction the flat faces (e.g., East, West, etc.).                                                           
| **agePossession**   | Age of the property or expected possession time (e.g., “1 to 5 Year Old”, “Under                                     Construction”, “Dec 2023”).|

| **nearbyLocations**  | List of nearby landmarks like hospitals, temples, schools.                                                
| **description**      | Free text description provided by the seller or agent.                                                       
| **furnishDetails**   | Details of furnishing like number of fans, lights, wardrobes, etc.                                           
| **features**         | Features and amenities like fire alarm, power backup, etc.                                                  
| **rating**           | Rating of various aspects (environment, safety, lifestyle) given as text.                                    
| **property_id**      | Unique identifier for each property.                                                                         


# Maunal Assessment/Cleaning done through excel

# Assessment/Cleaning through code

### Problems with the dataset: 

#### 1. Some Columns are not neccessary for our project such as link, property_id
### Dirty Data:

#### 1. Lot of Missing values are present in various rows-(Completeness)
#### 2. The column furnish detail includes [] instead of descriptions in various rows-(Validity)
#### 3. Column named area has â‚¹ in its string (validity)

### Messy Data: 
#### 1. Feature Engineering needs to be done over areaWithType column
#### 2. Feature Engineering needs to be done over additionalRoom column

In [36]:
import pandas as pd
import numpy as np

In [37]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [38]:
df = pd.read_csv("flats.csv")

In [39]:
df.head()

Unnamed: 0,property_name,link,society,price,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating,property_id
0,2 BHK Flat in Krishna Colony,https://www.99acres.com/2-bhk-bedroom-apartmen...,maa bhagwati residency,45 Lac,"₹ 5,000/sq.ft.",Carpet area: 900 (83.61 sq.m.),2 Bedrooms,2 Bathrooms,1 Balcony,,"Krishna Colony, Gurgaon, Haryana",4th of 4 Floors,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ...",C68850746
1,2 BHK Flat in Ashok Vihar,https://www.99acres.com/2-bhk-bedroom-apartmen...,Apna Enclave,50 Lac,"₹ 7,692/sq.ft.",Carpet area: 650 (60.39 sq.m.),2 Bedrooms,2 Bathrooms,1 Balcony,,"46b, Ashok Vihar, Gurgaon, Haryana",1st of 3 Floors,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ...",H68850564
2,2 BHK Flat in Sohna,https://www.99acres.com/2-bhk-bedroom-apartmen...,Tulsiani Easy in Homes,40 Lac,"₹ 6,722/sq.ft.",Carpet area: 595 (55.28 sq.m.),2 Bedrooms,2 Bathrooms,3 Balconies,,"Sohna, Gurgaon, Haryana",12nd of 14 Floors,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ...",J68850120
3,2 BHK Flat in Sector 61 Gurgaon,https://www.99acres.com/2-bhk-bedroom-apartmen...,Smart World Orchard,1.47 Crore,"₹ 12,250/sq.ft.",Carpet area: 1200 (111.48 sq.m.),2 Bedrooms,2 Bathrooms,2 Balconies,Study Room,"Sector 61 Gurgaon, Gurgaon, Haryana",2nd of 4 Floors,,Dec-23,"['Sector 55-56 Metro station', 'Bestech Centra...",Near to metro station of sector 56 and opposit...,,"['Security / Fire Alarm', 'Private Garden / Te...",,S68849476
4,2 BHK Flat in Sector 92 Gurgaon,https://www.99acres.com/2-bhk-bedroom-apartmen...,Parkwood Westend,70 Lac,"₹ 5,204/sq.ft.",Super Built up area 1345(124.95 sq.m.),2 Bedrooms,2 Bathrooms,3 Balconies,Study Room,"Sector 92 Gurgaon, Gurgaon, Haryana",5th of 8 Floors,,Under Construction,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",We are the proud owners of this 2 bhk alongwit...,,,"['Environment5 out of 5', 'Safety3 out of 5', ...",L47956793


In [40]:
df.tail()

Unnamed: 0,property_name,link,society,price,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating,property_id
3003,3 BHK Flat in Sector 86 Gurgaon,https://www.99acres.com/3-bhk-bedroom-apartmen...,Ansal Heights 86,1.05 Crore,"₹ 5,541/sq.ft.",Super Built up area 1895(176.05 sq.m.),3 Bedrooms,3 Bathrooms,3 Balconies,Servant Room,"Tower C, Sector 86 Gurgaon, Gurgaon, Haryana",9th of 13 Floors,North-East,Under Construction,"['IRIS Broadway Mall', 'Delhi Jaipur Expresswa...",Residential apartment for sell.Located in sect...,,,"['Safety4.5 out of 5', 'Lifestyle5 out of 5', ...",D26586124
3004,5 BHK Flat in Sector 48 Gurgaon,https://www.99acres.com/5-bhk-bedroom-apartmen...,Parsvnath Green Ville3.9 ★,3.3 Crore,"₹ 9,984/sq.ft.",Super Built up area 3905(362.79 sq.m.)Built Up...,5 Bedrooms,5 Bathrooms,3+ Balconies,Servant Room,"Sector 48 Gurgaon, Gurgaon, Haryana",4th of 5 Floors,,10+ Year Old,"['Sri Radhe Krishna Temple', 'Icici bank ATM',...",5 bhk duplex penthouse in low rise building.Av...,,"['Security / Fire Alarm', 'Private Garden / Te...","['Management4 out of 5', 'Green Area4 out of 5...",J17123294
3005,3 BHK Flat in Sector 108 Gurgaon,https://www.99acres.com/3-bhk-bedroom-apartmen...,Raheja Vedaanta3.6 ★,95 Lac,"₹ 5,214/sq.ft.",Super Built up area 1822(169.27 sq.m.),3 Bedrooms,3 Bathrooms,3 Balconies,Others,"Sector 108 Gurgaon, Gurgaon, Haryana",3rd of 22 Floors,,1 to 5 Year Old,,3 bedroom flat with full woodwork. Ready to mo...,,"['Security / Fire Alarm', 'Feng Shui / Vaastu ...","['Management3 out of 5', 'Green Area4 out of 5...",A41215323
3006,3 BHK Flat in DLF Phase 3,https://www.99acres.com/3-bhk-bedroom-apartmen...,Ambience Lagoon3.9 ★,5.8 Crore,"₹ 12,500/sq.ft.",Built Up area: 3700 (343.74 sq.m.),3 Bedrooms,4 Bathrooms,3+ Balconies,"Pooja Room,Study Room,Servant Room,Others","Gurgaon, DLF Phase 3, Gurgaon, Haryana",9th of 9 Floors,North-East,10+ Year Old,"['Micromax moulsari avenue metro station', 'In...",Luxury condominium complex located on delhi gu...,"['1 Water Purifier', '10 Fan', '1 Fridge', '1 ...","['Security / Fire Alarm', 'Private Garden / Te...","['Management5 out of 5', 'Green Area5 out of 5...",J18888617
3007,4 BHK Flat in Sector 54 Gurgaon,https://www.99acres.com/4-bhk-bedroom-apartmen...,DLF The Crest3.6 ★,11 Crore,"₹ 35,222/sq.ft.",Super Built up area 3123(290.14 sq.m.),4 Bedrooms,6 Bathrooms,3 Balconies,Servant Room,"Sector 54 Gurgaon, Gurgaon, Haryana",7th of 36 Floors,,1 to 5 Year Old,"['Sector 53-54 Metro Station', 'Ardee Mall', '...",Club & pool facing\nVrv air conditioning,"['6 Fan', '1 Fridge', '1 Exhaust Fan', '5 Geys...","['Security / Fire Alarm', 'Power Back-up', 'In...","['Management4 out of 5', 'Green Area4 out of 5...",V70296402


In [41]:
df.shape

(3008, 20)

In [42]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3008 entries, 0 to 3007
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   property_name    3007 non-null   object
 1   link             3008 non-null   object
 2   society          3007 non-null   object
 3   price            3007 non-null   object
 4   area             2996 non-null   object
 5   areaWithType     3008 non-null   object
 6   bedRoom          3008 non-null   object
 7   bathroom         3008 non-null   object
 8   balcony          3008 non-null   object
 9   additionalRoom   1694 non-null   object
 10  address          3002 non-null   object
 11  floorNum         3006 non-null   object
 12  facing           2127 non-null   object
 13  agePossession    3007 non-null   object
 14  nearbyLocations  2913 non-null   object
 15  description      3008 non-null   object
 16  furnishDetails   2203 non-null   object
 17  features         2594 non-null   

In [43]:
df.duplicated().sum()

0

In [44]:
df.isnull().sum()

property_name         1
link                  0
society               1
price                 1
area                 12
areaWithType          0
bedRoom               0
bathroom              0
balcony               0
additionalRoom     1314
address               6
floorNum              2
facing              881
agePossession         1
nearbyLocations      95
description           0
furnishDetails      805
features            414
rating              332
property_id           0
dtype: int64

In [45]:
# Dropping non-required columns 
df.drop(columns= ['link', 'property_id'], inplace= True)

In [46]:
df['area'] = df['area'].str.replace('â‚¹', '₹', regex=False)

In [47]:
# Columns to be renamed
df.rename(columns={'area':'price_per_sqft'},inplace= True)

In [48]:
df.head()

Unnamed: 0,property_name,society,price,price_per_sqft,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Krishna Colony,maa bhagwati residency,45 Lac,"₹ 5,000/sq.ft.",Carpet area: 900 (83.61 sq.m.),2 Bedrooms,2 Bathrooms,1 Balcony,,"Krishna Colony, Gurgaon, Haryana",4th of 4 Floors,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ..."
1,2 BHK Flat in Ashok Vihar,Apna Enclave,50 Lac,"₹ 7,692/sq.ft.",Carpet area: 650 (60.39 sq.m.),2 Bedrooms,2 Bathrooms,1 Balcony,,"46b, Ashok Vihar, Gurgaon, Haryana",1st of 3 Floors,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ..."
2,2 BHK Flat in Sohna,Tulsiani Easy in Homes,40 Lac,"₹ 6,722/sq.ft.",Carpet area: 595 (55.28 sq.m.),2 Bedrooms,2 Bathrooms,3 Balconies,,"Sohna, Gurgaon, Haryana",12nd of 14 Floors,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ..."
3,2 BHK Flat in Sector 61 Gurgaon,Smart World Orchard,1.47 Crore,"₹ 12,250/sq.ft.",Carpet area: 1200 (111.48 sq.m.),2 Bedrooms,2 Bathrooms,2 Balconies,Study Room,"Sector 61 Gurgaon, Gurgaon, Haryana",2nd of 4 Floors,,Dec-23,"['Sector 55-56 Metro station', 'Bestech Centra...",Near to metro station of sector 56 and opposit...,,"['Security / Fire Alarm', 'Private Garden / Te...",
4,2 BHK Flat in Sector 92 Gurgaon,Parkwood Westend,70 Lac,"₹ 5,204/sq.ft.",Super Built up area 1345(124.95 sq.m.),2 Bedrooms,2 Bathrooms,3 Balconies,Study Room,"Sector 92 Gurgaon, Gurgaon, Haryana",5th of 8 Floors,,Under Construction,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",We are the proud owners of this 2 bhk alongwit...,,,"['Environment5 out of 5', 'Safety3 out of 5', ..."


In [49]:
df['society'].value_counts().shape

(636,)

In [50]:
# Society includes rating in some entries
import re
df['society'] = df['society'].apply(lambda x: re.sub(r'\d+(\.\d+)?\s?★', '', str(x)).strip()).str.lower()

In [51]:
# Checking the values of price
df['price'].value_counts()

price
1.25 Crore          79
1.1 Crore           61
1.4 Crore           60
1.5 Crore           59
1.2 Crore           59
90 Lac              58
1.3 Crore           57
95 Lac              53
2 Crore             51
1.75 Crore          47
1 Crore             46
1.6 Crore           43
1.35 Crore          41
1.9 Crore           40
1.55 Crore          40
75 Lac              38
1.65 Crore          38
1.8 Crore           37
1.7 Crore           37
80 Lac              36
2.2 Crore           34
1.15 Crore          33
50 Lac              32
1.45 Crore          31
85 Lac              31
1.05 Crore          30
60 Lac              29
2.5 Crore           29
40 Lac              29
2.1 Crore           26
65 Lac              25
45 Lac              25
1.85 Crore          23
35 Lac              23
2.35 Crore          23
3 Crore             22
70 Lac              21
2.25 Crore          20
55 Lac              20
3.5 Crore           19
2.3 Crore           18
2.4 Crore           18
30 Lac              17
2.65 

In [52]:
df= df[df['price'] != 'Price on Request']

In [53]:
df.shape

(2997, 18)

In [54]:
def reset_price(x):
    if type(x) == float:
        return x
    elif x[1] == 'Lac':
        return round(float(x[0])/100,2)
    else:
        return round(float(x[0]),2)

In [55]:
df['price'] = df['price'].str.split(' ').apply(reset_price)

In [56]:
df.sample(5)

Unnamed: 0,property_name,society,price,price_per_sqft,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
601,3 BHK Flat in Sector 37D Gurgaon,bptp terra,1.98,"₹ 9,036/sq.ft.",Super Built up area 2191(203.55 sq.m.),3 Bedrooms,3 Bathrooms,3+ Balconies,Study Room,"1203, Sector 37D Gurgaon, Gurgaon, Haryana",12nd of 23 Floors,North-East,0 to 1 Year Old,"['Airia Mall', 'Dwarka Expressway', 'Golf Cour...",Its ultra luxury apartment and society,"['1 Stove', '6 AC', '1 Chimney', '1 Modular Ki...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Green Area4.5 out of 5', 'Construction4.5 ou..."
783,3 BHK Flat in Sector 65 Gurgaon,emaar emerald hills,1.95,"₹ 13,928/sq.ft.",Carpet area: 1400 (130.06 sq.m.),3 Bedrooms,3 Bathrooms,3+ Balconies,Store Room,"Amber Block, Sector 65 Gurgaon, Gurgaon, Haryana",1st of 2 Floors,North-East,0 to 1 Year Old,"['Emerald Plaza Shopping Mall', 'Southern Peri...","Situated in sector 65 gurgaon, emaar emerald h...","['1 Water Purifier', '5 Fan', '1 Exhaust Fan',...","['Security / Fire Alarm', 'Power Back-up', 'Fe...","['Green Area5 out of 5', 'Construction5 out of..."
2837,3 BHK Flat in BPTP,bptp amstoria,2.25,"₹ 8,653/sq.ft.",Super Built up area 2600(241.55 sq.m.),3 Bedrooms,3 Bathrooms,3+ Balconies,,"BPTP, Gurgaon, Haryana",7th of 19 Floors,,5 to 10 Year Old,"['Early Basket Grocery shop', 'Conscient One M...",We are the proud owners of this 3 bhk apartmen...,,,"['Management4 out of 5', 'Green Area4 out of 5..."
1092,3 BHK Flat in Sector 104 Gurgaon,godrej summit,1.16,"₹ 6,387/sq.ft.",Carpet area: 1816 (168.71 sq.m.),3 Bedrooms,4 Bathrooms,3+ Balconies,Servant Room,"Sector 104 Gurgaon, Gurgaon, Haryana",8th of 17 Floors,East,1 to 5 Year Old,"['MG Road Metro Station', 'The Esplanade Mall'...",Good furnished flat having 3bhk servant room a...,"['2 Wardrobe', '5 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Power Back-up', 'Fe...","['Green Area4.5 out of 5', 'Construction4 out ..."
2181,1 BHK Flat in Sector 56 Gurgaon,kendriya vihar,0.47,"₹ 7,768/sq.ft.",Carpet area: 605 (56.21 sq.m.),1 Bedroom,1 Bathroom,1 Balcony,,"Sector 56 Gurgaon, Gurgaon, Haryana",3rd of 3 Floors,,10+ Year Old,"['Sector metro station', 'Sector metro station...",1 bhk property in a well maintained gated soci...,"['1 Wardrobe', '1 Bed', '1 Water Purifier', '2...","['Power Back-up', 'Maintenance Staff', 'Water ...","['Green Area5 out of 5', 'Construction4 out of..."


In [57]:
df['price_per_sqft'].value_counts()

price_per_sqft
₹ 10,000/sq.ft.     19
₹ 8,000/sq.ft.      16
₹ 12,500/sq.ft.     16
₹ 6,666/sq.ft.      13
₹ 5,000/sq.ft.      13
₹ 7,500/sq.ft.      12
₹ 8,333/sq.ft.      12
₹ 6,000/sq.ft.      11
₹ 8,461/sq.ft.       9
₹ 12,000/sq.ft.      8
₹ 7,000/sq.ft.       8
₹ 9,000/sq.ft.       7
₹ 11,111/sq.ft.      6
₹ 5,500/sq.ft.       6
₹ 6,578/sq.ft.       6
₹ 8,928/sq.ft.       6
₹ 9,230/sq.ft.       6
₹ 11,500/sq.ft.      6
₹ 16,000/sq.ft.      5
₹ 6,500/sq.ft.       5
₹ 14,242/sq.ft.      5
₹ 8,888/sq.ft.       5
₹ 10,714/sq.ft.      5
₹ 8,571/sq.ft.       5
₹ 7,142/sq.ft.       5
₹ 5,556/sq.ft.       5
₹ 7,692/sq.ft.       5
₹ 4,444/sq.ft.       5
₹ 8,205/sq.ft.       5
₹ 5,600/sq.ft.       5
₹ 11,428/sq.ft.      5
₹ 4,615/sq.ft.       5
₹ 7,407/sq.ft.       5
₹ 5,384/sq.ft.       5
₹ 7,641/sq.ft.       5
₹ 4,666/sq.ft.       5
₹ 9,822/sq.ft.       4
₹ 8,432/sq.ft.       4
₹ 4,500/sq.ft.       4
₹ 6,250/sq.ft.       4
₹ 8,043/sq.ft.       4
₹ 8,500/sq.ft.       4
₹ 4,854/sq.ft.     

In [58]:
#Removig ₹, / and , from price_per_sqft
df['price_per_sqft'] = df['price_per_sqft'].str.split('/').str.get(0).str.replace('₹','').str.replace(',','').str.strip().astype('float')

In [59]:
df.head()

Unnamed: 0,property_name,society,price,price_per_sqft,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Krishna Colony,maa bhagwati residency,45.0,5000.0,Carpet area: 900 (83.61 sq.m.),2 Bedrooms,2 Bathrooms,1 Balcony,,"Krishna Colony, Gurgaon, Haryana",4th of 4 Floors,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ..."
1,2 BHK Flat in Ashok Vihar,apna enclave,50.0,7692.0,Carpet area: 650 (60.39 sq.m.),2 Bedrooms,2 Bathrooms,1 Balcony,,"46b, Ashok Vihar, Gurgaon, Haryana",1st of 3 Floors,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ..."
2,2 BHK Flat in Sohna,tulsiani easy in homes,0.4,6722.0,Carpet area: 595 (55.28 sq.m.),2 Bedrooms,2 Bathrooms,3 Balconies,,"Sohna, Gurgaon, Haryana",12nd of 14 Floors,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ..."
3,2 BHK Flat in Sector 61 Gurgaon,smart world orchard,1.47,12250.0,Carpet area: 1200 (111.48 sq.m.),2 Bedrooms,2 Bathrooms,2 Balconies,Study Room,"Sector 61 Gurgaon, Gurgaon, Haryana",2nd of 4 Floors,,Dec-23,"['Sector 55-56 Metro station', 'Bestech Centra...",Near to metro station of sector 56 and opposit...,,"['Security / Fire Alarm', 'Private Garden / Te...",
4,2 BHK Flat in Sector 92 Gurgaon,parkwood westend,0.7,5204.0,Super Built up area 1345(124.95 sq.m.),2 Bedrooms,2 Bathrooms,3 Balconies,Study Room,"Sector 92 Gurgaon, Gurgaon, Haryana",5th of 8 Floors,,Under Construction,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",We are the proud owners of this 2 bhk alongwit...,,,"['Environment5 out of 5', 'Safety3 out of 5', ..."


In [60]:
df['areaWithType'].unique() #--> It is seen that there are multiple unique items such as Carpet area, Built up area, Super Built up area

array(['Carpet area: 900 (83.61 sq.m.)', 'Carpet area: 650 (60.39 sq.m.)',
       'Carpet area: 595 (55.28 sq.m.)', ...,
       'Super Built up area 1822(169.27 sq.m.)',
       'Built Up area: 3700 (343.74 sq.m.)',
       'Super Built up area 3123(290.14 sq.m.)'], dtype=object)

In [61]:
#bedrooms
df['bedRoom'].value_counts()

bedRoom
3 Bedrooms    1437
2 Bedrooms     944
4 Bedrooms     478
1 Bedroom      104
5 Bedrooms      31
6 Bedrooms       3
Name: count, dtype: int64

In [62]:
df['bedRoom'].isnull().sum()

0

In [63]:
df['bedRoom']= df['bedRoom'].str.replace('Bedrooms','').str.strip()
df['bedRoom']= df['bedRoom'].str.replace('Bedroom','').str.strip()
df['bedRoom']= df['bedRoom'].astype('int')

In [64]:
df.dtypes

property_name       object
society             object
price              float64
price_per_sqft     float64
areaWithType        object
bedRoom              int32
bathroom            object
balcony             object
additionalRoom      object
address             object
floorNum            object
facing              object
agePossession       object
nearbyLocations     object
description         object
furnishDetails      object
features            object
rating              object
dtype: object

In [65]:
#bathrooms
df['bathroom'].value_counts()

bathroom
2 Bathrooms    1044
3 Bathrooms     989
4 Bathrooms     636
5 Bathrooms     169
1 Bathroom      112
6 Bathrooms      42
7 Bathrooms       5
Name: count, dtype: int64

In [66]:
df['bathroom'].isnull().sum()

0

In [67]:
df['bathroom']= df['bathroom'].str.replace('Bathrooms','').str.strip()
df['bathroom']= df['bathroom'].str.replace('Bathroom','').str.strip()
df['bathroom']= df['bathroom'].astype('int')

In [68]:
#balcony
df['balcony'].value_counts()

balcony
3 Balconies     974
3+ Balconies    862
2 Balconies     749
1 Balcony       315
No Balcony       97
Name: count, dtype: int64

In [69]:
df['balcony'].isnull().sum()

0

In [70]:
df['balcony']= df['balcony'].str.replace('Balconies','').str.strip()
df['balcony']= df['balcony'].str.replace('Balcony','').str.strip()  #--> Can't change the astype to int because of te 3+ unique term 

In [71]:
df.head()

Unnamed: 0,property_name,society,price,price_per_sqft,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Krishna Colony,maa bhagwati residency,45.0,5000.0,Carpet area: 900 (83.61 sq.m.),2,2,1,,"Krishna Colony, Gurgaon, Haryana",4th of 4 Floors,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ..."
1,2 BHK Flat in Ashok Vihar,apna enclave,50.0,7692.0,Carpet area: 650 (60.39 sq.m.),2,2,1,,"46b, Ashok Vihar, Gurgaon, Haryana",1st of 3 Floors,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ..."
2,2 BHK Flat in Sohna,tulsiani easy in homes,0.4,6722.0,Carpet area: 595 (55.28 sq.m.),2,2,3,,"Sohna, Gurgaon, Haryana",12nd of 14 Floors,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ..."
3,2 BHK Flat in Sector 61 Gurgaon,smart world orchard,1.47,12250.0,Carpet area: 1200 (111.48 sq.m.),2,2,2,Study Room,"Sector 61 Gurgaon, Gurgaon, Haryana",2nd of 4 Floors,,Dec-23,"['Sector 55-56 Metro station', 'Bestech Centra...",Near to metro station of sector 56 and opposit...,,"['Security / Fire Alarm', 'Private Garden / Te...",
4,2 BHK Flat in Sector 92 Gurgaon,parkwood westend,0.7,5204.0,Super Built up area 1345(124.95 sq.m.),2,2,3,Study Room,"Sector 92 Gurgaon, Gurgaon, Haryana",5th of 8 Floors,,Under Construction,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",We are the proud owners of this 2 bhk alongwit...,,,"['Environment5 out of 5', 'Safety3 out of 5', ..."


In [72]:
# additionalRoom
df['additionalRoom'].value_counts()

additionalRoom
Servant Room                                     629
Study Room                                       232
Others                                           179
Pooja Room                                       132
Study Room,Servant Room                           81
Store Room                                        76
Pooja Room,Servant Room                           60
Servant Room,Others                               52
Servant Room,Pooja Room                           30
Study Room,Others                                 27
Pooja Room,Study Room,Servant Room,Others         25
Pooja Room,Study Room,Servant Room                24
Servant Room,Store Room                           19
Pooja Room,Study Room                             13
Pooja Room,Study Room,Servant Room,Store Room     12
Study Room,Pooja Room                              8
Servant Room,Study Room                            8
Study Room,Servant Room,Store Room                 7
Pooja Room,Store Room          

In [73]:
df['additionalRoom'].value_counts().shape

(49,)

In [74]:
df['additionalRoom'].isnull().sum()

1305

In [75]:
df['additionalRoom'].fillna('not available', inplace= True)

In [76]:
df['additionalRoom'] = df['additionalRoom'].str.lower()

In [77]:
df[df['additionalRoom'] == 'not available']

Unnamed: 0,property_name,society,price,price_per_sqft,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Krishna Colony,maa bhagwati residency,45.0,5000.0,Carpet area: 900 (83.61 sq.m.),2,2,1,not available,"Krishna Colony, Gurgaon, Haryana",4th of 4 Floors,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ..."
1,2 BHK Flat in Ashok Vihar,apna enclave,50.0,7692.0,Carpet area: 650 (60.39 sq.m.),2,2,1,not available,"46b, Ashok Vihar, Gurgaon, Haryana",1st of 3 Floors,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ..."
2,2 BHK Flat in Sohna,tulsiani easy in homes,0.4,6722.0,Carpet area: 595 (55.28 sq.m.),2,2,3,not available,"Sohna, Gurgaon, Haryana",12nd of 14 Floors,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ..."
5,2 BHK Flat in Sector 36 Gurgaon,signature global infinity mall,0.41,6269.0,Built Up area: 654 (60.76 sq.m.),2,2,3,not available,"Sohna Sector 36, Sector 36 Gurgaon, Gurgaon, H...",3rd of 3 Floors,,undefined,,Best in class property available at sector 36 ...,,,
6,3 BHK Flat in Dwarka Expressway Gurgaon,the cocoon,2.0,13333.0,Super Built up area 1500(139.35 sq.m.),3,3,3,not available,"Dwarka Expressway Gurgaon, Gurgaon, Haryana",5th of 25 Floors,,0 to 1 Year Old,"['Shri Multispeciality Hospital', 'Esic Hospit...",Residential apartment for sell.The property co...,,,
7,3 BHK Flat in Sector 104 Gurgaon,ats triumph,1.8,7860.0,Carpet area: 2290 (212.75 sq.m.),3,4,3,not available,"Sector 104 Gurgaon, Gurgaon, Haryana",14th of 27 Floors,,0 to 1 Year Old,"['IFFCO Chowk Metro Station', 'The Esplanade M...",Ats triumph is one of gurgaon's most sought af...,,"['Power Back-up', 'Intercom Facility', 'Lift(s...","['Green Area4 out of 5', 'Amenities4.5 out of ..."
10,2 BHK Flat in Sector 81 Gurgaon,signature global city 81,0.96,9767.0,Carpet area: 1075 (99.87 sq.m.),2,2,2,not available,"Sector 81 Gurgaon, Gurgaon, Haryana",1st of 4 Floors,,Jun-24,"['Ambience Mall New', 'Dwarka Expressway', 'NH...",Signature global city 81 is one of gurgaon's m...,,"['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4.5 out of 5', 'Safety4.5 out of ..."
11,2 BHK Flat in Sohna,hcbs sports ville,0.29,5587.0,Carpet area: 519 (48.22 sq.m.),2,2,1,not available,"Sohna, Gurgaon, Haryana",4th of 13 Floors,,1 to 5 Year Old,"['The roadside cafe', 'GD Goenka Mess', 'ROyal...",Affordable housing in gurgaon and surrounded b...,,"['Security / Fire Alarm', 'Intercom Facility',...","['Green Area4 out of 5', 'Amenities4 out of 5'..."
12,3 BHK Flat in Sector 79 Gurgaon,supertech araville,1.35,6940.0,Carpet area: 1945 (180.7 sq.m.),3,3,3,not available,"Sector 79 Gurgaon, Gurgaon, Haryana",4th of 15 Floors,,1 to 5 Year Old,"['Petrol Pump Indian Oil', 'Petrol Pump', 'Rao...","Well spacious, nicely built with wooden floori...","['1 Water Purifier', '2 Fan', '1 Geyser', '3 L...",,"['Environment4 out of 5', 'Safety4 out of 5', ..."
13,2 BHK Flat in Sector 33 Gurgaon,godrej,0.95,6859.0,Super Built up area 1385(128.67 sq.m.),2,2,3+,not available,"Flat No. :- 301, Sector 33 Gurgaon, Gurgaon, H...",3rd of 20 Floors,South-East,Under Construction,,Its an godrej project nature plus.\nOn 3rd flo...,,,"['Environment3 out of 5', 'Safety4 out of 5', ..."


In [78]:
df['floorNum'].value_counts()

floorNum
2nd   of 4 Floors           74
3rd   of 4 Floors           71
4th   of 4 Floors           62
1st   of 4 Floors           61
12nd   of 14 Floors         49
14th   of 14 Floors         48
Ground of 14 Floors         40
10th   of 14 Floors         35
7th   of 14 Floors          35
8th   of 14 Floors          34
4th   of 14 Floors          28
6th   of 14 Floors          27
2nd   of 2 Floors           26
1st   of 14 Floors          26
3rd   of 3 Floors           26
3rd   of 14 Floors          24
5th   of 14 Floors          24
8th   of 19 Floors          24
11st   of 14 Floors         23
1st   of 1 Floors           23
9th   of 14 Floors          23
9th   of 9 Floors           23
5th   of 12 Floors          22
2nd   of 3 Floors           22
2nd   of 14 Floors          21
8th   of 18 Floors          20
10th   of 19 Floors         18
6th   of 18 Floors          18
10th   of 18 Floors         17
9th   of 13 Floors          17
7th   of 15 Floors          17
12nd   of 12 Floors         17

In [79]:
df['floorNum'].isnull().sum()

2

In [80]:
df['floorNum'] = df['floorNum'].str.split(' ').str.get(0).replace('Ground','0').str.replace('Basement','-1').str.replace('Lower','0').str.extract(r'(\d+)')

In [81]:
df.head()

Unnamed: 0,property_name,society,price,price_per_sqft,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Krishna Colony,maa bhagwati residency,45.0,5000.0,Carpet area: 900 (83.61 sq.m.),2,2,1,not available,"Krishna Colony, Gurgaon, Haryana",4,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ..."
1,2 BHK Flat in Ashok Vihar,apna enclave,50.0,7692.0,Carpet area: 650 (60.39 sq.m.),2,2,1,not available,"46b, Ashok Vihar, Gurgaon, Haryana",1,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ..."
2,2 BHK Flat in Sohna,tulsiani easy in homes,0.4,6722.0,Carpet area: 595 (55.28 sq.m.),2,2,3,not available,"Sohna, Gurgaon, Haryana",12,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ..."
3,2 BHK Flat in Sector 61 Gurgaon,smart world orchard,1.47,12250.0,Carpet area: 1200 (111.48 sq.m.),2,2,2,study room,"Sector 61 Gurgaon, Gurgaon, Haryana",2,,Dec-23,"['Sector 55-56 Metro station', 'Bestech Centra...",Near to metro station of sector 56 and opposit...,,"['Security / Fire Alarm', 'Private Garden / Te...",
4,2 BHK Flat in Sector 92 Gurgaon,parkwood westend,0.7,5204.0,Super Built up area 1345(124.95 sq.m.),2,2,3,study room,"Sector 92 Gurgaon, Gurgaon, Haryana",5,,Under Construction,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",We are the proud owners of this 2 bhk alongwit...,,,"['Environment5 out of 5', 'Safety3 out of 5', ..."


In [82]:
df['facing'].value_counts()

facing
North-East    505
East          490
North         301
South         203
West          183
North-West    162
South-East    144
South-West    135
Name: count, dtype: int64

In [83]:
df['facing'].isnull().sum()

874

In [84]:
df['facing'].fillna('NA',inplace=True)

In [85]:
df.insert(loc= 4, column= 'area', value= round((df['price']*10000000)/df['price_per_sqft']))

In [86]:
df.head()

Unnamed: 0,property_name,society,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Krishna Colony,maa bhagwati residency,45.0,5000.0,90000.0,Carpet area: 900 (83.61 sq.m.),2,2,1,not available,"Krishna Colony, Gurgaon, Haryana",4,West,1 to 5 Year Old,"['Chintapurni Mandir', 'State bank ATM', 'Pear...",So with lift.Maa bhagwati residency is one of ...,"['3 Fan', '4 Light', '1 Wardrobe', 'No AC', 'N...","['Feng Shui / Vaastu Compliant', 'Security / F...","['Environment4 out of 5', 'Safety4 out of 5', ..."
1,2 BHK Flat in Ashok Vihar,apna enclave,50.0,7692.0,65003.0,Carpet area: 650 (60.39 sq.m.),2,2,1,not available,"46b, Ashok Vihar, Gurgaon, Haryana",1,West,10+ Year Old,"['Chintapurni Mandir', 'Sheetla Mata Mandir', ...","Property situated on main road, railway statio...","['3 Wardrobe', '4 Fan', '1 Exhaust Fan', '1 Ge...","['Security / Fire Alarm', 'Maintenance Staff',...","['Environment4 out of 5', 'Safety4 out of 5', ..."
2,2 BHK Flat in Sohna,tulsiani easy in homes,0.4,6722.0,595.0,Carpet area: 595 (55.28 sq.m.),2,2,3,not available,"Sohna, Gurgaon, Haryana",12,,0 to 1 Year Old,"['Huda City Metro', 'Golf Course extn road', '...","This property is 15 km away from badshapur, gu...",,"['Power Back-up', 'Feng Shui / Vaastu Complian...","['Environment4 out of 5', 'Safety4 out of 5', ..."
3,2 BHK Flat in Sector 61 Gurgaon,smart world orchard,1.47,12250.0,1200.0,Carpet area: 1200 (111.48 sq.m.),2,2,2,study room,"Sector 61 Gurgaon, Gurgaon, Haryana",2,,Dec-23,"['Sector 55-56 Metro station', 'Bestech Centra...",Near to metro station of sector 56 and opposit...,,"['Security / Fire Alarm', 'Private Garden / Te...",
4,2 BHK Flat in Sector 92 Gurgaon,parkwood westend,0.7,5204.0,1345.0,Super Built up area 1345(124.95 sq.m.),2,2,3,study room,"Sector 92 Gurgaon, Gurgaon, Haryana",5,,Under Construction,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",We are the proud owners of this 2 bhk alongwit...,,,"['Environment5 out of 5', 'Safety3 out of 5', ..."


In [87]:
df.insert(loc=1,column='property_type',value='flat')

In [88]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2997 entries, 0 to 3007
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   property_name    2996 non-null   object 
 1   property_type    2997 non-null   object 
 2   society          2997 non-null   object 
 3   price            2996 non-null   float64
 4   price_per_sqft   2996 non-null   float64
 5   area             2996 non-null   float64
 6   areaWithType     2997 non-null   object 
 7   bedRoom          2997 non-null   int32  
 8   bathroom         2997 non-null   int32  
 9   balcony          2997 non-null   object 
 10  additionalRoom   2997 non-null   object 
 11  address          2991 non-null   object 
 12  floorNum         2995 non-null   object 
 13  facing           2997 non-null   object 
 14  agePossession    2996 non-null   object 
 15  nearbyLocations  2906 non-null   object 
 16  description      2997 non-null   object 
 17  furnishDetails   22

In [89]:
df.to_csv('flats_cleaned.csv',index=False)