# **Problem Statement**: Dataset Preprocessing for Real Estate Analysis

* In the realm of real estate analysis, it is crucial to preprocess and refine raw data to extract meaningful insights. Your dataset provides information about various properties, including their attributes and characteristics.
* The goal of this preprocessing task is to streamline the data, enhance its usability, and prepare it for further analysis.
* The specific preprocessing steps involve creating a sector column based on address information, dropping unnecessary features, and ensuring data consistency.

# **Description of Dataset:**

The dataset contains information related to different properties available in a certain area. The dataset includes the following columns:

* property_name: The name or identifier of the property.
* property_type: The type or category of the property (e.g., apartment, villa, commercial space).
* society: The name of the society or complex where the property is located.
* price: The price of the property.
* price_per_sqft: The price per square foot of the property.
* area: The total area of the property in square feet.
* areaWithType: The area along with the type (e.g., 2 BHK, 3 BHK + Study).
* bedRoom: The number of bedrooms in the property.
* bathroom: The number of bathrooms in the property.
* balcony: The number of balconies in the property.
* additionalRoom: Information about any additional rooms in the property.
* address: The address of the property.
* floorNum: The floor number of the property.
* facing: The direction the property is facing (e.g., North, South, East, West).
* agePossession: The age of the property or possession status.
* nearbyLocations: Locations or landmarks near the property's vicinity.
* description: A textual description of the property.
* furnishDetails: Details about the furnishing status of the property.
* features: Notable features or amenities of the property.
* rating: A rating associated with the property's quality or desirability.

# **Preprocessing Steps:**

* **Sector Column Creation**: Extract the sector information from the address column and create a new column named sector. This column will help categorize properties based on their location.

* **Feature Selection**: Identify features that are not relevant or contribute significantly to the analysis and decision-making process. Drop these features from the dataset to streamline the data and improve its quality.

* **Data Consistency**: Check for any inconsistencies or missing values within the remaining features. Handle missing data appropriately by either imputing values or removing incomplete records.

* **Data Format Optimization**: Convert any columns with non-standard formats (e.g., numeric values stored as text) to their appropriate data types.

* **Data Normalization**: If necessary, normalize numeric columns to a common scale to avoid bias in analysis due to varying ranges.

By completing these preprocessing steps, you will be able to transform the raw dataset into a cleaner, more organized, and more suitable format for your real estate analysis. This will ultimately assist you in making informed decisions and deriving valuable insights from the data.

# **Import Basic Libraries**

In [1]:
import numpy as np
import pandas as pd

# **Import Dataset**

In [2]:
df = pd.read_csv('gurgaon_properties.csv')

# **Basic Checks**

In [3]:
df.head()

Unnamed: 0,property_name,property_type,society,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Sector 93 Gurgaon,flat,signature global orchard avenue,0.4,7359.0,544.0,Carpet area: 543.53 (50.5 sq.m.),2,2,2,not available,"Sector 93 Gurgaon, Gurgaon, Haryana",11.0,,0 to 1 Year Old,"['JMS Crosswalk Mall', 'Reliance Trends Newtow...","Situated in sector-93 gurgaon, gurgaon, signat...","['1 Fan', '1 Exhaust Fan', '1 Geyser', '5 Ligh...","['Security / Fire Alarm', 'Lift(s)', 'Maintena...","['Green Area4 out of 5', 'Construction4 out of..."
1,2 BHK Flat in Sector 104 Gurgaon,flat,zara aavaas,0.4,7029.0,569.0,Super Built up area 569(52.86 sq.m.),2,2,1,not available,"Sector 104 Gurgaon, Gurgaon, Haryana",14.0,,1 to 5 Year Old,"['Ardee Mall', 'Northern Peripheral Road', 'Mp...",Residential apartment for sell.Located in sect...,"['2 Fan', '2 Light', '2 Wardrobe', 'No AC', 'N...","['Intercom Facility', 'Lift(s)', 'Swimming Poo...","['Environment4 out of 5', 'Lifestyle4 out of 5..."
2,4 BHK Flat in Sector 104 Gurgaon,flat,ats triumph,2.55,13076.0,1950.0,Super Built up area 3150(292.64 sq.m.)Carpet a...,4,4,3+,servant room,"21st, Sector 104 Gurgaon, Gurgaon, Haryana",21.0,East,1 to 5 Year Old,"['IFFCO Chowk Metro Station', 'The Esplanade M...",Looking for a 4 bhk property for sale in gurga...,"['6 Fan', '18 Light', '4 AC', 'No Bed', 'No Ch...","['Water purifier', 'Centrally Air Conditioned'...","['Green Area4 out of 5', 'Construction5 out of..."
3,3 BHK Flat in Sector 108 Gurgaon,flat,experion the heartsong,1.65,8237.0,2003.0,Super Built up area 2003(186.08 sq.m.)Built Up...,3,4,3,study room,"Sector 108 Gurgaon, Gurgaon, Haryana",3.0,North-West,5 to 10 Year Old,"['Galleria 108 Mall', 'Dwarka Expressway', 'Ce...","Experion, one of the most reputed real estate ...","['6 Fan', '6 Light', '4 AC', '1 Modular Kitche...","['Security / Fire Alarm', 'Power Back-up', 'In...","['Green Area5 out of 5', 'Construction4 out of..."
4,3 Bedroom House for sale in Nirvana Country,house,unitech deerwood chase,8.45,235376.0,359.0,Plot area 359(33.35 sq.m.),3,3,2,"study room,servant room","Deerwood, Nirvana Country, Gurgaon, Haryana",2.0,North-East,10+ Year Old,"['Sector 55-56 Metro Station', 'Raheja Mall', ...",Brokers pls excuse....Independent villa availa...,"['5 Fan', '1 Exhaust Fan', '6 Geyser', '1 Stov...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment5 out of 5', 'Lifestyle5 out of 5..."


In [4]:
# shape
df.shape

(3974, 20)

**Drop Duplicated**

In [5]:
df.duplicated().sum()

13

In [6]:
df = df.drop_duplicates()

In [7]:
df.duplicated().sum()

0

In [8]:
df.shape

(3961, 20)

In [9]:
# info
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3961 entries, 0 to 3973
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   property_name    3961 non-null   object 
 1   property_type    3961 non-null   object 
 2   society          3960 non-null   object 
 3   price            3941 non-null   float64
 4   price_per_sqft   3941 non-null   float64
 5   area             3941 non-null   float64
 6   areaWithType     3961 non-null   object 
 7   bedRoom          3961 non-null   int64  
 8   bathroom         3961 non-null   int64  
 9   balcony          3961 non-null   object 
 10  additionalRoom   3961 non-null   object 
 11  address          3950 non-null   object 
 12  floorNum         3940 non-null   float64
 13  facing           2784 non-null   object 
 14  agePossession    3960 non-null   object 
 15  nearbyLocations  3754 non-null   object 
 16  description      3961 non-null   object 
 17  furnishDetails

In [10]:
# Check null values
df.isnull().sum()

property_name         0
property_type         0
society               1
price                20
price_per_sqft       20
area                 20
areaWithType          0
bedRoom               0
bathroom              0
balcony               0
additionalRoom        0
address              11
floorNum             21
facing             1177
agePossession         1
nearbyLocations     207
description           0
furnishDetails     1032
features            709
rating              450
dtype: int64

# **Data Preprocesssing**

In [11]:
df.head()

Unnamed: 0,property_name,property_type,society,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Sector 93 Gurgaon,flat,signature global orchard avenue,0.4,7359.0,544.0,Carpet area: 543.53 (50.5 sq.m.),2,2,2,not available,"Sector 93 Gurgaon, Gurgaon, Haryana",11.0,,0 to 1 Year Old,"['JMS Crosswalk Mall', 'Reliance Trends Newtow...","Situated in sector-93 gurgaon, gurgaon, signat...","['1 Fan', '1 Exhaust Fan', '1 Geyser', '5 Ligh...","['Security / Fire Alarm', 'Lift(s)', 'Maintena...","['Green Area4 out of 5', 'Construction4 out of..."
1,2 BHK Flat in Sector 104 Gurgaon,flat,zara aavaas,0.4,7029.0,569.0,Super Built up area 569(52.86 sq.m.),2,2,1,not available,"Sector 104 Gurgaon, Gurgaon, Haryana",14.0,,1 to 5 Year Old,"['Ardee Mall', 'Northern Peripheral Road', 'Mp...",Residential apartment for sell.Located in sect...,"['2 Fan', '2 Light', '2 Wardrobe', 'No AC', 'N...","['Intercom Facility', 'Lift(s)', 'Swimming Poo...","['Environment4 out of 5', 'Lifestyle4 out of 5..."
2,4 BHK Flat in Sector 104 Gurgaon,flat,ats triumph,2.55,13076.0,1950.0,Super Built up area 3150(292.64 sq.m.)Carpet a...,4,4,3+,servant room,"21st, Sector 104 Gurgaon, Gurgaon, Haryana",21.0,East,1 to 5 Year Old,"['IFFCO Chowk Metro Station', 'The Esplanade M...",Looking for a 4 bhk property for sale in gurga...,"['6 Fan', '18 Light', '4 AC', 'No Bed', 'No Ch...","['Water purifier', 'Centrally Air Conditioned'...","['Green Area4 out of 5', 'Construction5 out of..."
3,3 BHK Flat in Sector 108 Gurgaon,flat,experion the heartsong,1.65,8237.0,2003.0,Super Built up area 2003(186.08 sq.m.)Built Up...,3,4,3,study room,"Sector 108 Gurgaon, Gurgaon, Haryana",3.0,North-West,5 to 10 Year Old,"['Galleria 108 Mall', 'Dwarka Expressway', 'Ce...","Experion, one of the most reputed real estate ...","['6 Fan', '6 Light', '4 AC', '1 Modular Kitche...","['Security / Fire Alarm', 'Power Back-up', 'In...","['Green Area5 out of 5', 'Construction4 out of..."
4,3 Bedroom House for sale in Nirvana Country,house,unitech deerwood chase,8.45,235376.0,359.0,Plot area 359(33.35 sq.m.),3,3,2,"study room,servant room","Deerwood, Nirvana Country, Gurgaon, Haryana",2.0,North-East,10+ Year Old,"['Sector 55-56 Metro Station', 'Raheja Mall', ...",Brokers pls excuse....Independent villa availa...,"['5 Fan', '1 Exhaust Fan', '6 Geyser', '1 Stov...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment5 out of 5', 'Lifestyle5 out of 5..."


In [12]:
# insert one sector columns and sector information came from property name column because sector very information in metro cities like Gurugaon
df.insert(loc=3,column='sector',value=df['property_name'].str.split('in').str.get(1).str.replace('Gurgaon','').str.strip())

In [13]:
df.head()

Unnamed: 0,property_name,property_type,society,sector,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,...,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Sector 93 Gurgaon,flat,signature global orchard avenue,Sector 93,0.4,7359.0,544.0,Carpet area: 543.53 (50.5 sq.m.),2,2,...,not available,"Sector 93 Gurgaon, Gurgaon, Haryana",11.0,,0 to 1 Year Old,"['JMS Crosswalk Mall', 'Reliance Trends Newtow...","Situated in sector-93 gurgaon, gurgaon, signat...","['1 Fan', '1 Exhaust Fan', '1 Geyser', '5 Ligh...","['Security / Fire Alarm', 'Lift(s)', 'Maintena...","['Green Area4 out of 5', 'Construction4 out of..."
1,2 BHK Flat in Sector 104 Gurgaon,flat,zara aavaas,Sector 104,0.4,7029.0,569.0,Super Built up area 569(52.86 sq.m.),2,2,...,not available,"Sector 104 Gurgaon, Gurgaon, Haryana",14.0,,1 to 5 Year Old,"['Ardee Mall', 'Northern Peripheral Road', 'Mp...",Residential apartment for sell.Located in sect...,"['2 Fan', '2 Light', '2 Wardrobe', 'No AC', 'N...","['Intercom Facility', 'Lift(s)', 'Swimming Poo...","['Environment4 out of 5', 'Lifestyle4 out of 5..."
2,4 BHK Flat in Sector 104 Gurgaon,flat,ats triumph,Sector 104,2.55,13076.0,1950.0,Super Built up area 3150(292.64 sq.m.)Carpet a...,4,4,...,servant room,"21st, Sector 104 Gurgaon, Gurgaon, Haryana",21.0,East,1 to 5 Year Old,"['IFFCO Chowk Metro Station', 'The Esplanade M...",Looking for a 4 bhk property for sale in gurga...,"['6 Fan', '18 Light', '4 AC', 'No Bed', 'No Ch...","['Water purifier', 'Centrally Air Conditioned'...","['Green Area4 out of 5', 'Construction5 out of..."
3,3 BHK Flat in Sector 108 Gurgaon,flat,experion the heartsong,Sector 108,1.65,8237.0,2003.0,Super Built up area 2003(186.08 sq.m.)Built Up...,3,4,...,study room,"Sector 108 Gurgaon, Gurgaon, Haryana",3.0,North-West,5 to 10 Year Old,"['Galleria 108 Mall', 'Dwarka Expressway', 'Ce...","Experion, one of the most reputed real estate ...","['6 Fan', '6 Light', '4 AC', '1 Modular Kitche...","['Security / Fire Alarm', 'Power Back-up', 'In...","['Green Area5 out of 5', 'Construction4 out of..."
4,3 Bedroom House for sale in Nirvana Country,house,unitech deerwood chase,Nirvana Country,8.45,235376.0,359.0,Plot area 359(33.35 sq.m.),3,3,...,"study room,servant room","Deerwood, Nirvana Country, Gurgaon, Haryana",2.0,North-East,10+ Year Old,"['Sector 55-56 Metro Station', 'Raheja Mall', ...",Brokers pls excuse....Independent villa availa...,"['5 Fan', '1 Exhaust Fan', '6 Geyser', '1 Stov...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment5 out of 5', 'Lifestyle5 out of 5..."


In [14]:
df['sector']=df['sector'].str.lower() # for lower case

In [15]:
df.head()

Unnamed: 0,property_name,property_type,society,sector,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,...,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
0,2 BHK Flat in Sector 93 Gurgaon,flat,signature global orchard avenue,sector 93,0.4,7359.0,544.0,Carpet area: 543.53 (50.5 sq.m.),2,2,...,not available,"Sector 93 Gurgaon, Gurgaon, Haryana",11.0,,0 to 1 Year Old,"['JMS Crosswalk Mall', 'Reliance Trends Newtow...","Situated in sector-93 gurgaon, gurgaon, signat...","['1 Fan', '1 Exhaust Fan', '1 Geyser', '5 Ligh...","['Security / Fire Alarm', 'Lift(s)', 'Maintena...","['Green Area4 out of 5', 'Construction4 out of..."
1,2 BHK Flat in Sector 104 Gurgaon,flat,zara aavaas,sector 104,0.4,7029.0,569.0,Super Built up area 569(52.86 sq.m.),2,2,...,not available,"Sector 104 Gurgaon, Gurgaon, Haryana",14.0,,1 to 5 Year Old,"['Ardee Mall', 'Northern Peripheral Road', 'Mp...",Residential apartment for sell.Located in sect...,"['2 Fan', '2 Light', '2 Wardrobe', 'No AC', 'N...","['Intercom Facility', 'Lift(s)', 'Swimming Poo...","['Environment4 out of 5', 'Lifestyle4 out of 5..."
2,4 BHK Flat in Sector 104 Gurgaon,flat,ats triumph,sector 104,2.55,13076.0,1950.0,Super Built up area 3150(292.64 sq.m.)Carpet a...,4,4,...,servant room,"21st, Sector 104 Gurgaon, Gurgaon, Haryana",21.0,East,1 to 5 Year Old,"['IFFCO Chowk Metro Station', 'The Esplanade M...",Looking for a 4 bhk property for sale in gurga...,"['6 Fan', '18 Light', '4 AC', 'No Bed', 'No Ch...","['Water purifier', 'Centrally Air Conditioned'...","['Green Area4 out of 5', 'Construction5 out of..."
3,3 BHK Flat in Sector 108 Gurgaon,flat,experion the heartsong,sector 108,1.65,8237.0,2003.0,Super Built up area 2003(186.08 sq.m.)Built Up...,3,4,...,study room,"Sector 108 Gurgaon, Gurgaon, Haryana",3.0,North-West,5 to 10 Year Old,"['Galleria 108 Mall', 'Dwarka Expressway', 'Ce...","Experion, one of the most reputed real estate ...","['6 Fan', '6 Light', '4 AC', '1 Modular Kitche...","['Security / Fire Alarm', 'Power Back-up', 'In...","['Green Area5 out of 5', 'Construction4 out of..."
4,3 Bedroom House for sale in Nirvana Country,house,unitech deerwood chase,nirvana country,8.45,235376.0,359.0,Plot area 359(33.35 sq.m.),3,3,...,"study room,servant room","Deerwood, Nirvana Country, Gurgaon, Haryana",2.0,North-East,10+ Year Old,"['Sector 55-56 Metro Station', 'Raheja Mall', ...",Brokers pls excuse....Independent villa availa...,"['5 Fan', '1 Exhaust Fan', '6 Geyser', '1 Stov...","['Feng Shui / Vaastu Compliant', 'Private Gard...","['Environment5 out of 5', 'Lifestyle5 out of 5..."


In [16]:
df['sector'].value_counts()

sohna                 163
sector 102            113
sector 85             110
sector 92             104
sector 69              94
                     ... 
near euro               1
ram nagar               1
kheri                   1
shivji park colony      1
vir nagar               1
Name: sector, Length: 301, dtype: int64

* Here there many Colony present in sector columns but we want specific sector information so we collect data from internet which colony placed in which sector
* and then replace it with sector

In [17]:
df['sector'] = df['sector'].str.replace('dharam colony','sector 12')
df['sector'] = df['sector'].str.replace('krishna colony','sector 7')
df['sector'] = df['sector'].str.replace('suncity','sector 54')
df['sector'] = df['sector'].str.replace('prem nagar','sector 13')
df['sector'] = df['sector'].str.replace('mg road','sector 28')
df['sector'] = df['sector'].str.replace('gandhi nagar','sector 28')
df['sector'] = df['sector'].str.replace('laxmi garden','sector 11')
df['sector'] = df['sector'].str.replace('shakti nagar','sector 11')

In [18]:
df['sector'] = df['sector'].str.replace('baldev nagar','sector 7')
df['sector'] = df['sector'].str.replace('shivpuri','sector 7')
df['sector'] = df['sector'].str.replace('garhi harsaru','sector 17')
df['sector'] = df['sector'].str.replace('imt manesar','manesar')
df['sector'] = df['sector'].str.replace('adarsh nagar','sector 12')
df['sector'] = df['sector'].str.replace('shivaji nagar','sector 11')
df['sector'] = df['sector'].str.replace('bhim nagar','sector 6')
df['sector'] = df['sector'].str.replace('madanpuri','sector 7')

In [19]:
df['sector'] = df['sector'].str.replace('saraswati vihar','sector 28')
df['sector'] = df['sector'].str.replace('arjun nagar','sector 8')
df['sector'] = df['sector'].str.replace('ravi nagar','sector 9')
df['sector'] = df['sector'].str.replace('vishnu garden','sector 105')
df['sector'] = df['sector'].str.replace('bhondsi','sector 11')
df['sector'] = df['sector'].str.replace('surya vihar','sector 21')
df['sector'] = df['sector'].str.replace('devilal colony','sector 9')
df['sector'] = df['sector'].str.replace('valley view estate','gwal pahari')

In [20]:
df['sector'] = df['sector'].str.replace('mehrauli  road','sector 14')
df['sector'] = df['sector'].str.replace('jyoti park','sector 7')
df['sector'] = df['sector'].str.replace('ansal plaza','sector 23')
df['sector'] = df['sector'].str.replace('dayanand colony','sector 6')
df['sector'] = df['sector'].str.replace('sushant lok phase 2','sector 55')
df['sector'] = df['sector'].str.replace('chakkarpur','sector 28')
df['sector'] = df['sector'].str.replace('greenwood city','sector 45')
df['sector'] = df['sector'].str.replace('subhash nagar','sector 12')

In [21]:
df['sector'] = df['sector'].str.replace('sohna road road','sohna road')
df['sector'] = df['sector'].str.replace('malibu town','sector 47')
df['sector'] = df['sector'].str.replace('surat nagar 1','sector 104')
df['sector'] = df['sector'].str.replace('new colony','sector 7')
df['sector'] = df['sector'].str.replace('mianwali colony','sector 12')
df['sector'] = df['sector'].str.replace('jacobpura','sector 12')
df['sector'] = df['sector'].str.replace('rajiv nagar','sector 13')
df['sector'] = df['sector'].str.replace('ashok vihar','sector 3')

In [22]:
df['sector'] = df['sector'].str.replace('dlf phase 1','sector 26')
df['sector'] = df['sector'].str.replace('nirvana country','sector 50')
df['sector'] = df['sector'].str.replace('palam vihar','sector 2')
df['sector'] = df['sector'].str.replace('dlf phase 2','sector 25')
df['sector'] = df['sector'].str.replace('sushant lok phase 1','sector 43')
df['sector'] = df['sector'].str.replace('laxman vihar','sector 4')
df['sector'] = df['sector'].str.replace('dlf phase 4','sector 28')
df['sector'] = df['sector'].str.replace('dlf phase 3','sector 24')

In [23]:
df['sector'] = df['sector'].str.replace('sushant lok phase 3','sector 57')
df['sector'] = df['sector'].str.replace('dlf phase 5','sector 43')
df['sector'] = df['sector'].str.replace('rajendra park','sector 105')
df['sector'] = df['sector'].str.replace('uppals southend','sector 49')
df['sector'] = df['sector'].str.replace('sohna','sohna road')
df['sector'] = df['sector'].str.replace('ashok vihar phase 3 extension','sector 5')
df['sector'] = df['sector'].str.replace('south city 1','sector 41')
df['sector'] = df['sector'].str.replace('ashok vihar phase 2','sector 5')

In [24]:
a = df['sector'].value_counts()[df['sector'].value_counts() >= 3]
df = df[df['sector'].isin(a.index)]

In [25]:
df['sector'].value_counts()

sohna road           163
sector 102           113
sector 85            110
sector 92            104
sector 69             94
                    ... 
sector 110 a           3
a block sector 43      3
sector 17a             3
b block sector 43      3
maruti kunj            3
Name: sector, Length: 131, dtype: int64

In [None]:
df['sector'] = df['sector'].str.replace('sector 95a','sector 95')
df['sector'] = df['sector'].str.replace('sector 23a','sector 23')
df['sector'] = df['sector'].str.replace('sector 12a','sector 12')
df['sector'] = df['sector'].str.replace('sector 3a','sector 3')
df['sector'] = df['sector'].str.replace('sector 110 a','sector 110')
df['sector'] = df['sector'].str.replace('patel nagar','sector 15')
df['sector'] = df['sector'].str.replace('a block sector 43','sector 43')
df['sector'] = df['sector'].str.replace('maruti kunj','sector 12')
df['sector'] = df['sector'].str.replace('b block sector 43','sector 43')

In [None]:
df['sector'] = df['sector'].str.replace('sector-33 sohna road','sector 33')
df['sector'] = df['sector'].str.replace('sector 1 manesar','manesar')
df['sector'] = df['sector'].str.replace('sector 4 phase 2','sector 4')
df['sector'] = df['sector'].str.replace('sector 1a manesar','manesar')
df['sector'] = df['sector'].str.replace('c block sector 43','sector 43')
df['sector'] = df['sector'].str.replace('sector 89 a','sector 89')
df['sector'] = df['sector'].str.replace('sector 2 extension','sector 2')
df['sector'] = df['sector'].str.replace('sector 36 sohna road','sector 36')

In [28]:
df[df['sector'] == 'new']

Unnamed: 0,property_name,property_type,society,sector,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,...,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
597,2 BHK Flat in New Gurgaon,flat,takshila heights sector 37 c,new,0.67,5583.0,1200.0,Super Built up area 1200(111.48 sq.m.),2,2,...,not available,"New Gurgaon, Gurgaon, Haryana",3.0,,1 to 5 Year Old,"['Shri Balaji Hospital and Trauma Center', 'S....",Check out this 2 bhk apartment for sale in tak...,[],"['Lift(s)', 'Swimming Pool', 'Visitor Parking'...",
1603,2 BHK Flat in New Gurgaon,flat,green court,new,0.38,5507.0,690.0,Carpet area: 690 (64.1 sq.m.),2,2,...,not available,"New Gurgaon, Gurgaon, Haryana",7.0,,Under Construction,"['Ing bank ATM', 'Dcb bank ATM', 'Indus ind ba...",We are the proud owners of this 2 bhk apartmen...,[],"['Intercom Facility', 'Lift(s)', 'Maintenance ...",
3709,4 BHK Flat in New Gurgaon,flat,sare homes,new,0.85,4786.0,1776.0,Super Built up area 1776(165 sq.m.),4,4,...,not available,"New Gurgaon, Gurgaon, Haryana",3.0,,5 to 10 Year Old,"['Columbia Asia Hospital', 'Apex Multi Special...",Located in the popular residential address of ...,[],,
3778,4 BHK Flat in New Gurgaon,flat,dlf 76,new,4.0,11428.0,3500.0,Carpet area: 3500 (325.16 sq.m.),4,4,...,"study room,servant room","New Gurgaon, Gurgaon, Haryana",4.0,,Jun-27,"['Shri Balaji Hospital and Trauma Center', 'S....",This lovely 4 bhk apartment/flat in new gurgao...,"['6 Wardrobe', '1 Fridge', '8 Fan', '1 Exhaust...","['Security / Fire Alarm', 'Feng Shui / Vaastu ...",


In [29]:
df.loc[597,'sector'] = 'sector 37'
df.loc[1603,'sector'] = 'sector 90'
df.loc[3709,'sector'] = 'sector 92'
df.loc[3778,'sector'] = 'sector 76'

In [30]:
df[df['sector'] == 'new sector 2']

Unnamed: 0,property_name,property_type,society,sector,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,...,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
8,2 Bedroom House for sale in New Palam Vihar,house,my home,new sector 2,0.34,12592.0,270.0,Plot area 270(25.08 sq.m.),2,2,...,not available,"Ez-19 A, New Palam Vihar, Gurgaon, Haryana",3.0,West,5 to 10 Year Old,"['Palam Vihar Vyapar kendra', 'Palam triangle'...",There are availability of various facilities l...,"['1 Wardrobe', '3 Fan', '6 Light', 'No AC', 'N...","['Water Storage', 'Park', 'Visitor Parking']","['Environment4 out of 5', 'Lifestyle4 out of 5..."
1815,2 BHK Flat in New Palam Vihar,flat,ompee k s residency,new sector 2,1.6,26936.0,594.0,Carpet area: 66 (55.18 sq.m.),2,2,...,not available,"New Palam Vihar, Gurgaon, Haryana",1.0,,1 to 5 Year Old,"['Palam Vihar Vyapar kendra', 'Palam triangle'...",We are the proud owners of this 2 bhk apartmen...,,,"['Environment4 out of 5', 'Safety4 out of 5', ..."
2582,2 BHK Flat in New Palam Vihar,flat,my home,new sector 2,0.22,4400.0,500.0,Carpet area: 500 (46.45 sq.m.),2,2,...,not available,"New Palam Vihar, Gurgaon, Haryana",1.0,,0 to 1 Year Old,"['Palam Vihar Vyapar kendra', 'Palam triangle'...",Cctv surveillance are provided here. There is ...,"['3 Fan', '1 Exhaust Fan', '15 Light', '1 Modu...",,"['Safety4 out of 5', 'Lifestyle4 out of 5', 'E..."
2946,2 BHK Flat in New Palam Vihar,flat,my home,new sector 2,0.28,3166.0,884.0,Carpet area: 900 (83.61 sq.m.),2,1,...,others,"F 150/b, New Palam Vihar, Gurgaon, Haryana",2.0,,1 to 5 Year Old,"['Palam Vihar Vyapar kendra', 'Palam triangle'...","2 bhk room with wooden coverd ,1 drawing room,...","['3 Wardrobe', '5 Light', '1 Chimney', '1 Modu...","['Water Storage', 'Park']","['Environment4 out of 5', 'Safety4 out of 5', ..."
3239,3 Bedroom House for sale in New Palam Vihar,house,independent,new sector 2,1.0,8796.0,1137.0,Plot area 120(100.34 sq.m.)Built Up area: 120 ...,3,2,...,pooja room,"Q-148, New Palam Vihar, Phase-2, Near Royal Oa...",1.0,North,10+ Year Old,"['Palam Vihar Vyapar kendra', 'Palam triangle'...","Ground and first floor, Ground floor: Ground f...",,,"['Environment4 out of 5', 'Lifestyle4 out of 5..."


In [31]:
df.loc[[8,1815,2582,2946,3239],'sector'] = 'sector 110'

In [32]:
df.shape

(3803, 21)

In [35]:
df.sample(5)

Unnamed: 0,property_name,property_type,society,sector,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,...,additionalRoom,address,floorNum,facing,agePossession,nearbyLocations,description,furnishDetails,features,rating
2567,2 BHK Flat in Sector 113 Gurgaon,flat,la vida by tata housing,sector 113,1.55,11654.0,1330.0,Super Built up area 1330(123.56 sq.m.)Built Up...,2,2,...,not available,"Sector 113 Gurgaon , Gurgaon, Haryana",2.0,North-East,0 to 1 Year Old,"['Dwarka Sector 21', 'Pacific D21 Mall', 'Bajg...","Situated in sector 113 gurgaon , la vida by ta...","['2 Light', 'No AC', 'No Bed', 'No Chimney', '...","['Centrally Air Conditioned', 'Water purifier'...",
850,3 BHK Flat in Sector 47 Gurgaon,flat,unitech uniworld gardens,sector 47,2.4,11505.0,2086.0,Super Built up area 2086(193.8 sq.m.),3,3,...,servant room,"Sector 47 Gurgaon, Gurgaon, Haryana",10.0,,5 to 10 Year Old,"['Rajiv Chowk Mosque', 'Standard chartered ATM...","3bhk + sq, 2086 sq feet\nSemi-Furnished \nNewl...","['3 Wardrobe', '6 Fan', '3 Geyser', '15 Light'...","['Security / Fire Alarm', 'Feng Shui / Vaastu ...","['Green Area4 out of 5', 'Construction4 out of..."
3406,3 BHK Flat in Sector 92 Gurgaon,flat,sare homes,sector 92,0.71,5354.0,1326.0,Carpet area: 1326 (123.19 sq.m.),3,3,...,pooja room,"7002, Sector 92 Gurgaon, Gurgaon, Haryana",2.0,North-West,1 to 5 Year Old,"['Yadav Clinic', 'Bangali Clinic', 'Dr. J. S. ...",Sare homes is one of gurgaon's most sought aft...,,"['Security / Fire Alarm', 'Feng Shui / Vaastu ...","['Environment5 out of 5', 'Lifestyle4 out of 5..."
577,3 BHK Flat in Sector 102 Gurgaon,flat,adani oyster greens,sector 102,1.9,10058.0,1889.0,Carpet area: 1889 (175.49 sq.m.),3,3,...,"pooja room,study room,store room","1456, Sector 102 Gurgaon, Gurgaon, Haryana",9.0,North-East,1 to 5 Year Old,"['MSK bedding products', 'INOX Gurgaon Dreamz ...",Looking for a 3 bhk property for sale in gurga...,"['8 Fan', '7 Light', '1 Modular Kitchen', '1 C...",,"['Environment4 out of 5', 'Lifestyle4.5 out of..."
1167,3 BHK Flat in Sector 79 Gurgaon,flat,mapsko mount ville,sector 79,1.28,8557.0,1496.0,Super Built up area 1490(138.43 sq.m.)Carpet a...,3,3,...,study room,"Sector-79, Sector 79 Gurgaon, Gurgaon, Haryana",16.0,South,1 to 5 Year Old,"['Huda Metro Station (Gurugram)', 'Sapphire 83...",1490 sqft 3 bhk 3 bathroom semi furnished apar...,"['3 Wardrobe', '1 Fan', '1 Exhaust Fan', '1 Li...","['Security / Fire Alarm', 'Power Back-up', 'Fe...","['Green Area5 out of 5', 'Construction5 out of..."


# **Drop Features**

In [36]:
# features to drop -> property_name, address, description, rating
df.drop(columns=['property_name', 'address', 'description', 'rating'],inplace=True)

In [37]:
df.sample(5)

Unnamed: 0,property_type,society,sector,price,price_per_sqft,area,areaWithType,bedRoom,bathroom,balcony,additionalRoom,floorNum,facing,agePossession,nearbyLocations,furnishDetails,features
2411,house,independent,sector 55,5.0,44444.0,1125.0,Plot area 125(104.52 sq.m.)Built Up area: 115 ...,9,9,3+,others,5.0,East,0 to 1 Year Old,"['Sector metro station', 'Sector metro station...","['14 Fan', '9 Geyser', '17 Light', '10 AC', '9...","['Private Garden / Terrace', 'Maintenance Staf..."
2159,house,independent,sector 11,2.5,17857.0,1400.0,Plot area 1400(130.06 sq.m.)Built Up area: 185...,5,4,3,store room,1.0,East,10+ Year Old,"['Rajiv Chowk Mosque', 'Hanuman Mandir', 'Hdfc...",[],
3678,flat,eldeco accolade,sohna road,1.1,7549.0,1457.0,Super Built up area 1457(135.36 sq.m.),2,2,3+,"study room,others",17.0,,1 to 5 Year Old,"['Global City Centre', 'Sohna Road', 'Damdama ...","['1 Water Purifier', '1 Exhaust Fan', '2 Geyse...",['Visitor Parking']
952,flat,godrej nature plus,sector 33,1.5,9677.0,1550.0,Super Built up area 1550(144 sq.m.),3,3,3,study room,6.0,East,Under Construction,"['Signature Global Infinity Mall Sohna', 'Bads...","['2 Wardrobe', '2 Fan', '3 Geyser', '5 Light',...","['Feng Shui / Vaastu Compliant', 'Security / F..."
2731,flat,eldeco accolade,sohna road,0.95,6542.0,1452.0,Super Built up area 1452(134.9 sq.m.)Carpet ar...,2,2,3+,study room,6.0,East,0 to 1 Year Old,"['Global City Centre', 'Sohna Road', 'Damdama ...","['2 Wardrobe', '1 Exhaust Fan', '14 Light', '1...",


In [38]:
df.shape

(3803, 17)

 * **feature engineering required** -> areaWithType, additionalRoom, facing, agePossession, furnishDetails, features
 * So we do Feature engineering in our Data so we export this datset as CSV

In [39]:
df.to_csv('gurgaon_properties_cleaned_v1.csv',index=False)