# <CENTER>**`Data Gathering: Gurgram Real Estate Project`**<CENTER>

#### **`Data Gathering`** - It is the process of **collecting raw data** from various sources to build, train, and test machine learning models.

* 🎯 **Why is it Important?**
    * **Garbage in, garbage out**: The quality and quantity of your data directly affect the performance of your ML model.
    * ML models learn from patterns in the data — if the data is **incomplete, noisy, biased**, or **not relevant**, the model will be too.

---

## **`3. DATA GATHERING:`**

### 3.1 Source of Data
To build a predictive model in the real estate domain, data was collected from prominent Indian real estate listing platforms, namely:

* [99acres.com](https://www.99acres.com)
* [Housing.com](https://housing.com)

These platforms provide comprehensive property listings and were chosen for their detailed and structured property-related information, relevant to the Indian housing market.

---
### 3.2 Types of Data Collected

The dataset comprises diverse parameters describing individual flats/apartments across multiple Indian cities. Key attributes collected include:

* **Property Details:** Area (in sq. ft.), number of bedrooms (BHK), bathrooms, floor level, furnishing status
* **Pricing Information:** Listed price, price per sq. ft.
* **Builder/Project Info:** Name of builder, project name, possession status
* **Location Attributes:** Locality, city, pin code, proximity to key landmarks
* **Amenities and Features:** Parking, lift, gym, power backup, swimming pool, gated society
* **Posting Details:** Listing date, property age, contact type (owner/builder/broker)

---
### 3.3 Dataset Size and Format

* **Total Records:** \~8,000+ property listings
* **Raw Format:** Initially gathered as **JSON** responses via dynamic page scraping
* **Transformed Format:** Converted and cleaned into **CSV** for analysis and modeling
* **Approximate Size:** \~200 MB

This volume is sufficient for exploratory data analysis, feature engineering, and training machine learning models with generalizable patterns.

---
### 3.4 Tools and Technologies Used

The following Python-based tools were employed during the data extraction process:

* **Selenium:** For browser automation and interaction with dynamically loaded elements (e.g., paginated listings, JavaScript-rendered content)
* **Requests:** For making direct HTTP requests where feasible
* **BeautifulSoup:** For parsing and extracting information from the HTML content of the pages
* **Pandas:** For early-stage data transformation and file format conversion

---

### 3.5 Challenges Faced
* **Anti-bot Mechanisms:** Some websites implemented rate-limiting or CAPTCHA mechanisms that needed careful handling with delays and retries.
* **Dynamic Content Loading:** Many listings were rendered via JavaScript, necessitating Selenium-based scraping rather than simple HTTP requests.
* **Data Inconsistency:** Property descriptions and attribute availability varied across listings, leading to missing or inconsistent values in some cases.
* **Ethical Scraping Considerations:** All scraping was performed responsibly with minimal server load and in accordance with site policies (no login, no private data accessed).

---
---



In [17]:
## Importing tools for Data Gathering from various websites:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

In [None]:
## Set the different options for the browser
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
chrome_options.add_experimental_option('excludeSwitches', ['enable-logging'])

## Ignore the certificate and SSL errors
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--ignore-ssl-errors')

## Maximize the browser window
chrome_options.add_argument("start-maximized")

---
### **`Script for Scrapping HTML of Website :`**

In [None]:
## Importing relevant tools:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time

## Creating new HTML file to save HTML:
with open('housing_flat_data_gurgaon__00.html', 'w', encoding = 'utf-8') as file:
    pass

## Setting up Driver for efficiency:
options = Options()
options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_argument('--headless')

driver = webdriver.Chrome(options = options)
driver.set_page_load_timeout(300)        ## Increasing wait time for site loading to 5 mins from 2 mins

## Link to Housing.com site for Gurgaon Flats or Apartments:
link = 'https://housing.com/in/buy/searches/AC0AL1u6M1P1od1w26jrfqap1jl?gad_campaignid=21643530717&gclid=Cj0KCQjwhafEBhCcARIsAEGZEKJHtH0YzZ27lh6L5RRiQa6LwMoVcZ1eTgsF4Tu3V7MTiwt3Ce_iX-YaAnhdEALw_wcB'

## Launching the browser:
driver.get(link)
time.sleep(10)

prev_height = driver.execute_script("return document.body.scrollHeight")

## Now, Loading the dynamic website iteratively till the end is reached:
while True:
    ## Scrolling using JavaScript command:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
    time.sleep(8)

    items = driver.find_element(by=By.XPATH, value='//*[@id="innerApp"]/div[3]/div[2]/div[1]/div[1]/div[2]/div/div/span')
    print(items.text)

    new_height = driver.execute_script("return document.body.scrollHeight")
    print('prev_height : ' , prev_height , 'new_height : ' , new_height , '\n')

    ## Saving the HTML of current loaded website to the HTML file:
    html = driver.page_source
    with open('housing_flat_data_gurgaon_6.html', 'w', encoding = 'utf-8') as file:
        file.write(html)

    ## Deleting variable for efficient memory usage: 
    del html

    ## Breaking condition for loop:
    if prev_height == new_height:
        driver.quit()
        break

    prev_height = new_height

## File 1: Built-up Area in sq.ft -> 0 to 1000+   -> 2021 (1890 actually found!)
## File 2: Built-up Area in sq.ft -> 1000+ to 1400+   -> 1751 (1560 actually found!)
## File 3: Built-up Area in sq.ft -> 1400+ to 1600+   -> 1802 (1560 actually found!)
## File 4: Built-up Area in sq.ft -> 1600+ to (less than 2000 mark) 2000+   -> 1890 (1620 actually found!)
## File 5: Built-up Area in sq.ft -> (less than 2000 mark) 2000+ to (slightly < mid of) 3000+ -> 2032 (1740 actually found!)
## File 6: Built-up Area in sq.ft -> (slightly < mid of) 3000+ to 5000+ -> 1376 (1230 actually found!)


### **OUTCOME:**
- We **scrapped 6 HTML** saved in **6 different HTML files**.
- **Each contains initial information and link to 1200+ distinct property listings**.
- Now **using this information we'll scrap**, **data for individual listing** from its web page.

---
---

### **`Types of Property Listings :`**
- There are 2 types of Property Listings:
    1. **Individual Flat or Apartment** by Independent owner, Builder or Seller.
    2. **Group of Flats or Apartments** by Independent Builder or Seller.

Now we'll use two separate Python Scripts to scrap individual property data for these two property types. Also the individual web-pages of these two types have a distinct web-page design.

In [16]:
## Summary of 6 Scrapped HTML:
from bs4 import BeautifulSoup
soup = None

for file_number in range(1,7):
    file_name = 'housing_flat_data_gurgaon_' + str(file_number) + '.html'
    with open(file_name, 'r', encoding = 'utf-8') as file:
        soup = BeautifulSoup(file.read())
    
    prod_cards = soup.find_all('div', {'class' : "T_topSection _1p55exct _j65k64 _e21osq _cxftgi _9s1txw"})

    i, j, k = 0, 0, 0
    
    for prod_card in prod_cards:
        link = "housing.com" + prod_card.find('div' , {'class':'infoTopContainer'}).find('a')['href']
        name = prod_card.find('div' , {'class':'infoTopContainer'}).find('a').text
        sub_name = prod_card.find('div' , {'class':'T_subtitleContainer _h3ftgi T_arrangeElementsInLine _cxftgi _0h1h6o _fcv2br _9s1txw'}).text
        i += name.strip()[0].isalpha()
        j += name.strip()[0].isdecimal()
        k += 1

    print("File Name : " , file_name)
    print("Property Type 1 : " , j , "Percentage of total listings : " , round((j/len(prod_cards))*100 , 2))
    print("Property Type 2 : " , i , "Percentage of total listings : " , round((i/len(prod_cards))*100 , 2))
    print("Total Property Listings : ", i+j)

File Name :  housing_flat_data_gurgaon_1.html
Property Type 1 :  1579 Percentage of total listings :  84.85
Property Type 2 :  282 Percentage of total listings :  15.15
Total Property Listings :  1861
File Name :  housing_flat_data_gurgaon_2.html
Property Type 1 :  1174 Percentage of total listings :  77.59
Property Type 2 :  339 Percentage of total listings :  22.41
Total Property Listings :  1513
File Name :  housing_flat_data_gurgaon_3.html
Property Type 1 :  1135 Percentage of total listings :  75.62
Property Type 2 :  366 Percentage of total listings :  24.38
Total Property Listings :  1501
File Name :  housing_flat_data_gurgaon_4.html
Property Type 1 :  1215 Percentage of total listings :  76.8
Property Type 2 :  367 Percentage of total listings :  23.2
Total Property Listings :  1582
File Name :  housing_flat_data_gurgaon_5.html
Property Type 1 :  1273 Percentage of total listings :  74.97
Property Type 2 :  425 Percentage of total listings :  25.03
Total Property Listings :  16

---
#### SUMMARY: Property listing data from six HTML files:

| File Name                     | Total Listings | Type 1 Count (%) | Type 2 Count (%) |
| ----------------------------- | -------------- | ---------------- | ---------------- |
| `housing_flat_data_gurgaon_1` | 1861           | 1579 (84.85%)    | 282 (15.15%)     |
| `housing_flat_data_gurgaon_2` | 1513           | 1174 (77.59%)    | 339 (22.41%)     |
| `housing_flat_data_gurgaon_3` | 1501           | 1135 (75.62%)    | 366 (24.38%)     |
| `housing_flat_data_gurgaon_4` | 1582           | 1215 (76.80%)    | 367 (23.20%)     |
| `housing_flat_data_gurgaon_5` | 1698           | 1273 (74.97%)    | 425 (25.03%)     |
| `housing_flat_data_gurgaon_6` | 1185           | 774 (65.32%)     | 411 (34.68%)     |

---

### 📊 Overall Totals:

* **Total Listings Across All Files**: `9339`
* **Total Property Type 1 Listings**: `7149` → **76.55%**
* **Total Property Type 2 Listings**: `2190` → **23.45%**

---
---

## Lets Separate these two Property Types and save them in different files:

In [64]:
## Separating Property Listings Data from 6 HTML File - based on 'name' of product card:
from bs4 import BeautifulSoup

## Creating two separate files:
with open('housing_flat_data_gurgaon_final(number).html', 'w', encoding = 'utf-8') as file:
        pass
with open('housing_flat_data_gurgaon_final(alphabet).html', 'w', encoding = 'utf-8') as file:
        pass


soup = None
for file_number in range(1,7):
    k = 0
    
    file_name = 'housing_flat_data_gurgaon_' + str(file_number) + '.html'
    with open(file_name, 'r', encoding = 'utf-8') as file:
        soup = BeautifulSoup(file.read())
    
    prod_cards = soup.find_all('div', {'class' : "T_topSection _1p55exct _j65k64 _e21osq _cxftgi _9s1txw"})
    
    for prod_card in prod_cards:
        link = "housing.com" + prod_card.find('div' , {'class':'infoTopContainer'}).find('a')['href']
        name = prod_card.find('div' , {'class':'infoTopContainer'}).find('a').text
        sub_name = prod_card.find('div' , {'class':'T_subtitleContainer _h3ftgi T_arrangeElementsInLine _cxftgi _0h1h6o _fcv2br _9s1txw'}).text

        if name.strip()[0].isalpha():
            with open('housing_flat_data_gurgaon_final(alphabet).html', 'a', encoding = 'utf-8') as file:
                data = f'<div> <h1>{link}</h1> <h2>{name}</h2> <a>{sun_name}</a> </div>'
                file.write(data)
        
        elif name.strip()[0].isdecimal():
            with open('housing_flat_data_gurgaon_final(number).html', 'a', encoding = 'utf-8') as file:
                data = f'<div> <h1>{link}</h1> <h2>{name}</h2> <a>{sun_name}</a> </div>'
                file.write(data)
        
        else:
            k += 1

    print(file_name , '--> Number of Listings with Property Type different from two identified :', k)

housing_flat_data_gurgaon_1.html -----> 0
housing_flat_data_gurgaon_2.html -----> 0
housing_flat_data_gurgaon_3.html -----> 0
housing_flat_data_gurgaon_4.html -----> 0
housing_flat_data_gurgaon_5.html -----> 0
housing_flat_data_gurgaon_6.html -----> 0


#### `Now, we are all set to scrap data for Individual Property Listings!`
---

### Python Script for Scrapping Listings Data for Property Type 1:

In [3]:
## Importing necessary tools:
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
import os
import time

In [None]:
## What is "headers = {...}" ?
## This line defines a dictionary called 'headers' with a 'User-Agent' key. 
## The value of 'User Agent' key is a string that represents a user agent string.
## The user agent string is used to tell the server about the browser and operating system of the user. 
## Some websites serve different content based on the user agent or even block certain user agents (often
## to prevent scraping). By defining a common browser's user agent string, this code is trying to mimic 
## a real browser request to potentially avoid blocks or get the same content a real user would see.
with open('housing_flat_data_gurgaon_final(number).html', 'r', encoding = 'utf-8') as file:
    soup = BeautifulSoup(file.read())

## Creating Empty Dataframe to append records:
df = pd.DataFrame()

links = []
for i in soup.find_all('h1'):
    links.append(i.text)

del soup   ## Since no more use....

## Sample links for testing:
#url = 'https://housing.com/in/buy/resale/page/17535654-2-bhk-apartment-in-sector-68-for-rs-15500000'
#url = 'https://housing.com/in/buy/resale/page/17458460-2-bhk-apartment-in-sector-65-for-rs-23900000'
#url = 'https://housing.com/in/buy/resale/page/17898461-2.5-bhk-apartment-in-sector-61-for-rs-17500000'

len_links = len(links)

k = 0   ## Start from....
for j in range(k , len_links):
    url = "https://" + links[j]
    
    headers = {
        'authority': 'www.housing.com',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
        'accept-language': 'en-US,en;q=0.9',
        'cache-control': 'no-cache',
        'dnt': '1',
        'pragma': 'no-cache',
        'referer': url,
        'sec-ch-ua': '"Chromium";v="107", "Not;A=Brand";v="8"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"macOS"',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/527.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36',
    }
    
    page = requests.get(url, headers=headers)
    pageSoup = BeautifulSoup(page.content, 'html.parser')

    print(url)
    
    ## Parsing Data from HTML:
    try:
        Name = pageSoup.find('div', {'class':'css-164r41r'}).find('div',{'class':'css-v78wxh'}).find('div', {'class':'css-js5v7e'}).text
    except:
        continue
    
    Address = pageSoup.find('div', {'class':'css-164r41r'}).find('div',{'class':'css-v78wxh'}).find('div',{'class':'css-1ty5xzi'}).text
    
    try:
        Seller_Builder = pageSoup.find('div', {'class':'css-164r41r'}).find('div',{'class':'css-v78wxh'}).find('div',{'class':'css-gdymlq'}).find('a').text
    except:
        Seller_Builder = np.nan
    
    EMI = pageSoup.find('div', {'class':'css-164r41r'}).find('div',{'class':'css-pjrxll'}).find('div',{'class':'css-1nssci'}).find('a').text
    Link = url 
    
    Built_Up_Area, Avg_Price, Age_of_property, Possession_status, Floor, Vaastu, Furnishing = '-', '-', '-', '-', '-', '-', '-'
    for i in pageSoup.find('section', {'class':'css-13dph6'}).find_all('div',{'class':'css-c2zxhw'}):
        item = i.get_text(separator = '->').split('->')
        if item[-1] == 'Built Up Area':
            Built_Up_Area = item[0]
        elif item[-1] == 'Avg. Price':
            Avg_Price = item[0]
        elif item[-1] == 'Age of property':
            Age_of_property = item[0]
        elif item[-1] == 'Possession status':
            Possession_status = item[0]
        elif item[-1] == 'Floor':
            Floor = item[0]
        elif item[-1] == 'Facing':
            Vaastu = item[0]
        elif item[-1] == 'Furnishing':
            Furnishing = item[0]
    
    
    Society, Brokerage, Price, Bedrooms, Bathrooms, Parking, Balcony, Advertised = '-', '-', '-', '-', '-', '-', '-', '-'
    
    for i in pageSoup.find('section', {'class':'T_sectionContainerStyle _gdnqn7od _l8exct T_sectionStyle _12m9n7od _140hexct _1qjb14v0 _1cm9i2wt _rz5zidpf _mkh2mm _2du67f'}).find(
        'table').find_all('tr'):
        item = i.get_text(separator = '->').split('->')
        
        if item[0] == 'Project Name':
            Society = item[1]
        elif item[0] == 'Brokerage':
            Brokerage = item[1]
        elif item[0] == 'Price':
            Price = item[1]
        elif item[0] == 'Bedrooms':
            Bedrooms = item[1]
        elif item[0] == 'Bathrooms':
            Bathrooms = item[1]
        elif item[0] == 'Parking':
            Parking = item[1]
        elif item[0] == 'Balcony':
            Balcony = item[1]
        elif item[0] == 'Added':
            Advertised = item[1]
    
    try:
        Amenities = [i.get_text() for i in pageSoup.find(
            'section', {'class':'T_sectionStyle _l8exct T_amenitiesStyle _6chtauyy'}).find_all(
            'div', {'class':'T_cellStyle _1wbo1osq _5jftgi _3f1aa9 _l8edxx _e2u29b cell'})]
    except:
        Amenities = np.nan
    
    Nearby_landmarks = [i.get_text(separator='--').split('--')[:2] for i in pageSoup.find(
        'section',{'class':'T_sectionStyle _12m9n7od _140hexct _1qjb14v0 _1cm9i2wt _rz5zidpf _mkh2mm _2du67f'}
    ).find('div',{'class':'slider-Wrapper T_sliderWrapper _e21osq _tr161g _uc12yw _mkh2mm'}).find_all(
        'div',{'class':'slide-content T_slideContent _e21osq _vy1osq'})]
    
    
    Prop_description = pageSoup.find('div', {'class':'_l84xfc T_aboutProperty'}).find(
        'div',{'class':'about-text T_collapsedStyle _ks15vq _iy1wqb T_textStyle _l01wug _g3m2nv _legktf'}).get_text(separator = '--->').split('--->')[-1]
    
    data = {'Name':Name, 'Address':Address, 'Seller_Builder':Seller_Builder,
            'EMI':EMI, 'Built_Up_Area':Built_Up_Area, 'Avg_Price':Avg_Price, 
            'Age_of_property':Age_of_property, 'Possession_status':Possession_status, 
            'Floor':Floor, 'Vaastu':Vaastu, 'Furnishing':Furnishing, 'Society':Society, 
            'Brokerage':Brokerage, 'Price':Price, 'Bedrooms':Bedrooms, 'Bathrooms':Bathrooms, 
            'Parking':Parking, 'Balcony':Balcony,'Advertised':Advertised,'Amenities':str(Amenities),
            'Nearby_landmarks':str(Nearby_landmarks), 'Prop_description':Prop_description,
           'Link':Link}
    
    df = pd.concat([df,pd.DataFrame(data , index = [0])] , ignore_index = True)
    print(j , " of -> " , len_links)

    del pageSoup, data


## Saving Dataframe to CSV:
## df.to_csv('Housing_Listings_all_records_(numbers)_00000.csv')

---
### **`OUTCOME : `**
- We have created 3 CSV files saving, 7962 Property Records.
- Now we'll merge these 3 files to create one single CSV with all records together.

In [173]:
## Three CSV Files are:
df1 = pd.read_csv('Housing_Listings_all_records_(numbers)_1.csv' , index_col = 'Unnamed: 0')
df2 = pd.read_csv('Housing_Listings_all_records_(numbers)_2.csv' , index_col = 'Unnamed: 0')
df3 = pd.read_csv('Housing_Listings_all_records_(numbers)_3.csv' , index_col = 'Unnamed: 0')


print("Shape of 3 DFs : " , df1.shape, df2.shape, df3.shape)
print("Total saved records are : " , df1.shape[0] + df2.shape[0] + df3.shape[0]) 
print("Total Unique Records in File 1 : ", df1.shape[0] - df1.duplicated().sum())
print("Total Unique Records in File 2 : ", df2.shape[0] - df2.duplicated().sum())
print("Total Unique Records in File 3 : ", df3.shape[0] - df3.duplicated().sum())

Shape of 3 DFs :  (5206, 23) (885, 23) (1871, 23)
Total saved records are :  7962
Total Unique Records in File 1 :  4603
Total Unique Records in File 2 :  885
Total Unique Records in File 3 :  1803


#### Merging all three into Single CSV File:

In [195]:
final_df = pd.concat([df1, df2, df3], ignore_index = True)
final_df

Unnamed: 0,Name,Address,Seller_Builder,EMI,Built_Up_Area,Avg_Price,Age_of_property,Possession_status,Floor,Vaastu,...,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
0,2 BHK Flat,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹36.01 K,593 sq.ft,₹11.47 K/sq.ft,1 Years Old,Ready to move,12 of 14,East facing,...,68.0 L,2,2,1 Open Parking,1,More than a month ago,"['Lift', 'Power Backup', 'Garden', 'Sports', '...","[['School', ""St. Xavier's High School""], ['Hos...",Looking for a 2 BHK Flat for sale in Gurgaon? ...,https://housing.com/in/buy/resale/page/1761033...
1,2 BHK Flat,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹33.36 K,690 sq.ft,₹9.13 K/sq.ft,1 Years Old,Ready to move,4 of 15,-,...,63.0 L,2,2,No Parking,1,17 days ago,,"[['School', ""St. Xavier's High School""], ['Hos...",Best 2 BHK Flat for modern-day lifestyle is no...,https://housing.com/in/buy/resale/page/1681859...
2,2 BHK Flat,"Experion The Heartsong, Sector 108, Gurgaon",Experion Developers,EMI starts at ₹74.47 K,1000 sq.ft,₹15 K/sq.ft,2 Year Old,Ready to move,9 of 26,North-East facing,...,1.5 Cr,2,2,1 Covered and 1 Open Parking,3,More than a month ago,"['Amphitheater', 'Cricket Pitch', 'Gazebo', 'S...","[['School', 'The Shikshiyan School'], ['Hospit...","2 BHK Flat for sale in Sector 108, Gurgaon - c...",https://housing.com/in/buy/resale/page/1767484...
3,3 BHK Flat,"ROF Aalayas 1, Sector 102, Gurgaon",ROF Infratech,EMI starts at ₹39.19 K,645 sq.ft,₹11.47 K/sq.ft,1 Years Old,Ready to move,5 of 14,South facing,...,74.0 L,3,2,1 Covered and 1 Open Parking,1,More than a month ago,"['Stove', 'Gas Pipeline', 'Cupboard', 'Pet all...","[['School', 'Delhi Public School'], ['Hospital...","3 BHK Flat for sale in Sector 102, Gurgaon - c...",https://housing.com/in/buy/resale/page/1734127...
4,2 BHK Flat,"Signature Global The Millennia I, Sector 37D, ...",Signature Global Builders Pvt. Ltd.,EMI starts at ₹34.95 K,570 sq.ft,₹11.58 K/sq.ft,4 Year Old,Ready to move,6 of 18,East facing,...,66.0 L,2,2,1 Covered Parking,3,4 days ago,"['Lift', 'Power Backup', 'Intercom', 'Garden',...","[['School', 'Euro International School, Sector...","Property for sale in Sector 37 D, Gurgaon. Thi...",https://housing.com/in/buy/resale/page/1791824...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7957,4 BHK Flat,"Ats marigold, Sector 71, Gurgaon",,EMI starts at ₹2.83 Lacs,3420 sq.ft,₹16.67 K/sq.ft,-,Ready to move,-,-,...,5.7 Cr,4,4,No Parking,No Balcony,More than a month ago,,"[['School', 'The Vivekananda School'], ['Hospi...","Property for sale in Sector 71, Gurgaon. This ...",https://housing.com/in/buy/resale/page/1745346...
7958,4 BHK Flat,"Sector 60, Gurgaon",,EMI starts at ₹2.73 Lacs,2814 sq.ft,₹19.55 K/sq.ft,-,Ready to move,-,-,...,5.5 Cr,4,4,-,-,More than a month ago,,"[['School', 'Unicosmos School'], ['Hospital', ...",Check out this 4 BHK Flat for sale in Sector 6...,https://housing.com/in/buy/resale/page/1768567...
7959,3.5 BHK Flat,"puri residences, Sector 111, Gurgaon",,EMI starts at ₹2.23 Lacs,2440 sq.ft,₹18.44 K/sq.ft,-,Ready to move,-,-,...,4.5 Cr,4,4,-,-,21 days ago,,"[['School', 'Prudence Schools - Top & Best CBS...","3.5 BHK Flat for sale in Sector 17, Gurgaon. T...",https://housing.com/in/buy/resale/page/1777577...
7960,4 BHK Flat,"Intelligentsia Apartment, Sector 49, Gurgaon",,EMI starts at ₹1.09 Lacs,2777 sq.ft,₹7.92 K/sq.ft,-,Ready to move,-,-,...,2.2 Cr,4,4,-,-,24 days ago,,"[['School', 'Unicosmos School'], ['Hospital', ...","A 4 BHK Flat for sale in Sector 49, Gurgaon. P...",https://housing.com/in/buy/resale/page/1775038...


#### Saving in CSV file:

In [197]:
## Dropping Duplicated records:
print("Shape of Final File : " , final_df.shape)
print("Total Unique Records in Final File : ", final_df.shape[0] - final_df.duplicated().sum())
print("Total Duplicated Records in Final File : ", final_df.duplicated().sum())

final_df = final_df[~ final_df.duplicated()].reset_index(drop = True)
final_df.shape

Shape of Final File :  (7962, 23)
Total Unique Records in Final File :  7143
Total Duplicated Records in Final File :  819


(7143, 23)

In [199]:
## Saving to CSV:
final_df.to_csv('Housing_Listings_all_records_(numbers)_FINAL.csv')

In [200]:
df = pd.read_csv('Housing_Listings_all_records_(numbers)_FINAL.csv')
df

Unnamed: 0.1,Unnamed: 0,Name,Address,Seller_Builder,EMI,Built_Up_Area,Avg_Price,Age_of_property,Possession_status,Floor,...,Price,Bedrooms,Bathrooms,Parking,Balcony,Advertised,Amenities,Nearby_landmarks,Prop_description,Link
0,0,2 BHK Flat,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹36.01 K,593 sq.ft,₹11.47 K/sq.ft,1 Years Old,Ready to move,12 of 14,...,68.0 L,2,2,1 Open Parking,1,More than a month ago,"['Lift', 'Power Backup', 'Garden', 'Sports', '...","[['School', ""St. Xavier's High School""], ['Hos...",Looking for a 2 BHK Flat for sale in Gurgaon? ...,https://housing.com/in/buy/resale/page/1761033...
1,1,2 BHK Flat,"Pyramid Elite, Sector 86, Gurgaon",Pyramid Infratech Private Limited,EMI starts at ₹33.36 K,690 sq.ft,₹9.13 K/sq.ft,1 Years Old,Ready to move,4 of 15,...,63.0 L,2,2,No Parking,1,17 days ago,,"[['School', ""St. Xavier's High School""], ['Hos...",Best 2 BHK Flat for modern-day lifestyle is no...,https://housing.com/in/buy/resale/page/1681859...
2,2,2 BHK Flat,"Experion The Heartsong, Sector 108, Gurgaon",Experion Developers,EMI starts at ₹74.47 K,1000 sq.ft,₹15 K/sq.ft,2 Year Old,Ready to move,9 of 26,...,1.5 Cr,2,2,1 Covered and 1 Open Parking,3,More than a month ago,"['Amphitheater', 'Cricket Pitch', 'Gazebo', 'S...","[['School', 'The Shikshiyan School'], ['Hospit...","2 BHK Flat for sale in Sector 108, Gurgaon - c...",https://housing.com/in/buy/resale/page/1767484...
3,3,3 BHK Flat,"ROF Aalayas 1, Sector 102, Gurgaon",ROF Infratech,EMI starts at ₹39.19 K,645 sq.ft,₹11.47 K/sq.ft,1 Years Old,Ready to move,5 of 14,...,74.0 L,3,2,1 Covered and 1 Open Parking,1,More than a month ago,"['Stove', 'Gas Pipeline', 'Cupboard', 'Pet all...","[['School', 'Delhi Public School'], ['Hospital...","3 BHK Flat for sale in Sector 102, Gurgaon - c...",https://housing.com/in/buy/resale/page/1734127...
4,4,2 BHK Flat,"Signature Global The Millennia I, Sector 37D, ...",Signature Global Builders Pvt. Ltd.,EMI starts at ₹34.95 K,570 sq.ft,₹11.58 K/sq.ft,4 Year Old,Ready to move,6 of 18,...,66.0 L,2,2,1 Covered Parking,3,4 days ago,"['Lift', 'Power Backup', 'Intercom', 'Garden',...","[['School', 'Euro International School, Sector...","Property for sale in Sector 37 D, Gurgaon. Thi...",https://housing.com/in/buy/resale/page/1791824...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7138,7138,4 BHK Flat,"Ats marigold, Sector 71, Gurgaon",,EMI starts at ₹2.83 Lacs,3420 sq.ft,₹16.67 K/sq.ft,-,Ready to move,-,...,5.7 Cr,4,4,No Parking,No Balcony,More than a month ago,,"[['School', 'The Vivekananda School'], ['Hospi...","Property for sale in Sector 71, Gurgaon. This ...",https://housing.com/in/buy/resale/page/1745346...
7139,7139,4 BHK Flat,"Sector 60, Gurgaon",,EMI starts at ₹2.73 Lacs,2814 sq.ft,₹19.55 K/sq.ft,-,Ready to move,-,...,5.5 Cr,4,4,-,-,More than a month ago,,"[['School', 'Unicosmos School'], ['Hospital', ...",Check out this 4 BHK Flat for sale in Sector 6...,https://housing.com/in/buy/resale/page/1768567...
7140,7140,3.5 BHK Flat,"puri residences, Sector 111, Gurgaon",,EMI starts at ₹2.23 Lacs,2440 sq.ft,₹18.44 K/sq.ft,-,Ready to move,-,...,4.5 Cr,4,4,-,-,21 days ago,,"[['School', 'Prudence Schools - Top & Best CBS...","3.5 BHK Flat for sale in Sector 17, Gurgaon. T...",https://housing.com/in/buy/resale/page/1777577...
7141,7141,4 BHK Flat,"Intelligentsia Apartment, Sector 49, Gurgaon",,EMI starts at ₹1.09 Lacs,2777 sq.ft,₹7.92 K/sq.ft,-,Ready to move,-,...,2.2 Cr,4,4,-,-,24 days ago,,"[['School', 'Unicosmos School'], ['Hospital', ...","A 4 BHK Flat for sale in Sector 49, Gurgaon. P...",https://housing.com/in/buy/resale/page/1775038...


## **`Conclusion :`**
- Write now we have gathered data for only Gurgaon and Only for Independent Flat and Apartment type Property.
- Later we can extend our data to include other property types like Independent Plots, Independent House/Villa etc.
- And that can be done for other major Tier 1 Cities like 'Hyderabad' and 'Bangalore'.
- For now we have enough data to start building this project!

---
---
---