# Web Scraping Real Estate Data from MagicBricks

This notebook demonstrates how to scrape real estate property listings from MagicBricks.com using Python. We'll use `Selenium` to automate browser interactions and `BeautifulSoup` to parse the HTML content. The extracted data will then be organized into a Pandas DataFrame and saved as a CSV file.

## 1. Setting Up the Environment and Dependencies

First, we need to import the necessary libraries. If you don't have them installed, you can install them using `pip`:
`pip install selenium beautifulsoup4 pandas`

In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
import time

# selenium: Automates web browser interactions.
# BeautifulSoup: Parses HTML content for data extraction.
# pandas: Used for data manipulation (DataFrames) and CSV export.
# time: Provides a simple way to pause script execution.

## 2. Initializing the Web Driver and Navigating to the URL

This section sets up the Chrome browser for automation, including options to maximize the window and reduce bot detection, then navigates to the target URL.

In [None]:
# Setup Chrome options
options = Options()
options.add_argument("--start-maximized")
options.add_argument("--disable-blink-features=AutomationControlled")

# Initialize the Chrome WebDriver
# Ensure chromedriver.exe is in your PATH or specify its path
driver = webdriver.Chrome(options=options)

# URL for property listings in Delhi
url = "https://www.magicbricks.com/property-for-sale/residential-real-estate?proptype=Multistorey-Apartment,Builder-Floor-Apartment&cityName=Delhi"
driver.get(url)

# Wait for the page to load dynamic content
time.sleep(5)

## 3. Parsing the Page Content and Quitting the Driver

After the page loads, its HTML content is retrieved and processed by BeautifulSoup. The browser is then closed to free up resources.

In [None]:
# Get the page source and parse with BeautifulSoup
soup = BeautifulSoup(driver.page_source, "html.parser")

# Close the browser
driver.quit()

## 4. Extracting Property Data

This is the core scraping logic, iterating through each property listing card to extract details. It includes error handling to manage inconsistencies in website structure.

In [None]:
# Find all property cards
cards = soup.find_all("div", class_="m-srp-card__container")

# List to store extracted data
data = []

# Loop through each property card
for card in cards:
    try:
        # Extract basic information
        name = card.find("h2", class_="m-srp-card__title").get_text(strip=True)
        price = card.find("div", class_="m-srp-card__price").get_text(strip=True)
        
        rate_element = card.find("div", class_="m-srp-card__price--size")
        rate = rate_element.get_text(strip=True) if rate_element else ""
        
        location = card.find("div", class_="m-srp-card__location").get_text(strip=True)
        
        # Extract features
        features = card.find_all("div", class_="m-srp-card__summary__list__item")

        bedroom = features[0].get_text(strip=True) if len(features) > 0 else ""
        carpet_area = features[1].get_text(strip=True) if len(features) > 1 else ""
        status = features[2].get_text(strip=True) if len(features) > 2 else ""
        floor = features[3].get_text(strip=True) if len(features) > 3 else ""

        # Extract detailed description parts
        desc_block = card.find("div", class_="m-srp-card__desc")
        desc_text = desc_block.get_text(separator='|', strip=True) if desc_block else ""
        desc_parts = desc_text.split("|")

        transaction = desc_parts[0] if len(desc_parts) > 0 else ""
        facing = desc_parts[1] if len(desc_parts) > 1 else ""
        overlooking = desc_parts[2] if len(desc_parts) > 2 else ""
        ownership = desc_parts[3] if len(desc_parts) > 3 else ""
        parking = desc_parts[4] if len(desc_parts) > 4 else ""
        bathroom = desc_parts[5] if len(desc_parts) > 5 else ""
        balcony = desc_parts[6] if len(desc_parts) > 6 else ""
        
        city = "Delhi"

        # Clean numerical data
        carpet_area_sqft = ''.join(filter(str.isdigit, carpet_area))
        rate_per_sqft = ''.join(filter(str.isdigit, rate))
        total_area = ""

        # Append data to the list
        data.append({
            "Name": name,
            "Price": price,
            "Rate": rate,
            "Property": name.split('Flat for Sale')[0].strip() if "Flat for Sale" in name else "",
            "Carpet Area": carpet_area,
            "Status": status,
            "Floor": floor,
            "Transaction": transaction,
            "Facing": facing,
            "Overlooking": overlooking,
            "Ownership": ownership,
            "Parking": parking,
            "Bathroom": bathroom,
            "Balcony": balcony,
            "City": city,
            "Location": location,
            "Rate_per_sqft": rate_per_sqft,
            "Bedroom": bedroom.split(" ")[0] if bedroom else "",
            "Carpet_area_sqft": carpet_area_sqft,
            "Total_area": total_area
        })
    except Exception as e:
        print(f"Error parsing property: {e}")
        continue

## 5. Creating a Pandas DataFrame and Saving to CSV

Finally, the collected data is converted into a Pandas DataFrame and saved as a CSV file.

In [None]:
# Convert data list to DataFrame
df = pd.DataFrame(data)

# Save DataFrame to CSV
df.to_csv("magicbricks_full_data.csv", index=False)

print("Scraping complete. Data saved to magicbricks_full_data.csv")