Part 1: Setup and Initialization

This section focuses on setting up the web scraping environment and initializing the project. It involves:

~Importing the necessary libraries required for data extraction and analysis.

~Creating the project structure within a Jupyter Notebook (.ipynb) for better organization and reproducibility.

~Sending a test HTTP request to the Cars24 website to verify connectivity and ensure that the website can be accessed successfully.

This initial step is crucial as it ensures that the team members can proceed seamlessly and establish a strong foundation helping maintain efficiency, consistency, and smooth integration throughout the entire project workflow.

In [1]:
# Step 1: Importing Required Libraries

import requests                      # For sending HTTP requests
from bs4 import BeautifulSoup         # For parsing HTML content
import pandas as pd    # For data manipulation and analysis
import os  
              # For creating project structure

print("Libraries imported successfully!")

Libraries imported successfully!


In [2]:
# Sending a test HTTP request to verify connectivity

test_url = "https://www.cars24.com/buy-used-hyundai-cars-mumbai/?sort=bestmatch&serveWarrantyCount=true&listingSource=Homepage_Filters"

try:  # For handling potential connection errors
    response = requests.get(test_url)    # Check the status code returned by the server
    if response.status_code == 200:     # 200 means the request was successful
        print("Successfully connected to the Cars24 website.")
    else:
        print(f"Failed to connect to the Cars24 website. Status code: {response.status_code}")     # For errors
except Exception as e:
    print(f"An error occurred while trying to connect to the Cars24 website: {e}")                 # For handling exception errors


Successfully connected to the Cars24 website.


In [3]:
# Creating project structure

project_dir = "cars24_hyundai_mumbai"              # name of the project folder
if not os.path.exists(project_dir):                # Check if the directory already exists
    os.makedirs(project_dir)                       # If not, create the directory
    print(f"Project directory '{project_dir}' created successfully.")
else:
    print(f"Project directory '{project_dir}' already exists.")                # Printing a message if it already exists

Project directory 'cars24_hyundai_mumbai' created successfully.


In [None]:
# Step 3: Data Extraction
# To be completed by the next team members
"""
Web Scraping Used Hyundai Cars from Cars24 (Mumbai) using Selenium + BeautifulSoup

Requirements:
1. Install Python 3.x
2. Install necessary libraries:
   pip install selenium beautifulsoup4

3. Download ChromeDriver:
   - Go to https://chromedriver.chromium.org/downloads
   - Choose the version that matches your Chrome browser
   - Extract and save the chromedriver.exe somewhere (e.g., C:\chromedriver\chromedriver.exe)

4. Update the 'chrome_service' path below to your ChromeDriver location.

 Notes / Tips:
   - ChromeDriver path must be updated in the script.
   - The Chrome browser must remain open while the script scrolls, otherwise some cars may not load.
   - Slow scrolling is necessary because Cars24 dynamically loads cars as you scroll.
   - The script safely handles missing data using conditional checks.
   - You can later export the 'all_car_data' list to CSV or JSON if needed.
"""

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup
import time

chrome_service = Service(r"C:\\Users\\PUNIT AYARE\\Downloads\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe")
driver = webdriver.Chrome(service=chrome_service)

all_car_data = []


driver.get(test_url)
time.sleep(5)  # initial wait


print("🖱️ Slowly scrolling to load all cars...")

scroll_pause = 1.5
scroll_increment = 400
all_cards_count = 0
current_height = 0

while True:
    driver.execute_script(f"window.scrollTo(0, {current_height});")
    time.sleep(scroll_pause)
    current_height += scroll_increment
    
    # Count how many car cards are loaded
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    car_cards = soup.find_all('div', class_='styles_normalCardWrapper__qDZjq')
    
    if len(car_cards) > all_cards_count:
        all_cards_count = len(car_cards)
    else:
        # If no new cards loaded after scroll, check if we reached bottom
        page_height = driver.execute_script("return document.body.scrollHeight")
        if current_height >= page_height:
            break  # all cars loaded

time.sleep(2)  # extra wait for last cards
print(f"✅ Finished scrolling. Total cards loaded: {len(car_cards)}")

# --- Parse page ---
soup = BeautifulSoup(driver.page_source, 'html.parser')
car_cards = soup.find_all('div', class_='styles_normalCardWrapper__qDZjq')
print(f"Found {len(car_cards)} cars on the page")

for card in car_cards:
    try:
        # Name
        car_name_tag = card.find('span', class_='sc-braxZu kjFjan')
        car_name = car_name_tag.text.strip() if car_name_tag else None

        # Variant
        variant_info = card.find('span', class_='sc-braxZu lmmumg')
        variant = variant_info.text.strip() if variant_info else None

        kilometers_info = card.find('p', class_='sc-braxZu kvfdZL')
        kilometers = kilometers_info.text.strip() if kilometers_info else None

        kilometer_tags = card.find_all('p', class_='sc-braxZu kvfdZL')
        fuel = kilometer_tags[1].text.strip() if len(kilometer_tags) > 1 else None
        transmission = kilometer_tags[2].text.strip() if len(kilometer_tags) > 2 else None



        

        # Price
        price_tag = card.find('p', string=lambda t: t and "₹" in t)
        price = price_tag.text.strip() if price_tag else None
        # Find the parent div by class
        price_block = card.find('div', class_='styles_priceWrap__VwWBV')
        if price_block:
            price_tags = price_block.find_all('p')
            price_display = price_tags[0].text.strip() if len(price_tags) > 0 else None
            price_lakh = price_tags[1].text.strip() if len(price_tags) > 1 else None
        else:
             price_display = None
             price_lakh = None

        car_data = {
            "Car_Name": car_name,
            "Variant": variant,
            "Kilometers": kilometers,
            "Fuel_Type": fuel,
            "Transmission": transmission,
            "EMI": price,
            "Price_original": price_display,
            "Price": price_lakh
        }

        all_car_data.append(car_data)
    except Exception as e:
        print(f"❌ Failed to extract a car card: {e}")

driver.quit()

print(f"\n✅ Total cars scraped: {len(all_car_data)}")
for car in all_car_data:
    print(car)


🖱️ Slowly scrolling to load all cars...
✅ Finished scrolling. Total cards loaded: 436
Found 436 cars on the page

✅ Total cars scraped: 436
{'Car_Name': '2016 Hyundai Grand i10', 'Variant': 'SPORTZ 1.2 KAPPA VTVT', 'Kilometers': '34.11k km', 'Fuel_Type': 'Petrol', 'Transmission': 'Manual', 'EMI': 'EMI ₹6,549/m*', 'Price_original': '₹3.58L', 'Price': '₹3.35 lakh'}
{'Car_Name': '2018 Hyundai Verna', 'Variant': '1.6 VTVT SX (O) AT', 'Kilometers': '30.23k km', 'Fuel_Type': 'Petrol', 'Transmission': 'Auto', 'EMI': 'EMI ₹13,064/m*', 'Price_original': '₹7.83L', 'Price': '₹6.68 lakh'}
{'Car_Name': '2018 Hyundai Grand i10', 'Variant': 'SPORTZ 1.2 KAPPA VTVT', 'Kilometers': '63.23k km', 'Fuel_Type': 'Petrol', 'Transmission': 'Manual', 'EMI': 'EMI ₹6,843/m*', 'Price_original': '₹3.85L', 'Price': '₹3.50 lakh'}
{'Car_Name': '2014 Hyundai Xcent', 'Variant': 'SX 1.2', 'Kilometers': '96.90k km', 'Fuel_Type': 'Petrol', 'Transmission': 'Manual', 'EMI': 'EMI ₹7,900/m*', 'Price_original': '₹3.17L', 'Price

In [None]:
# Step 4: Data Cleaning
# To be completed by the next team members
