Part 1: Setup and Initialization

This section focuses on setting up the web scraping environment and initializing the project. It involves:

~Importing the necessary libraries required for data extraction and analysis.

~Creating the project structure within a Jupyter Notebook (.ipynb) for better organization and reproducibility.

~Sending a test HTTP request to the Cars24 website to verify connectivity and ensure that the website can be accessed successfully.

This initial step is crucial as it ensures that the team members can proceed seamlessly and establish a strong foundation helping maintain efficiency, consistency, and smooth integration throughout the entire project workflow.

In [4]:
# Step 1: Importing Required Libraries

import requests                      # For sending HTTP requests
from bs4 import BeautifulSoup         # For parsing HTML content
import pandas as pd    # For data manipulation and analysis
import os                # For creating project structure

print("Libraries imported successfully!")

Libraries imported successfully!


In [8]:
# Sending a test HTTP request to verify connectivity

test_url = "https://www.cars24.com/buy-used-hyundai-cars-mumbai/?sort=bestmatch&serveWarrantyCount=true&listingSource=Homepage_Filters"

try:  # For handling potential connection errors
    response = requests.get(test_url)    # Check the status code returned by the server
    if response.status_code == 200:     # 200 means the request was successful
        print("Successfully connected to the Cars24 website.")
    else:
        print(f"Failed to connect to the Cars24 website. Status code: {response.status_code}")     # For errors
except Exception as e:
    print(f"An error occurred while trying to connect to the Cars24 website: {e}")                 # For handling exception errors


Successfully connected to the Cars24 website.


In [9]:
# Creating project structure

project_dir = "cars24_hyundai_mumbai"              # name of the project folder
if not os.path.exists(project_dir):                # Check if the directory already exists
    os.makedirs(project_dir)                       # If not, create the directory
    print(f"Project directory '{project_dir}' created successfully.")
else:
    print(f"Project directory '{project_dir}' already exists.")                # Printing a message if it already exists

Project directory 'cars24_hyundai_mumbai' already exists.


In [None]:
# Step 4: Data Cleaning
# To be completed by the next team members


In [None]:
# Step 5: Data Extraction
# To be completed by the next team members
