# HONDA Cars Web Scraping – AckoDrive

This notebook scrapes Honda car data from AckoDrive, cleans it, and exports it to a CSV file.

**Steps covered:**
1. Research & planning (URL and basic structure)
2. Data extraction with BeautifulSoup
3. Data cleaning & preprocessing
4. Data presentation/export to CSV


In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from datetime import datetime


## 1. Fetch Honda cars page

In [2]:
BASE_URL = "https://ackodrive.com"
HONDA_URL = f"{BASE_URL}/collection/honda+cars/"

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/123.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}

response = requests.get(HONDA_URL, headers=HEADERS)
print("Status code:", response.status_code)
html = response.text


Status code: 200


## 2. Parse HTML with BeautifulSoup

In [3]:
soup = BeautifulSoup(html, "html.parser")
print(soup.title.string if soup.title else "No title found")

Honda Cars Price in India - Honda Car Models in 2025


## 3. Extract Honda car details

We extract model name, variants, fuel information, transmission, and price. Selectors are based on the current structure of the Honda collection page. If AckoDrive changes their HTML, you may need to adjust the CSS selectors accordingly using browser *Inspect*.

In [4]:
def extract_honda_cards(soup: BeautifulSoup):
    cards_data = []
    main_heading = soup.find(string=re.compile(r"Honda cars in India", re.I))
    if main_heading:
        # Move up to a reasonably large container
        container = main_heading.find_parent()
        for _ in range(4):  # climb up a few levels defensively
            if container and container.parent:
                container = container.parent
    else:
        container = soup
    model_links = []
    for a in container.find_all("a", href=True):
        text = a.get_text(strip=True)
        if text.startswith("Honda "):
            model_links.append(a)

    print(f"Found {len(model_links)} Honda model entries")

    for a in model_links:
        model_name = a.get_text(strip=True)
        brand = "Honda"
        detail_href = a["href"]
        detail_url = detail_href if detail_href.startswith("http") else BASE_URL + detail_href
        block_texts = []
        parent_block = a.parent
        for sib in parent_block.next_siblings:
            if getattr(sib, "name", None) is None:
                continue
            txt = sib.get_text(" ", strip=True)
            if not txt:
                continue
            block_texts.append(txt)
            if "Browse Honda cars" in txt or "Home" == txt:
                break

        full_block = " \n ".join(block_texts)

        variants_match = re.search(r"(\d+\s+Variants)", full_block, flags=re.I)
        variants = variants_match.group(1) if variants_match else "N/A"

        fuel_match = re.search(r"(Hybrid[^\n]*|Petrol[^\n]*|Diesel[^\n]*Electric[^\n]*)", full_block, flags=re.I)
        fuel = fuel_match.group(1).strip() if fuel_match else "N/A"

        trans_match = re.search(r"(Manual\s*•\s*Automatic|Manual|Automatic)", full_block, flags=re.I)
        transmission = trans_match.group(1).strip() if trans_match else "N/A"

        price_match = re.search(r"₹[^\n]*lakh[^\n]*", full_block)
        price_range = price_match.group(0).strip() if price_match else "N/A"

        is_discontinued = bool(re.search(r"Discontinued", full_block, flags=re.I))

        cards_data.append({
            "brand": brand,
            "model": model_name,
            "variants": variants,
            "fuel": fuel,
            "transmission": transmission,
            "price_range_raw": price_range,
            "is_discontinued": is_discontinued,
            "detail_url": detail_url,
        })

    return cards_data


In [5]:
cards_data = extract_honda_cards(soup)
len(cards_data), cards_data[:3]

Found 7 Honda model entries


(7,
 [{'brand': 'Honda',
   'model': 'Honda City',
   'variants': 'N/A',
   'fuel': 'Hybrid • Petrol Manual • Automatic',
   'transmission': 'Manual • Automatic',
   'price_range_raw': 'N/A',
   'is_discontinued': False,
   'detail_url': 'https://ackodrive.com/cars/honda-city/'},
  {'brand': 'Honda',
   'model': 'Honda Elevate',
   'variants': 'N/A',
   'fuel': 'Petrol • Manual • Automatic',
   'transmission': 'Manual • Automatic',
   'price_range_raw': 'N/A',
   'is_discontinued': False,
   'detail_url': 'https://ackodrive.com/cars/honda-elevate/'},
  {'brand': 'Honda',
   'model': 'Honda Amaze',
   'variants': 'N/A',
   'fuel': 'Petrol Manual • Automatic',
   'transmission': 'Manual • Automatic',
   'price_range_raw': 'N/A',
   'is_discontinued': False,
   'detail_url': 'https://ackodrive.com/cars/honda-amaze/'}])

## 4. Create DataFrame

In [14]:
df_raw = pd.DataFrame(cards_data)
df_raw

Unnamed: 0,brand,model,variants,fuel,transmission,price_range_raw,is_discontinued,detail_url
0,Honda,Honda City,,Hybrid • Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-city/
1,Honda,Honda Elevate,,Petrol • Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-elevate/
2,Honda,Honda Amaze,,Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-amaze/
3,Honda,Honda Amaze (2021-2024),,Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-amaze-2021-2024/
4,Honda,Honda WR-V (2020-2023),,Petrol • Diesel Manual,Manual,,False,https://ackodrive.com/cars/honda-wr-v-2020-2023/
5,Honda,Honda Jazz (2020-2023),,Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-jazz-2020-2023/
6,Honda,Honda City (2020-2023),,Hybrid • Petrol • Diesel Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-city-2020-2023/


In [15]:
df.to_csv("honda_cars.csv", index=False, encoding="utf-8-sig")

## 5. Data Cleaning

- Normalize column names
- Extract numeric min/max prices from the price range string
- Add scrape date

In [7]:
df = df_raw.copy()

df["scrape_date"] = datetime.today().date()

def parse_price_range(price_str: str):
    """Parse a price range like '₹13.9 lakh – ₹23.9 lakh' into numeric min/max (in lakh)."""
    if not isinstance(price_str, str) or "₹" not in price_str:
        return None, None
    cleaned = price_str.replace("–", "-")
    nums = re.findall(r"\d+\.\d+|\d+", cleaned)
    if not nums:
        return None, None
    nums = [float(n) for n in nums]
    if len(nums) == 1:
        return nums[0], nums[0]
    return min(nums), max(nums)

df["price_min_lakh"], df["price_max_lakh"] = zip(*df["price_range_raw"].map(parse_price_range))

df

Unnamed: 0,brand,model,variants,fuel,transmission,price_range_raw,is_discontinued,detail_url,scrape_date,price_min_lakh,price_max_lakh
0,Honda,Honda City,,Hybrid • Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-city/,2025-12-07,,
1,Honda,Honda Elevate,,Petrol • Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-elevate/,2025-12-07,,
2,Honda,Honda Amaze,,Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-amaze/,2025-12-07,,
3,Honda,Honda Amaze (2021-2024),,Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-amaze-2021-2024/,2025-12-07,,
4,Honda,Honda WR-V (2020-2023),,Petrol • Diesel Manual,Manual,,False,https://ackodrive.com/cars/honda-wr-v-2020-2023/,2025-12-07,,
5,Honda,Honda Jazz (2020-2023),,Petrol Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-jazz-2020-2023/,2025-12-07,,
6,Honda,Honda City (2020-2023),,Hybrid • Petrol • Diesel Manual • Automatic,Manual • Automatic,,False,https://ackodrive.com/cars/honda-city-2020-2023/,2025-12-07,,


## 6. Export to CSV

In [8]:
output_path = "honda_ackodrive_cars.csv"
df.to_csv(output_path, index=False, encoding="utf-8-sig")
print(f"Saved {len(df)} rows to {output_path}")

Saved 7 rows to honda_ackodrive_cars.csv
