# Data Science project: building a web scraper to extract structured car listings in greece, processing the data, modelling and more

## Introduction

Car.gr is one of the largest platforms in Greece for buying and selling cars, scooters, boats, and other vehicles, though it is primarily used for car sales. Sellers list their vehicles, and buyers can contact them directly.

Determining the market value of a used car is challenging due to numerous influencing factors, including mileage, model, brand, year, condition, options, gearbox type, and accident history. Additionally, hidden factors such as location, supply and demand, and even color might impact prices. Given these complexities, gaining insights from data could help better understand pricing dynamics.
Project Goals

To explore these challenges, I aim to scrape, preprocess, and analyze car listings from Car.gr to build useful tools, such as:

    📊 Market Analysis: Statistical insights into pricing trends and factors influencing car values.
    💰 Price Estimation Model: Predict car prices based on listing attributes.
    🤖 Chatbot using Retrieval-Augmented Generation (RAG): Enable users to query listings using natural language.
    🔍 Fraud Detection & Anomaly Detection: Identify fraudulent listings using text and image classification.
    🖼 Image-Based Vehicle Condition Assessment: Use computer vision to assess car conditions.

Personal Motivation & Learning Goals

After a two-year break from data science, this project serves as a hands-on opportunity to refresh and update my skills. The main technical focus areas include:

    Web Scraping & Data Storage – Reinforcing skills in Python, Google Cloud Platform (GCP), and databases.
    Data Preprocessing & Exploratory Analysis – Extracting meaningful insights from raw data.
    Machine Learning for Price Prediction – Building and evaluating pricing models.
    Implementing a RAG-based Chatbot – Exploring state-of-the-art retrieval-augmented generation techniques.
    Image Classification for Car Condition Assessment – Applying deep learning for visual analysis.

Next Steps

The first and most crucial step is scraping the website, as it forms the foundation for all subsequent tasks. This will be my immediate focus.

All the code will be found on my github repos

## A First look at car.gr

On car.gr, when looking for cars on sale without any criteria we get the paginated of all listed cars.

![Sample Image](../photos/car_gr_list.png)

We can see more than 130 000 cars listed on the website. Now let's take a look at one ad:

![Sample Image](../photos/car_gr_example_ad.png)

We can see that the ads are organised as follows:

- A title: Usually brand and model of the car
- A short description
- A price
- Information about the seller
- Photos
- Registration year
- Mileage
- Fuel type
- Engine size
- Engine Power
- Number of doors
- Gearbox type
- Category of the vehicle
- Color
- A section to calculate the cost of a loan and other sections that don't interest us

We see that there is extensive information about each car and that is what we aim to scrape.



## Scraping Strategy:
