# 🚗 Car Price Dataset - EDA using ydata-profiling

This notebook performs exploratory data analysis (EDA) on the car price dataset using `ydata-profiling` to understand data distribution, correlation, missing values, and feature insights.

In [2]:
# Install ydata-profiling if not already installed
!pip install ydata-profiling

Collecting ydata-profiling
  Downloading ydata_profiling-4.16.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting pydantic>=2 (from ydata-profiling)
  Downloading pydantic-2.11.0-py3-none-any.whl.metadata (63 kB)
Collecting visions<0.8.2,>=0.7.5 (from visions[type_image_path]<0.8.2,>=0.7.5->ydata-profiling)
  Using cached visions-0.8.1-py3-none-any.whl.metadata (11 kB)
Collecting htmlmin==0.1.12 (from ydata-profiling)
  Using cached htmlmin-0.1.12.tar.gz (19 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting phik<0.13,>=0.11.1 (from ydata-profiling)
  Downloading phik-0.12.4-cp311-cp311-win_amd64.whl.metadata (5.6 kB)
Collecting seaborn<0.14,>=0.10.1 (from ydata-profiling)
  Downloading seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting multimethod<2,>=1.4 (from ydata-profiling)
  Using cached multimethod-1.12-py3-none-any.whl.metadata (9.6 kB)
Collecting typeguard<5,>=3 (from ydata-profiling)
  Using cached typ

In [3]:
import pandas as pd
from ydata_profiling import ProfileReport

# Load the dataset
df = pd.read_csv('car_price_dataset.csv')

# Display basic info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Brand         10000 non-null  object 
 1   Model         10000 non-null  object 
 2   Year          10000 non-null  int64  
 3   Engine_Size   10000 non-null  float64
 4   Fuel_Type     10000 non-null  object 
 5   Transmission  10000 non-null  object 
 6   Mileage       10000 non-null  int64  
 7   Doors         10000 non-null  int64  
 8   Owner_Count   10000 non-null  int64  
 9   Price         10000 non-null  int64  
dtypes: float64(1), int64(5), object(4)
memory usage: 781.4+ KB


In [4]:
# Generate the profiling report
profile = ProfileReport(df, title="Car Price Dataset EDA Report", explorative=True)
profile.to_file("car_price_profile_report.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]


[A%|                                                                                           | 0/10 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 30.71it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

### ✅ Output

- A full EDA report will be saved as `car_price_profile_report.html`
- It includes data types, missing values, descriptive stats, distributions, and correlations.