# Exploring price, mileage, and condition of used vehicles

This notebook explores and prepares the used car dataset. The goal is to understand the structure, handle missing data, and identify useful patterns for the dashboard.

## Steps:
1. Load and preview dataset
2. Handle missing value using the group-based strategy transform()
3. Explore data visually with a histogram and scattered plot

In [None]:
import pandas as pd
import plotly.express as px

df = pd.read_csv('vehicles_us.csv')
df.head()

In [None]:
df.isna().sum()

## Missing Values

Restoring missing data

- **'is_4wd"**: Filled with 0 and convert data type as bool.
- **'paint_color'**: Filled with unknown since its not possible to fill value based on other car features.
- **'cylinders'**, **'model_year'**, **'odometer'**: Filled using group-based median to preserve meaningful values with the transform() method.

In [None]:
df['is_4wd'] = df['is_4wd'].fillna(0).astype(bool)
df['paint_color'] = df['paint_color'].fillna('Unknown')
df['cylinders'] = df.groupby('type')['cylinders'].transform(lambda x: x.fillna(x.median()))
df['model_year'] = df.groupby('model')['model_year'].transform(lambda x: x.fillna(x.median()))
df['odometer'] = df.groupby('model_year')['odometer'].transform(lambda x: x.fillna(x.median()))
df.dropna(subset = ['price', 'odometer', 'condition'], inplace = True)

In [None]:
df.isna().sum()

## Data Visualization

Explore the cleaned dataset using visualization to better understand relationships between feature such as price, odometer readings, and year.

In [None]:
fig1 = px.histogram(df, x = 'odometer', nbins = 50, title = 'Odometer Distribution')
fig1.show()

In [None]:
fig2 = px.scatter(df, x='odometer', y='price', color='condition', hover_data=['model_year', 'model'], title = 'Price vs. Odometer by Vehicle Condition')
fig2.show()