
# 📊 Exploratory Data Analysis (EDA) - Used Car Dataset

This notebook presents a detailed **Exploratory Data Analysis (EDA)** of a used car dataset.

We aim to uncover patterns and insights in the data before building predictive models. Key questions we try to answer:
- What are the trends in car prices?
- Which features are correlated with selling price?
- How do fuel type, transmission, and ownership affect pricing?

---


In [None]:
# 📦 Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [None]:
# 📂 Load Dataset
df = pd.read_csv("used_car_data.csv")
df.head()

In [None]:
# 🧼 Basic Info and Missing Values
print(df.info())
print("\nMissing values:")
print(df.isnull().sum())

In [None]:
# 📈 Distribution of Selling Price
plt.figure(figsize=(6,4))
sns.histplot(df['selling_price'], kde=True, color='skyblue')
plt.title('Distribution of Selling Price')
plt.xlabel('Selling Price')
plt.ylabel('Frequency')
plt.show()

In [None]:
# 🔢 Distribution of Kilometers Driven
plt.figure(figsize=(6,4))
sns.histplot(df['km_driven'], bins=30, kde=True, color='orange')
plt.title('Kilometers Driven Distribution')
plt.xlabel('Kilometers Driven')
plt.ylabel('Count')
plt.show()

In [None]:
# 🔋 Fuel Type Count
plt.figure(figsize=(6,4))
sns.countplot(x='fuel', data=df, palette='Set2')
plt.title('Fuel Type Distribution')
plt.xlabel('Fuel Type')
plt.ylabel('Count')
plt.show()

In [None]:
# 🔄 Seller Type Count
plt.figure(figsize=(6,4))
sns.countplot(x='seller_type', data=df, palette='Set3')
plt.title('Seller Type Distribution')
plt.xlabel('Seller Type')
plt.ylabel('Count')
plt.show()

In [None]:
# ⚙️ Transmission Type Count
plt.figure(figsize=(6,4))
sns.countplot(x='transmission', data=df, palette='Set1')
plt.title('Transmission Type')
plt.xlabel('Transmission')
plt.ylabel('Count')
plt.show()

In [None]:
# 🧠 Correlation Heatmap
df_encoded = pd.get_dummies(df.drop("name", axis=1), drop_first=True)

plt.figure(figsize=(10,6))
sns.heatmap(df_encoded.corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

## ✅ EDA Summary

- The dataset has diverse fuel types, seller types, and transmission modes.
- Selling price and kilometers driven are right-skewed.
- Fuel type, transmission, and ownership history influence price.
- Strong correlation between 'present_price' and 'selling_price'.

Next steps: use these insights to build predictive models.
