
# 🚗 Used Car Price Prediction

## 📄 Project Description

This project focuses on predicting the **selling price** of used cars based on various features such as year, kilometers driven, fuel type, seller type, and more.

We perform:
- 📊 **Exploratory Data Analysis (EDA)**
- 🔄 **Data Preprocessing**
- 🧠 **Model Training using Random Forest Regressor**
- 📈 **Model Evaluation with Visuals**

The dataset is visualized using Seaborn and Matplotlib, and the trained model is evaluated using R² Score and Mean Squared Error.


In [None]:
# 📦 Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')


In [None]:
# 📂 Load Dataset
df = pd.read_csv("used_car_data.csv")
df.head()

In [None]:
# 📊 Selling Price Distribution
plt.figure(figsize=(6,4))
sns.histplot(df['selling_price'], kde=True)
plt.title('Selling Price Distribution')
plt.xlabel('Selling Price')
plt.ylabel('Frequency')
plt.show()

In [None]:
# 🔋 Fuel Type Count
plt.figure(figsize=(6,4))
sns.countplot(x='fuel', data=df)
plt.title('Fuel Type Count')
plt.xlabel('Fuel Type')
plt.ylabel('Count')
plt.show()

In [None]:
# 🔥 Correlation Heatmap
df_encoded = pd.get_dummies(df.drop("name", axis=1), drop_first=True)

plt.figure(figsize=(8,6))
sns.heatmap(df_encoded.corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

In [None]:
# 🧠 Model Building: Random Forest Regressor
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

X = df_encoded.drop("selling_price", axis=1)
y = df_encoded["selling_price"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print(f"R² Score: {r2:.2f}")
print(f"MSE: {mse:.2f}")


In [None]:
# 📈 Actual vs Predicted Plot
plt.figure(figsize=(6,6))
plt.scatter(y_test, y_pred, c='blue')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('Actual vs Predicted Selling Price')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red')
plt.show()

## ✅ Conclusion

- We successfully built a used car price prediction model.
- Explored and visualized the dataset.
- Trained a Random Forest model with decent performance.
- Evaluated results using scatter plot and metrics.

🔗 Ready to share on GitHub!
