# 🚗 Car Price Prediction with Machine Learning

Welcome to this state-of-the-art notebook on **Car Price Prediction** using Machine Learning. In this project, we will:

- Explore and understand the dataset.
- Visualize relationships between features and car prices.
- Build a predictive model using advanced regression techniques.
- Evaluate and interpret the model's performance.

Let's get started! 🛠️

In [1]:
# 📦 Importing Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

sns.set(style="whitegrid")  # Aesthetic plots

## 📑 Load the Dataset

We will use the `car data.csv` file located in the same directory. This dataset includes various features like brand, model, year, horsepower, mileage, etc.

In [2]:
# 🔍 Load the dataset
df = pd.read_csv("car data.csv")

# Quick preview
df.head()

## 🧹 Data Cleaning and Exploration

Let's inspect the dataset for missing values, data types, and descriptive statistics.

In [3]:
# Dataset information
df.info()

# Check for missing values
df.isnull().sum()

# Descriptive statistics
df.describe()

## 📊 Data Visualization

Let's visualize relationships between the features and car price.

In [4]:
# Plot price distribution
plt.figure(figsize=(10, 5))
sns.histplot(df['Selling_Price'], kde=True, bins=30, color='teal')
plt.title("Distribution of Selling Prices")
plt.xlabel("Selling Price (in lakhs)")
plt.ylabel("Frequency")
plt.show()

# Correlation heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Feature Correlations")
plt.show()

## 🛠️ Feature Engineering

Handle categorical variables and select relevant features.

In [5]:
# Encode categorical variables
df_encoded = pd.get_dummies(df, drop_first=True)

# Feature selection
X = df_encoded.drop('Selling_Price', axis=1)
y = df_encoded['Selling_Price']

## 🔥 Model Building

We'll use a **Random Forest Regressor** for its robustness and strong performance on structured data.

In [6]:
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
rf_model = RandomForestRegressor(random_state=42)

# Hyperparameter tuning (optional)
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(estimator=rf_model, param_grid=param_grid,
                           cv=5, scoring='r2', n_jobs=-1, verbose=2)

grid_search.fit(X_train, y_train)

# Best model
best_model = grid_search.best_estimator_
print(f"Best parameters: {grid_search.best_params_}")

## 📈 Model Evaluation

Let's evaluate the performance of our model using key metrics.

In [7]:
# Predict on test set
y_pred = best_model.predict(X_test)

# Evaluation metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
print(f"R² Score: {r2:.2f}")

## 📝 Conclusion

In this project, we built a **state-of-the-art car price prediction model** using Random Forest Regression. We:

- Explored and visualized the dataset.
- Engineered features and handled categorical variables.
- Tuned hyperparameters for optimal performance.
- Evaluated the model using robust metrics.

This pipeline is fully extendable to other car datasets and can be enhanced by integrating more features like brand reputation, safety ratings, and consumer reviews. 🚀