A comprehensive machine learning project for predicting used car selling prices using various vehicle attributes. This project implements Random Forest regression with hyperparameter tuning to provide accurate price predictions for the used car market.
This project aims to help users estimate the selling price of used cars based on key vehicle characteristics such as year, present price, kilometers driven, fuel type, seller type, transmission, and ownership history. The model serves both buyers and sellers in making informed decisions in the used car market.
- Advanced ML Algorithm: Random Forest Regression with hyperparameter optimization
- Feature Engineering: Created car age features and handled categorical variables
- Data Preprocessing: Comprehensive data cleaning and encoding
- Feature Importance Analysis: Identified key factors affecting car prices
- Model Persistence: Saved model for deployment using pickle
- Visualization: Detailed EDA with correlation analysis and distribution plots
Source: Used Car Price Dataset
Records: 301 car sales records
Features: 8 key attributes after preprocessing
Target Variable: Selling Price (in lakhs)
- Car_Name: Vehicle model name
- Year: Manufacturing year (2003-2018)
- Selling_Price: Actual selling price (target variable)
- Present_Price: Current market price
- Kms_Driven: Total kilometers driven (500 - 500,000 km)
- Fuel_Type: Petrol, Diesel, or CNG
- Seller_Type: Dealer or Individual
- Transmission: Manual or Automatic
- Owner: Number of previous owners (0-3)
- Present_Price: Current market value
- Kms_Driven: Vehicle usage indicator
- Owner: Ownership history
- no_year: Car age (2020 - manufacturing year)
- Fuel_Type_Diesel: Binary encoding for diesel cars
- Fuel_Type_Petrol: Binary encoding for petrol cars
- Seller_Type_Individual: Binary encoding for individual sellers
- Transmission_Manual: Binary encoding for manual transmission
- Missing Values: 0 (complete dataset)
- Duplicates: Handled through data validation
- Outliers: Analyzed using box plots and statistical methods
- Car Age Calculation:
no_year = 2020 - Year - Categorical Encoding: One-hot encoding for categorical variables
- Feature Removal: Dropped
Car_Nameto avoid overfitting - Data Type Optimization: Ensured proper data types for all features
- Price Range: ₹0.10 - ₹35.00 lakhs
- Car Age: 2 - 17 years
- Fuel Type Distribution: Petrol (majority), Diesel, CNG
- Transmission: Manual (majority), Automatic
Random Forest Regressor was chosen for its:
- Robust performance with mixed data types
- Feature importance capabilities
- Resistance to overfitting
- Excellent handling of non-linear relationships
- Training Accuracy: Near perfect fit on training data
- Feature Importance: Present_Price (38.2%) as top predictor
- Cross-Validation: 5-fold CV with negative MSE scoring
- Hyperparameter Optimization: RandomizedSearchCV implementation
Best Parameters:
{
'n_estimators': 700,
'min_samples_split': 15,
'min_samples_leaf': 1,
'max_features': 'auto',
'max_depth': 20
}Based on Extra Trees Regressor analysis:
- Present_Price (38.24%) - Current market value
- Fuel_Type_Diesel (22.26%) - Diesel vehicle indicator
- Seller_Type_Individual (13.05%) - Individual seller indicator
- Transmission_Manual (13.18%) - Manual transmission
- no_year (7.86%) - Car age
- Cross-Validation Score: Optimized using RandomizedSearchCV
- Feature Importance: Validated using Extra Trees Regressor
- Residual Analysis: Normal distribution of prediction errors
- Scatter Plot Analysis: Good correlation between predicted vs actual values
- Present Price Impact: Strongest predictor (38% importance)
- Fuel Type Effect: Diesel cars tend to have higher resale value
- Seller Type Influence: Dealer vs Individual seller pricing patterns
- Age Depreciation: Clear negative correlation with car age
- Transmission Preference: Manual transmission market dynamics
- Price Distribution: Right-skewed distribution with most cars under ₹10 lakhs
- Correlation Analysis: Strong positive correlation (0.88) between present price and selling price
- Fuel Type Patterns: Diesel cars show higher selling prices on average
- Age Impact: Clear depreciation trend with increasing car age
- Seller Type Effect: Dealer prices generally higher than individual sellers
- Correlation heatmap showing feature relationships
- Box plots for outlier detection
- Pair plots revealing feature interactions
- Feature importance bar charts
- Residual distribution plots
- Price Optimization: Set competitive selling prices
- Market Analysis: Understand factors affecting car value
- Timing Decisions: Optimal time to sell based on depreciation
- Price Validation: Verify if asking price is fair
- Negotiation Tool: Use predictions for price negotiations
- Market Insights: Understand value drivers in used car market
- Inventory Pricing: Accurate pricing for stock vehicles
- Purchase Decisions: Evaluate cars for acquisition
- Market Trends: Track pricing patterns across segments
