Skip to content

OSBrainer/CarPrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Car Price Prediction

SmileMcqueen pic

A complete end-to-end machine learning pipeline for predicting the resale value of used cars. From data exploration and preprocessing, through model training and evaluation, to serving predictions via a REST API and containerized deployment.

🚀 Project Features

  • Exploratory Data Analysis (EDA)
    Investigate data integrity, feature distributions, correlations, and outliers to guide modeling decisions.

  • Data Preprocessing & Feature Engineering
    Automatic handling of missing values, categorical encoding, numerical scaling, and derivation of new features such as car age.

  • Model Development & Hyperparameter Tuning
    Baseline linear regression plus tree-based regressors (Random Forest, Gradient Boosting) with cross-validated grid search.

  • Model Evaluation
    Performance metrics (MAE, RMSE, R²), residual analysis, and feature-importance visualizations.

  • Model Serialization
    A single, versioned sklearn.Pipeline (including preprocessing) saved via joblib for reproducible inference.

  • REST API
    Lightweight FastAPI/Flask service exposing a /predict endpoint for real-time price estimates.

  • Docker Support
    Dockerfile for containerized deployment on platforms such as Heroku, AWS ECS, or Azure Web Apps.


📁 Repository Structure

/ ├── data/ # Raw and processed datasets ├── images/ # Static assets (figures, banners) ├── notebooks/ # Exploratory Data Analysis (EDA) notebooks ├── src/ # Application code │ ├── data_prep.py # Data cleaning & feature‐engineering │ ├── train.py # Model training & hyperparameter tuning │ ├── evaluate.py # Evaluation metrics & plots │ └── app.py # Flask web application ├── requirements.txt # Python dependencies └── README.md # This file

🔍 Overview

We train regression models (Linear Regression, Random Forest, Gradient Boosting) on a used-car dataset to learn how features like age, mileage, engine size, make/model and fuel type influence resale price. After evaluating performance (MAE, RMSE, R²), the best model is serialized and served via a lightweight Flask API.


⚙️ Setup & Installation

  1. Clone the repository and navigate into the project folder.
  2. Create a Python virtual environment and activate it.
  3. Install all required packages from requirements.txt.

📊 Exploratory Data Analysis

Open the eda.ipynb notebook under notebooks/.

  • Perform data quality checks (missing values, duplicates, outliers).
  • Visualize distributions for numeric and categorical features.
  • Generate correlation matrices and scatter plots (e.g. price vs. mileage, price vs. age) to inform feature engineering.

🏋️ Model Training & Evaluation

  1. Prepare and clean your processed dataset (under data/processed/).
  2. Run the training script in src/train.py to fit your regression pipelines and perform hyperparameter tuning.
  3. Run the evaluation script in src/evaluate.py to compute MAE, RMSE and R² on hold-out data and save residual plots.

Outputs:

  • models/car_price_pipeline.joblib — serialized sklearn Pipeline
  • reports/metrics.json — performance metrics
  • Residual plots in the reports/ folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors