Car Price Prediction

A complete end-to-end machine learning pipeline for predicting the resale value of used cars. From data exploration and preprocessing, through model training and evaluation, to serving predictions via a REST API and containerized deployment.

🚀 Project Features

Exploratory Data Analysis (EDA)
Investigate data integrity, feature distributions, correlations, and outliers to guide modeling decisions.
Data Preprocessing & Feature Engineering
Automatic handling of missing values, categorical encoding, numerical scaling, and derivation of new features such as car age.
Model Development & Hyperparameter Tuning
Baseline linear regression plus tree-based regressors (Random Forest, Gradient Boosting) with cross-validated grid search.
Model Evaluation
Performance metrics (MAE, RMSE, R²), residual analysis, and feature-importance visualizations.
Model Serialization
A single, versioned sklearn.Pipeline (including preprocessing) saved via joblib for reproducible inference.
REST API
Lightweight FastAPI/Flask service exposing a /predict endpoint for real-time price estimates.
Docker Support
Dockerfile for containerized deployment on platforms such as Heroku, AWS ECS, or Azure Web Apps.

📁 Repository Structure

/ ├── data/ # Raw and processed datasets ├── images/ # Static assets (figures, banners) ├── notebooks/ # Exploratory Data Analysis (EDA) notebooks ├── src/ # Application code │ ├── data_prep.py # Data cleaning & feature‐engineering │ ├── train.py # Model training & hyperparameter tuning │ ├── evaluate.py # Evaluation metrics & plots │ └── app.py # Flask web application ├── requirements.txt # Python dependencies └── README.md # This file

🔍 Overview

We train regression models (Linear Regression, Random Forest, Gradient Boosting) on a used-car dataset to learn how features like age, mileage, engine size, make/model and fuel type influence resale price. After evaluating performance (MAE, RMSE, R²), the best model is serialized and served via a lightweight Flask API.

⚙️ Setup & Installation

Clone the repository and navigate into the project folder.
Create a Python virtual environment and activate it.
Install all required packages from requirements.txt.

📊 Exploratory Data Analysis

Open the eda.ipynb notebook under notebooks/.

Perform data quality checks (missing values, duplicates, outliers).
Visualize distributions for numeric and categorical features.
Generate correlation matrices and scatter plots (e.g. price vs. mileage, price vs. age) to inform feature engineering.

🏋️ Model Training & Evaluation

Prepare and clean your processed dataset (under data/processed/).
Run the training script in src/train.py to fit your regression pipelines and perform hyperparameter tuning.
Run the evaluation script in src/evaluate.py to compute MAE, RMSE and R² on hold-out data and save residual plots.

Outputs:

models/car_price_pipeline.joblib — serialized sklearn Pipeline
reports/metrics.json — performance metrics
Residual plots in the reports/ folder

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Car Price Prediction

🚀 Project Features

📁 Repository Structure

🔍 Overview

⚙️ Setup & Installation

📊 Exploratory Data Analysis

🏋️ Model Training & Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
images		images
notebooks		notebooks
src		src
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Car Price Prediction

🚀 Project Features

📁 Repository Structure

🔍 Overview

⚙️ Setup & Installation

📊 Exploratory Data Analysis

🏋️ Model Training & Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages