# 📊 Car Price Prediction using AutoGluon (Regression)
In this notebook, we use the **AutoGluon** framework to predict the car price (numerical value) based on features like brand, year, mileage, engine size, etc.

We treat this as a **regression problem**, where the target variable is the `Price` column.

## 🔍 Data Dictionary
- **Brand**: Car manufacturer name
- **Model**: Specific model of the car
- **Year**: Year of manufacture
- **Engine_Size**: Size of the engine (liters)
- **Fuel_Type**: Type of fuel used (Petrol/Diesel/EV)
- **Transmission**: Gear system (Manual/Auto)
- **Mileage**: Kilometers the car has traveled
- **Doors**: Number of doors
- **Owner_Count**: Number of previous owners
- **Price**: 💰 Target variable (in dollars)

In [1]:
import pandas as pd
from autogluon.tabular import TabularPredictor

In [2]:
df = pd.read_csv('car_price_dataset.csv')
df = df.dropna(subset=['Price'])
df.head()

Unnamed: 0,Brand,Model,Year,Engine_Size,Fuel_Type,Transmission,Mileage,Doors,Owner_Count,Price
0,Kia,Rio,2020,4.2,Diesel,Manual,289944,3,5,8501
1,Chevrolet,Malibu,2012,2.0,Hybrid,Automatic,5356,2,3,12092
2,Mercedes,GLA,2020,4.2,Diesel,Automatic,231440,4,2,11171
3,Audi,Q5,2023,2.0,Electric,Manual,160971,2,1,11780
4,Volkswagen,Golf,2003,2.6,Hybrid,Semi-Automatic,286618,3,3,2867


In [3]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

In [4]:
predictor = TabularPredictor(label='Price', problem_type='regression', eval_metric='r2')
predictor.fit(train_data, time_limit=600)

No path specified. Models will be saved in: "AutogluonModels\ag-20250416_005409"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.2
Python Version:     3.11.11
Operating System:   Windows
Platform Machine:   AMD64
Platform Version:   10.0.26100
CPU Count:          22
Memory Avail:       13.48 GB / 31.43 GB (42.9%)
Disk Space Avail:   230.17 GB / 401.65 GB (57.3%)
No presets specified! To achieve strong results with AutoGluon, it is recommended to use the available presets. Defaulting to `'medium'`...
	Recommended Presets (For more details refer to https://auto.gluon.ai/stable/tutorials/tabular/tabular-essentials.html#presets):
	presets='experimental' : New in v1.2: Pre-trained foundation model + parallel fits. The absolute best accuracy without consideration for inference speed. Does not support GPU.
	presets='best'         : Maximize accuracy. Recommended for most users. Use in competitions and benchmarks.
	presets='high'         : Strong accuracy with fast inference speed.
	pres

[1000]	valid_set's l2: 16442.4	valid_set's r2: 0.998357
[2000]	valid_set's l2: 14447.6	valid_set's r2: 0.998556
[3000]	valid_set's l2: 13975.6	valid_set's r2: 0.998603
[4000]	valid_set's l2: 13776.5	valid_set's r2: 0.998623
[5000]	valid_set's l2: 13611.6	valid_set's r2: 0.99864
[6000]	valid_set's l2: 13578.2	valid_set's r2: 0.998643
[7000]	valid_set's l2: 13547.8	valid_set's r2: 0.998646
[8000]	valid_set's l2: 13538.2	valid_set's r2: 0.998647
[9000]	valid_set's l2: 13520.2	valid_set's r2: 0.998649
[10000]	valid_set's l2: 13499.8	valid_set's r2: 0.998651


	0.9987	 = Validation score   (r2)
	105.47s	 = Training   runtime
	0.28s	 = Validation runtime
Fitting model: LightGBM ... Training model for up to 484.37s of the 484.37s of remaining time.


[1000]	valid_set's l2: 18589.6	valid_set's r2: 0.998142
[2000]	valid_set's l2: 17619.9	valid_set's r2: 0.998239
[3000]	valid_set's l2: 17237.6	valid_set's r2: 0.998277
[4000]	valid_set's l2: 17142.4	valid_set's r2: 0.998287
[5000]	valid_set's l2: 17082.3	valid_set's r2: 0.998293
[6000]	valid_set's l2: 17035.6	valid_set's r2: 0.998297
[7000]	valid_set's l2: 17027.5	valid_set's r2: 0.998298
[8000]	valid_set's l2: 17010.4	valid_set's r2: 0.9983
[9000]	valid_set's l2: 17001.8	valid_set's r2: 0.998301
[10000]	valid_set's l2: 16994.6	valid_set's r2: 0.998302


	0.9983	 = Validation score   (r2)
	70.96s	 = Training   runtime
	0.2s	 = Validation runtime
Fitting model: RandomForestMSE ... Training model for up to 411.29s of the 411.29s of remaining time.
	0.9749	 = Validation score   (r2)
	2.4s	 = Training   runtime
	0.16s	 = Validation runtime
Fitting model: CatBoost ... Training model for up to 408.31s of the 408.31s of remaining time.
	Ran out of time, early stopping on iteration 2482.
	0.9998	 = Validation score   (r2)
	408.26s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.00s of the -0.09s of remaining time.
	Ensemble Weights: {'CatBoost': 0.929, 'LightGBMXT': 0.071}
	0.9998	 = Validation score   (r2)
	0.15s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 600.33s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 2725.6 rows/s (800 batch size)
TabularPredictor saved. To load, use: predictor = TabularPre

<autogluon.tabular.predictor.predictor.TabularPredictor at 0x1af071ad0d0>

In [5]:
predictor.evaluate(test_data)

{'r2': 0.999834205450661,
 'root_mean_squared_error': -39.02933672084759,
 'mean_squared_error': -1523.289124869302,
 'mean_absolute_error': -23.7973583984375,
 'pearsonr': 0.99991739613012,
 'median_absolute_error': -17.32568359375}

## ✅ Conclusion Summary
- The regression model using AutoGluon was able to predict car prices based on various vehicle features.
- **R² Score** and other metrics indicate how well the model explains the variance in car pricing.
- Further improvement can be achieved by hyperparameter tuning or feature engineering.