Skip to content

carboML/Price_predictor

Repository files navigation

Second hand car Price predictor

Project overview

Buying a brand new car is not always the best option. There are some people out there that like to change cars over the months. Others just don't have enough money to buy a brand new car. There are also cases where someone has just gotten their driver's license and prefers to learn how to drive in a second-hand car. My goal is to estimate the price of a car based on certain features. This way, we can know if the car we want to buy is cheap or expensive.

For an extensive explanation see my blog: https://carboml.github.io/

Problem statement

The second-hand market is full of options when buying a car. The price of this market is controlled by the seller. This way, the prices may vary a lot. Some people may find it hard to decide if the price of a car they are looking to buy is fair or not. My goal is to help those people by training a model that estimates the price of a car based on its features. I'm going to use a Kaggle dataset: https://www.kaggle.com/datasets/adityadesai13/used-car-dataset-ford-and-mercedes . This set has 100,000 car listings from UK second market. Each car has certain features. We are assuming the price is dependent on some of these features.

Metrics

The fit of the model is going to be evaluated using .score from sckit learn. I use this metric since two models are going to be compared, and I need a metric that can be used to compare both. 

Data Analysis

Each car is defined by 10 features.

  • price in £.
  • transmissionis the type of transmission of the car.
  • mileageis indicated is the distance distance used in miles.
  • mpgstand for miles per gallon of fuel.
  • engineSize, size in liters.
  • taxis the road tax for the car. Since all the car listing here are cars from UK, the tax is the tax from UK.

I get the representation of:

  • 1 : mpg per car brand. Does some brands uses more fuel than others?
  • 2 : Price per car brand. There are brands that are more expensive than others, but, how much?
  • 3 : Sales per brand.
  • 4 : Top 10 most selled cars
  • 5 : Price per year
  • 6 : Mpg per brand

Results

After fitting and comparing 3 different regression models, linearRegressorand KNeigbors and Decission tree regression the best model in this case is linearRegressor.

Conclusion

Estimating the price of a car based on some features is possible A person looking to buy a second-hand car in the UK, will find this project deeply useful.

Problems:: Since this dataset only contains a listing from the UK, it's possible that extrapolating this information to other parts of the world will result in a decrease in performance. This information has to be updated once in a while because there are factors, like inflation or the time of the year, that affect the price of the second hand market. For example, in summer, the price of motorcycles increases, as well as in winter, the price of cars.

Future work: It will be interesting to build a model that estimates the price of a car based on more features. Also, I would like to work on the client profile. These features also affect the price of the car. For example, younger drivers tend to treat cars worse. This will lead to a decrease in the price.

Intallations

The librarys used in this project are:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • sklearn

File descriptions

There are 3 jupyter notebooks and a data folder. The data folder contains all the .csv needed to build this model. The notebooks:

  • 1.- ETL pipeline: Contains the data preparation.
  • 2.- Data visualitation: Contains the different representations of the data.
  • 3.- Model fit: Contains the comparation and different fit of model to fin the optimal.

Author

Pablo Carbonero Álvarez

About

Price estimator based on certain features

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published