Skip to content

Exploratory Data Analysis & Data Cleaning of Vehicle Sales Dataset

Notifications You must be signed in to change notification settings

aprilhong/vehiclesales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚗 Exploratory Data Analysis and Data Cleaning of Vehical Sales Dataset 🧹

Dataset obtained from Kaggle provides in-depth details on each vehicle (make, model, year, features) alongside sales information (price, date). It even goes a step further, including estimated market values to help you track market trends. Additionally, you can analyze how a car's condition and mileage affect its selling price.

However, some cleaning is required before predicting on this dataset. The Vehicle Sales Notebook showcases the exploratory data analysis and cleaning process.

Issues discovered through early exploratory analysis

  • duplicated categorical values with different formats (ex. Toyota vs toyota)
  • data entry errors with values placed in the wrong columns

Methodologies

  • Removing duplicate categorical values
  • Spliting and Joining dataframes
  • Filling in missing data based on existing values
    • For example, vehicle model can be found if both make and trim are available.
  • Imputing missing values with median

Recommendations

  • The cleaned dataset can be used to forecast vehicle selling price.

File Descriptions

  • data : folder containing all data files
    • car_prices.csv: raw dataset from Kaggle
    • df_cleaned.csv: cleaned dataset exported from notebook
  • vehicalsales.ipynb : jupyter notebook with EDA and Data Cleaning
  • plots.py : module with plotting functions used in the notebook

About

Exploratory Data Analysis & Data Cleaning of Vehicle Sales Dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages