Dataset obtained from Kaggle provides in-depth details on each vehicle (make, model, year, features) alongside sales information (price, date). It even goes a step further, including estimated market values to help you track market trends. Additionally, you can analyze how a car's condition and mileage affect its selling price.
However, some cleaning is required before predicting on this dataset. The Vehicle Sales Notebook showcases the exploratory data analysis and cleaning process.
Issues discovered through early exploratory analysis
- duplicated categorical values with different formats (ex. Toyota vs toyota)
- data entry errors with values placed in the wrong columns
Methodologies
- Removing duplicate categorical values
- Spliting and Joining dataframes
- Filling in missing data based on existing values
- For example, vehicle model can be found if both make and trim are available.
- Imputing missing values with median
Recommendations
- The cleaned dataset can be used to forecast vehicle selling price.
File Descriptions