🚗 Exploratory Data Analysis and Data Cleaning of Vehical Sales Dataset 🧹

Dataset obtained from Kaggle provides in-depth details on each vehicle (make, model, year, features) alongside sales information (price, date). It even goes a step further, including estimated market values to help you track market trends. Additionally, you can analyze how a car's condition and mileage affect its selling price.

However, some cleaning is required before predicting on this dataset. The Vehicle Sales Notebook showcases the exploratory data analysis and cleaning process.

Issues discovered through early exploratory analysis

duplicated categorical values with different formats (ex. Toyota vs toyota)
data entry errors with values placed in the wrong columns

Methodologies

Removing duplicate categorical values
Spliting and Joining dataframes
Filling in missing data based on existing values
- For example, vehicle model can be found if both make and trim are available.
Imputing missing values with median

Recommendations

The cleaned dataset can be used to forecast vehicle selling price.

File Descriptions

data : folder containing all data files
- car_prices.csv: raw dataset from Kaggle
- df_cleaned.csv: cleaned dataset exported from notebook
vehicalsales.ipynb : jupyter notebook with EDA and Data Cleaning
plots.py : module with plotting functions used in the notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

🚗 Exploratory Data Analysis and Data Cleaning of Vehical Sales Dataset 🧹

Files

README.md

Latest commit

History

README.md

File metadata and controls

🚗 Exploratory Data Analysis and Data Cleaning of Vehical Sales Dataset 🧹