This project involves the analysis of a car dataset using Python and Pandas, focusing on exploring, cleaning, and modeling the data. The dataset contains information about various car features, including fuel economy (mpg), horsepower, acceleration, and more.
- Dataset Overview:
Load the dataset into a Pandas DataFrame. Examine the number of features and examples in the dataset.
- Missing Values:
Identify features with missing values. Determine the number of missing values for each feature. Implement imputation methods (e.g., mean, median, or mode) to fill missing values.
- Fuel Economy Analysis:
Use a box plot to compare fuel economy (mpg) for different countries.
- Distribution Analysis:
Determine which feature ('acceleration', 'horsepower', or 'mpg') has a distribution most similar to a Gaussian. Visualize histograms for each feature.
- Correlation Analysis:
Create a scatter plot of 'horsepower' vs. 'mpg' to explore correlation. Implement simple linear regression using the closed-form solution. Visualize the learned regression line.
- Advanced Regression:
Implement quadratic regression using a quadratic function. Repeat simple linear regression using the gradient descent algorithm.
Python
Pandas
NumPy
Matplotlib
Seaborn