Car Prices Prediction

1.1 Introduction
In the automotive industry, car prices are influenced by various factors, such as horsepower and other specifications. For new companies entering car manufacturing, understanding these pricing factors is crucial. Machine learning offers a powerful tool for predicting car prices and uncovering key insights about significant features that affect these prices. By analyzing relevant data, we can provide valuable insights to new manufacturers, helping them make informed decisions about car production

1.2 Metrics
Mean Squared Error (MSE)
Mean Absolute Error (MAE)

1.3 Data Source
The dataset required for our analysis is available for download from the specified repository. Engaging with this data promises to be an enriching experience, as it will unveil key insights that are instrumental in accurately predicting prices for various car models.
https://www.kaggle.com/CooperUnion/cardataset

Within the car market, there's a stark contrast between luxury brands like Bugatti and Lamborghini, which are priced well beyond the average buyer's budget, and more affordable options such as Ford and Toyota. Our upcoming data analysis will provide comprehensive insights into the pricing trends across various car brands. We'll delve into a dataset rich with accurate and detailed information about car models and their prices.

Table of Contents
1. Car Prices Prediction
1.1 Introduction

1.2 Metrics

1.3 Source

1.4 Importing libraries

1.5 Reading the data

2. Exploratory data analysis
2.1 Countplot
    2.1.1 Countplot of Different Car Companies

   2.1.2 Countplot of the total cars per different years

   2.1.3 Counting the cars based on transmission type

   2.1.4 Countplot of Engine Fuel Type

   2.1.5 Countplot of Vehicle Size

2.2 Missingno
2.3 Groupby
   2.3.1 Grouping with 'Make' feature

   2.3.2 Grouping the data on the basis of Year

   2.3.3 Grouping on the basis of Transmission Type

   2.3.4 Grouping on the basis of Make and 'MSRP' values

   2.3.5 grouping on the basis of Make and 'Engine HP' values

   2.3.6 Grouping on the basis of Driven Wheels

   2.3.7 grouping on the basis of Make with 'Popularity' values

2.4 Scatterplot between 'highway MPG' and 'city mpg'

2.5 Boxplot
   2.5.1 Boxplot of 'highway MPG'

   2.5.2 Boxplot of 'city mpg'

   2.5.3 Boxplot of 2 features 'city mpg' and 'highway MPG'

   2.5.4 Boxplot of 'Engine HP'

2.6 lmplot
   2.6.1 lmplot between 'Engine HP' and 'Popularity'

   2.6.2 lmplot between 'Engine Cylinders' and 'Popularity'

   2.6.3 lmplot between 'Number of Doors' and 'Popularity'

   2.6.4 lmplot between 'Engine Cylinders' and 'Engine HP'

   2.6.5 lmplot between 'city mpg' and 'highway MPG'

   2.6.6 lmplot between 'city mpg' and 'Engine Cylinders'

2.7 Heatmap

2.8 Grouping on the basis of 'Year'

2.9 Plotting the barplot of 'Years of Manufacture'

3. Manipulating the Data
3.1 Shuffling the data

3.2 Dividing the data into training and testing set

3.3 Encoding the data

3.4 One Hot Encoding

3.5 Standardization and Normalization of data

4. Machine Learning Analysis
4.1 Linear Regression
   4.1.1 Regplot for Linear Regression Output

4.2 Support Vector Regressor
   4.2.1 Regplot for Support Vector Regressor

4.3 K - Neighbors Regressor
   4.3.1 Regplot for K - Neighbors Regressor

4.4 PLS Regression
   4.4.1 Regplot for PLS Regression

4.5 Decision Tree Regressor
   4.5.1 Regplot for Decision Tree Regressor

4.6 Gradient Boosting Regressor
   4.6.1 Regplt for Gradient Boosting Regressor

4.7 MLP Regressor
   4.7.1 Regplot of MLP Regressor

4.8 Dataframe of Machine Learning Models

4.9 (a). Barplot of machine learning models with mean absolute error
4.9 (b). Barplot of machine learning models with mean squared error

5. Conclusion

1.4 Importing libraries
We will import several libraries essential for data analysis, visualization, and understanding machine learning models. The following list details the libraries to be included.

In [1]:
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler 
from sklearn.model_selection import GridSearchCV, train_test_split, KFold, cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error, accuracy_score, mean_absolute_error, mean_squared_error
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.cross_decomposition import PLSRegression
from sklearn.tree import DecisionTreeRegressor
import missingno as msno
from sklearn.utils import shuffle 
from category_encoders import TargetEncoder, OneHotEncoder
import warnings
warnings.filterwarnings("ignore")
sns.set(rc = {'figure.figsize': (20, 20)})
%matplotlib inline 

1.5 Reading the data
We will use Pandas to read the data, storing it in a variable named 'data' for subsequent calculations.

In [2]:
data = pd.read_csv('data/data.csv')

In [3]:
data.shape

(11914, 16)

In [4]:
data.head()

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,BMW,1 Series M,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500


2.1 Countplot
Countplots are used with the help of seaborn library in python. These plots give us a good understanding of the total number of elements present in a particular feature that we have considered. Below are a list of countplots for different features of interest which would help in understanding the overall distribution of data based on different features. Therefore, taking a look at these plots would ensure that one is familiar with the data along with the total number of classes for different features respectively.