In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge, RidgeClassifier
from sklearn.feature_selection import RFE
from sklearn.decomposition import PCA
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.pipeline import Pipeline
from scipy.stats import skew

# What drives the price of a car?

![](images/kurt.jpeg)

**OVERVIEW**

In this application, you will explore a dataset from kaggle. The original dataset contained information on 3 million used cars. The provided dataset contains information on 426K cars to ensure speed of processing.  Your goal is to understand what factors make a car more or less expensive.  As a result of your analysis, you should provide clear recommendations to your client -- a used car dealership -- as to what consumers value in a used car.

### CRISP-DM Framework

<center>
    <img src = images/crisp.png width = 50%/>
</center>


To frame the task, throughout our practical applications we will refer back to a standard process in industry for data projects called CRISP-DM.  This process provides a framework for working through a data problem.  Your first step in this application will be to read through a brief overview of CRISP-DM [here](https://mo-pcco.s3.us-east-1.amazonaws.com/BH-PCMLAI/module_11/readings_starter.zip).  After reading the overview, answer the questions below.

### Business Understanding

From a business perspective, we are tasked with identifying key drivers for used car prices.  In the CRISP-DM overview, we are asked to convert this business framing to a data problem definition.  Using a few sentences, reframe the task as a data task with the appropriate technical vocabulary. 

What features from the dataset impact the price of the Car positively ?

What features from the dataset impact the price of the Car negatively ?

What features from the dataset do not impact the price of the Car ?

### Data Understanding

After considering the business understanding, we want to get familiar with our data.  Write down some steps that you would take to get to know the dataset and identify any quality issues within.  Take time to get to know the dataset and explore what information it contains and how this could be used to inform your business understanding.

Get shape of the data; Get missing values; Get unique values for year and region; Subset data using year

In [12]:
#Vehicle data
vd = pd.read_csv('/Users/satyaki/Documents/PCMLAI/Course_material/Module_11/practical_application_II_starter/data/vehicles.csv')

In [13]:
vd.head()

Unnamed: 0,id,region,price,year,manufacturer,model,condition,cylinders,fuel,odometer,title_status,transmission,VIN,drive,size,type,paint_color,state
0,7222695916,prescott,6000,,,,,,,,,,,,,,,az
1,7218891961,fayetteville,11900,,,,,,,,,,,,,,,ar
2,7221797935,florida keys,21000,,,,,,,,,,,,,,,fl
3,7222270760,worcester / central MA,1500,,,,,,,,,,,,,,,ma
4,7210384030,greensboro,4900,,,,,,,,,,,,,,,nc


In [14]:
vd.shape

(426880, 18)

In [15]:
# Missing values by feature
vd.isnull().sum()

id                   0
region               0
price                0
year              1205
manufacturer     17646
model             5277
condition       174104
cylinders       177678
fuel              3013
odometer          4400
title_status      8242
transmission      2556
VIN             161042
drive           130567
size            306361
type             92858
paint_color     130203
state                0
dtype: int64

### Data Preparation

After our initial exploration and fine tuning of the business understanding, it is time to construct our final dataset prior to modeling.  Here, we want to make sure to handle any integrity issues and cleaning, the engineering of new features, any transformations that we believe should happen (scaling, logarithms, normalization, etc.), and general preparation for modeling with `sklearn`. 

In [20]:
vd["year"].value_counts()

year
2017.0    36420
2018.0    36369
2015.0    31538
2013.0    30794
2016.0    30434
          ...  
1943.0        1
1915.0        1
1902.0        1
1905.0        1
1909.0        1
Name: count, Length: 114, dtype: int64

In [22]:
#Subsetting data
vd = vd[vd['year']>=2000]
vd['year'].value_counts()

year
2017.0    36420
2018.0    36369
2015.0    31538
2013.0    30794
2016.0    30434
2014.0    30283
2019.0    25375
2012.0    23898
2011.0    20341
2020.0    19298
2008.0    17150
2010.0    15829
2007.0    14873
2006.0    12763
2009.0    12185
2005.0    10622
2004.0     8971
2003.0     7151
2002.0     5587
2001.0     4443
2000.0     3572
2021.0     2396
2022.0      133
Name: count, dtype: int64

In [25]:
vd.shape

(400425, 18)

In [27]:
vd.isnull().sum()

id                   0
region               0
price                0
year                 0
manufacturer     13053
model             3700
condition       165909
cylinders       170074
fuel              2688
odometer          4252
title_status      7817
transmission      2444
VIN             140492
drive           121914
size            290775
type             83167
paint_color     121424
state                0
dtype: int64

In [29]:
#Drop NA
vdc = vd.dropna()

In [31]:
vdc.shape

(33563, 18)

In [33]:
vdc.info()

<class 'pandas.core.frame.DataFrame'>
Index: 33563 entries, 126 to 426836
Data columns (total 18 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            33563 non-null  int64  
 1   region        33563 non-null  object 
 2   price         33563 non-null  int64  
 3   year          33563 non-null  float64
 4   manufacturer  33563 non-null  object 
 5   model         33563 non-null  object 
 6   condition     33563 non-null  object 
 7   cylinders     33563 non-null  object 
 8   fuel          33563 non-null  object 
 9   odometer      33563 non-null  float64
 10  title_status  33563 non-null  object 
 11  transmission  33563 non-null  object 
 12  VIN           33563 non-null  object 
 13  drive         33563 non-null  object 
 14  size          33563 non-null  object 
 15  type          33563 non-null  object 
 16  paint_color   33563 non-null  object 
 17  state         33563 non-null  object 
dtypes: float64(2), int64(2), obj

In [35]:
#Drop features that are not relevant to predictions - ID, VIN, model, Title status, state (assuming a national enterprise)
vdc=vdc.drop('VIN',axis=1)
vdc=vdc.drop('id',axis=1)
vdc=vdc.drop('title_status',axis=1)
vdc=vdc.drop('state',axis=1)
vdc=vdc.drop('model',axis=1)
vdc.info()

<class 'pandas.core.frame.DataFrame'>
Index: 33563 entries, 126 to 426836
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        33563 non-null  object 
 1   price         33563 non-null  int64  
 2   year          33563 non-null  float64
 3   manufacturer  33563 non-null  object 
 4   condition     33563 non-null  object 
 5   cylinders     33563 non-null  object 
 6   fuel          33563 non-null  object 
 7   odometer      33563 non-null  float64
 8   transmission  33563 non-null  object 
 9   drive         33563 non-null  object 
 10  size          33563 non-null  object 
 11  type          33563 non-null  object 
 12  paint_color   33563 non-null  object 
dtypes: float64(2), int64(1), object(10)
memory usage: 3.6+ MB


In [37]:
#reset index
vdc=vdc.reset_index(drop=True)

In [39]:
vdc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33563 entries, 0 to 33562
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        33563 non-null  object 
 1   price         33563 non-null  int64  
 2   year          33563 non-null  float64
 3   manufacturer  33563 non-null  object 
 4   condition     33563 non-null  object 
 5   cylinders     33563 non-null  object 
 6   fuel          33563 non-null  object 
 7   odometer      33563 non-null  float64
 8   transmission  33563 non-null  object 
 9   drive         33563 non-null  object 
 10  size          33563 non-null  object 
 11  type          33563 non-null  object 
 12  paint_color   33563 non-null  object 
dtypes: float64(2), int64(1), object(10)
memory usage: 3.3+ MB


In [41]:
# Dropping type == 'bus' and cylinders == 'other'
vdc=vdc[vdc['type'] != 'bus']
vdc=vdc[vdc['cylinders'] != 'other']
vdc.info() #33525, 33437

<class 'pandas.core.frame.DataFrame'>
Index: 33437 entries, 0 to 33562
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        33437 non-null  object 
 1   price         33437 non-null  int64  
 2   year          33437 non-null  float64
 3   manufacturer  33437 non-null  object 
 4   condition     33437 non-null  object 
 5   cylinders     33437 non-null  object 
 6   fuel          33437 non-null  object 
 7   odometer      33437 non-null  float64
 8   transmission  33437 non-null  object 
 9   drive         33437 non-null  object 
 10  size          33437 non-null  object 
 11  type          33437 non-null  object 
 12  paint_color   33437 non-null  object 
dtypes: float64(2), int64(1), object(10)
memory usage: 3.6+ MB


In [43]:
#Change datatypes
vdc['year']=vdc['year'].astype('int32')
vdc['odometer']=vdc['odometer'].astype('int32')
vdc.info()

<class 'pandas.core.frame.DataFrame'>
Index: 33437 entries, 0 to 33562
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   region        33437 non-null  object
 1   price         33437 non-null  int64 
 2   year          33437 non-null  int32 
 3   manufacturer  33437 non-null  object
 4   condition     33437 non-null  object
 5   cylinders     33437 non-null  object
 6   fuel          33437 non-null  object
 7   odometer      33437 non-null  int32 
 8   transmission  33437 non-null  object
 9   drive         33437 non-null  object
 10  size          33437 non-null  object
 11  type          33437 non-null  object
 12  paint_color   33437 non-null  object
dtypes: int32(2), int64(1), object(10)
memory usage: 3.3+ MB


In [45]:
vdc=vdc.reset_index(drop=True)

In [47]:
# Ordinal parameter mapping
cylinders_mapping = {'3 cylinders': 3, '4 cylinders': 4, '5 cylinders': 5, '6 cylinders': 6, '8 cylinders': 8, '10 cylinders': 10}
vdc['cylinders'] = vdc['cylinders'].map(cylinders_mapping)
vdc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33437 entries, 0 to 33436
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        33437 non-null  object 
 1   price         33437 non-null  int64  
 2   year          33437 non-null  int32  
 3   manufacturer  33437 non-null  object 
 4   condition     33437 non-null  object 
 5   cylinders     33432 non-null  float64
 6   fuel          33437 non-null  object 
 7   odometer      33437 non-null  int32  
 8   transmission  33437 non-null  object 
 9   drive         33437 non-null  object 
 10  size          33437 non-null  object 
 11  type          33437 non-null  object 
 12  paint_color   33437 non-null  object 
dtypes: float64(1), int32(2), int64(1), object(9)
memory usage: 3.1+ MB


In [49]:
size_mapping = {'full-size': 4, 'mid-size': 3, 'compact': 2, 'sub-compact': 1}
vdc['size']=vdc['size'].map(size_mapping)
vdc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33437 entries, 0 to 33436
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        33437 non-null  object 
 1   price         33437 non-null  int64  
 2   year          33437 non-null  int32  
 3   manufacturer  33437 non-null  object 
 4   condition     33437 non-null  object 
 5   cylinders     33432 non-null  float64
 6   fuel          33437 non-null  object 
 7   odometer      33437 non-null  int32  
 8   transmission  33437 non-null  object 
 9   drive         33437 non-null  object 
 10  size          33437 non-null  int64  
 11  type          33437 non-null  object 
 12  paint_color   33437 non-null  object 
dtypes: float64(1), int32(2), int64(2), object(8)
memory usage: 3.1+ MB


In [51]:
condition_mapping = {'new': 6, 'like new': 5, 'excellent': 4, 'good': 3, 'fair': 2, 'salvage': 1}
vdc['condition']=vdc['condition'].map(condition_mapping)
vdc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33437 entries, 0 to 33436
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        33437 non-null  object 
 1   price         33437 non-null  int64  
 2   year          33437 non-null  int32  
 3   manufacturer  33437 non-null  object 
 4   condition     33437 non-null  int64  
 5   cylinders     33432 non-null  float64
 6   fuel          33437 non-null  object 
 7   odometer      33437 non-null  int32  
 8   transmission  33437 non-null  object 
 9   drive         33437 non-null  object 
 10  size          33437 non-null  int64  
 11  type          33437 non-null  object 
 12  paint_color   33437 non-null  object 
dtypes: float64(1), int32(2), int64(3), object(7)
memory usage: 3.1+ MB


In [53]:
vdc.shape

(33437, 13)

### Modeling

With your (almost?) final dataset in hand, it is now time to build some models.  Here, you should build a number of different regression models with the price as the target.  In building your models, you should explore different parameters and be sure to cross-validate your findings.

In [56]:
# One Hot encoding for nominal features
vdcm = pd.get_dummies(vdc, columns=['region', 'manufacturer', 'fuel', 'transmission', 'drive', 'type', 'paint_color'])

In [58]:
vdcm.shape

(33437, 471)

In [60]:
vdcm.head(50)

Unnamed: 0,price,year,condition,cylinders,odometer,size,region_SF bay area,region_abilene,region_akron / canton,region_albany,...,paint_color_brown,paint_color_custom,paint_color_green,paint_color_grey,paint_color_orange,paint_color_purple,paint_color_red,paint_color_silver,paint_color_white,paint_color_yellow
0,0,2018,5,6.0,68472,4,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
1,0,2019,5,6.0,69125,4,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
2,0,2018,5,6.0,66555,4,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
3,4000,2002,4,4.0,155000,2,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,9000,2008,4,4.0,56700,2,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
5,8950,2011,4,6.0,164000,4,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
6,98900,2001,3,8.0,20187,3,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
7,9400,2008,3,6.0,129473,4,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
8,7300,2007,3,6.0,181000,3,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
9,72900,2021,3,8.0,19129,4,False,False,False,False,...,False,False,False,True,False,False,False,False,False,False


In [62]:
vdcm.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33437 entries, 0 to 33436
Columns: 471 entries, price to paint_color_yellow
dtypes: bool(465), float64(1), int32(2), int64(3)
memory usage: 16.1 MB


In [64]:
vdcm.dropna()
vdcm = vdcm.replace([np.inf, -np.inf], np.nan).dropna()

In [66]:
vdcm=vdcm.astype(int)
vdcm.info()

<class 'pandas.core.frame.DataFrame'>
Index: 33432 entries, 0 to 33436
Columns: 471 entries, price to paint_color_yellow
dtypes: int64(471)
memory usage: 120.4 MB


In [68]:
vdcm.shape

(33432, 471)

In [70]:
your_column = 'price'  

# Compute the correlation matrix
corr_matrix = vdcm.corr()

# Get the correlation of 'your_column' with other features
corr_with_your_column = corr_matrix[your_column]

# Remove 'your_column' from the series
corr_with_your_column = corr_with_your_column[corr_with_your_column.index != your_column]

# Get top 15 features most correlated with 'your_column'
top_15_corr = corr_with_your_column.sort_values(ascending=False)[:15]
bottom_15_corr = corr_with_your_column.sort_values(ascending=True)[:15]

# prints the lists of top values
print(top_15_corr)
print(bottom_15_corr)

year                         0.403710
fuel_diesel                  0.398731
type_truck                   0.348961
cylinders                    0.319699
size                         0.296428
drive_4wd                    0.263990
manufacturer_ford            0.188957
paint_color_white            0.150505
type_pickup                  0.148358
manufacturer_ram             0.146870
condition                    0.125163
manufacturer_gmc             0.097373
region_anchorage / mat-su    0.095342
manufacturer_ferrari         0.091798
region_southwest VA          0.079619
Name: price, dtype: float64
fuel_gas               -0.358226
drive_fwd              -0.337100
odometer               -0.307926
type_sedan             -0.279015
manufacturer_honda     -0.104219
type_wagon             -0.102218
paint_color_silver     -0.097942
transmission_other     -0.095072
type_hatchback         -0.091733
manufacturer_nissan    -0.080935
manufacturer_hyundai   -0.075788
type_SUV               -0.075172
region

In [71]:
# Creates a correlation matrix plot heatmap

# Compute the correlation matrix
#corr = vdcm.corr()

# Set up the matplotlib figure
#plt.subplots(figsize=(20, 15))

#sns.heatmap(corr, cmap='coolwarm')
#plt.show()

In [72]:
X = vdcm.drop('price', axis = 1)
y = vdcm['price']
# data set splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=22)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

print(type(X_train), type(y_train)) #should be DataFrame and Series

(23402, 470)
(10030, 470)
(23402,)
(10030,)
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.series.Series'>


In [73]:
# Note: I always do the scaler after splitting sets so test set does not get data leakage
# I fit the StandardScaler on your training data only.

scaler5 = StandardScaler()

X_train_scaled = scaler5.fit_transform(X_train) # This step combines two other steps into one
# scaler.fit(X_train)
# X_train_scaled = scaler.transform(X_train)

# Note you should not fit the test data only the train data. Just transform the test data.
X_test_scaled = scaler5.transform(X_test)

# Convert NumPy arrays back to pandas DataFrame
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns)

### Evaluation

With some modeling accomplished, we aim to reflect on what we identify as a high quality model and what we are able to learn from this.  We should review our business objective and explore how well we can provide meaningful insight on drivers of used car prices.  Your goal now is to distill your findings and determine whether the earlier phases need revisitation and adjustment or if you have information of value to bring back to your client.

In [75]:
# Create a StandardScaler object
scaler = StandardScaler()

# Create a PCA object
pca = PCA(n_components=10)

# Create models
lasso = Lasso(max_iter=100000)
ridge = Ridge(max_iter=100000)
linear = LinearRegression()

# Create an RFE selector object using Linear Regression as the estimator
#rfe = RFE(estimator=linear, n_features_to_select=10, step=1)

# Make pipelines with RFE
lasso_pipe = make_pipeline(scaler, lasso)
ridge_pipe = make_pipeline(scaler, ridge)
#linear_pipe = make_pipeline(scaler, rfe, linear)
linear_pipe = make_pipeline(scaler, pca, linear)

# Define a list of alphas
alphas = [0.1, 1, 10, 100, 300, 400]

# Define a grid of hyperparameters
param_grid = {'lasso__alpha': alphas, 'ridge__alpha': alphas}

# Create a GridSearchCV object for each model
lasso_grid = GridSearchCV(lasso_pipe, param_grid={'lasso__alpha': alphas}, cv=5)

ridge_grid = GridSearchCV(ridge_pipe, param_grid={'ridge__alpha': alphas}, cv=5)

# Fit the models
lasso_result = lasso_grid.fit(X_train, y_train)
ridge_result = ridge_grid.fit(X_train, y_train)
linear_result = linear_pipe.fit(X_train, y_train)

# Print the best alpha and corresponding score for each model
print("Lasso: Best Alpha =", lasso_result.best_params_['lasso__alpha'], "Best Score =", lasso_result.best_score_)
print("Ridge: Best Alpha =", ridge_result.best_params_['ridge__alpha'], "Best Score =", ridge_result.best_score_)

# Predict on the test set
lasso_pred = lasso_result.predict(X_test)
ridge_pred = ridge_result.predict(X_test)
linear_pred = linear_result.predict(X_test)

# Calculate the RMSE for each model
lasso_rmse = np.sqrt(mean_squared_error(y_test, lasso_pred))
ridge_rmse = np.sqrt(mean_squared_error(y_test, ridge_pred))
linear_rmse = np.sqrt(mean_squared_error(y_test, linear_pred))

# Print the RMSE for each model
print("Lasso RMSE: ", lasso_rmse)
print("Ridge RMSE: ", ridge_rmse)
print("Linear Regression RMSE: ", linear_rmse)

# Calculate the R-squared for each model
lasso_r2 = r2_score(y_test, lasso_pred)
ridge_r2 = r2_score(y_test, ridge_pred)
linear_r2 = r2_score(y_test, linear_pred)

# Print the R-squared for each model
print("Lasso R-squared: ", lasso_r2)
print("Ridge R-squared: ", ridge_r2)
print("Linear Regression R-squared: ", linear_r2)


Lasso: Best Alpha = 10 Best Score = 0.5942880453487532
Ridge: Best Alpha = 400 Best Score = 0.5935094578331671
Lasso RMSE:  8431.423083705564
Ridge RMSE:  8437.379171783541
Linear Regression RMSE:  10192.3052621638
Lasso R-squared:  0.5971033968943155
Ridge R-squared:  0.596533971017123
Linear Regression R-squared:  0.4112421839364685


In [76]:
# Get the feature names
feature_names = X_train.columns

# Get the PCA object from the pipeline
pca = linear_pipe.named_steps['pca']

# List of top 5 principal components
top_pcs = [0, 1, 2, 3, 4]  # 0-indexed



# Get the coefficients of the best models
lasso_coef = lasso_result.best_estimator_.named_steps['lasso'].coef_
ridge_coef = ridge_result.best_estimator_.named_steps['ridge'].coef_
linear_coef = linear_pipe.named_steps['linearregression'].coef_

# Create a DataFrame for each model's coefficients
lasso_df = pd.DataFrame({'Feature': feature_names, 'Coefficient': lasso_coef})
ridge_df = pd.DataFrame({'Feature': feature_names, 'Coefficient': ridge_coef})
linear_df = pd.DataFrame({'Component': ["PC" + str(i+1) for i in range(len(linear_coef))], 'Coefficient': linear_coef})


# Sort the DataFrames by the absolute value of the coefficients and get the top 5
lasso_top5 = lasso_df.reindex(lasso_df.Coefficient.abs().sort_values(ascending=False).index).head(5)
ridge_top5 = ridge_df.reindex(ridge_df.Coefficient.abs().sort_values(ascending=False).index).head(5)
linear_top5 = linear_df.reindex(linear_df.Coefficient.abs().sort_values(ascending=False).index).head(5)

# Print the top 5 features for each model
print("Lasso Top 5 Features:\n", lasso_top5)
print("Ridge Top 5 Features:\n", ridge_top5)
print("Linear Regression Top 5 Principal Components:\n", linear_top5)

Lasso Top 5 Features:
           Feature  Coefficient
0            year  4729.994154
435   fuel_diesel  3624.469745
3        odometer -2488.321833
2       cylinders  1974.348033
359  region_tulsa -1524.885365
Ridge Top 5 Features:
          Feature  Coefficient
0           year  4615.811864
3       odometer -2484.241056
435  fuel_diesel  2037.916508
2      cylinders  1894.815801
437     fuel_gas -1668.738341
Linear Regression Top 5 Principal Components:
   Component  Coefficient
0       PC1  3257.239881
4       PC5 -3074.610964
6       PC7  1884.102042
5       PC6   636.583131
3       PC4   588.821082


### Deployment

Now that we've settled on our models and findings, it is time to deliver the information to the client.  You should organize your work as a basic report that details your primary findings.  Keep in mind that your audience is a group of used car dealers interested in fine tuning their inventory.

The mileage of a vehicle is the top factor in reducing its sales price; the higher the mileage, the lower the sales price.
A vehicle with more cylinders and a larger engine will have an increased sales price.
The year of the vehicle affects its sales price; the newer the vehicle, the higher the sales price.
Front-wheel drive is a significant factor in decreasing a vehicle's sales price; vehicles without it tend to have higher sales prices.
The condition of a vehicle significantly affects its price; the better the condition, the higher the sales price.