# What drives the price of a car?

![](images/kurt.jpeg)

**OVERVIEW**

In this application, you will explore a dataset from kaggle. The original dataset contained information on 3 million used cars. The provided dataset contains information on 426K cars to ensure speed of processing.  Your goal is to understand what factors make a car more or less expensive.  As a result of your analysis, you should provide clear recommendations to your client -- a used car dealership -- as to what consumers value in a used car.

### CRISP-DM Framework

<center>
    <img src = images/crisp.png width = 50%/>
</center>


To frame the task, throughout our practical applications we will refer back to a standard process in industry for data projects called CRISP-DM.  This process provides a framework for working through a data problem.  Your first step in this application will be to read through a brief overview of CRISP-DM [here](https://mo-pcco.s3.us-east-1.amazonaws.com/BH-PCMLAI/module_11/readings_starter.zip).  After reading the overview, answer the questions below.

### Business Understanding

From a business perspective, we are tasked with identifying key drivers for used car prices.  In the CRISP-DM overview, we are asked to convert this business framing to a data problem definition.  Using a few sentences, reframe the task as a data task with the appropriate technical vocabulary. 

### Predict used car pricing
Base on region/price/year/manufacturer/model/condition/cylinders/fule/odometer/title status/VIN/drive/size/type/paint_condition/state features to predict used car pricing
Select the approriate model to minimize prediction errors


In [187]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, OneHotEncoder, OrdinalEncoder
from sklearn.preprocessing import StandardScaler
from sklearn import set_config
from sklearn.compose import make_column_transformer, make_column_selector
from sklearn.model_selection import train_test_split,GridSearchCV

In [188]:
from sklearn.metrics import mean_squared_error 
from sklearn import set_config
set_config(display="diagram") #setting this will display your pipelines as seen above

In [189]:
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import cross_val_score
from sklearn.decomposition import PCA

### Data Understanding

After considering the business understanding, we want to get familiar with our data.  Write down some steps that you would take to get to know the dataset and identify any quality issues within.  Take time to get to know the dataset and explore what information it contains and how this could be used to inform your business understanding.

In [190]:
## load data to a dataframe to check the data inforamtion
## check the columns and datatype with info()
## check sine statistics with describe()
## check what the data look like by using head()
car=pd.read_csv('data/vehicles.csv')
print(car.info())
print(car.describe())
print(car.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 426880 entries, 0 to 426879
Data columns (total 18 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   id            426880 non-null  int64  
 1   region        426880 non-null  object 
 2   price         426880 non-null  int64  
 3   year          425675 non-null  float64
 4   manufacturer  409234 non-null  object 
 5   model         421603 non-null  object 
 6   condition     252776 non-null  object 
 7   cylinders     249202 non-null  object 
 8   fuel          423867 non-null  object 
 9   odometer      422480 non-null  float64
 10  title_status  418638 non-null  object 
 11  transmission  424324 non-null  object 
 12  VIN           265838 non-null  object 
 13  drive         296313 non-null  object 
 14  size          120519 non-null  object 
 15  type          334022 non-null  object 
 16  paint_color   296677 non-null  object 
 17  state         426880 non-null  object 
dtypes: f

### Data Preparation

After our initial exploration and fine tuning of the business understanding, it is time to construct our final dataset prior to modeling.  Here, we want to make sure to handle any integrity issues and cleaning, the engineering of new features, any transformations that we believe should happen (scaling, logarithms, normalization, etc.), and general preparation for modeling with `sklearn`. 

In [191]:
## Data cleaning to drop all NaN data
car = car.dropna()
print(car.isnull().sum()) # check to see if any null values in any columns.
print(car.info())
car.head()

id              0
region          0
price           0
year            0
manufacturer    0
model           0
condition       0
cylinders       0
fuel            0
odometer        0
title_status    0
transmission    0
VIN             0
drive           0
size            0
type            0
paint_color     0
state           0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
Int64Index: 34868 entries, 126 to 426836
Data columns (total 18 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            34868 non-null  int64  
 1   region        34868 non-null  object 
 2   price         34868 non-null  int64  
 3   year          34868 non-null  float64
 4   manufacturer  34868 non-null  object 
 5   model         34868 non-null  object 
 6   condition     34868 non-null  object 
 7   cylinders     34868 non-null  object 
 8   fuel          34868 non-null  object 
 9   odometer      34868 non-null  float64
 10  title_status  34868 non-null  

Unnamed: 0,id,region,price,year,manufacturer,model,condition,cylinders,fuel,odometer,title_status,transmission,VIN,drive,size,type,paint_color,state
126,7305672709,auburn,0,2018.0,chevrolet,express cargo van,like new,6 cylinders,gas,68472.0,clean,automatic,1GCWGAFP8J1309579,rwd,full-size,van,white,al
127,7305672266,auburn,0,2019.0,chevrolet,express cargo van,like new,6 cylinders,gas,69125.0,clean,automatic,1GCWGAFP4K1214373,rwd,full-size,van,white,al
128,7305672252,auburn,0,2018.0,chevrolet,express cargo van,like new,6 cylinders,gas,66555.0,clean,automatic,1GCWGAFPXJ1337903,rwd,full-size,van,white,al
215,7316482063,birmingham,4000,2002.0,toyota,echo,excellent,4 cylinders,gas,155000.0,clean,automatic,JTDBT123520243495,fwd,compact,sedan,blue,al
219,7316429417,birmingham,2500,1995.0,bmw,525i,fair,6 cylinders,gas,110661.0,clean,automatic,WBAHD6322SGK86772,rwd,mid-size,sedan,white,al


In [192]:
# Check region category for the object columns
#print(car['region'].unique())
#print(car['region'].nunique())
#print(car['manufacturer'].nunique())
print(car[{'region','manufacturer','model','condition','condition','cylinders','fuel','title_status','VIN','size','paint_color','state'}].nunique())
print(car['state'].unique())

region            392
VIN             21938
size                4
title_status        6
paint_color        12
condition           6
cylinders           8
state              51
model            5139
fuel                5
manufacturer       41
dtype: int64
['al' 'ak' 'az' 'ar' 'ca' 'co' 'ct' 'dc' 'de' 'fl' 'ga' 'hi' 'id' 'il'
 'in' 'ia' 'ks' 'ky' 'la' 'me' 'md' 'ma' 'mi' 'mn' 'ms' 'mo' 'mt' 'nc'
 'ne' 'nv' 'nj' 'nm' 'ny' 'nh' 'nd' 'oh' 'ok' 'or' 'pa' 'ri' 'sc' 'sd'
 'tn' 'tx' 'ut' 'vt' 'va' 'wa' 'wv' 'wi' 'wy']


In [193]:
# Drop ID and VIN, because these will not affect pricing. 
car = car.drop('id',axis=1)
car = car.drop('VIN',axis=1)


In [194]:
#use OneHotEncoder to convert the objects 
car['cylinders']=car['cylinders'].str.replace('cylinders','',regex=False)
car['cylinders']=pd.to_numeric(car['cylinders'],errors='coerce',downcast='integer')
car.head()
car.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 34868 entries, 126 to 426836
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   region        34868 non-null  object 
 1   price         34868 non-null  int64  
 2   year          34868 non-null  float64
 3   manufacturer  34868 non-null  object 
 4   model         34868 non-null  object 
 5   condition     34868 non-null  object 
 6   cylinders     34772 non-null  float64
 7   fuel          34868 non-null  object 
 8   odometer      34868 non-null  float64
 9   title_status  34868 non-null  object 
 10  transmission  34868 non-null  object 
 11  drive         34868 non-null  object 
 12  size          34868 non-null  object 
 13  type          34868 non-null  object 
 14  paint_color   34868 non-null  object 
 15  state         34868 non-null  object 
dtypes: float64(3), int64(1), object(12)
memory usage: 4.5+ MB


In [195]:
## convert all objects using OneHotEncoder, then standardize the numerical data with standardscaler
## Note that if use feature array, make sure don't add another bracket.
#features =['region','manufacturer','model','condition','condition','fuel','title_status','size','paint_color','type','state']
#col_transformer= make_column_transformer((OneHotEncoder(sparse_output= True,drop ='if_binary'),features),
                                        #(StandardScaler(),make_column_selector(dtype_include='number')))
#col_transformer= make_column_transformer((OneHotEncoder(sparse_output= True,drop ='if_binary'),make_column_selector(dtype_include='object')),
                                        #(StandardScaler(),make_column_selector(dtype_include='number')))
col_transformer= make_column_transformer((OneHotEncoder(sparse_output= False,drop ='if_binary'),make_column_selector(dtype_include='object')),
                                        (StandardScaler(),make_column_selector(dtype_include='number')))
car_transformed=col_transformer.fit_transform(car)
num_features=col_transformer.transformers_[1][1].get_feature_names_out()
cat_features=col_transformer.transformers_[0][1].get_feature_names_out(col_transformer.transformers_[0][2])
all_features = list(cat_features) + list(num_features)

## Convert back to dataframe
transformed_car = pd.DataFrame(car_transformed, columns=all_features)
transformed_car = transformed_car.dropna()
#print(transformed_car.head())
## prepare the feature dataset as X_car and price as the y_car 
X_car = transformed_car.drop('price',axis =1)
y_car = transformed_car['price']
X_car.head()


Unnamed: 0,region_SF bay area,region_abilene,region_akron / canton,region_albany,region_albuquerque,region_altoona-johnstown,region_amarillo,region_ames,region_anchorage / mat-su,region_ann arbor,...,state_ut,state_va,state_vt,state_wa,state_wi,state_wv,state_wy,year,cylinders,odometer
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.952655,-0.018661,-0.389632
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.092276,-0.018661,-0.383096
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.952655,-0.018661,-0.40882
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-1.281286,-1.20273,0.476451
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-2.258636,-0.018661,0.032649


### Modeling

With your (almost?) final dataset in hand, it is now time to build some models.  Here, you should build a number of different regression models with the price as the target.  In building your models, you should explore different parameters and be sure to cross-validate your findings.

In [196]:
# reduce the features for easier processing
#print(components.shape)
X_train, X_test,y_train,y_test = train_test_split(X_car,y_car, test_size = 0.3)
#print(X_train)

In [197]:
# Build a pipeline with PCA to reduce the features to 5
# Use Ridge model 
train_preds =''
test_preds =''
train_mse_1=''
test_mse_1=''
pipe = Pipeline([('pca',PCA(n_components=5)),('model', Ridge())])
pipe.fit(X_train,y_train)
train_preds =pipe.predict(X_train)
test_preds = pipe.predict(X_test)
train_mse_1 = mean_squared_error(train_preds,y_train)
test_mse_1 = mean_squared_error(test_preds, y_test)
print(f'Ridge Train MSE: {train_mse_1}')
print(f'Ridge Test MSE: {test_mse_1}')

## To see the explained variance ration of each principal component
explained_variance = pipe.named_steps['pca'].explained_variance_ratio_
print(f"explained variance ratio of the PCA components:{explained_variance}")

## calculate cumulative explained varaince ratio
cumulative_explained_variance = np.cumsum(explained_variance)

# Determine the number of components needed to reach 95% variance
num_components = np.argmax(cumulative_explained_variance >= 0.95) + 1
print("Cumulative Explained Variance:", cumulative_explained_variance)
print("Number of components to reach 95% variance:", num_components)

components_sim= pipe.named_steps['pca'].components_

feature_names=X_train.columns
components_df=pd.DataFrame(components_sim, columns=feature_names)
#print(f"components:\n",components_df)

## Find the top 5 contributed features for each principal components
top_5_features_pc1 =components_df.loc[0].abs().nlargest(5).index.tolist()
top_5_features_pc2 =components_df.loc[1].abs().nlargest(5).index.tolist()
top_5_features_pc3 =components_df.loc[2].abs().nlargest(5).index.tolist()
top_5_features_pc4 =components_df.loc[3].abs().nlargest(5).index.tolist()
top_5_features_pc5 =components_df.loc[4].abs().nlargest(5).index.tolist()
print("Top 5 contributed features in PC1:", top_5_features_pc1)
print("Top 5 contributed features in PC2:", top_5_features_pc2)
print("Top 5 contributed features in PC3:", top_5_features_pc3)
print("Top 5 contributed features in PC4:", top_5_features_pc4)
print("Top 5 contributed features in PC4:", top_5_features_pc5)
 
##Retrieved the PCA compoent names based on feature contributions
pca_step=pipe.named_steps['pca']
# get original feature names
feature_names=X_train.columns
# create PCA component names based on teh feature contributions
component_names =[]
for i, component in enumerate(pca_step.components_):
    # Combine feature names with their corresponding weight in the component
    component_name =f"PC{i+1}:"+" ".join(f"{round(weight,5)}*{feature}"for weight, feature in zip(component, feature_names))
    component_names.append(component_name)

print(type(component_names))

for name in component_names:
    print(name)
    
# Function to extract weights for a specific component
def get_feature_weights(pca_step, feature_names, component_index):
    component_weights = pca_step.components_[component_index]
    return {feature: weight for feature, weight in zip(feature_names, component_weights)}

# Get weights for the first 5 component
first_component_weights = get_feature_weights(pca_step, feature_names, 0)
second_component_weights = get_feature_weights(pca_step, feature_names, 1)
# Create a DataFrame to display the weights and features
df_weights_1 = pd.DataFrame(list(first_component_weights.items()), columns=['feature', 'weight'])
df_weights_2 = pd.DataFrame(list(second_component_weights.items()), columns=['feature', 'weight'])
#print(df_weights)

Top_5_features = ['cylinders', 'odometer', 'year', 'size_full-size', 'drive_fwd']
selected_features_1 = df_weights_1.loc[df_weights['feature'].isin(Top_5_features)]
print(selected_features_1)
selected_features_2 = df_weights_2.loc[df_weights['feature'].isin(Top_5_features)]
print(selected_features_2)

Ridge Train MSE: 0.7200699428583004
Ridge Test MSE: 0.7222261558912321
explained variance ratio of the PCA components:[0.13620714 0.10639348 0.06532849 0.04230907 0.03696261]
Cumulative Explained Variance: [0.13620714 0.24260062 0.30792911 0.35023817 0.38720078]
Number of components to reach 95% variance: 1
Top 5 contributed features in PC1: ['cylinders', 'year', 'drive_fwd', 'size_full-size', 'odometer']
Top 5 contributed features in PC2: ['year', 'odometer', 'cylinders', 'drive_4wd', 'drive_fwd']
Top 5 contributed features in PC3: ['odometer', 'year', 'drive_4wd', 'drive_rwd', 'type_SUV']
Top 5 contributed features in PC4: ['condition_excellent', 'drive_4wd', 'condition_good', 'type_SUV', 'paint_color_white']
Top 5 contributed features in PC4: ['condition_excellent', 'condition_good', 'drive_4wd', 'type_SUV', 'cylinders']
<class 'list'>
PC1:-0.00037*region_SF bay area -1e-05*region_abilene -0.00153*region_akron / canton -0.00071*region_albany 6e-05*region_albuquerque -4e-05*region_al

In [198]:
# Build a pipeline with PCA to reduce the features to 5 and Lasso regression
# Use Lasso model 
train_preds =''
test_preds =''
train_mse_2=''
test_mse_2=''
pipe = Pipeline([('pca',PCA(n_components=5)),('model', Lasso())])
pipe.fit(X_train,y_train)
train_preds_2 =pipe.predict(X_train)
test_preds_2 = pipe.predict(X_test)
train_mse_2 = mean_squared_error(train_preds_2,y_train)
test_mse_2 = mean_squared_error(test_preds_2, y_test)
print(f'lasso Train MSE: {train_mse_2}')
print(f'lasso Test MSE: {test_mse_2}')

lasso Train MSE: 1.007109376664994
lasso Test MSE: 0.9791042247873808


In [199]:
# Build a pipeline with PCA to reduce the features to 100
# Use Ridge model 
# Check cumulative explained variance ratio to see how many PCA components needed to achieve 80%.  
# From the results, looks like 69 components are needed to achieve 80% variance.  So 100 PCA components are okay, which can achieve 84%
# Convert back the coefficients from PCA and Ridge to the orignal feature coefficients to check the top 10 features which could affect the pricing
train_preds =''
test_preds =''
train_mse=''
test_mse=''
pipe = Pipeline([('pca',PCA(n_components=100)),('model', Ridge())])
pipe.fit(X_train,y_train)
train_preds =pipe.predict(X_train)
test_preds = pipe.predict(X_test)
train_mse = mean_squared_error(train_preds,y_train)
test_mse = mean_squared_error(test_preds, y_test)
print(f'Ridge Train with PCA 100 MSE: {train_mse}')
print(f'Ridge Test with PCA 100 MSE: {test_mse}')

## To see the explained variance ration of each principal component
explained_variance = pipe.named_steps['pca'].explained_variance_ratio_
#print(f"explained variance ratio of the PCA components:{explained_variance}")

## calculate cumulative explained varaince ratio
cumulative_explained_variance = np.cumsum(explained_variance)

# Determine the number of components needed to reach 80% variance
num_components = np.argmax(cumulative_explained_variance >= 0.80) + 1
print("Cumulative Explained Variance:", cumulative_explained_variance)
print("Number of components to reach 95% variance:", num_components)

# Ridge coefficients for principal components
ridge_coefs_pcs = pipe.named_steps['model'].coef_
components_sim= pipe.named_steps['pca'].components_
#print("ridge coef:",ridge_coefs_pcs)
ridge_coefs_pcs_df = pd.DataFrame({'ridge_coef_PC':ridge_coefs_pcs})
print (ridge_coefs_pcs_df.head())
# Transform back to original feature space
original_feature_coefs = np.dot(components_sim.T, ridge_coefs_pcs)

# Create a DataFrame for easier interpretation
feature_names=X_train.columns
coef_df = pd.DataFrame({
    'Feature': feature_names,
    'Coefficient': original_feature_coefs})
print("Original Coefficient dataframe:\n")
print(coef_df)

# Sort the DataFrame by the absolute value of the coefficients
coef_df['AbsCoefficient'] = coef_df['Coefficient'].abs()
df_top_features = coef_df.sort_values(by='AbsCoefficient', ascending=False).head(10)
print(" \n Top 10 features which impact pricing:\n",df_top_features[['Feature', 'Coefficient']])

# convert PCA component to a dataframe with feature names for columns
components_df=pd.DataFrame(components_sim, columns=feature_names)
#print(f"components:\n",components_df)



Ridge Train with PCA 100 MSE: 0.5583157725223507
Ridge Test with PCA 100 MSE: 0.5554805166540051
Cumulative Explained Variance: [0.13620714 0.24260062 0.30792911 0.35023817 0.38720078 0.41650957
 0.44111432 0.46387536 0.48155883 0.49835465 0.51375505 0.52848628
 0.54258834 0.55522093 0.56751224 0.57915958 0.58982751 0.60012501
 0.6098522  0.61932178 0.62816262 0.63652396 0.64446995 0.65177417
 0.65716392 0.66232886 0.66741432 0.67224604 0.6769881  0.68153077
 0.68562356 0.68966875 0.69369701 0.69763315 0.7014573  0.70519279
 0.70887906 0.71253339 0.71611029 0.71950675 0.72289222 0.72620261
 0.72943866 0.73258607 0.73562473 0.73863954 0.74156186 0.74445574
 0.74727576 0.75001571 0.75272555 0.75537598 0.7579618  0.76054153
 0.76310801 0.76554723 0.76795563 0.77032642 0.77264852 0.77491294
 0.77715542 0.77937845 0.78153378 0.78367297 0.7858003  0.78792097
 0.79001364 0.79201145 0.79396517 0.7958536  0.79764844 0.79938137
 0.8011102  0.80275974 0.80438427 0.80597454 0.80752661 0.80898477
 

### Evaluation

With some modeling accomplished, we aim to reflect on what we identify as a high quality model and what we are able to learn from this.  We should review our business objective and explore how well we can provide meaningful insight on drivers of used car prices.  Your goal now is to distill your findings and determine whether the earlier phases need revisitation and adjustment or if you have information of value to bring back to your client.

In [200]:
## In the first model, used PCA = 5 and Ridge model.  Checked mean squared error

train_mse_1 = mean_squared_error(train_preds,y_train)
test_mse_1 = mean_squared_error(test_preds, y_test)
print(f'Ridge Train MSE: {train_mse_1}')
print(f'Ridge Test MSE: {test_mse_1}')

Ridge Train MSE: 0.5583157725223507
Ridge Test MSE: 0.5554805166540051


In [201]:
## In the second model, used PCA =5 and Lasso model
train_mse_2 = mean_squared_error(train_preds_2,y_train)
test_mse_2 = mean_squared_error(test_preds_2, y_test)
print(f'lasso Train MSE: {train_mse_2}')
print(f'lasso Test MSE: {test_mse_2}')

lasso Train MSE: 1.007109376664994
lasso Test MSE: 0.9791042247873808


## 
From the MSE, looks like Ridge model performs better.
##
From PCA variance percentage report, looks like 5 PCA cumulative variance is < 95%.  So increased the PCA components to 100 with Rige Model

In [202]:
## In the third model, used PCA = 100 and Ridge model
train_mse = mean_squared_error(train_preds,y_train)
test_mse = mean_squared_error(test_preds, y_test)
print(f'Ridge Train with PCA 100 MSE: {train_mse}')
print(f'Ridge Test with PCA 100 MSE: {test_mse}')


Ridge Train with PCA 100 MSE: 0.5583157725223507
Ridge Test with PCA 100 MSE: 0.5554805166540051


##
By increasing the PCA components from 5 to 100 and with Ridge model, the MSE of test was reduced from 0.7 to 0.5.  
Further increasing PCA could have better fit.
Evaluted the top 5 features which contributed to the model the best
##

In [203]:
# Sort the DataFrame by the absolute value of the coefficients
coef_df['AbsCoefficient'] = coef_df['Coefficient'].abs()
df_top_features = coef_df.sort_values(by='AbsCoefficient', ascending=False).head(10)
print(df_top_features[['Feature', 'Coefficient']])



                        Feature  Coefficient
5578                fuel_diesel     0.558626
5591         transmission_other    -0.547098
5660                   state_ok    -0.453526
5590        transmission_manual     0.388806
5601           type_convertible     0.378073
356                region_tulsa    -0.366162
5611                 type_wagon    -0.323382
5580                   fuel_gas    -0.304967
5624                   state_ak     0.298778
8     region_anchorage / mat-su     0.297470


### Deployment

Now that we've settled on our models and findings, it is time to deliver the information to the client.  You should organize your work as a basic report that details your primary findings.  Keep in mind that your audience is a group of used car dealers interested in fine tuning their inventory.

### Top 10 features impacing the pricing

With >80% explained variance simulation, the top 10 features which impact the pricing are as followings

| Feature | Coefficient |
|----------|----------|
| fuel_diesel|0.554380|
|transmission_other |-0.523171| 
|state_ok |-0.479432| 
|type_convertibl |0.434469| 
|transmission_manual |0.360975| 
|type_wagon |-0.328941| 
|region_tulsa |-0.328314| 
|fuel_gas|-0.523171| 
|condition_like new |0.308291| 
|state_ak  | 0.290623| 
|region_anchorage / mat-su | 0.297470| 


###

From the original coefficient analysis, the positive impact features are as follows:

Diesel_Fuel, Convertible type, manual transmission, condition like new, state_ak,region_anchorage / mat-su.  The importance are in sequence of the descriptions. The price is positively correlated with the features. To achieve higher pricing, then stock the inventory with the above features.

The negative impact features are other transmission, gas fuel, state_ok, wagon type, tulsa region.  The importance are in sequence of the descriptions. he price is negatively correlated with the features. To achieve higher pricing, then don't stock the inventory with the above features.