#  Used Car Dealer

**Author:**
[Anthony Strittmatter](http://www.anthonystrittmatter.com)

You take the role of a used car dealer. You have a database with prices and characteristics of used cars you sold previously (*use\_car\_database.csv*). You want to use this data to predict the prices of the used cars that are currently in your garage. For these cars you have a data set containing the characteristics (*new\_used\_cars.csv*), but not the prices. The characteristics that you observe in the data sets are described in the file *variable\_description.xlsx*.

Your task is to predict the used car prices of the used cars that are currently in your garage using any method of your choice.

## Load Packages and Data

In [1]:
########################  Load Packages and Data  ########################

# Load packages
library(rpart)
library(rpart.plot)
library(grf)
library(glmnet)
library(nnet)

# Load data
cars <- read.csv("used_car_database.csv", sep = ",")
new_cars <- read.csv("new_used_cars.csv", sep = ",")

print('Packages and data successfully loaded.')

#############################################################################

"package 'glmnet' was built under R version 3.6.3"Loading required package: Matrix
Loaded glmnet 4.1-1


[1] "Packages and data successfully loaded."


## Inspect Data

In [2]:
########################  Describe Old Data  ########################

# Print first few rows of old data
head(cars)

# Number of observations
print(paste0('Old data: ',nrow(cars),' observations'))

######################################################################

id,sales_price,mercedes_c,vw_golf,vw_passat,bmw_320,opel_astra,diesel,other_car_owner,pm_green,...,mile_40,mile_50,mile_100,mile_150,mileage,mileage2,age_3,age_6,age_car_years,age_car_years2
52927,12.06,0,0,1,0,0,1,2,1,...,1,1,1,0,149.0,22201.0,1,1,9.0,81.0
49185,21.121,0,0,1,0,0,1,1,1,...,1,0,0,0,42.0,1764.0,0,0,1.2,1.44
64639,31.7,0,0,0,1,0,0,1,1,...,0,0,0,0,14.3,204.49,0,0,1.7,2.89
11372,14.24,1,0,0,0,0,0,2,1,...,1,1,0,0,56.161,3154.058,1,1,6.4,40.96
7593,20.35,0,0,1,0,0,1,1,1,...,1,1,1,0,101.482,10298.596,1,0,4.3,18.49
89992,11.679,0,1,0,0,0,0,2,1,...,1,1,0,0,71.544,5118.544,1,1,7.0,49.0


[1] "Old data: 48976 observations"


In [3]:
########################  Describe Old Data  ########################

# Print first few rows of new data
head(new_cars)

# Number of observations
print(paste0('New data: ',nrow(new_cars),' observations'))

######################################################################

id,mercedes_c,vw_golf,vw_passat,bmw_320,opel_astra,diesel,other_car_owner,pm_green,private_seller,...,mile_40,mile_50,mile_100,mile_150,mileage,mileage2,age_3,age_6,age_car_years,age_car_years2
104720,0,1,0,0,0,0,0,0,1,...,1,0,0,0,40.0,1600.0,0,0,2.5,6.25
32761,0,0,1,0,0,1,0,0,1,...,1,1,1,0,139.8,19544.04,1,0,5.7,32.49
32601,0,0,1,0,0,0,0,1,0,...,1,1,1,0,134.0,17956.0,1,1,8.9,79.21
53732,0,0,1,0,0,1,2,1,0,...,0,0,0,0,34.85,1214.5225,0,0,1.6,2.56
3655,0,1,0,0,0,1,1,1,0,...,1,1,0,0,90.142,8125.5802,1,0,4.9,24.01
98140,1,0,0,0,0,1,1,1,0,...,0,0,0,0,31.299,979.6274,0,0,1.2,1.44


[1] "New data: 1024 observations"


## Prepare Data

In [4]:
########################  Data Preparation  ########################

# Generate outcome and control variables
y <- as.matrix(cars[,2])
x <- as.matrix(cars[,-c(1:2)])
new_x <- as.matrix(new_cars[,-1])

print('Data is prepared.')

#############################################################################

[1] "Data is prepared."


**$\Rightarrow$ It is possible to add non-linear and interaction terms.**

## Generate Training and Test Sample

In [26]:
########################  Training and Test Samples  ########################

set.seed(???)

# Generate variable with the rows in training data
???

print('Training and test samples created.')

#############################################################################

[1] "Training and test samples created."


## Predict Orange Juice Prices in Training Sample and Assess Model in Test Sample

### Lasso, Ridge, Elastic Net

In [27]:
########################  LASSO, Ridge, Elastic Net  ##############################


                                   
################################################################

[1] "R-squared Penalized Regression: 0.781"


### Tree

In [28]:
######################  Regression Tree  #######################



################################################################

[1] "R-squared Tree: 0.849"


### Random Forest

In [29]:
########################  Random Forest  #######################


################################################################

[1] "R-squared Forest: 0.876"


## Select Favorite Model and Extrapolate to New Data

In [30]:
########################  Out-of-Sample Prediction  #######################

# Fitted values
new_prediction <- ???

print('Out-of-sample sales are predicted.')

###########################################################################

[1] "Out-of-sample sales are predicted."


## Store Out-of-Sample Predictions

In [32]:
########################  Store Results  #######################

id_new <- as.matrix(new_cars$id)

# Replace ??? with your name
write.csv(cbind(id_new,new_prediction),"???.csv")

print('File is stored.')
print('Send your results to anthony.strittmatter@ensae.fr')

################################################################

[1] "File is stored."
[1] "Send your results to anthony.strittmatter@ensae.fr"
