<div><img src="http://www.stevinsonauto.net/assets/Icon_Brake.png", width=270, height=270, align = 'right'> 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/640px-IBM_logo.svg.png", width = 90, height = 90, align = 'right', style="margin:0px 25px"></div>

# Classifying Driver Type with Brake Events
##### By Rafi Kurlansik and Ross Lewis

________________________________

**Table of contents**
    
* [Problem Statement](#problemStatement)
    
* [Exploratory Data Analysis](#eda)

* [Modeling](#ml)
    
* [Data and Model Export](#export)

* [Conclusion](#conclusion)

______________________

<a id='problemStatement'></a>

### Problem Statement

The service bays at dealerships have seen an increase in warranty claims related to brakes.  Using historical telematics data of known driver types, can we classify the driving style of customers making warranty claims?

________
<a id='eda'></a>

### Exploratory Data Analysis

In [None]:
# Access 'historical_brake_eventsGM.csv' data file from the project.
brakeEventDF <- as.data.frame(loadDataFrameFromFile(pc, "historical_brake_eventsGM.csv"))

paste("brakeEventDF Type: ", class(brakeEventDF))

In [None]:
head(brakeEventDF)

We see VINs, the type or classification of the brake event, and then a series of columns related to the brake event itself.  

#### Summary Statistics

Let's begin exploring the data by looking at some summary statistics of these events by both type and road type.

In [None]:
install.packages("magrittr")
library(magrittr)

In [None]:
install.packages("dplyr")
library(dplyr)

In [None]:
print("Summary Statistics by Event Type")
group_by(brakeEventDF, type) %>% summarise(avg_braketime = mean(brake_time_sec), avg_brakedistance = mean(brake_distance_ft), avg_brakescore = mean(braking_score), abs_events = sum(abs_event))

print("Summary Statistics by Event Type and Road Type")
aggDF <- group_by(brakeEventDF, type, road_type) %>% summarise(avg_braketime = mean(brake_time_sec), avg_brakedistance = mean(brake_distance_ft), avg_brakescore = mean(braking_score), abs_events = sum(abs_event))

aggDF

Looks like aggressive drivers have lower brake times, distances, and scores.  Distracted drivers have more ABS events.  Quality drivers are on the other side of the spectrum.  

#### Visualization

We can see these relationships visually using the open source R package, ggplot2.  Let's examine the following three relationships:

* Brake Time by Type
* Brake Distance by Braking Score
* ABS Events by Type and Road Type

In [None]:
install.packages("ggplot2")
library(ggplot2)

In [None]:
options(repr.plot.width = 12, repr.plot.height = 3)

ggplot(brakeEventDF, aes(x = brake_time_sec, color = type, fill = type)) + 
    geom_density(alpha = 0.5) +
    labs(x = "Braking Time (seconds)", y = "Observation Density", title = "Distribution of Brake Time by Type") +
    theme_minimal()

ggplot(sample_frac(brakeEventDF, .33), aes(x = brake_distance_ft, y = braking_score)) + 
    geom_point(aes(shape = road_type, color = type), size = 2) +
    scale_shape_manual(values=c(3, 5, 8)) +
    geom_point(color = 'black', size = 0.35, aes(shape = road_type)) +
    labs(x = "Braking Distance (feet)", y = "Braking Score", title = "Braking Score by Distance (ft)") +
    theme_minimal()

ggplot(aggDF, aes(x = road_type, y = abs_events)) + 
    geom_bar(aes(fill = type), stat = 'identity') + 
    coord_flip() +
    labs(x = "# of ABS Events", y = "Road Type", title = "ABS Events by Road Type and Event Type") +
    theme_minimal()

After visually inspecting the data, we see some clear grouping along the lines of event type, road type, and number of ABS events.  There is also an obvious linear relationship between brake score and brake time.  This historical data is clean enough to build a model from.

__________

<a id='ml'></a>


### Modeling

We can train a decision tree model on the historical brake event data.  It will learn the relationship between the various quantitative variables and the type of brake event, allowing us to classify new records as they come in.  In this case, we will be checking the behavior of drivers making warranty claims.

We have another data set to test the model on, so in this case we don't need to split into train and test sets.  The following cell trains the model and tests its accuracy.

In [None]:
install.packages("caTools")
library(caTools)
install.packages("randomForest")
library(randomForest)

In [None]:
## Preserve VINs to add on after modeling
#vins <- brakeEventDF$VIN

## set the seed to make your partition reproductible
set.seed(22)
brakeEventDF$spl = sample.split(brakeEventDF,SplitRatio=0.7)

train=subset(brakeEventDF, brakeEventDF$spl==TRUE)
test=subset(brakeEventDF, brakeEventDF$spl==FALSE)

## Select columns for modeling
trainingDF <- select(train, type, brake_time_sec, brake_distance_ft, road_type, braking_score, 
                 brake_pressure20pct, brake_pressure40pct, brake_pressure60pct,
                 brake_pressure80pct, brake_pressure100pct, abs_event, travel_speed)

brakeEventModel <- randomForest(type ~ ., 
                                data = trainingDF,
                                ntree = 500,
                                proximity = TRUE)

## Load test set
#testingDF <-  read.csv(file = getObjectStorageFileWithCredentials_d7a568f8ac534bc48834f0e1762068f9("DataScienceforAutomotiveWorkshop", "testdata.csv"))

print("Confusion Matrix for Testing Data:")
table(predict(brakeEventModel, select(test, -VIN, -type)), test$type)

The accuracy on this model is strong enough to give us some confidence in using it on new data.  

_________

<a id='export'></a>

### Model Export

We can export the decision tree model to Object Storage for use in our Shiny app.

In [None]:
saveRDS(object = brakeEventModel, file = "brakeEventModel.rds")

The model has successfully been written to Object Storage.  

________

### Conclusion

In this notebook we have quickly explored and visualized brake event data using R.  We've also built, tested, and exported a decision tree model that can be embedded in applications or used to create reports.  To see the Shiny app where this model is used on customers coming into the service bay, click on 'Tools --> RStudio' in the menu bar above.

_______


<div><br><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/640px-IBM_logo.svg.png" width = 200 height = 200>
</div><br>