-
Notifications
You must be signed in to change notification settings - Fork 2
/
labs-14.Rmd
93 lines (68 loc) · 2.17 KB
/
labs-14.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "Week 14 - SVM"
---
Download template [here](templates/labs-14.Rmd)
```{r setup, include=FALSE}
# Delete this chunk!
knitr::opts_chunk$set(eval = FALSE)
```
We will be using tidymodels and the `flights` data set from {nycflights13}.
```{r}
library(tidymodels)
library(nycflights13)
```
We will do the same transformation as we have done before.
```{r}
flights1 <- flights %>%
mutate(delay = factor(arr_delay > 0, c(TRUE, FALSE),
c("Delayed", "On time"))) %>%
filter(month == 1, !is.na(delay)) %>%
select(delay, hour, minute, dep_delay, carrier, distance)
set.seed(1234)
flights_split <- initial_split(flights1)
flights_train <- training(flights_split)
flights_test <- testing(flights_split)
```
# The models
We will start using a `svm_linear()` model . These can be used for both regression and classification so we need to specify it for this model. We will be using the `kernlab` package as the engine.
```{r}
svm_lin_spec <- svm_linear() %>%
set_mode("classification") %>%
set_engine("kernlab")
```
and then we will fit it right away. The fitting might take a minute or two but we shouldn't worry.
```{r}
svm_lin_fit <- fit(svm_lin_spec, delay ~ ., data = flights_train)
svm_lin_fit
```
We can get the confusion matrix
```{r}
svm_lin_fit %>%
augment(new_data = flights_train) %>%
conf_mat(delay, .pred_class) %>%
autoplot(type = "heatmap")
```
and calculate the accuracy.
```{r}
bean_tree %>%
augment(new_data = beans_train) %>%
accuracy(class, .pred_class)
```
They are not doing well.
Let us try a polynomial SVM model to see if that helps at all.
```{r}
svm_poly_spec <- svm_poly(degree = 2) %>%
set_mode("classification") %>%
set_engine("kernlab")
svm_poly_fit <- fit(svm_poly_spec, delay ~ ., data = flights_train)
svm_poly_fit
```
calculating another confusion matrix doesn't give us much luck.
```{r}
svm_poly_fit %>%
augment(new_data = flights_train) %>%
conf_mat(delay, .pred_class) %>%
autoplot(type = "heatmap")
```
# Your turn
But wait, we didn't do any preprocessing. Let us do some proper preprocessing to see if we can improve on the model. We also have a `cost` parameter we could tune. Let us try that as well.