-
Notifications
You must be signed in to change notification settings - Fork 0
/
04_Practical_Logistic.Rmd
executable file
·636 lines (416 loc) · 22.9 KB
/
04_Practical_Logistic.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
---
title: "04_Practical; Logistic Regression"
author: "Ryan Greenup 17805315"
date: "12 March 2019"
output:
html_document:
code_folding: hide
keep_md: yes
theme: flatly
toc: yes
toc_depth: 4
toc_float: no
pdf_document:
toc: yes
always_allow_html: yes
bibliography: ref.bib
##Shiny can be good but {.tabset} will be more compatible with PDF
##but you can submit HTML in turnitin so it doesn't really matter.
##It is rare that you will want to use a floating toc with {.tabset}
##If a floating toc is used in the document only use {.tabset} on more or less copy/pasted
#sections with different datasets
##Otherwise use {.tabsets} instead of TOC or TOC instead of {.tabsets}, one or the other though.
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
if(require('pacman')){
library('pacman')
}else{
install.packages('pacman')
library('pacman')
}
pacman::p_load(caret, e1071, scales, ggplot2, rmarkdown, shiny, ISLR, class, BiocManager, corrplot, plotly, tidyverse, latex2exp, stringr, reshape2, cowplot, ggpubr, rstudioapi, wesanderson, RColorBrewer, colorspace, gridExtra, grid)
#Mass isn't available for R 3.5...
```
# Logistic Regression
Material of Tue 26 2019, week 4
It's worth bearing in mind that Logistic regression is basically the same as linear regression for an output value that is categorical, i.e. if linear assumptions hold and $Y \in \mathbb{Z} \wedge Y = f(X) + \varepsilon \implies f = \sum^n_{i=1} \left[a_i \cdot \frac{e^{\alpha + \beta x_i}}{1+e^{\alpha + \beta x_i}} \right]$.
This is a parametric method because all we need to do now is solve parameter coefficients.
This is supervised learning because there are output values.
## Question 01: Logistic Regression: Use “Default” data sets
### 1. Start R studio. Upload “Default” data set built in R and View the data sets using the Console.
```{r}
def <- read.csv(file = "../0DataSets/Default.csv")
head(def)
# writeLines("\n")
# print("***Dimensions***")
writeLines("\n")
dim(def)
# writeLines("\n")
# print("***Summary***")
# writeLines("\n")
summary(def)
# writeLines("\n")
# print("***Structure***")
# writeLines("\n")
str(def)
# writeLines("\n")
```
#### Dummy Variables
when `glm` is used on a categorical variable that is stored as the `factor` class in ***R*** the categorical variables will be encoded as dummy variables (i.e. they will become numbers that are stored as `factors` ), I thought it might be better to do this yourself because otherwise you have to figure out which dummy variable corresponds to what, however at p. 157 of the textook a different approach is taken; `glm` was used and then `contrasts(def$student)` was used to work out which was what, that's probably the better way to do it.
You shouldn't work with dummy variables at all, that's what contrasts are for, work with contrasts [^contrastR] instead [^datacamp] of doing it all by hand, it will prevent you from making a mistake.
[^datacamp]: [Data Camp: Logistic Regression Tutorial](https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/contrasts)
[^contrastR]: [R Documentation on Contrasts](https://www.rdocumentation.org/packages/stats/versions/3.6.1/topics/contrasts)
If values are already numeric but represent categorical variables, it is necessary to ensure that they are of the `factor` class not `numeric`/`integer`/`double` etc. because otherwise the model will perform sub-optimally. This can be tested by using `class()`.[^class]
[^class]: [DataMentor](https://www.datamentor.io/r-programming/factor/) and [StackOverflow](https://stackoverflow.com/questions/35445112/what-is-the-difference-between-mode-and-class-in-r) and [StackExchange](https://stats.stackexchange.com/questions/3212/mode-class-and-type-of-r-objects)
If values are integers but need to be considered as discrete values, it is usually better to convert them into `factors` as well, otherwise models will simply assume they are continuous variables and perform suboptimally (it may be worth preserving two colums in a data frame for the factorised and original data, ocassionally plots will complain about discrete data data and it will be easier to feed them the numeric data).
first convert the data from factors to something else just so we can see how to do that, the easiest in this case is a `character` (be mindful in ***R*** there is no `string` class, instead there is only the `character` class, this is distinct from *C*-like languages such as java), again this can be tested by using `class()`.
There is also `typeof()` which returns the type of the object from the perspective of ***R***, `class()` refers to the perspective of an Object-Orientated programming language.
```{r}
def$default <- as.factor(def$default)
def$student <- as.factor(def$student)
glm(formula = default ~ balance, data = def, family = "binomial")
```
```{r}
contrasts(def$default)
contrasts(def$student)
```
### 2. Plot Income verses Balance (i.e. Plot the Predictors against Each other) {.tabset}
Generally it would be more appropriate to map the output variable to the colour, because colour is easier to see than shape and it is the output variable we are most concerned with.
#### GGplot
By using the structure from above we can decide to map shape and colour to the default and student variables respectively.
Ge
```{r}
predplot <- ggplot(data = def, aes(y = balance, x = income, col = default, shape = student)) +
geom_point(size = 2, alpha = 0.7) +
labs(y = TeX("Account Balance $(X_2)$"), x = TeX("Income ($X_3$)"), shape = TeX("Student ( X_1 )"), col = "Default (Y)",
title = "Account Balace against Income and Defaut") +
theme_classic()
predplot
#ggplotly(predplot) # This does not support latex!
```
It's hard to see whether or not there are any defaults in the centre of the plot, hence map the alpha-level to the default and let's invert this plot so that the split occurs along left/right, that way it will resemble our box plots (this is done simply for want of being able to visualise the data, both the axis are predivitive features so it doesn't matter):
```{r}
ggplot(data = def, aes(x = balance, y = income, col = default, shape = student)) +
geom_point(size = 5, aes(alpha = default)) +
labs(x = TeX("Account Balance $(X_2)$"), y = TeX("Income ($X_3$)"), shape = TeX("Student ( X_1 )"), col = "Default (Y)",
title = "Account Balace against Income and Defaut") +
theme_classic()
```
Now it can be clearly seen that defaults occur at a higher frequency when there is a higher account balance, and maybe slightly higher for students than non-students (there;s mayve very slightly less triangles on the right side than left.)
It seems to be totally independent of income.
##### BoxPlot
It is necessary to observe the behaviour of the output variable in response to all the predictors, however it is innapropriate to use a scatterplot because it comes out almost useless:
A boxplot is the correct way to do this [^129], using the output along the $x$-axis and the predictor along the $y$ axis, we can create two sets of this in ggplot by mapping the `x=student` and `fill = default`.
[^129]: Refer to page 129 of ISL [@ISL]
```{r}
ggplot(data = def, aes(x = student, y = balance, fill = default)) + #The predict
geom_boxplot() +
theme_classic() +
labs(y = TeX("Balance ($X_2$)"), x = TeX("Student ($X_1$)") , fill = TeX("Default ($Y$)"),
title = TeX("Default given Status as student and Balance")) +
scale_fill_brewer(palette = "Pastel1")
```
From here it can clearly be seen that there are some outliers for balance but it is a significant predictorv of defaulting, it can also be seen that generally students are more likely to default but this is a non-significant observation.
In my opinion the appropriate
The textbook chose to visualise the boxplots as below, I think this is undesirable because the interaction of income is already clear from the preceeding plot and because student has a more significant impact than income that should be considered.
```{r}
balplot <- ggplot(data = def, aes(x = default, y = balance, fill = default)) +
geom_boxplot() +
labs(x = TeX("Default $(Y)$"), y = TeX("Balance $(X_2)$"), title = "Default given Account Balance") +
theme_classic2() +
guides(fill = FALSE) +
scale_fill_brewer(palette = "Pastel1")
incplot <- ggplot(data = def, aes(x = default, y = income, fill = default)) +
geom_boxplot() +
labs(x = TeX("Default $(Y)$"), y = TeX("Balance $(X_33$"), title = "Default given Account Balance") +
theme_classic2() +
guides(fill = FALSE) +
#scale_fill_discrete(direction = 1, h.start = 99, )
scale_fill_brewer(palette = "Pastel1")
grid.arrange(balplot, incplot, nrow = 1)
```
#### Base Plot
```{r}
plot(income ~ balance, data = def,
main = "Credit Defaults",
xlab = "Income",
ylab ="Balance",
pch = c(1, 2)[as.numeric(student)],
col = c("IndianRed", "Royalblue")[as.numeric(default)],
cex = 0.7)
```
### 3. Construct the Multiple Linear Logistic Model for default in terms of the Predictors balance, student and income.
In order to create a logistic model use the `glm` function which is a *generalised linear model* function, because this is just an exponentiated linear function.
* Choose `family = binomial`
- categorical variables are not normally distributed because they are not continuous, a binomial distribution is the corresponding distribution for discrete data
* Choose `link = logit`
- Use `probit` for when the seperation of data is distinct, it performs slightly better.
- the term *logit* is another word for *log odds*:
- $\log{\left(\frac{\text{P}\left(X\right)}{1 - \text{P}\left(X\right)}\right)}$
```{r}
ModDef <- glm(formula = default ~ student + balance + income, family = binomial(link = "logit"), data = def)
contrasts(def$default)
```
### 4. Show how you would calculate the probability (the default = “yes”) for a given set of X (predictor) values.
In order to find the probability just use the predict command:
#### Create the data frame of new data
Use a data frame rather than a list, that way the format of your input, output and predicted data is all equivalent.
```{r}
newdata <- data.frame(
# "student" = as.factor(c("Yes", "Yes", "No", "Yes")), # Normal
#If I wanted to create random points.
"student" = ifelse(sample(c(0, 1), size = 4, replace = TRUE)==0, "No", "Yes"),
"balance" = rnorm(n = 4, 1500, sd = 500),
"income" = rnorm(n = 4, mean = 50000, sd = 15000)
)
```
Now call the `predict` command and use `cbind` to combine the `newdata` with the predicted data.
By default the predict command will output the value of the *logit* or the *log-odds*:[^133]
[^133]: Refer to page 133 Equation 4.4 of ISL
$$
\verb|output| = \log\left( \frac{ \text{Pr}\left(X\right) }{1 - \text{Pr}\left(X\right) } \right)
$$
Clearly the raw probability would be considerably more useful to us, hence specify the parameter `type="Response"`:
```{r}
predVals <- predict(object = ModDef, newdata, type = "response")
cbind("default_prob" = predVals, newdata)
```
all the corresponding values can be returned by not specifying the `newdata` parameter in the predict command, it would also be possible to return the `fitted.values` from the model object, the fitted values will represent the probability because it is the natural position along the $y$-axis, it is the fitted value
```{r}
ModelPreds <- cbind(def, "Probability" = ModDef$fitted.values)
#ModelPreds <- cbind(def, "Probability" = predict(ModDef, type = 'response')) #This would do the same thing
```
In order to move from the probability to the decision the threshold would have to be specified:
```{r}
threshold <- 0.5
ModelPreds$prediction <- rep_len(x ="No", length.out = nrow(ModelPreds))
ModelPreds$prediction[ModelPreds$Probability > threshold] <- "Yes"
ModelPreds$prediction <- as.factor(ModelPreds$prediction)
ModelPreds <- ModelPreds[,c(2,6,7,3,4,5)]
head(ModelPreds)
#Alternative approach
#threshold <- 0.5
#Predictions <- ifelse(ModelPreds$Probability < threshold, 0, 1)
#ModelPreds <- cbind(ModelPreds, "Prediction" = Predictions)
```
### 5. Construct the Misclassification Martrix
The misclassification matrix (or confusion matrix) is a matrix for the False and True Positives, it can be a good idea to wrap this in a function because later on it may be necessary to use a loop to return a ROC curve.
#### Construct functions to Compute FPR and TPR {.tabset}
##### base
This is the method that oliver taught us
```
This is the harder way to do it, this is what Oliver taught us
trupos <- ifelse(ModelPreds$default == ModelPreds$Prediction, 1, 0 ) #fpr
trupos <- sum(trupos)
trupos
FalsePos <- nrow(ModelPreds) - trupos
FalseNeg <- ifelse(ModelPreds$)
```
This is a more elegent strategy using `table(predictions, observations)`, the order should be specified as such by convention I have seen in the TB and [online](https://www.datacamp.com/community/tutorials/confusion-matrix-calculation-r), this way also matches the `confusionMatrix()` functoin inside the `caret` package
By convention the rows will represent the predicted values and the columns will represent the observed values. \footnote{Atleast that's what I'm lead to believe by this [DataCamp Tutorial](https://www.datacamp.com/community/tutorials/confusion-matrix-calculation-r) and this [DataSchool Tutorial](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/)}. however one page of the textbook shows it in print the otherway but generates in code this way so that's a little confusing.
```{r}
conf.mat <- table(ModelPreds$prediction, ModelPreds$default, dnn = c("Predicted", "Observed"))
print(conf.mat)
# It can be a little difficult to determine which one is the prediction using table
#if you don't get the order of prediction then observation right, so
# you can also use `confusionMatrix` from the `caret` package to triple check.
conf.mat2 <- confusionMatrix(ModelPreds$prediction, ModelPreds$default)
print(conf.mat2)
```
### 6. Calculate the Prob (Misclassification) or Misclassification Rate
In order to determine the missclassification rate we would, in this case the diagonals are the correct predictions and the off-diagonals are misclassifications \footnote{refer to page 158 of the TB}
```{r}
misclass <- (40+228)/(9627+228+40+105)
print(paste("The misclassification rate is: ", percent(misclass)))
```
in practice we would just take the mean value of the missclassifications, because under the hood true and false correspond to 1/0 in R.
```{r}
misclass <- mean(ModelPreds$default != ModelPreds$prediction)
print(paste("The misclassification rate is: ", percent(misclass)))
```
### 7. Calculate the FPR and FNR
Incorrect will be off-diagonals.
The False positive rate will be the number of times that the prediction is positive but the observation is negative, so the second row, first column of the confusion matrix.
The False Negative rate will be the number of times that the prediction is negative but the observation is positive, so first row second column:
```{r}
conf.mat
FPR <- conf.mat[2,1] / nrow(ModelPreds)
FNR <- conf.mat[1,2] / nrow(ModelPreds)
```
The false positive rate is `r FPR` and the False Negative rate is `r FNR`
#### Choose the ideal threshold based on the ROC curve
##### Terminology
The true positive rate is the number of positives that are true divided by the number of positives observed
$$
\textbf{TruePosRate} \text{ is } \textit{Sensitivity} = \frac{\textsf{# of True Positives}}{\textsf{# of True Positives + # of False Negatives}} \\
\ \\
\textbf{FalsePosRate} \text{ is } \left( 1-\textit{Sensitivity} \right) = \frac{\textsf{# of False Positives}}{\textsf{# of False Positive + # of True True Negative}}
$$
$$
\textit{Sensitivity} \text{ is } \textbf{TruePosRate}\\
\textit{Specificity} \text{ is } \textbf{TrueNegRate}
$$
A ROC Curve is the TruePosRate (i.e. the Sensitivity) on the $y$-axis against the FalsePosRate (i.e. 1-Sensitivity) on the x-axis:
First Create a tpr function:
```{r}
tpr <- function(model, observation, threshold, silent){
probability <- model$fitted.values
prediction <- rep_len(x ="No", length.out = length(observation))
prediction[probability > threshold] <- "Yes"
prediction <- as.factor(prediction)
conf.mat <- table(prediction, observation, dnn = c("Predicted", "Observed"))
conf.mat2 <- confusionMatrix(prediction, observation)
if (!silent) {
print(conf.mat2)
}
tpr <- conf.mat2[["byClass"]][["Sensitivity"]]
# tpr <- 1- conf.mat[2,2]/(conf.mat[2,2] + conf.mat[1,1])
fnr <- 1-conf.mat2[["byClass"]][["Specificity"]]
return(c(tpr, fnr))
}
tpr(ModDef, def$default, threshold = 0.5, silent = FALSE)
```
Now run a loop to try every threshold
```{r, include=FALSE}
# di <- 0.01
#tprvec <- rep(x = c(FALSE, FALSE), length.out = 1/di) # This would need to be a data frame
# for (i in seq(from = 0.01, to = 0.985-di, by = di)) {
# tprvec[i/di] <- tpr(ModDef, def$default, threshold = i , silent = TRUE)
#
# }
#
# rocdata <- data.frame(Sensitivity = tprvec, FPR =
```
## Question 02: Logistic Regression: Use “Smarket” data set.
### 1. Start R studio. Upload “Smarket” data set built in R and View the data sets using the Console.
```{r}
head(Smarket)
```
### 2. Construct the Multiple Linear Logistic Model for Direction in terms of the Predictors Lag1, Lag2, Lag3, Lag4, Lag5 and Volume.
```{r}
#Check that it's a factor
class(Smarket$Direction)
#Make it a factor
Smarket$Direction <- as.factor(Smarket$Direction)
LogModSmarket <- glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Smarket, family = binomial(link = "logit"))
```
### 3. Show how you would calculate the probability (the Direction = “up”) for a given set of X (predictor) values.
Generate some newdata
```{r}
n <- 4
maxval <- max(Smarket$Today)
minval <- min(Smarket$Today)
newdata <- data.frame(
Lag1 = sample(seq(from = minval, to = maxval, length.out = 1000), size = n),
Lag2 = sample(seq(from = minval, to = maxval, length.out = 1000), size = n),
Lag3 = sample(seq(from = minval, to = maxval, length.out = 1000), size = n),
Lag4 = sample(seq(from = minval, to = maxval, length.out = 1000), size = n),
Lag5 = sample(seq(from = minval, to = maxval, length.out = 1000), size = n),
Volume = sample(seq(
from = min(Smarket$Volume), to = max(Smarket$Volume), length.out = 1000)
, size = n)
)
```
Now feed this data into the model:
```{r}
predicteddata <- predict(object = LogModSmarket, newdata, type = "response")
predicteddata <- data.frame(Probability = predicteddata, newdata )
```
In order to create the predictions use `contrasts` to check how the dummy variables have been assigned.
```{r}
contrasts(Smarket$Direction)
```
Hence Up is 1 so in order to create the prediction from the probability (assuming an 0.5 threshold):
```{r}
predicteddata$Predictions <- "Down"
predicteddata$Predictions[predicteddata$Probability > 0.5] <- "Up"
# Don't forget that it needs to be a factor,
# Because of some really strange behaviour in R this MUST be done last
predicteddata$Predictions <- as.factor( predicteddata$Predictions)
predicteddata <- predicteddata[,c(8, 1:7)]
# If Else could also have been used like this
#This is neat because it's using if/else across data without a loop
#Just rember that this returs values, it won't assign them inside like a
#loop otherwise would
predicteddata$Predictions <- ifelse(predicteddata$Probability > 0.5, "Up", "Down")
predicteddata
```
### 4. Construct the Misclassification Martrix
First call upon the model (or predict) to get the modelled values of the data:
* Calling from the model with `glm(Y~X)$fitted.values` will always return the probabilities, The fitted values are the y-values, naturally the prob.
* If you want to youse predict, don't specify new data and it will by default return the training data, however the values will be the log-odd values, make sure to specify `type=response`
```{r}
# Calling from the model will always return the probabilities,
# The fitted values are the y-values, naturally the prob.
preds <- LogModSmarket$fitted.values
# If you want to youse predict, don't specify new data and it
#will by default return the training data, however the values
# will be the log-odd values, make sure to specify
#type=response
probs <- predict(LogModSmarket, type = 'response')
preds <- ifelse(probs > 0.5, "Up", "Down")
preds <- as.factor(preds)
```
The misclassification matrix may be made with:
* The built in `table` command
- When using this you have to remember to do it in the order `prediction, observation`
* the `confusionmatrix()` command from the `caret` package
- the advantage to this is that it will automatically label things and return the tpr
```{r}
conf.mat <- table(preds, Smarket$Direction, dnn = c("Predictions", "Observations"))
conf.mat
# Using confusionMatrix (recall that the `contrasts` told us up was one,
# hence that is specified as positive)
conf.obj <- confusionMatrix(data = preds, reference = Smarket$Direction )
conf.obj
```
### 5. Calculate the Prob (Misclassification) or Misclassification Rate
The diagonals of the matrix represent the correct classifications and the off-diagonals represent the misclassifications, so the misclassification rate is:
```{r}
(conf.mat[1,2] + conf.mat[2,1])/sum(conf.mat)
# This may also have been returned from
1-conf.obj$overall["Accuracy"]
```
### 6. Calculate the False Positive rate
The false positive rate is:
\begin{align}
\textsf{FPR}&= \frac{FP}{Neg}\\
&= \frac{FP}{FP + TN}\\
&= 1 - TNR
\end{align}
If/when you forget that during the exam remember you can just use `confusionMatrix()` command inside the `caret` package.
```{r}
#I've made a mistake here, down is positive when I would have rathered it the other wya and is too late in the evening to fix that,
# It makes it a mess but whatever.
# From the confusion Matrix
conf.mat
FPR <- (conf.mat[1,2])/(sum(conf.mat[,2]))
# Using `caret::confusionMatrix()`
# Specificity is the TNR
#FPR is 1-TNR so use
FPR
1-conf.obj$byClass["Specificity"][1]
```
### 7. Calculate the False Negative Rate
The False Negative rate may be determined by swapping every
occurence of positive with negative and vice versa, hence;
The False Negative Rate is given by:
\begin{align}
\textsf{FNR}&= \frac{FN}{Pos}\\
&= \frac{FN}{FN + TP}\\
&= 1 - TPR
\end{align}
```{r}
#I've fucked up here, dowsn is positive and is too late in the evening to fix that,
# It makes it a mess but whatever.
# From the confusion Matrix
# It's false so we are concerned with the offset
# Just choose the other offset and the other column
FNR <- (conf.mat[2,1])/(sum(conf.mat[,1]))
# Using `caret::confusionMatrix()`
# Specificity is the TNR
#FNR is 1-TPR so use Sensitivity instead
FNR
1-conf.obj$byClass["Sensitivity"][1]
```