# Assignment 13 -  Evaluating model performance

***

Load the `ISLR2` and the `tidymodels` packages.

In [10]:
library(ISLR2)
library(tidymodels)

In this assignment we will use the `Default` dataset which includes the default status for credit card customers (`default` variable) in addition to each customer's:

1. credit card balance (`balance` variable),
1. student status (`student` variable), and,
1. income (`income` variable).

In [11]:
Default |> head()

Unnamed: 0_level_0,default,student,balance,income
Unnamed: 0_level_1,<fct>,<fct>,<dbl>,<dbl>
1,No,No,729.5265,44361.625
2,No,Yes,817.1804,12106.135
3,No,No,1073.5492,31767.139
4,No,No,529.2506,35704.494
5,No,No,785.6559,38463.496
6,No,Yes,919.5885,7491.559


We will be modeling `default` with the customer features.

Before we begin let's count how many customers fall into each `default` category.

In [12]:
Default |> count(default)

default,n
<fct>,<int>
No,9667
Yes,333


The data is quite imbalanced. This will be important to keep in mind when we evaluate the performance of our model later. 

Run the code below to create **testing** and **training** data from `Default`. We will use the "test" dataset at the end to get a final evaluation of our best model's accuracy.

In [13]:
Default_split = initial_split(Default, prop = 0.90, strata = default)

Default_train = training(Default_split)
Default_test = testing(Default_split)

Create a logistic regression model called `mod`. Set the engine to `glm` and the mode to `classification`. 

In [14]:

mod = logistic_reg() |>
    set_engine("glm") |>
    set_mode("classification")

mod_new = logistic_reg() |>
    set_engine("glm") |>
    set_mode("classification")


Our data is imbalanced. As such, a naive model that *always* predicts a customer to **not default** would be correct quite often. Let's start by calculating the "accuracy" of a naive model. This will be the baseline accuracy by which we evaluate other models.

In [15]:
# This code calculates the accuracy of a model that always predicts default to be "No"

Default_train |>
    mutate(.pred_naive = factor('No', levels = c('No', 'Yes'))) |>
    accuracy(truth = default, .pred_naive)

.metric,.estimator,.estimate
<chr>,<chr>,<dbl>
accuracy,binary,0.9668889


***
Let's use k-fold cross validation to evaluate the performance of a model where the outcome is `default` and the predictors are `income` and `balance`.

To start, use `vfold_cv` to generate 10 validation folds (i.e. set the `v` variable to 10). Set the `strata` argument to `default` so we preserve the distribution of `default` values in each fold.

Creat your folds below and use `glimpse` to look at the output table. Call your output folds tables "folds".

In [16]:

folds = vfold_cv(Default_train, v = 10, strata = default)

folds |> glimpse()


Rows: 10
Columns: 2
$ splits [3m[90m<list>[39m[23m [<vfold_split[8100 x 900 x 9000 x 4]>], [<vfold_split[8100 x 9…
$ id     [3m[90m<chr>[39m[23m "Fold01", "Fold02", "Fold03", "Fold04", "Fold05", "Fold06", "Fo…


The code below fits a model to each of your 10 folds. `collect_metrics` finds the average of evaluation metrics for each of your ten models. 

In [20]:
mod |> 
    fit_resamples(default ~ income + balance, folds) |>
    collect_metrics()

.metric,.estimator,mean,n,std_err,.config
<chr>,<chr>,<dbl>,<int>,<dbl>,<chr>
accuracy,binary,0.97388889,10,0.002050461,Preprocessor1_Model1
brier_class,binary,0.02150123,10,0.001500481,Preprocessor1_Model1
roc_auc,binary,0.94820937,10,0.004978908,Preprocessor1_Model1


❓How does the model accuracy compare to the naive model from above?

The model that uses `income` and `balance` as predictors is expected to have a higher accuracy compared to the naive model, as it incorporates meaningful features to predict `default`. However, since the data is imbalanced, the naive model's accuracy might still be competitive. The improvement in accuracy should be evaluated in the context of the imbalance and whether the model provides better insights into predicting defaults beyond just accuracy (e.g., sensitivity, specificity).


Complete the cell below to evaluate a model also includes the `student` variable as as predictor.
1. use `default ~ income + balance + student` as the formula,
2. encode your `student` variable with `step_dummy`, and,
3. don't forget to `prep` your recipe!

**Including a prep() step breaks no matter how I tried formatting it**

In [25]:
rec = recipe(default ~ income + balance + student, data = Default_train) |>
    step_dummy(student) 

workflow() |>
    add_recipe(rec) |>
    add_model(mod) |>
    fit_resamples(folds) |>
    collect_metrics()

.metric,.estimator,mean,n,std_err,.config
<chr>,<chr>,<dbl>,<int>,<dbl>,<chr>
accuracy,binary,0.97333333,10,0.001945767,Preprocessor1_Model1
brier_class,binary,0.02143593,10,0.001504336,Preprocessor1_Model1
roc_auc,binary,0.94883925,10,0.004771394,Preprocessor1_Model1


Does it appear that the model that includes `student` improves upon the first model with only `income` and `balance` as predictors?


1. **Model 1 (`default ~ income + balance`)**:
    - Accuracy: 0.9739 (± 0.0021)
    - Brier Score: 0.0215 (± 0.0015)
    - ROC AUC: 0.9482 (± 0.0050)

2. **Model 2 (`default ~ income + balance + student`)**:
    - Accuracy: 0.9733 (± 0.0019)
    - Brier Score: 0.0214 (± 0.0015)
    - ROC AUC: 0.9488 (± 0.0048)

### Thoughts
- The accuracy of both models is very similar, with Model 1 slightly outperforming Model 2.
- The Brier Score is marginally better (lower) for Model 2, indicating slightly better calibration.
- The ROC AUC is slightly higher for Model 2, suggesting a minor improvement in the model's ability to distinguish between classes.


Including the `student` variable **does not significantly improve** the model's performance. The differences in metrics are minimal, and the added complexity of including `student` may not be justified.


Finally, estimate the accuracy of an `default ~ income + balance` model on the test data, `Default_test`. 

Does our model outperform a naive model?

In [26]:
mod |>
    fit(default ~ income + balance, data = Default_train) |>
    predict(new_data = Default_test) |>
    bind_cols(Default_test) |>
    accuracy(truth = default, estimate = .pred_class)



.metric,.estimator,.estimate
<chr>,<chr>,<dbl>
accuracy,binary,0.973


In [27]:
# Calculate the accuracy of a naive model on the test data
Default_test |>
    mutate(.pred_naive = factor('No', levels = c('No', 'Yes'))) |>
    accuracy(truth = default, .pred_naive)



.metric,.estimator,.estimate
<chr>,<chr>,<dbl>
accuracy,binary,0.965


The accuracy of the `default ~ income + balance` model on the test data is **0.973**, which is higher than the accuracy of the naive model (**0.965**). This would indicate that the model ***does provide better predictions*** by using the features `income` and `balance`. 

However, given the imbalanced nature of the dataset, relying solely on accuracy may not be sufficient. Additional metrics such as sensitivity, specificity, or ROC AUC should also be considered to comprehensively evaluate the model's performance.
