# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [1]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [7]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .001,
    "batch_size": 64,
    "num_hidden": 16,
    "hidden_size": 64,
    "num_epochs": 50
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    # if not model.load_model():
    model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512887 obs, 170945 no diabetes [0], 171126 pre-diabetes [1], 170816 diabetes [2]
Test: 128222 obs, 42758 no diabetes [0], 42577 pre-diabetes [1], 42887 diabetes [2]


100%|██████████| 8014/8014 [00:16<00:00, 473.49it/s]


Epoch 1/50, Loss: 0.9859840509627923


100%|██████████| 8014/8014 [00:17<00:00, 456.88it/s]


Epoch 2/50, Loss: 0.978113356993506


100%|██████████| 8014/8014 [00:16<00:00, 492.53it/s]


Epoch 3/50, Loss: 0.9721252312528126


100%|██████████| 8014/8014 [00:17<00:00, 470.79it/s]


Epoch 4/50, Loss: 0.9591944650696911


100%|██████████| 8014/8014 [00:17<00:00, 461.34it/s]


Epoch 5/50, Loss: 0.9434594016718335


100%|██████████| 8014/8014 [00:17<00:00, 462.65it/s]


Epoch 6/50, Loss: 0.9299253725800277


100%|██████████| 8014/8014 [00:17<00:00, 453.95it/s]


Epoch 7/50, Loss: 0.9189981278603827


100%|██████████| 8014/8014 [00:17<00:00, 456.51it/s]


Epoch 8/50, Loss: 0.909109007644332


100%|██████████| 8014/8014 [00:17<00:00, 465.25it/s]


Epoch 9/50, Loss: 0.9011274860971492


100%|██████████| 8014/8014 [00:17<00:00, 465.03it/s]


Epoch 10/50, Loss: 0.8982348591052061


100%|██████████| 8014/8014 [00:16<00:00, 479.13it/s]


Epoch 11/50, Loss: 0.8903890494801083


100%|██████████| 8014/8014 [00:17<00:00, 457.35it/s]


Epoch 12/50, Loss: 0.8875726325303086


100%|██████████| 8014/8014 [00:16<00:00, 474.60it/s]


Epoch 13/50, Loss: 0.8865164354729896


100%|██████████| 8014/8014 [00:17<00:00, 454.74it/s]


Epoch 14/50, Loss: 0.8802428725564869


100%|██████████| 8014/8014 [00:17<00:00, 470.88it/s]


Epoch 15/50, Loss: 0.8779654382380102


100%|██████████| 8014/8014 [00:16<00:00, 474.96it/s]


Epoch 16/50, Loss: 0.8752753020210519


100%|██████████| 8014/8014 [00:17<00:00, 454.72it/s]


Epoch 17/50, Loss: 0.8702641290814496


100%|██████████| 8014/8014 [00:17<00:00, 459.79it/s]


Epoch 18/50, Loss: 0.8722536684956846


100%|██████████| 8014/8014 [00:17<00:00, 461.29it/s]


Epoch 19/50, Loss: 0.8744876040152727


100%|██████████| 8014/8014 [00:17<00:00, 459.22it/s]


Epoch 20/50, Loss: 0.8695496388257576


100%|██████████| 8014/8014 [00:17<00:00, 450.85it/s]


Epoch 21/50, Loss: 0.8643697466685464


100%|██████████| 8014/8014 [00:17<00:00, 462.35it/s]


Epoch 22/50, Loss: 0.8613728264159385


100%|██████████| 8014/8014 [00:17<00:00, 462.39it/s]


Epoch 23/50, Loss: 0.8601708793345609


100%|██████████| 8014/8014 [00:17<00:00, 461.66it/s]


Epoch 24/50, Loss: 0.8565260812636472


100%|██████████| 8014/8014 [00:17<00:00, 462.83it/s]


Epoch 25/50, Loss: 0.8598311676968949


100%|██████████| 8014/8014 [00:17<00:00, 469.44it/s]


Epoch 26/50, Loss: 0.8549457167356366


100%|██████████| 8014/8014 [00:17<00:00, 461.66it/s]


Epoch 27/50, Loss: 0.854540077935498


100%|██████████| 8014/8014 [00:17<00:00, 460.04it/s]


Epoch 28/50, Loss: 0.8483058482609338


100%|██████████| 8014/8014 [00:17<00:00, 466.06it/s]


Epoch 29/50, Loss: 0.8563944715007095


100%|██████████| 8014/8014 [00:16<00:00, 473.15it/s]


Epoch 30/50, Loss: 0.8533650489186062


100%|██████████| 8014/8014 [00:16<00:00, 482.64it/s]


Epoch 31/50, Loss: 0.8631086107468944


100%|██████████| 8014/8014 [00:16<00:00, 482.85it/s]


Epoch 32/50, Loss: 0.8476130192955742


100%|██████████| 8014/8014 [00:17<00:00, 461.90it/s]


Epoch 33/50, Loss: 0.8496551217805545


100%|██████████| 8014/8014 [00:17<00:00, 460.92it/s]


Epoch 34/50, Loss: 0.8420476784633525


100%|██████████| 8014/8014 [00:17<00:00, 464.87it/s]


Epoch 35/50, Loss: 0.8393999480133137


100%|██████████| 8014/8014 [00:16<00:00, 476.75it/s]


Epoch 36/50, Loss: 0.8448388025904642


100%|██████████| 8014/8014 [00:17<00:00, 453.02it/s]


Epoch 37/50, Loss: 0.8491686717333983


100%|██████████| 8014/8014 [00:17<00:00, 469.40it/s]


Epoch 38/50, Loss: 0.8356242845978914


100%|██████████| 8014/8014 [00:17<00:00, 468.98it/s]


Epoch 39/50, Loss: 0.8432380623849575


100%|██████████| 8014/8014 [00:17<00:00, 463.81it/s]


Epoch 40/50, Loss: 0.8442884803948808


100%|██████████| 8014/8014 [00:17<00:00, 457.11it/s]


Epoch 41/50, Loss: 0.8389939781002252


100%|██████████| 8014/8014 [00:17<00:00, 451.93it/s]


Epoch 42/50, Loss: 0.8395128786296121


100%|██████████| 8014/8014 [00:16<00:00, 474.91it/s]


Epoch 43/50, Loss: 0.8710140213723607


100%|██████████| 8014/8014 [00:17<00:00, 458.79it/s]


Epoch 44/50, Loss: 0.8568723185537818


100%|██████████| 8014/8014 [00:17<00:00, 463.58it/s]


Epoch 45/50, Loss: 0.8377546460470238


100%|██████████| 8014/8014 [00:17<00:00, 457.56it/s]


Epoch 46/50, Loss: 0.8499360855809167


100%|██████████| 8014/8014 [00:17<00:00, 462.59it/s]


Epoch 47/50, Loss: 0.8476901885976451


100%|██████████| 8014/8014 [00:17<00:00, 465.84it/s]


Epoch 48/50, Loss: 0.8329552982568681


100%|██████████| 8014/8014 [00:17<00:00, 461.74it/s]


Epoch 49/50, Loss: 0.8318900789053089


100%|██████████| 8014/8014 [00:17<00:00, 452.85it/s]


Epoch 50/50, Loss: 0.8256360343679514

<Test Report>
Precision: [no diabetes] 0.6871986127277478, [pre-diabetes] 0.7103934968064675, [diabetes] 0.6428519602389533
Recall: [no diabetes] 0.6765751438327331, [pre-diabetes] 0.7471169880451887, [diabetes] 0.6197682281343997
F1-Score: [no diabetes] 0.6818455011490189, [pre-diabetes] 0.7282925991643294, [diabetes] 0.6310990811311347
Support: [no diabetes] 42758, [pre-diabetes] 42577, [diabetes] 42887
Accuracy: 68.0999%


***
## Hyperparameter Optimization

In [None]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

Fitting 2 folds for each of 360 candidates, totalling 720 fits


[CV 1/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=10, num_hidden=1, output_size=3;, score=nan total time=   0.2s
[CV 1/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=10, num_hidden=3, output_size=3;, score=nan total time=   0.2s
[CV 2/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=10, num_hidden=1, output_size=3;, score=nan total time=   0.2s
[CV 2/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=10, num_hidden=2, output_size=3;, score=nan total time=   0.3s
[CV 2/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=25, num_hidden=1, output_size=3;, score=nan total time=   0.3s
[CV 1/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=10, num_hidden=2, output_size=3;, score=nan total time=   0.3s
[CV 1/2] END batch_size=32, hidden_size=32, input_size=21, learning_rate=1, num_epochs=25, num_hidde

KeyboardInterrupt: 

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion