# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [2]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [4]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .0001,
    "batch_size": 128,
    "num_hidden": 1,
    "hidden_size": 4096,
    "num_epochs": 50,
    "classify_fn": "softmax"
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    # if not model.load_model():
    model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512887 obs, 170945 no diabetes [0], 171126 pre-diabetes [1], 170816 diabetes [2]
Test: 128222 obs, 42758 no diabetes [0], 42577 pre-diabetes [1], 42887 diabetes [2]


100%|██████████| 4007/4007 [00:13<00:00, 289.12it/s]


Epoch 1/50, Loss: 1.0077912219450722


100%|██████████| 4007/4007 [00:13<00:00, 287.87it/s]


Epoch 2/50, Loss: 0.9904404963553324


100%|██████████| 4007/4007 [00:14<00:00, 283.68it/s]


Epoch 3/50, Loss: 0.9880382145227623


100%|██████████| 4007/4007 [00:12<00:00, 308.42it/s]


Epoch 4/50, Loss: 0.9856697548676346


100%|██████████| 4007/4007 [00:13<00:00, 289.69it/s]


Epoch 5/50, Loss: 0.9834917887416836


100%|██████████| 4007/4007 [00:13<00:00, 293.70it/s]


Epoch 6/50, Loss: 0.9811379502085578


100%|██████████| 4007/4007 [00:13<00:00, 305.02it/s]


Epoch 7/50, Loss: 0.9787372709332126


100%|██████████| 4007/4007 [00:13<00:00, 301.04it/s]


Epoch 8/50, Loss: 0.9771586130734599


100%|██████████| 4007/4007 [00:13<00:00, 289.77it/s]


Epoch 9/50, Loss: 0.9742432476012285


100%|██████████| 4007/4007 [00:13<00:00, 296.70it/s]


Epoch 10/50, Loss: 0.9712461740827215


100%|██████████| 4007/4007 [00:13<00:00, 291.33it/s]


Epoch 11/50, Loss: 0.9681853098943104


100%|██████████| 4007/4007 [00:13<00:00, 287.23it/s]


Epoch 12/50, Loss: 0.9651089286399596


100%|██████████| 4007/4007 [00:13<00:00, 292.10it/s]


Epoch 13/50, Loss: 0.961825847700108


100%|██████████| 4007/4007 [00:13<00:00, 287.52it/s]


Epoch 14/50, Loss: 0.9593817067765107


100%|██████████| 4007/4007 [00:13<00:00, 292.78it/s]


Epoch 15/50, Loss: 0.9560523947516314


100%|██████████| 4007/4007 [00:13<00:00, 294.49it/s]


Epoch 16/50, Loss: 0.9526288818176589


100%|██████████| 4007/4007 [00:13<00:00, 290.01it/s]


Epoch 17/50, Loss: 0.9499140541571942


100%|██████████| 4007/4007 [00:13<00:00, 293.05it/s]


Epoch 18/50, Loss: 0.94530585865621


100%|██████████| 4007/4007 [00:13<00:00, 303.45it/s]


Epoch 19/50, Loss: 0.9416342412916716


100%|██████████| 4007/4007 [00:13<00:00, 290.63it/s]


Epoch 20/50, Loss: 0.9384627064654201


100%|██████████| 4007/4007 [00:13<00:00, 291.82it/s]


Epoch 21/50, Loss: 0.9349465804085757


100%|██████████| 4007/4007 [00:13<00:00, 293.27it/s]


Epoch 22/50, Loss: 0.9347166816938955


100%|██████████| 4007/4007 [00:13<00:00, 290.76it/s]


Epoch 23/50, Loss: 0.9300468846809724


100%|██████████| 4007/4007 [00:13<00:00, 287.92it/s]


Epoch 24/50, Loss: 0.928766809811103


100%|██████████| 4007/4007 [00:13<00:00, 294.77it/s]


Epoch 25/50, Loss: 0.9245872279042938


100%|██████████| 4007/4007 [00:13<00:00, 286.52it/s]


Epoch 26/50, Loss: 0.9235143063518809


100%|██████████| 4007/4007 [00:13<00:00, 286.44it/s]


Epoch 27/50, Loss: 0.9198633660424044


100%|██████████| 4007/4007 [00:13<00:00, 289.65it/s]


Epoch 28/50, Loss: 0.9172807484456122


100%|██████████| 4007/4007 [00:13<00:00, 287.45it/s]


Epoch 29/50, Loss: 0.9150991856174217


100%|██████████| 4007/4007 [00:13<00:00, 289.23it/s]


Epoch 30/50, Loss: 0.9157322985232843


100%|██████████| 4007/4007 [00:13<00:00, 291.47it/s]


Epoch 31/50, Loss: 0.9117563534830168


100%|██████████| 4007/4007 [00:13<00:00, 291.24it/s]


Epoch 32/50, Loss: 0.9092618026785759


100%|██████████| 4007/4007 [00:14<00:00, 286.20it/s]


Epoch 33/50, Loss: 0.9097701608762796


100%|██████████| 4007/4007 [00:13<00:00, 292.58it/s]


Epoch 34/50, Loss: 0.9095233709609267


100%|██████████| 4007/4007 [00:13<00:00, 297.19it/s]


Epoch 35/50, Loss: 0.9069421980218576


100%|██████████| 4007/4007 [00:13<00:00, 301.27it/s]


Epoch 36/50, Loss: 0.9075227806863864


100%|██████████| 4007/4007 [00:13<00:00, 290.87it/s]


Epoch 37/50, Loss: 0.9013937735813409


100%|██████████| 4007/4007 [00:13<00:00, 291.63it/s]


Epoch 38/50, Loss: 0.9037474630359167


100%|██████████| 4007/4007 [00:13<00:00, 294.10it/s]


Epoch 39/50, Loss: 0.9011099546634559


100%|██████████| 4007/4007 [00:13<00:00, 291.30it/s]


Epoch 40/50, Loss: 0.8979623303140879


100%|██████████| 4007/4007 [00:13<00:00, 293.27it/s]


Epoch 41/50, Loss: 0.8984308829919222


100%|██████████| 4007/4007 [00:13<00:00, 290.96it/s]


Epoch 42/50, Loss: 0.897641491125372


100%|██████████| 4007/4007 [00:14<00:00, 285.00it/s]


Epoch 43/50, Loss: 0.894126756800539


100%|██████████| 4007/4007 [00:13<00:00, 293.83it/s]


Epoch 44/50, Loss: 0.9151238932078813


100%|██████████| 4007/4007 [00:13<00:00, 288.48it/s]


Epoch 45/50, Loss: 0.8954230703140156


100%|██████████| 4007/4007 [00:13<00:00, 292.10it/s]


Epoch 46/50, Loss: 0.8912760605977484


100%|██████████| 4007/4007 [00:13<00:00, 286.81it/s]


Epoch 47/50, Loss: 0.8937571212660502


100%|██████████| 4007/4007 [00:14<00:00, 285.45it/s]


Epoch 48/50, Loss: 0.8935286043765451


100%|██████████| 4007/4007 [00:13<00:00, 295.86it/s]


Epoch 49/50, Loss: 0.8889047434748275


100%|██████████| 4007/4007 [00:13<00:00, 287.51it/s]


Epoch 50/50, Loss: 0.8990425235736629

<Test Report>
Precision: [no diabetes] 0.6628577131309099, [pre-diabetes] 0.5269758437841933, [diabetes] 0.5684811778319423
Recall: [no diabetes] 0.6213574068010664, [pre-diabetes] 0.5882048993588087, [diabetes] 0.5383915871942546
F1-Score: [no diabetes] 0.6414370043095643, [pre-diabetes] 0.5559094793620493, [diabetes] 0.5530273998850355
Support: [no diabetes] 42758, [pre-diabetes] 42577, [diabetes] 42887
Accuracy: 58.2599%


***
## Hyperparameter Optimization

In [None]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

<Grid-Search>
Testing 7 combinations WITHOUT cross-validation


IndexError: tuple index out of range

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion