# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [1]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [6]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .0005,
    "batch_size": 128,
    "num_hidden": 3,
    "hidden_size": 1024,
    "num_epochs": 50
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    # if not model.load_model():
    model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512887 obs, 170945 no diabetes [0], 171126 pre-diabetes [1], 170816 diabetes [2]
Test: 128222 obs, 42758 no diabetes [0], 42577 pre-diabetes [1], 42887 diabetes [2]


100%|██████████| 4007/4007 [00:21<00:00, 189.38it/s]


Epoch 1/50, Loss: 0.9766258215529383


100%|██████████| 4007/4007 [00:21<00:00, 189.13it/s]


Epoch 2/50, Loss: 0.9277257938946695


100%|██████████| 4007/4007 [00:21<00:00, 189.28it/s]


Epoch 3/50, Loss: 0.8607926194079356


100%|██████████| 4007/4007 [00:21<00:00, 189.60it/s]


Epoch 4/50, Loss: 0.8177821273990542


100%|██████████| 4007/4007 [00:21<00:00, 189.66it/s]


Epoch 5/50, Loss: 0.7956378528951021


100%|██████████| 4007/4007 [00:21<00:00, 189.40it/s]


Epoch 6/50, Loss: 0.7819146251244113


100%|██████████| 4007/4007 [00:21<00:00, 189.69it/s]


Epoch 7/50, Loss: 0.7730876860072377


100%|██████████| 4007/4007 [00:21<00:00, 189.93it/s]


Epoch 8/50, Loss: 0.7652697582062802


100%|██████████| 4007/4007 [00:21<00:00, 189.69it/s]


Epoch 9/50, Loss: 0.7621774067002876


100%|██████████| 4007/4007 [00:21<00:00, 189.62it/s]


Epoch 10/50, Loss: 0.7573944740769034


100%|██████████| 4007/4007 [00:21<00:00, 189.66it/s]


Epoch 11/50, Loss: 0.7535141197439736


100%|██████████| 4007/4007 [00:21<00:00, 189.58it/s]


Epoch 12/50, Loss: 0.7514079677395193


100%|██████████| 4007/4007 [00:21<00:00, 189.62it/s]


Epoch 13/50, Loss: 0.749311249021702


100%|██████████| 4007/4007 [00:21<00:00, 189.79it/s]


Epoch 14/50, Loss: 0.7492745138624823


100%|██████████| 4007/4007 [00:21<00:00, 189.39it/s]


Epoch 15/50, Loss: 0.7480122842692305


100%|██████████| 4007/4007 [00:21<00:00, 189.56it/s]


Epoch 16/50, Loss: 0.7470738794519802


100%|██████████| 4007/4007 [00:21<00:00, 189.61it/s]


Epoch 17/50, Loss: 0.7464302501269103


100%|██████████| 4007/4007 [00:21<00:00, 189.55it/s]


Epoch 18/50, Loss: 0.7465248968733913


100%|██████████| 4007/4007 [00:21<00:00, 189.68it/s]


Epoch 19/50, Loss: 0.7453960627280241


100%|██████████| 4007/4007 [00:21<00:00, 189.68it/s]


Epoch 20/50, Loss: 0.7450083423827037


100%|██████████| 4007/4007 [00:21<00:00, 189.56it/s]


Epoch 21/50, Loss: 0.7502043679196667


100%|██████████| 4007/4007 [00:21<00:00, 189.70it/s]


Epoch 22/50, Loss: 0.7487078288680216


100%|██████████| 4007/4007 [00:21<00:00, 189.67it/s]


Epoch 23/50, Loss: 0.7506327998290075


100%|██████████| 4007/4007 [00:21<00:00, 189.54it/s]


Epoch 24/50, Loss: 0.7550332948160136


100%|██████████| 4007/4007 [00:21<00:00, 189.81it/s]


Epoch 25/50, Loss: 0.7572304848456044


100%|██████████| 4007/4007 [00:21<00:00, 189.94it/s]


Epoch 26/50, Loss: 0.7610766531405438


100%|██████████| 4007/4007 [00:21<00:00, 190.01it/s]


Epoch 27/50, Loss: 0.7650676122591149


100%|██████████| 4007/4007 [00:21<00:00, 190.09it/s]


Epoch 28/50, Loss: 0.7708032573432675


100%|██████████| 4007/4007 [00:21<00:00, 189.55it/s]


Epoch 29/50, Loss: 0.7802106212105928


100%|██████████| 4007/4007 [00:21<00:00, 189.67it/s]


Epoch 30/50, Loss: 0.782534295732552


100%|██████████| 4007/4007 [00:21<00:00, 189.60it/s]


Epoch 31/50, Loss: 0.7937661523024043


100%|██████████| 4007/4007 [00:21<00:00, 189.61it/s]


Epoch 32/50, Loss: 0.8105388038289912


100%|██████████| 4007/4007 [00:21<00:00, 189.81it/s]


Epoch 33/50, Loss: 0.8116761139868975


100%|██████████| 4007/4007 [00:21<00:00, 189.70it/s]


Epoch 34/50, Loss: 0.823211250904346


100%|██████████| 4007/4007 [00:21<00:00, 189.66it/s]


Epoch 35/50, Loss: 0.8482686763307178


100%|██████████| 4007/4007 [00:21<00:00, 189.48it/s]


Epoch 36/50, Loss: 0.8570376824404433


100%|██████████| 4007/4007 [00:21<00:00, 189.78it/s]


Epoch 37/50, Loss: 0.8790607343357877


100%|██████████| 4007/4007 [00:21<00:00, 189.67it/s]


Epoch 38/50, Loss: 0.9188074527133098


100%|██████████| 4007/4007 [00:21<00:00, 189.68it/s]


Epoch 39/50, Loss: 1.0110369659523313


100%|██████████| 4007/4007 [00:21<00:00, 189.43it/s]


Epoch 40/50, Loss: 1.079229051971007


100%|██████████| 4007/4007 [00:21<00:00, 189.68it/s]


Epoch 41/50, Loss: 1.1158246990299772


100%|██████████| 4007/4007 [00:21<00:00, 189.99it/s]


Epoch 42/50, Loss: 1.0843773540904287


100%|██████████| 4007/4007 [00:21<00:00, 190.00it/s]


Epoch 43/50, Loss: 1.0843768729394827


100%|██████████| 4007/4007 [00:21<00:00, 189.65it/s]


Epoch 44/50, Loss: 1.0843766580236085


100%|██████████| 4007/4007 [00:21<00:00, 189.89it/s]


Epoch 45/50, Loss: 1.093706881372239


100%|██████████| 4007/4007 [00:21<00:00, 189.91it/s]


Epoch 46/50, Loss: 1.1111667017732025


100%|██████████| 4007/4007 [00:21<00:00, 189.79it/s]


Epoch 47/50, Loss: 1.0910489906954117


100%|██████████| 4007/4007 [00:21<00:00, 189.84it/s]


Epoch 48/50, Loss: 1.0888419344087834


100%|██████████| 4007/4007 [00:21<00:00, 189.75it/s]


Epoch 49/50, Loss: 1.0888414244893367


100%|██████████| 4007/4007 [00:21<00:00, 189.89it/s]


Epoch 50/50, Loss: 1.0888414261553512

<Test Report>
Precision: [no diabetes] 0.35561185882993107, [pre-diabetes] 0.0, [diabetes] 0.5755375330064126
Recall: [no diabetes] 0.9782029093970719, [pre-diabetes] 0.0, [diabetes] 0.14230419474432812
F1-Score: [no diabetes] 0.5216024941543258, [pre-diabetes] 0.0, [diabetes] 0.22818791946308725
Support: [no diabetes] 42758, [pre-diabetes] 42577, [diabetes] 42887
Accuracy: 37.3797%


***
## Hyperparameter Optimization

In [3]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

<Grid-Search>
Testing 7 combinations WITHOUT cross-validation


IndexError: tuple index out of range

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion