# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [2]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [3]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/pre_split_processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/pre_split_processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .0001,
    "batch_size": 128,
    "num_hidden": 2,
    "hidden_size": 2048,
    "num_epochs": 50,
    "classify_fn": "sigmoid"
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    # if not model.load_model():
    model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512886 obs, 170962 no diabetes [0], 170962 pre-diabetes [1], 170962 diabetes [2]
Test: 50736 obs, 42741 no diabetes [0], 926 pre-diabetes [1], 7069 diabetes [2]


100%|██████████| 4007/4007 [01:09<00:00, 57.50it/s]


Epoch 1/50, Loss: 0.9226, Test Loss: 0.8863


100%|██████████| 4007/4007 [01:09<00:00, 57.61it/s]


Epoch 2/50, Loss: 0.8110, Test Loss: 0.8227


100%|██████████| 4007/4007 [01:09<00:00, 57.61it/s]


Epoch 3/50, Loss: 0.7704, Test Loss: 0.8213


100%|██████████| 4007/4007 [01:09<00:00, 57.60it/s]


Epoch 4/50, Loss: 0.7504, Test Loss: 0.8199


100%|██████████| 4007/4007 [01:09<00:00, 57.64it/s]


Epoch 5/50, Loss: 0.7365, Test Loss: 0.8078


100%|██████████| 4007/4007 [01:09<00:00, 57.59it/s]


Epoch 6/50, Loss: 0.7248, Test Loss: 0.7953


100%|██████████| 4007/4007 [01:09<00:00, 57.66it/s]


Epoch 7/50, Loss: 0.7144, Test Loss: 0.7821


100%|██████████| 4007/4007 [01:09<00:00, 57.56it/s]


Epoch 8/50, Loss: 0.7049, Test Loss: 0.7884


100%|██████████| 4007/4007 [01:09<00:00, 57.61it/s]


Epoch 9/50, Loss: 0.6984, Test Loss: 0.7942


100%|██████████| 4007/4007 [01:09<00:00, 57.63it/s]


Epoch 10/50, Loss: 0.6903, Test Loss: 0.8142


100%|██████████| 4007/4007 [01:09<00:00, 57.60it/s]


Epoch 11/50, Loss: 0.6859, Test Loss: 0.7862


100%|██████████| 4007/4007 [01:09<00:00, 57.65it/s]


Epoch 12/50, Loss: 0.6812, Test Loss: 0.7830


100%|██████████| 4007/4007 [01:09<00:00, 57.64it/s]


Epoch 13/50, Loss: 0.6783, Test Loss: 0.7804


100%|██████████| 4007/4007 [01:09<00:00, 57.61it/s]


Epoch 14/50, Loss: 0.6743, Test Loss: 0.7981


100%|██████████| 4007/4007 [01:09<00:00, 57.64it/s]


Epoch 15/50, Loss: 0.6719, Test Loss: 0.7842


100%|██████████| 4007/4007 [01:09<00:00, 57.66it/s]


Epoch 16/50, Loss: 0.6698, Test Loss: 0.7855


100%|██████████| 4007/4007 [01:09<00:00, 57.68it/s]


Epoch 17/50, Loss: 0.6681, Test Loss: 0.7853


100%|██████████| 4007/4007 [01:09<00:00, 57.66it/s]


Epoch 18/50, Loss: 0.6661, Test Loss: 0.7856


100%|██████████| 4007/4007 [01:09<00:00, 57.61it/s]


Epoch 19/50, Loss: 0.6646, Test Loss: 0.7880


100%|██████████| 4007/4007 [01:09<00:00, 57.75it/s]


Epoch 20/50, Loss: 0.6632, Test Loss: 0.7910


100%|██████████| 4007/4007 [01:09<00:00, 57.76it/s]


Epoch 21/50, Loss: 0.6625, Test Loss: 0.7869


100%|██████████| 4007/4007 [01:09<00:00, 57.79it/s]


Epoch 22/50, Loss: 0.6603, Test Loss: 0.7814


100%|██████████| 4007/4007 [01:09<00:00, 57.79it/s]


Epoch 23/50, Loss: 0.6590, Test Loss: 0.7865


100%|██████████| 4007/4007 [01:09<00:00, 57.82it/s]


Epoch 24/50, Loss: 0.6587, Test Loss: 0.7969


100%|██████████| 4007/4007 [01:09<00:00, 57.84it/s]


Epoch 25/50, Loss: 0.6568, Test Loss: 0.7733


100%|██████████| 4007/4007 [01:09<00:00, 57.83it/s]


Epoch 26/50, Loss: 0.6565, Test Loss: 0.7875


100%|██████████| 4007/4007 [01:09<00:00, 57.84it/s]


Epoch 27/50, Loss: 0.6555, Test Loss: 0.7676


100%|██████████| 4007/4007 [01:09<00:00, 57.85it/s]


Epoch 28/50, Loss: 0.6544, Test Loss: 0.8023


100%|██████████| 4007/4007 [01:09<00:00, 57.84it/s]


Epoch 29/50, Loss: 0.6544, Test Loss: 0.7788


100%|██████████| 4007/4007 [01:09<00:00, 57.86it/s]


Epoch 30/50, Loss: 0.6535, Test Loss: 0.7772


 36%|███▌      | 1445/4007 [00:24<00:44, 57.84it/s]

***
## Hyperparameter Optimization

In [None]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

<Grid-Search>
Testing 7 combinations WITHOUT cross-validation


IndexError: tuple index out of range

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion