# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [1]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [2]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/pre_split_processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/pre_split_processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .0001,
    "batch_size": 128,
    "num_hidden": 4,
    "hidden_size": 1024,
    "num_epochs": 50,
    "classify_fn": "sigmoid"
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    # if not model.load_model():
    model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512724 obs, 170908 no diabetes [0], 170908 pre-diabetes [1], 170908 diabetes [2]
Test: 50736 obs, 42795 no diabetes [0], 944 pre-diabetes [1], 6997 diabetes [2]


100%|██████████| 4006/4006 [00:27<00:00, 147.82it/s]


Epoch 1/50, Loss: 0.9313979456559218


100%|██████████| 4006/4006 [00:26<00:00, 148.63it/s]


Epoch 2/50, Loss: 0.8206022908344783


100%|██████████| 4006/4006 [00:26<00:00, 148.65it/s]


Epoch 3/50, Loss: 0.7812785073274144


100%|██████████| 4006/4006 [00:27<00:00, 145.12it/s]


Epoch 4/50, Loss: 0.7630532888609114


100%|██████████| 4006/4006 [00:27<00:00, 147.87it/s]


Epoch 5/50, Loss: 0.7512007845947877


100%|██████████| 4006/4006 [00:27<00:00, 144.86it/s]


Epoch 6/50, Loss: 0.7401863449791105


100%|██████████| 4006/4006 [00:27<00:00, 144.30it/s]


Epoch 7/50, Loss: 0.7305674186897945


100%|██████████| 4006/4006 [00:27<00:00, 145.51it/s]


Epoch 8/50, Loss: 0.7218519273961239


100%|██████████| 4006/4006 [00:27<00:00, 147.59it/s]


Epoch 9/50, Loss: 0.7140890699116159


100%|██████████| 4006/4006 [00:27<00:00, 146.98it/s]


Epoch 10/50, Loss: 0.7069721635109296


100%|██████████| 4006/4006 [00:27<00:00, 146.32it/s]


Epoch 11/50, Loss: 0.7012584481724964


100%|██████████| 4006/4006 [00:26<00:00, 148.45it/s]


Epoch 12/50, Loss: 0.6967364628941074


100%|██████████| 4006/4006 [00:27<00:00, 147.06it/s]


Epoch 13/50, Loss: 0.6936726187445554


100%|██████████| 4006/4006 [00:27<00:00, 147.28it/s]


Epoch 14/50, Loss: 0.6893851776243268


100%|██████████| 4006/4006 [00:27<00:00, 147.64it/s]


Epoch 15/50, Loss: 0.6866096774161725


100%|██████████| 4006/4006 [00:27<00:00, 147.51it/s]


Epoch 16/50, Loss: 0.6849044918627127


100%|██████████| 4006/4006 [00:27<00:00, 145.46it/s]


Epoch 17/50, Loss: 0.6812895905418034


100%|██████████| 4006/4006 [00:27<00:00, 148.32it/s]


Epoch 18/50, Loss: 0.6799319415198406


100%|██████████| 4006/4006 [00:27<00:00, 146.34it/s]


Epoch 19/50, Loss: 0.6778144447344039


100%|██████████| 4006/4006 [00:27<00:00, 146.66it/s]


Epoch 20/50, Loss: 0.6763465989546602


100%|██████████| 4006/4006 [00:27<00:00, 147.44it/s]


Epoch 21/50, Loss: 0.6752207249745213


100%|██████████| 4006/4006 [00:27<00:00, 146.29it/s]


Epoch 22/50, Loss: 0.6740153650462598


100%|██████████| 4006/4006 [00:27<00:00, 147.27it/s]


Epoch 23/50, Loss: 0.6726020942077122


100%|██████████| 4006/4006 [00:27<00:00, 148.34it/s]


Epoch 24/50, Loss: 0.6705709594491119


100%|██████████| 4006/4006 [00:26<00:00, 148.71it/s]


Epoch 25/50, Loss: 0.6697171642461301


100%|██████████| 4006/4006 [00:26<00:00, 148.93it/s]


Epoch 26/50, Loss: 0.6687218591206322


100%|██████████| 4006/4006 [00:26<00:00, 148.98it/s]


Epoch 27/50, Loss: 0.6672890840234961


100%|██████████| 4006/4006 [00:26<00:00, 149.07it/s]


Epoch 28/50, Loss: 0.6675425426131537


100%|██████████| 4006/4006 [00:26<00:00, 148.97it/s]


Epoch 29/50, Loss: 0.6661343724949028


100%|██████████| 4006/4006 [00:26<00:00, 148.90it/s]


Epoch 30/50, Loss: 0.6652607355067804


100%|██████████| 4006/4006 [00:26<00:00, 148.92it/s]


Epoch 31/50, Loss: 0.6639601779888703


100%|██████████| 4006/4006 [00:26<00:00, 148.97it/s]


Epoch 32/50, Loss: 0.6633290653370406


100%|██████████| 4006/4006 [00:26<00:00, 149.09it/s]


Epoch 33/50, Loss: 0.6629458311880105


100%|██████████| 4006/4006 [00:26<00:00, 148.96it/s]


Epoch 34/50, Loss: 0.6624963592721175


100%|██████████| 4006/4006 [00:26<00:00, 149.01it/s]


Epoch 35/50, Loss: 0.661631233532668


100%|██████████| 4006/4006 [00:26<00:00, 148.96it/s]


Epoch 36/50, Loss: 0.6611273336190315


100%|██████████| 4006/4006 [00:26<00:00, 149.01it/s]


Epoch 37/50, Loss: 0.6606016970847287


100%|██████████| 4006/4006 [00:26<00:00, 149.13it/s]


Epoch 38/50, Loss: 0.6605312709830965


100%|██████████| 4006/4006 [00:26<00:00, 149.15it/s]


Epoch 39/50, Loss: 0.6590285007351349


100%|██████████| 4006/4006 [00:26<00:00, 149.20it/s]


Epoch 40/50, Loss: 0.658507500522683


100%|██████████| 4006/4006 [00:26<00:00, 149.35it/s]


Epoch 41/50, Loss: 0.6584030791365737


100%|██████████| 4006/4006 [00:26<00:00, 149.39it/s]


Epoch 42/50, Loss: 0.6580105232139022


100%|██████████| 4006/4006 [00:26<00:00, 149.39it/s]


Epoch 43/50, Loss: 0.6576887907918786


100%|██████████| 4006/4006 [00:26<00:00, 149.59it/s]


Epoch 44/50, Loss: 0.6574051248777287


100%|██████████| 4006/4006 [00:26<00:00, 149.52it/s]


Epoch 45/50, Loss: 0.6559977108725178


100%|██████████| 4006/4006 [00:26<00:00, 149.50it/s]


Epoch 46/50, Loss: 0.6558280463553166


100%|██████████| 4006/4006 [00:26<00:00, 149.51it/s]


Epoch 47/50, Loss: 0.6555516147577816


100%|██████████| 4006/4006 [00:26<00:00, 149.35it/s]


Epoch 48/50, Loss: 0.6558808735234702


100%|██████████| 4006/4006 [00:26<00:00, 149.38it/s]


Epoch 49/50, Loss: 0.6554071685402499


100%|██████████| 4006/4006 [00:26<00:00, 149.14it/s]


Epoch 50/50, Loss: 0.6558454898346441

<Test Report>
Precision: [no diabetes] 0.8474442612373647, [pre-diabetes] 0.15384615384615385, [diabetes] 0.024390243902439025
Recall: [no diabetes] 0.8286715737819839, [pre-diabetes] 0.17921966557095897, [diabetes] 0.019067796610169493
F1-Score: [no diabetes] 0.8379527893953357, [pre-diabetes] 0.16556641140744652, [diabetes] 0.02140309155766944
Support: [no diabetes] 42795, [pre-diabetes] 6997, [diabetes] 944
Accuracy: 72.4042%


***
## Hyperparameter Optimization

In [3]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

<Grid-Search>
Testing 7 combinations WITHOUT cross-validation


IndexError: tuple index out of range

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion