# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [2]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [5]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .0001,
    "batch_size": 128,
    "num_hidden": 1,
    "hidden_size": 4096,
    "num_epochs": 50,
    "classify_fn": "softmax"
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    if not model.load_model():
        model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512887 obs, 170945 no diabetes [0], 171126 pre-diabetes [1], 170816 diabetes [2]
Test: 128222 obs, 42758 no diabetes [0], 42577 pre-diabetes [1], 42887 diabetes [2]


100%|██████████| 4007/4007 [01:31<00:00, 43.90it/s]


Epoch 1/50, Loss: 0.9945262284507351


100%|██████████| 4007/4007 [01:31<00:00, 43.90it/s]


Epoch 2/50, Loss: 0.970514615991629


100%|██████████| 4007/4007 [01:31<00:00, 43.93it/s]


Epoch 3/50, Loss: 0.943732138060262


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 4/50, Loss: 0.906880989540596


100%|██████████| 4007/4007 [01:31<00:00, 43.93it/s]


Epoch 5/50, Loss: 0.868801548649374


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 6/50, Loss: 0.8376542646925974


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 7/50, Loss: 0.8145139233888103


100%|██████████| 4007/4007 [01:31<00:00, 43.95it/s]


Epoch 8/50, Loss: 0.7970200334816226


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 9/50, Loss: 0.7835812089685373


100%|██████████| 4007/4007 [01:31<00:00, 43.92it/s]


Epoch 10/50, Loss: 0.7738872063644872


100%|██████████| 4007/4007 [01:31<00:00, 43.92it/s]


Epoch 11/50, Loss: 0.7653807949656883


100%|██████████| 4007/4007 [01:31<00:00, 43.91it/s]


Epoch 12/50, Loss: 0.7589361173034993


100%|██████████| 4007/4007 [01:31<00:00, 43.89it/s]


Epoch 13/50, Loss: 0.7533884416770126


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 14/50, Loss: 0.7478917862876919


100%|██████████| 4007/4007 [01:31<00:00, 43.95it/s]


Epoch 15/50, Loss: 0.7438156928895923


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 16/50, Loss: 0.7398013771695926


100%|██████████| 4007/4007 [01:31<00:00, 43.93it/s]


Epoch 17/50, Loss: 0.7362549687699246


100%|██████████| 4007/4007 [01:31<00:00, 43.92it/s]


Epoch 18/50, Loss: 0.7329347927075893


100%|██████████| 4007/4007 [01:31<00:00, 43.95it/s]


Epoch 19/50, Loss: 0.7301067930520965


100%|██████████| 4007/4007 [01:31<00:00, 43.93it/s]


Epoch 20/50, Loss: 0.7272395572223239


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 21/50, Loss: 0.7243169103443459


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 22/50, Loss: 0.7216469223195251


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 23/50, Loss: 0.719473211269055


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 24/50, Loss: 0.7169382450793014


100%|██████████| 4007/4007 [01:31<00:00, 43.95it/s]


Epoch 25/50, Loss: 0.7147631733114653


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 26/50, Loss: 0.7129733763729987


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 27/50, Loss: 0.7106493007282917


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 28/50, Loss: 0.7089373016553774


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 29/50, Loss: 0.7070595461262531


100%|██████████| 4007/4007 [01:31<00:00, 43.97it/s]


Epoch 30/50, Loss: 0.7054128960715823


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 31/50, Loss: 0.7031750328706808


100%|██████████| 4007/4007 [01:31<00:00, 43.92it/s]


Epoch 32/50, Loss: 0.7019575788756395


100%|██████████| 4007/4007 [01:31<00:00, 43.95it/s]


Epoch 33/50, Loss: 0.7003597367842179


100%|██████████| 4007/4007 [01:31<00:00, 43.97it/s]


Epoch 34/50, Loss: 0.6986648966494361


100%|██████████| 4007/4007 [01:31<00:00, 43.98it/s]


Epoch 35/50, Loss: 0.6973417832990046


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 36/50, Loss: 0.6959317164704304


100%|██████████| 4007/4007 [01:31<00:00, 43.90it/s]


Epoch 37/50, Loss: 0.694398810419026


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 38/50, Loss: 0.693272481691757


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 39/50, Loss: 0.6919224938312194


100%|██████████| 4007/4007 [01:31<00:00, 43.95it/s]


Epoch 40/50, Loss: 0.6906614832994734


100%|██████████| 4007/4007 [01:31<00:00, 43.93it/s]


Epoch 41/50, Loss: 0.6895442439160919


100%|██████████| 4007/4007 [01:31<00:00, 43.93it/s]


Epoch 42/50, Loss: 0.6884666797315685


100%|██████████| 4007/4007 [01:31<00:00, 43.91it/s]


Epoch 43/50, Loss: 0.6873618410893263


100%|██████████| 4007/4007 [01:31<00:00, 43.90it/s]


Epoch 44/50, Loss: 0.6860137115638691


100%|██████████| 4007/4007 [01:31<00:00, 43.94it/s]


Epoch 45/50, Loss: 0.6852119464900448


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 46/50, Loss: 0.6842252725462441


100%|██████████| 4007/4007 [01:31<00:00, 43.97it/s]


Epoch 47/50, Loss: 0.6832514290414547


100%|██████████| 4007/4007 [01:31<00:00, 43.96it/s]


Epoch 48/50, Loss: 0.6822451958044645


100%|██████████| 4007/4007 [01:31<00:00, 43.97it/s]


Epoch 49/50, Loss: 0.6815192226105745


100%|██████████| 4007/4007 [01:31<00:00, 43.99it/s]


Epoch 50/50, Loss: 0.680587612425326


OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 

***
## Hyperparameter Optimization

In [None]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

<Grid-Search>
Testing 7 combinations WITHOUT cross-validation


IndexError: tuple index out of range

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion