# Modeling
In this notebook, we'll be modeling the data we've previously prepared. Out notebook will be laid out as follows:

1. Model Selection & Generation
2. Hyperparameter Optimization
3. Fine-Tuning (if needed)
4. Reporting Best Model(s) + Settings
5. Interpretation
6. Conclusion

Our eventual goal here is two-fold:

1. Accurately [and fairly] model the diabetes dataset
2. Interpret the results to find something worth recommending to those wanting to reduce risk of diabetes. This can be via LIME/SHAP (i.e. some interpretable model that approximates the neural network) or via analyzing a more simple model's structure (i.e. regression coefficients, random forest decision boundaries)

In [1]:
# Environment Setup
from utils.model import *
from utils.dataset import *

***
## Model Selection & Generation

In [2]:
# generate lookup for models
models = {
    # "tree": TreeClassifier(target="diabetes", path="../datasets/pre_split_processed.parquet"),
    "ffnn": MLPClassifier(target="diabetes", path="../datasets/pre_split_processed.parquet")
}

# manual search
models["ffnn"].set_hyperparams({
    "learning_rate": .0005,
    "batch_size": 128,
    "num_hidden": 2,
    "hidden_size": 2048,
    "num_epochs": 50,
    "classify_fn": "sigmoid"
})

# train & test basic model
for mt, model in models.items():
    # attempt to load, else train and test
    if not model.load_model():
        model.train_model(verbose=1)
    model.test_model()

<Train-Test Split Report>
Train: 512886 obs, 170962 no diabetes [0], 170962 pre-diabetes [1], 170962 diabetes [2]
Test: 50736 obs, 42741 no diabetes [0], 926 pre-diabetes [1], 7069 diabetes [2]
[0 0 0 ... 2 0 0]

<Test Report>
Precision: [no diabetes] 0.8576161666049307, [pre-diabetes] 0.07242339832869081, [diabetes] 0.47055524397083565
Recall: [no diabetes] 0.975059076764699, [pre-diabetes] 0.028077753779697623, [diabetes] 0.11868722591597114
F1-Score: [no diabetes] 0.9125745880549625, [pre-diabetes] 0.04046692607003891, [diabetes] 0.1895616809760506
Support: [no diabetes] 42741, [pre-diabetes] 926, [diabetes] 7069
Accuracy: 83.8458%


***
## Hyperparameter Optimization

In [3]:
# optimize hyperparams
optimizer_results = {model_type: model.optimize_hyperparams(kfold=2) for model_type, model in models.items()}
print(optimizer_results)

<Grid-Search>
Testing 7 combinations WITHOUT cross-validation


IndexError: tuple index out of range

***
## Fine-Tuning + Other Adjustments

***
## Best Model Report

***
## Interpretation

***
## Conclusion