# Dielectric Predictor: Bayesian Hyperparameter Optimization

This script sets up and executes a Bayesian hyperparameter optimization pipeline for predicting dielectric properties using a custom `DielectricPredictor` class. The key steps are as follows:

## 1. Environment and Reproducibility Setup
Essential libraries such as `numpy`, `pandas`, `tensorflow`, and `sklearn` are imported, alongside `hyperopt` for Bayesian optimization. A `set_tf_seed()` function ensures reproducibility by fixing seeds for TensorFlow, NumPy, and Python’s random module.


In [1]:
import os
import csv
import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Dropout, Add, Activation
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from BayesianCV import DielectricPredictor
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='sklearn')
import tensorflow as tf
import random

def set_tf_seed(seed: int):
    tf.random.set_seed(seed)
    np.random.seed(seed)
    random.seed(seed)

### 2. Predictor Initialization and Data Preprocessing
A `DielectricPredictor` instance is initialized with paths to input features, model checkpoints, and results output. It uses 10-fold cross-validation (`n_splits=10`), stratified binning (`num_bins=10`), and evaluates 30 different hyperparameter configurations (`max_evals=30`). The `load_and_preprocess()` method handles data loading and preparation.

### 3. Hyperparameter Search and Optimization
A CSV is initialized to record hyperparameter tuning results, and a search space is defined with `num_units`, `num_layers`, and `dropout_rate` as tunable parameters. The `optimize()` method executes Bayesian optimization using `hyperopt`, evaluating model performance (R² and MSE) for each configuration and logging the results.

In [2]:
if __name__ == '__main__':
    # Example usage
    set_tf_seed(42)
    predictor = DielectricPredictor(
        data_filepath='./outputs/encoded_features/encodedOutputLayer8.csv',
        standardized_folder='./checkpoints/BayesianResult/ResultResNet/',
        hyperopt_results_csv='./hyperopt_results.csv',
        seed=1,
        n_splits=10,
        num_bins=10,
        max_evals=10,
    )
    predictor.load_and_preprocess()
    # Initialize hyperopt CSV
    with open(predictor.hyperopt_results_csv, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['num_units', 'num_layers', 'dropout_rate', 'avg_r2', 'avg_mse'])
    space = {
        'num_units': hp.quniform('num_units', 16, 256, 1),
        'num_layers': hp.quniform('num_layers', 3, 10, 1),
        'dropout_rate': hp.uniform('dropout_rate', 0.2, 0.5),
    }
    predictor.optimize(space)

Optimizing for ResNet model, trial 1
[1] Current hyperparameters: 
    ✦ num_units:           222.0
    ✦ num_layers:          6.0
    ✦ dropout_rate:        0.46090
[2] Performance for this trial:
    ✦ avg_r2:              0.7884
    ✦ avg_mse:             0.2115


Optimizing for ResNet model, trial 2
[1] Current hyperparameters: 
    ✦ num_units:           217.0
    ✦ num_layers:          5.0
    ✦ dropout_rate:        0.24331
[2] Performance for this trial:
    ✦ avg_r2:              0.7885
    ✦ avg_mse:             0.2116


Optimizing for ResNet model, trial 3
[1] Current hyperparameters: 
    ✦ num_units:           100.0
    ✦ num_layers:          9.0
    ✦ dropout_rate:        0.41679
[2] Performance for this trial:
    ✦ avg_r2:              0.4412
    ✦ avg_mse:             0.5588


Optimizing for ResNet model, trial 4
[1] Current hyperparameters: 
    ✦ num_units:           95.0
    ✦ num_layers:          10.0
    ✦ dropout_rate:        0.45773
[2] Performance for this trial