# Ray Tune for Hyperparameter Optimization

This notebook explores the use of Ray Tune for hyperparameter optimization in a deep learning problem. It compares Grid Search, Bayesian Search, and Hyperband on the MNIST dataset using the Lenet model.

## Overview
The key steps involve performing hyperparameter optimization using different search techniques, comparing their efficiency and effectiveness, and analyzing the results.

## Procedure
- **Hyperparameter Optimization**: Apply Grid Search, Bayesian Search, and Hyperband to optimize the hyperparameters of the Lenet model.
- **Performance Metrics**: Measure the time taken and identify the best hyperparameters for each search technique.
- **Analysis**: Compare the time efficiency and model performance across different search methods.

In [14]:
import json
import os
import numpy as np
import tensorflow as tf
import tensorflow as tf
keras = tf.keras
from keras import layers
import GPUtil

import ray
from ray.air.integrations.keras import ReportCheckpointCallback
from ray.train import Result, RunConfig, ScalingConfig
from ray.train.tensorflow import TensorflowTrainer
from ray.train import RunConfig
from ray import tune
from ray import train


# make the dir exist if not make
cwd = os.getcwd()
os.makedirs('./ray_results',
 exist_ok=True)
os.environ['TUNE_RESULT_DIR'] = './ray_results'

import warnings

warnings.filterwarnings("ignore")

In [3]:
ray.shutdown()
ray.init(
    # dashboard_port=8270,
         include_dashboard=True,
        #  dashboard_host='0.0.0.0'
)

2024-04-26 20:16:24,550	INFO worker.py:1740 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


0,1
Python version:,3.11.8
Ray version:,2.12.0
Dashboard:,http://127.0.0.1:8265


In [15]:
import warnings; warnings.filterwarnings("ignore")

def get_mnist_data():
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    x_train = x_train.reshape((60000, 28, 28, 1)) / 255.0
    x_test = x_test.reshape((10000, 28, 28, 1)) / 255.0
    y_train = keras.utils.to_categorical(y_train)
    y_test = keras.utils.to_categorical(y_test)
    return (x_train, y_train), (x_test, y_test)

def create_model(config):
    model = keras.Sequential(
        [
            layers.Conv2D(config["filters"], kernel_size=(5, 5), activation="relu", input_shape=(28, 28, 1)),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Conv2D(16, kernel_size=(5, 5), activation="relu"),
            layers.MaxPooling2D(pool_size=(2, 2)),
            layers.Flatten(),
            layers.Dense(120, activation="relu"),
            layers.Dropout(config["dropout"]),
            layers.Dense(84, activation="relu"),
            layers.Dense(10, activation="softmax"),
        ]
    )
    model.compile(
        loss="categorical_crossentropy",
        optimizer=keras.optimizers.Adam(learning_rate=config["lr"]),
        metrics=["accuracy"],
        
    )
    return model

def train_mnist(config):
    (x_train, y_train), (x_test, y_test) = get_mnist_data()

    model = create_model(config)
    history = model.fit(
        x_train,
        y_train,
        batch_size=config["batch_size"],
        epochs=10,
        validation_data=(x_test, y_test),
        verbose=0,
    )
    accuracy = model.evaluate(x_test, y_test, verbose=0)[1]



    train.report({"accuracy": accuracy, "loss": history.history["loss"][-1]})



In [16]:
import time
import warnings; warnings.filterwarnings("ignore")

config = {
    "filters": tune.qrandint(64, 256),
    "lr": tune.loguniform(1e-3, 1e-1),
    "batch_size": tune.choice([64, 128, 256]),
    "dropout": tune.uniform(0, 1),
}


def run_tune(
        config,
        bayesian_opt=None,
        hyperband_sch=None,
        num_samples=12,
        local_storage_path=os.path.join(os.getcwd(), "logs"),
):
    import warnings; warnings.filterwarnings("ignore")

    start = time.time()

    analysis = tune.run(
        train_mnist,
        config=config,
        num_samples=num_samples,
        mode="max",
        resources_per_trial={
            "cpu": 10,
            "gpu": 1,
        },
        storage_path=local_storage_path,
        verbose=1,
        search_alg=bayesian_opt,
        scheduler=hyperband_sch,
        
    )

    best_model_params = analysis.get_best_config(metric="accuracy", mode="max")

    # best hyperparameters
    end = time.time()
    print(f"\nTime elapsed: {end - start:.3f} s")
    print(f"\n- best model params: {best_model_params}")

    return analysis



In [17]:
len(GPUtil.getGPUs())

os.cpu_count()

48

In [18]:
from ray.tune.search.bayesopt import BayesOptSearch
from ray.tune.search import ConcurrencyLimiter
from ray.tune.schedulers import HyperBandScheduler
from ray.tune.search.optuna import OptunaSearch

bayesian_optimizer = OptunaSearch(metric='accuracy', mode='max')
hyperband_scheduler = HyperBandScheduler()

In [19]:
import warnings; warnings.filterwarnings("ignore")

In [20]:
import warnings; warnings.filterwarnings("ignore")

# Grid Search
results_grid = run_tune(
    config,
    num_samples=12,
)


results_grid.dataframe().drop(columns='date', axis=1).sort_values(by="accuracy", ascending=False)

0,1
Current time:,2024-04-26 21:40:24
Running for:,00:01:40.84
Memory:,22.7/377.1 GiB

Trial name,status,loc,batch_size,dropout,filters,lr,iter,total time (s),accuracy,loss
train_mnist_e201d_00000,TERMINATED,10.32.35.93:3012766,128,0.160794,188,0.0743088,1,25.5081,0.1135,2.30578
train_mnist_e201d_00001,TERMINATED,10.32.35.93:3012767,64,0.0338731,235,0.0350346,1,33.8229,0.1028,2.30461
train_mnist_e201d_00002,TERMINATED,10.32.35.93:3012768,64,0.737601,85,0.0236457,1,24.6468,0.7675,1.02857
train_mnist_e201d_00003,TERMINATED,10.32.35.93:3012769,64,0.733414,161,0.00121864,1,29.6916,0.9916,0.0500076
train_mnist_e201d_00004,TERMINATED,10.32.35.93:3014952,64,0.790922,138,0.0759002,1,27.8317,0.101,2.3071
train_mnist_e201d_00005,TERMINATED,10.32.35.93:3015088,128,0.128914,208,0.0268274,1,26.5805,0.9563,0.189966
train_mnist_e201d_00006,TERMINATED,10.32.35.93:3015303,128,0.802586,251,0.00192634,1,29.3589,0.9921,0.0872715
train_mnist_e201d_00007,TERMINATED,10.32.35.93:3015644,64,0.57216,224,0.00492556,1,32.4466,0.9861,0.072259
train_mnist_e201d_00008,TERMINATED,10.32.35.93:3017182,256,0.400627,77,0.0296665,1,16.8184,0.1135,2.30267
train_mnist_e201d_00009,TERMINATED,10.32.35.93:3017351,64,0.353273,134,0.00603414,1,27.5393,0.9826,0.0651929


You may want to consider increasing the `CheckpointConfig(num_to_keep)` or decreasing the frequency of saving checkpoints.
You can suppress this error by setting the environment variable TUNE_WARN_EXCESSIVE_EXPERIMENT_CHECKPOINT_SYNC_THRESHOLD_S to a smaller value than the current threshold (5.0).
2024-04-26 21:40:24,676	INFO tune.py:1004 -- Wrote the latest version of all result files and experiment state to '/scratch/dan9232/ADS/homework06/p2/logs/train_mnist_2024-04-26_21-38-43' in 0.0043s.
2024-04-26 21:40:24,680	INFO tune.py:1036 -- Total run time: 100.86 seconds (100.84 seconds for the tuning loop).



Time elapsed: 100.876 s

- best model params: {'filters': 251, 'lr': 0.0019263435824004797, 'batch_size': 128, 'dropout': 0.8025858331420281}


Unnamed: 0,accuracy,loss,timestamp,checkpoint_dir_name,done,training_iteration,trial_id,time_this_iter_s,time_total_s,pid,hostname,node_ip,time_since_restore,iterations_since_restore,config/filters,config/lr,config/batch_size,config/dropout,logdir
6,0.9921,0.087272,1714181989,,False,1,e201d_00006,29.358913,29.358913,3015303,gr034.hpc.nyu.edu,10.32.35.93,29.358913,1,251,0.001926,128,0.802586,e201d_00006
3,0.9916,0.050008,1714181956,,False,1,e201d_00003,29.691643,29.691643,3012769,gr034.hpc.nyu.edu,10.32.35.93,29.691643,1,161,0.001219,64,0.733414,e201d_00003
7,0.9861,0.072259,1714181996,,False,1,e201d_00007,32.446572,32.446572,3015644,gr034.hpc.nyu.edu,10.32.35.93,32.446572,1,224,0.004926,64,0.57216,e201d_00007
9,0.9826,0.065193,1714182016,,False,1,e201d_00009,27.539263,27.539263,3017351,gr034.hpc.nyu.edu,10.32.35.93,27.539263,1,134,0.006034,64,0.353273,e201d_00009
11,0.971,0.180528,1714182024,,False,1,e201d_00011,23.66416,23.66416,3018245,gr034.hpc.nyu.edu,10.32.35.93,23.66416,1,76,0.016282,64,0.238265,e201d_00011
5,0.9563,0.189966,1714181984,,False,1,e201d_00005,26.580486,26.580486,3015088,gr034.hpc.nyu.edu,10.32.35.93,26.580486,1,208,0.026827,128,0.128914,e201d_00005
2,0.7675,1.028566,1714181951,,False,1,e201d_00002,24.646823,24.646823,3012768,gr034.hpc.nyu.edu,10.32.35.93,24.646823,1,85,0.023646,64,0.737601,e201d_00002
0,0.1135,2.305776,1714181952,,False,1,e201d_00000,25.508111,25.508111,3012766,gr034.hpc.nyu.edu,10.32.35.93,25.508111,1,188,0.074309,128,0.160794,e201d_00000
10,0.1135,2.30911,1714182024,,False,1,e201d_00010,31.223721,31.223721,3017743,gr034.hpc.nyu.edu,10.32.35.93,31.223721,1,197,0.089952,64,0.582926,e201d_00010
8,0.1135,2.30267,1714182003,,False,1,e201d_00008,16.818377,16.818377,3017182,gr034.hpc.nyu.edu,10.32.35.93,16.818377,1,77,0.029666,256,0.400627,e201d_00008


In [10]:
import warnings; warnings.filterwarnings("ignore")

# HyperBand Search
results_hyperband = run_tune(
    config,
    num_samples=12,
    hyperband_sch=hyperband_scheduler,
)

results_hyperband.dataframe().drop(columns='date', axis=1).sort_values(by="accuracy", ascending=False)

0,1
Current time:,2024-04-26 20:20:10
Running for:,00:01:37.41
Memory:,22.8/377.1 GiB

Trial name,status,loc,batch_size,dropout,filters,lr,iter,total time (s),accuracy,loss
train_mnist_aea44_00000,TERMINATED,10.32.35.93:2985580,128,0.623221,171,0.00142238,1,24.9648,0.9923,0.0403927
train_mnist_aea44_00001,TERMINATED,10.32.35.93:2985581,64,0.770109,150,0.0137656,1,28.5057,0.9394,0.463789
train_mnist_aea44_00002,TERMINATED,10.32.35.93:2985582,64,0.919748,150,0.0110355,1,28.6451,0.6331,1.10362
train_mnist_aea44_00003,TERMINATED,10.32.35.93:2985583,128,0.747763,183,0.00215336,1,25.631,0.991,0.0562548
train_mnist_aea44_00004,TERMINATED,10.32.35.93:2987797,64,0.981989,147,0.0225012,1,28.3656,0.1135,2.30336
train_mnist_aea44_00005,TERMINATED,10.32.35.93:2987798,256,0.55448,253,0.00360248,1,26.6014,0.9913,0.0359921
train_mnist_aea44_00006,TERMINATED,10.32.35.93:2988168,256,0.785563,75,0.0029725,1,16.8442,0.9887,0.0923754
train_mnist_aea44_00007,TERMINATED,10.32.35.93:2988169,64,0.0705849,136,0.00570566,1,27.3856,0.9864,0.042377
train_mnist_aea44_00008,TERMINATED,10.32.35.93:2989810,128,0.149281,203,0.00601169,1,26.4908,0.9895,0.0350194
train_mnist_aea44_00009,TERMINATED,10.32.35.93:2990405,256,0.581125,96,0.0091361,1,17.8736,0.9861,0.0719443


2024-04-26 20:20:10,605	INFO tune.py:1004 -- Wrote the latest version of all result files and experiment state to '/scratch/dan9232/ADS/homework06/p2/logs/train_mnist_2024-04-26_20-18-33' in 0.0063s.
2024-04-26 20:20:10,610	INFO tune.py:1036 -- Total run time: 97.43 seconds (97.41 seconds for the tuning loop).



Time elapsed: 97.442 s

- best model params: {'filters': 171, 'lr': 0.0014223835207790932, 'batch_size': 128, 'dropout': 0.623221129779948}


Unnamed: 0,accuracy,loss,timestamp,checkpoint_dir_name,done,training_iteration,trial_id,time_this_iter_s,time_total_s,pid,hostname,node_ip,time_since_restore,iterations_since_restore,config/filters,config/lr,config/batch_size,config/dropout,logdir
0,0.9923,0.040393,1714177141,,False,1,aea44_00000,24.964822,24.964822,2985580,gr034.hpc.nyu.edu,10.32.35.93,24.964822,1,171,0.001422,128,0.623221,aea44_00000
5,0.9913,0.035992,1714177172,,False,1,aea44_00005,26.601362,26.601362,2987798,gr034.hpc.nyu.edu,10.32.35.93,26.601362,1,253,0.003602,256,0.55448,aea44_00005
3,0.991,0.056255,1714177142,,False,1,aea44_00003,25.630996,25.630996,2985583,gr034.hpc.nyu.edu,10.32.35.93,25.630996,1,183,0.002153,128,0.747763,aea44_00003
10,0.9907,0.039255,1714177210,,False,1,aea44_00010,33.112871,33.112871,2990558,gr034.hpc.nyu.edu,10.32.35.93,33.112871,1,231,0.001036,64,0.60503,aea44_00010
8,0.9895,0.035019,1714177195,,False,1,aea44_00008,26.490802,26.490802,2989810,gr034.hpc.nyu.edu,10.32.35.93,26.490802,1,203,0.006012,128,0.149281,aea44_00008
6,0.9887,0.092375,1714177165,,False,1,aea44_00006,16.844218,16.844218,2988168,gr034.hpc.nyu.edu,10.32.35.93,16.844218,1,75,0.002972,256,0.785563,aea44_00006
7,0.9864,0.042377,1714177175,,False,1,aea44_00007,27.3856,27.3856,2988169,gr034.hpc.nyu.edu,10.32.35.93,27.3856,1,136,0.005706,64,0.070585,aea44_00007
9,0.9861,0.071944,1714177193,,False,1,aea44_00009,17.873629,17.873629,2990405,gr034.hpc.nyu.edu,10.32.35.93,17.873629,1,96,0.009136,256,0.581125,aea44_00009
1,0.9394,0.463789,1714177145,,False,1,aea44_00001,28.505673,28.505673,2985581,gr034.hpc.nyu.edu,10.32.35.93,28.505673,1,150,0.013766,64,0.770109,aea44_00001
2,0.6331,1.103617,1714177145,,False,1,aea44_00002,28.645146,28.645146,2985582,gr034.hpc.nyu.edu,10.32.35.93,28.645146,1,150,0.011035,64,0.919748,aea44_00002


In [11]:
import warnings; warnings.filterwarnings("ignore")

# Bayesian Search
results_bayesian = run_tune(
    config,
    num_samples=10,
    bayesian_opt=bayesian_optimizer,
)

results_bayesian.dataframe().drop(columns='date', axis=1).sort_values(by="accuracy", ascending=False)

0,1
Current time:,2024-04-26 20:21:40
Running for:,00:01:29.40
Memory:,22.7/377.1 GiB

Trial name,status,loc,batch_size,dropout,filters,lr,iter,total time (s),accuracy,loss
train_mnist_73b47ed1,TERMINATED,10.32.35.93:2992841,128,0.839625,192,0.00206371,1,25.8071,0.989,0.124486
train_mnist_4a37088a,TERMINATED,10.32.35.93:2992918,128,0.450035,132,0.00279648,1,22.2968,0.991,0.0329273
train_mnist_1f05c7c4,TERMINATED,10.32.35.93:2993081,256,0.902373,154,0.0087493,1,21.1001,0.8107,0.499097
train_mnist_bb925d26,TERMINATED,10.32.35.93:2993246,64,0.885024,249,0.0488009,1,34.6294,0.1028,2.30558
train_mnist_4ab9e82a,TERMINATED,10.32.35.93:2994895,64,0.316036,203,0.0137987,1,31.8664,0.1135,2.30267
train_mnist_a626b13e,TERMINATED,10.32.35.93:2995061,128,0.822432,104,0.00274735,1,20.7596,0.987,0.120426
train_mnist_d3432c91,TERMINATED,10.32.35.93:2995292,64,0.853038,117,0.0912201,1,26.4623,0.0974,2.30998
train_mnist_c9887fef,TERMINATED,10.32.35.93:2996159,64,0.221157,184,0.0173998,1,29.9994,0.1135,2.30266
train_mnist_c6b4da48,TERMINATED,10.32.35.93:2996849,256,0.140748,74,0.0084749,1,16.6189,0.9849,0.0363487
train_mnist_d38aad0b,TERMINATED,10.32.35.93:2997492,128,0.302039,122,0.0447364,1,21.6929,0.1135,2.30451


2024-04-26 20:21:40,071	INFO tune.py:1004 -- Wrote the latest version of all result files and experiment state to '/scratch/dan9232/ADS/homework06/p2/logs/train_mnist_2024-04-26_20-20-10' in 0.0051s.
2024-04-26 20:21:40,075	INFO tune.py:1036 -- Total run time: 89.42 seconds (89.40 seconds for the tuning loop).



Time elapsed: 89.430 s

- best model params: {'filters': 132, 'lr': 0.0027964818759629276, 'batch_size': 128, 'dropout': 0.45003453184382947}


Unnamed: 0,accuracy,loss,timestamp,checkpoint_dir_name,done,training_iteration,trial_id,time_this_iter_s,time_total_s,pid,hostname,node_ip,time_since_restore,iterations_since_restore,config/filters,config/lr,config/batch_size,config/dropout,logdir
1,0.991,0.032927,1714177239,,False,1,4a37088a,22.29678,22.29678,2992918,gr034.hpc.nyu.edu,10.32.35.93,22.29678,1,132,0.002796,128,0.450035,4a37088a
0,0.989,0.124486,1714177239,,False,1,73b47ed1,25.807137,25.807137,2992841,gr034.hpc.nyu.edu,10.32.35.93,25.807137,1,192,0.002064,128,0.839625,73b47ed1
5,0.987,0.120426,1714177266,,False,1,a626b13e,20.759553,20.759553,2995061,gr034.hpc.nyu.edu,10.32.35.93,20.759553,1,104,0.002747,128,0.822432,a626b13e
8,0.9849,0.036349,1714177286,,False,1,c6b4da48,16.618863,16.618863,2996849,gr034.hpc.nyu.edu,10.32.35.93,16.618863,1,74,0.008475,256,0.140748,c6b4da48
2,0.8107,0.499097,1714177240,,False,1,1f05c7c4,21.100128,21.100128,2993081,gr034.hpc.nyu.edu,10.32.35.93,21.100128,1,154,0.008749,256,0.902373,1f05c7c4
4,0.1135,2.302668,1714177274,,False,1,4ab9e82a,31.866418,31.866418,2994895,gr034.hpc.nyu.edu,10.32.35.93,31.866418,1,203,0.013799,64,0.316036,4ab9e82a
9,0.1135,2.304512,1714177300,,False,1,d38aad0b,21.692935,21.692935,2997492,gr034.hpc.nyu.edu,10.32.35.93,21.692935,1,122,0.044736,128,0.302039,d38aad0b
7,0.1135,2.302659,1714177291,,False,1,c9887fef,29.999368,29.999368,2996159,gr034.hpc.nyu.edu,10.32.35.93,29.999368,1,184,0.0174,64,0.221157,c9887fef
3,0.1028,2.30558,1714177257,,False,1,bb925d26,34.629407,34.629407,2993246,gr034.hpc.nyu.edu,10.32.35.93,34.629407,1,249,0.048801,64,0.885024,bb925d26
6,0.0974,2.309982,1714177275,,False,1,d3432c91,26.46228,26.46228,2995292,gr034.hpc.nyu.edu,10.32.35.93,26.46228,1,117,0.09122,64,0.853038,d3432c91


**Results of Best Model** 

- Grid Search
  - Time elapsed: 135.555s
  - Accuracy: 0.9918
  - Best Params
    -  'filters': 64
    -  'lr': 0.002028
    -  'batch_size': 256
    -  'dropout': 0.0401908
 -  HyperBand Search
    -  Accuracy: 0.9915
    -  Time elapsed: 142.989s
    -  Best Params
        -  'filters': 256
        - 'lr': 0.00303
        - 'batch_size': 256
        - 'dropout': 0.424163
  - Bayesian Search
    - Accuracy: 0.99
    - Time elapsed: 122.275s
    - Best Params
      - 'filters': 128
      - 'lr': 0.005683
      - 'batch_size': 256
      - 'dropout': 0.261574
