# Problem 3 - Ray Tune for Hyperparameter Optimization 10 points


In this problem, we will compare the performance of Grid Search, Bayesian Search and Hyperband for hyperparameter optimization for a deep learning problem using Ray Tune. We will use the MNIST dataset alongwith the Lenet model. The hyperparameters to tune are:

• Number of filters in the first Conv2d layer: 64 to 256 

• Learning Rate: 0.001 to 0.1

• Batch Size: 64,128,256

• Dropout: probability between 0 and 1

Use Ray Tune (https://docs.ray.io/en/latest/tune/index.html) for the search. You can use the same resources per trial and metric as those in Lab 8 and Lab 10 in class.

In [1]:
!pip install "ray[tune]"
!pip install pyarrow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Operation cancelled by user[0m
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras import layers, models

import ray
from ray import tune
from ray.tune.integration.keras import TuneReportCallback
from ray.tune.schedulers import HyperBandScheduler

In [4]:
def train_mnist(config):
    batch_size = int(config['batch_size'])
    epochs = 2

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(filters=int(config["conv_filters"]), kernel_size=(5, 5), activation="relu", input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Conv2D(filters=48, kernel_size=(5,5), padding='valid', activation='relu'),
        tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dropout(config["dropout"]),
        tf.keras.layers.Dense(84, activation="relu"),
        tf.keras.layers.Dense(10, activation="softmax")
    ])
    
    
    model.compile(
        loss="sparse_categorical_crossentropy",
        optimizer=tf.keras.optimizers.Adam(
            lr=config["lr"]),
        metrics=["accuracy"])

    model.fit(
        x_train,
        y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=0,
        validation_data=(x_test, y_test),
        callbacks=[TuneReportCallback({
            "accuracy": "accuracy"
        })])

1. Perform Grid Search, Bayesian Search and Hyperband for the given hyperparameter configurations. For Grid Search, you can either sample uniformly between the given ranges, or specify a list of values in the given range (for e.g., filters = [64,128,256], lr=[0.001,0.01,0.1], etc). (6)

In [10]:
# --------------------------------Hyperband --------------------------
search_list = {
            "batch_size": tune.grid_search([64,128,256]),
            "conv_filters": tune.grid_search([64,128,256]),
            "dropout": tune.uniform(0, 1),
            "lr": tune.uniform(0.001, 0.1)
        }
hyperband_scheduler = HyperBandScheduler(
    time_attr='training_iteration',
    metric='mean_accuracy',
    mode='max',
    max_t=10,
    reduction_factor=3)

Hype_analysis = tune.run(
        train_mnist,
        name="exp",
        resources_per_trial={
            "gpu": 1
        },
        config=search_list,
        scheduler=hyperband_scheduler)

0,1
Current time:,2022-12-11 19:44:06
Running for:,00:05:40.60
Memory:,2.8/12.7 GiB

Trial name,status,loc,batch_size,conv_filters,dropout,lr,acc,iter,total time (s),ts
train_mnist_6167f_00000,TERMINATED,172.28.0.12:1679,64,64,0.373763,0.0223396,0.1085,10,43.1493,
train_mnist_6167f_00001,TERMINATED,172.28.0.12:2247,128,64,0.0861023,0.014111,0.978833,10,30.8554,
train_mnist_6167f_00002,TERMINATED,172.28.0.12:2655,256,64,0.353302,0.0973313,0.105167,10,23.2188,
train_mnist_6167f_00003,TERMINATED,172.28.0.12:2446,64,128,0.224768,0.0723921,0.105633,7,55.6887,0.0
train_mnist_6167f_00004,TERMINATED,172.28.0.12:2000,128,128,0.685153,0.0622605,0.103967,3,16.3711,
train_mnist_6167f_00005,TERMINATED,172.28.0.12:2548,256,128,0.166145,0.00787397,0.988517,7,37.3817,0.0
train_mnist_6167f_00006,TERMINATED,172.28.0.12:2163,64,256,0.481493,0.0576298,0.10375,3,25.0217,
train_mnist_6167f_00007,TERMINATED,172.28.0.12:2365,128,256,0.475957,0.096512,0.1038,3,21.1456,
train_mnist_6167f_00008,TERMINATED,172.28.0.12:1889,256,256,0.735513,0.0843475,0.1022,7,36.6678,


[2m[36m(train_mnist pid=1679)[0m 2022-12-11 19:38:32.246458: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=1679)[0m   super(Adam, self).__init__(name, **kwargs)


Trial name,date,done,episodes_total,experiment_id,hostname,iterations_since_restore,mean_accuracy,node_ip,pid,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
train_mnist_6167f_00000,2022-12-11_19-39-13,False,,59717ba6d9ca4f13b18429358274da57,db50b83514e2,10,0.1085,172.28.0.12,1679,43.1493,3.65664,43.1493,1670787553,0,,10,6167f_00000,0.00554705
train_mnist_6167f_00001,2022-12-11_19-42-01,False,,28a3aa4120e74f19a72a9f4f1ca2a449,db50b83514e2,10,0.978833,172.28.0.12,2247,30.8554,2.60831,30.8554,1670787721,0,,10,6167f_00001,0.00332403
train_mnist_6167f_00002,2022-12-11_19-44-06,True,,7a86f0843d8c47ac9af0a5e8e250288c,db50b83514e2,10,0.105167,172.28.0.12,2655,23.2188,1.8015,23.2188,1670787846,0,,10,6167f_00002,0.0032196
train_mnist_6167f_00003,2022-12-11_19-43-08,False,0.0,4762503d03f34cdfbe0c98cfbce83fe3,db50b83514e2,7,0.105633,172.28.0.12,2446,36.9919,4.53338,55.6887,1670787788,0,0.0,7,6167f_00003,0.00726557
train_mnist_6167f_00004,2022-12-11_19-40-38,False,,7426f833d27f418dad3fe0a942636153,db50b83514e2,3,0.103967,172.28.0.12,2000,16.3711,3.69682,16.3711,1670787638,0,,3,6167f_00004,0.00346971
train_mnist_6167f_00005,2022-12-11_19-43-38,True,0.0,c547a6f7c6314421821ea4eace202598,db50b83514e2,7,0.988517,172.28.0.12,2548,24.0879,2.68242,37.3817,1670787818,0,0.0,7,6167f_00005,0.00717592
train_mnist_6167f_00006,2022-12-11_19-41-26,False,,9ce22bdb009f410d9ade3686019a602a,db50b83514e2,3,0.10375,172.28.0.12,2163,25.0217,6.66575,25.0217,1670787686,0,,3,6167f_00006,0.0050025
train_mnist_6167f_00007,2022-12-11_19-42-27,True,,0797be25ad404276ba0e63b8ff36f000,db50b83514e2,3,0.1038,172.28.0.12,2365,21.1456,5.49264,21.1456,1670787747,0,,3,6167f_00007,0.00316238
train_mnist_6167f_00008,2022-12-11_19-40-17,True,,74bdce6262304ca4bd8fdf323df210e3,db50b83514e2,7,0.1022,172.28.0.12,1889,36.6678,4.48088,36.6678,1670787617,0,,7,6167f_00008,0.00319815


[2m[36m(train_mnist pid=1807)[0m 2022-12-11 19:39:19.666498: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=1807)[0m   super(Adam, self).__init__(name, **kwargs)
[2m[36m(train_mnist pid=1889)[0m 2022-12-11 19:39:42.667389: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=1889)[0m   super(Adam, self).__init__(name, **kwargs)
[2m[36m(train_mnist pid=2000)[0m 2022-12-11 19:40:23.800325: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=2000)[0m   super(Adam, self).__init__(name, **kwargs)


In [11]:
#---------------Gridsearch---------------------
analysis = tune.run(
        train_mnist,
        name="exp",
        metric="mean_accuracy",
        mode="max",
        stop={
            "mean_accuracy": 0.90,
        },
        resources_per_trial={
            "gpu": 1
        },
        config={
            "conv_filters": tune.grid_search([64, 128, 256]),
            "lr": tune.grid_search([0.001,0.01,0.1]),
            "batch_size": tune.grid_search([64,128,256]),
            "dropout": tune.grid_search([0, .33, .66])
        })

0,1
Current time:,2022-12-11 20:23:16
Running for:,00:39:10.02
Memory:,3.0/12.7 GiB

Trial name,status,loc,batch_size,conv_filters,dropout,lr,acc,iter,total time (s)
train_mnist_2c8b9_00000,TERMINATED,172.28.0.12:2781,64,64,0.0,0.001,0.955933,1,8.75708
train_mnist_2c8b9_00001,TERMINATED,172.28.0.12:2878,128,64,0.0,0.001,0.946267,1,7.71541
train_mnist_2c8b9_00002,TERMINATED,172.28.0.12:2948,256,64,0.0,0.001,0.9279,1,7.13757
train_mnist_2c8b9_00003,TERMINATED,172.28.0.12:3017,64,128,0.0,0.001,0.959417,1,9.61661
train_mnist_2c8b9_00004,TERMINATED,172.28.0.12:3088,128,128,0.0,0.001,0.9483,1,8.88195
train_mnist_2c8b9_00005,TERMINATED,172.28.0.12:3157,256,128,0.0,0.001,0.935833,1,7.90216
train_mnist_2c8b9_00006,TERMINATED,172.28.0.12:3227,64,256,0.0,0.001,0.959083,1,11.9363
train_mnist_2c8b9_00007,TERMINATED,172.28.0.12:3295,128,256,0.0,0.001,0.949617,1,11.2091
train_mnist_2c8b9_00008,TERMINATED,172.28.0.12:3362,256,256,0.0,0.001,0.929783,1,9.92268
train_mnist_2c8b9_00009,TERMINATED,172.28.0.12:3429,64,64,0.33,0.001,0.945617,1,8.79353


[2m[36m(train_mnist pid=2781)[0m 2022-12-11 19:44:13.715221: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=2781)[0m   super(Adam, self).__init__(name, **kwargs)


Trial name,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,mean_accuracy,node_ip,pid,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
train_mnist_2c8b9_00000,2022-12-11_19-44-20,True,,4e7f41b4e97041baa75ba8d33e9fa0f4,,db50b83514e2,1,0.955933,172.28.0.12,2781,8.75708,8.75708,8.75708,1670787860,0,,1,2c8b9_00000,0.00308466
train_mnist_2c8b9_00001,2022-12-11_19-44-32,True,,ccb15e6c4c2d435389f8038c0d972733,,db50b83514e2,1,0.946267,172.28.0.12,2878,7.71541,7.71541,7.71541,1670787872,0,,1,2c8b9_00001,0.00330329
train_mnist_2c8b9_00002,2022-12-11_19-44-44,True,,2587f9f05ee04dbaa175cd0eae3f6de6,,db50b83514e2,1,0.9279,172.28.0.12,2948,7.13757,7.13757,7.13757,1670787884,0,,1,2c8b9_00002,0.00329351
train_mnist_2c8b9_00003,2022-12-11_19-44-58,True,,2d3c2d857b48449cbf46c011cc70af0b,,db50b83514e2,1,0.959417,172.28.0.12,3017,9.61661,9.61661,9.61661,1670787898,0,,1,2c8b9_00003,0.003263
train_mnist_2c8b9_00004,2022-12-11_19-45-11,True,,ced839828dbc44ab85e24df4e4e5a614,,db50b83514e2,1,0.9483,172.28.0.12,3088,8.88195,8.88195,8.88195,1670787911,0,,1,2c8b9_00004,0.00336003
train_mnist_2c8b9_00005,2022-12-11_19-45-23,True,,96909d12a1304eca8775bd0b8134ca3b,,db50b83514e2,1,0.935833,172.28.0.12,3157,7.90216,7.90216,7.90216,1670787923,0,,1,2c8b9_00005,0.00443244
train_mnist_2c8b9_00006,2022-12-11_19-45-39,True,,e84918dffdf747bbaf5e297b7634194f,,db50b83514e2,1,0.959083,172.28.0.12,3227,11.9363,11.9363,11.9363,1670787939,0,,1,2c8b9_00006,0.00359702
train_mnist_2c8b9_00007,2022-12-11_19-45-55,True,,be4746c598454095984ea0e42a7576e4,,db50b83514e2,1,0.949617,172.28.0.12,3295,11.2091,11.2091,11.2091,1670787955,0,,1,2c8b9_00007,0.00762749
train_mnist_2c8b9_00008,2022-12-11_19-46-09,True,,603bc2f0d33c4a848cbf6df7f2cccf75,,db50b83514e2,1,0.929783,172.28.0.12,3362,9.92268,9.92268,9.92268,1670787969,0,,1,2c8b9_00008,0.00320888
train_mnist_2c8b9_00009,2022-12-11_19-46-22,True,,bb99c6a2c5784a1998fb898da5a9d0a7,,db50b83514e2,1,0.945617,172.28.0.12,3429,8.79353,8.79353,8.79353,1670787982,0,,1,2c8b9_00009,0.00342584


[2m[36m(train_mnist pid=2878)[0m 2022-12-11 19:44:27.050535: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=2878)[0m   super(Adam, self).__init__(name, **kwargs)
[2m[36m(train_mnist pid=2948)[0m 2022-12-11 19:44:39.116584: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=2948)[0m   super(Adam, self).__init__(name, **kwargs)
[2m[36m(train_mnist pid=3017)[0m 2022-12-11 19:44:51.017079: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
[2m[36m(train_mnist pid=3017)[0m   super(Adam, self).__init__(name, **kwargs)


In [5]:
!pip install bayesian-optimization

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bayesian-optimization
  Downloading bayesian_optimization-1.4.2-py3-none-any.whl (17 kB)
Collecting colorama>=0.4.6
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: colorama, bayesian-optimization
Successfully installed bayesian-optimization-1.4.2 colorama-0.4.6


In [6]:
#------------ Baysian search-----------
from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.bayesopt import BayesOptSearch
algo = BayesOptSearch(utility_kwargs={"kind": "ucb", "kappa": 2.5, "xi": 0.0})
algo = ConcurrencyLimiter(algo, max_concurrent=4)

search_space = {
    "conv_filters": tune.uniform(64, 256),
    "lr": tune.uniform(0.0009, 0.0011),
    "batch_size": tune.uniform(64, 256),
    "dropout": tune.uniform(0, 1)
}
tuner = tune.Tuner(
    train_mnist,
    tune_config=tune.TuneConfig(
        metric="accuracy",
        mode="max",
        search_alg=algo,
        num_samples=2,
    ),
    param_space=search_space
)
results = tuner.fit()



0,1
Current time:,2022-12-11 22:49:28
Running for:,00:13:45.29
Memory:,1.9/12.7 GiB

Trial name,status,loc,batch_size,conv_filters,dropout,lr,iter,total time (s),accuracy
train_mnist_2599aa8e,TERMINATED,172.28.0.12:1002,135.912,246.537,0.731994,0.00101973,2,819.982,0.973117
train_mnist_28c03b9c,TERMINATED,172.28.0.12:1041,93.9556,93.9509,0.0580836,0.00107324,2,508.505,0.986483


[2m[36m(train_mnist pid=1002)[0m 2022-12-11 22:35:49.686094: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[2m[36m(train_mnist pid=1002)[0m   super(Adam, self).__init__(name, **kwargs)
[2m[36m(train_mnist pid=1041)[0m 2022-12-11 22:35:57.020464: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
[2m[36m(train_mnist pid=1041)[0m   super(Adam, self).__init__(name, **kwargs)


Trial name,accuracy,date,done,episodes_total,experiment_id,experiment_tag,hostname,iterations_since_restore,node_ip,pid,time_since_restore,time_this_iter_s,time_total_s,timestamp,timesteps_since_restore,timesteps_total,training_iteration,trial_id,warmup_time
train_mnist_2599aa8e,0.973117,2022-12-11_22-49-28,True,,b2650d94fd584f758090ffdb71fbcadd,"1_batch_size=135.9117,conv_filters=246.5371,dropout=0.7320,lr=0.0010",3ce0fe93336a,2,172.28.0.12,1002,819.982,284.745,819.982,1670798968,0,,2,2599aa8e,0.00346756
train_mnist_28c03b9c,0.986483,2022-12-11_22-44-24,True,,84e0b431ab1241aca1298ab06ef095ca,"2_batch_size=93.9556,conv_filters=93.9509,dropout=0.0581,lr=0.0011",3ce0fe93336a,2,172.28.0.12,1041,508.505,253.163,508.505,1670798664,0,,2,28c03b9c,0.0161686


2022-12-11 22:49:28,479	INFO tune.py:777 -- Total run time: 825.43 seconds (825.29 seconds for the tuning loop).


2. For each of the search technique in part 1, display the time taken to perform the analysis and display the hyperparameters for the best model. (2)


In [13]:
#Hyperband best model---------

print(Hype_analysis.get_best_config("mean_accuracy", "max"))
#Grid Search best model---------
print(analysis.best_config)

{'batch_size': 256, 'conv_filters': 128, 'dropout': 0.16614474270272972, 'lr': 0.0078739746012135}
{'conv_filters': 128, 'lr': 0.01, 'batch_size': 256, 'dropout': 0}


In [12]:
# Bayesian Search--------------------
results.get_best_result("accuracy", "max")

Result(metrics={'accuracy': 0.9864833354949951, 'done': True, 'trial_id': '28c03b9c', 'experiment_tag': '2_batch_size=93.9556,conv_filters=93.9509,dropout=0.0581,lr=0.0011'}, error=None, log_dir=PosixPath('/root/ray_results/train_mnist_2022-12-11_22-35-43/train_mnist_28c03b9c_2_batch_size=93.9556,conv_filters=93.9509,dropout=0.0581,lr=0.0011_2022-12-11_22-35-48'))

3. What are your observations regarding time taken and performance of the best model? (2)

The best models were not the most time consuming for grid search and 

For the Bayesian Search algorithm we had to change the lr rate a little otherwise the time for search would take a long time because it was not converging. 

Also, once the learning rate was adjusted the best model was not the most time consuming. 