HPO converts all hyperparameters into strings #1238

wxdrizzle · 2024-04-02T05:26:50Z

Describe the bug

Hi, I'm using Hyperparameter Optimizer (HPO) but found the generated tasks failed because all the hyperparameters became strings rather than their original types.

To reproduce

Initially I have a completed task with hyperparameters managed by my code. Specifically, I first manually run a task, in which my code reads values from a yaml file and generate the hyperparameter dict, then use task.set_parameters(dict_params).

Then I started to use HPO based on this task. My code detects whether a task is run by an agent, and if so, it will use task.get_parameters(cast=True) to get the hyperparameters to do training. This works well if I manually clone the initial task and send it to a queue.

However, when I used the HPO, the new tasks created by HPO just failed. I found that the hyperparameter values returned by task.get_parameters(cast=True) are all in type of str, except for the hyperparameters I specified to optimize. Is there any way to solve this issue? Thank you very much!

Expected behaviour

When a task is run by the HPO, the hyperparameter values returned by task.get_parameters(cast=True) should have the same types as the values from the task with id of base_task_id input to HyperParameterOptimizer.

Environment

Server type: self-hosted
ClearML SDK Version: 1.15.0
ClearML Server Version: 1.14.0-431
Python Version: 3.11.5
OS: Linux

The text was updated successfully, but these errors were encountered:

ainoam · 2024-04-03T13:45:25Z

@wxdrizzle Can you provide a simple code example?

BTW, why did you go with dict_params = task.get_parameters(cast=True) and task.set_parameters(dict_params)
rather than dict_params = task.connect(dict_params)?

wxdrizzle · 2024-04-06T21:45:31Z

Hi @ainoam , thanks a lot for your reply!

Code example

First, I created a training.py, the code in which is:

from clearml import Task
import sys

cli = sys.argv[1:]
if '--manually' in cli:
    run_by_agent = False
else:
    run_by_agent = True


def read_yaml(run_by_agent):
    if run_by_agent:
        dict_params = {}
    else:
        dict_params = {
            'dataset/modalities': ['CT', 'MRI'],
            'model/name': 'u-net',
        }
    return dict_params


if not run_by_agent:
    task = Task.init(project_name='tmp_project', task_name='tmp_task', task_type="training")
    # the following two lines are because I want the agents use my existing python environment
    task.set_base_docker(docker_image='/home/xxx/software/anaconda3/envs/research')
    task.set_packages([])
    dict_params = read_yaml(run_by_agent)
    task.set_parameters(dict_params)
else:
    task = Task.init()
    dict_params = task.get_parameters(cast=True)
print('run by agent?', run_by_agent)
print('dict_params: ', dict_params)
print('type of dataset/modalities', type(dict_params['dataset/modalities']))

Then I executed this file manually, by python training.py --manually. Here --manually is just to tell the code that this is not executed by an agent. (I'm not sure if there is a simpler way to automatically determine whether the file is run by an agent or manually; if so, please kindly let me know). Then, I saw the output is:

run by agent? False
dict_params:  {'dataset/modalities': ['CT', 'MRI'], 'model/name': 'u-net'}
type of dataset/modalities <class 'list'>

Note that I used task.set_base_docker() in order to prevent the next hyperparameter optimization step from creating new python environment. This method is from here.

You can see, the type of the hyperparameter "dataset/modalities" is "list", as expected.

Next, I found the ID of the generated task was "13f202cc8a014ba4b92f1e93e34352d1". Then, I created another file hyperparam_optim.py, the code in which is:

from clearml.automation import UniformParameterRange, UniformIntegerParameterRange, ParameterSet, DiscreteParameterRange
from clearml.automation import HyperParameterOptimizer, GridSearch, Objective
from clearml.automation.optuna import OptimizerOptuna
from clearml import Task

task = Task.init(project_name='tmp_project', task_name='hyperparam_optim', task_type=Task.TaskTypes.optimizer,
                 reuse_last_task_id=False)

objective_metric = Objective('test', 'dice_mean')

optimizer = GridSearch(
    base_task_id='13f202cc8a014ba4b92f1e93e34352d1',
    hyper_parameters=[
        DiscreteParameterRange('model/name', ['resnet']),
    ],
    objective_metric=objective_metric,
    num_concurrent_workers=16,
    objective_metric_title='test',
    objective_metric_series='dice_mean',
    objective_metric_sign='max',
    execution_queue='one_gpu_work',
    max_iteration_per_job=50000,
)

optimizer.start()
optimizer.wait()
optimizer.stop()

Note that the base_task_id is different if you try to reproduce it.

Then I ran python hyperparam_optim.py, and in the Webapp I saw one task ran by the HPO. The console output of that task is:

You can see, the type of the hyperparameter "dataset/modalities" changed from "list" to "str". As a current workaround, I have to manually go through all hyperparameters and use eval(xxxx) to change the type of xxxx from "str" to the type it should be. If you have any solution I'd be very grateful.

Regarding why I didn't use `task.connect()`

One reason is that my hyperparameters have different categories, i.e., some are for the dataset, and some are for the model, etc. For example, assume the hyperparameter dict dict_params have two keys: "model/name" and "dataset/name". If I use task.connect(dict_params), then these keys will appear in the section "General" in the Webapp, as shown below:

But I prefer to have several sections like "model" and "dataset". If I use task.set_parameters(dict_params) then I can achieve this, as shown below:

Another reason is I'm actually not sure how to use task.connect() with agents. As you can see above, in my daily practice, if I run experiments manually, I want to read from a specific yaml file to get the hyperparameters and initialize dict_params. However, consider the case that I clone an existing task, modify some hyperparameters, and the send it to a queue (so an agent can run it). In this case, I don't have a yaml file path, so I can only set dict_params to {} if I want to use dict_params=task.connect(dict_params). I found the result is still {}. That is to say, I'm not able to get modified hyperparameters using task.connect() when running an agent. Is anything I did wrong? I really appreciate it if you have suggestions on this also.

wxdrizzle added the bug Something isn't working label Apr 2, 2024

wxdrizzle mentioned this issue Apr 15, 2024

HyperParameterOptimizer fails because bask task hyper parameters types are not copied over #975

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPO converts all hyperparameters into strings #1238

HPO converts all hyperparameters into strings #1238

wxdrizzle commented Apr 2, 2024

ainoam commented Apr 3, 2024

wxdrizzle commented Apr 6, 2024

HPO converts all hyperparameters into strings #1238

HPO converts all hyperparameters into strings #1238

Comments

wxdrizzle commented Apr 2, 2024

Describe the bug

To reproduce

Expected behaviour

Environment

ainoam commented Apr 3, 2024

wxdrizzle commented Apr 6, 2024

Code example

Regarding why I didn't use task.connect()

Regarding why I didn't use `task.connect()`