Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPO converts all hyperparameters into strings #1238

Open
wxdrizzle opened this issue Apr 2, 2024 · 2 comments
Open

HPO converts all hyperparameters into strings #1238

wxdrizzle opened this issue Apr 2, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@wxdrizzle
Copy link

Describe the bug

Hi, I'm using Hyperparameter Optimizer (HPO) but found the generated tasks failed because all the hyperparameters became strings rather than their original types.

To reproduce

Initially I have a completed task with hyperparameters managed by my code. Specifically, I first manually run a task, in which my code reads values from a yaml file and generate the hyperparameter dict, then use task.set_parameters(dict_params).

Then I started to use HPO based on this task. My code detects whether a task is run by an agent, and if so, it will use task.get_parameters(cast=True) to get the hyperparameters to do training. This works well if I manually clone the initial task and send it to a queue.

However, when I used the HPO, the new tasks created by HPO just failed. I found that the hyperparameter values returned by task.get_parameters(cast=True) are all in type of str, except for the hyperparameters I specified to optimize. Is there any way to solve this issue? Thank you very much!

Expected behaviour

When a task is run by the HPO, the hyperparameter values returned by task.get_parameters(cast=True) should have the same types as the values from the task with id of base_task_id input to HyperParameterOptimizer.

Environment

  • Server type: self-hosted
  • ClearML SDK Version: 1.15.0
  • ClearML Server Version: 1.14.0-431
  • Python Version: 3.11.5
  • OS: Linux
@wxdrizzle wxdrizzle added the bug Something isn't working label Apr 2, 2024
@ainoam
Copy link
Collaborator

ainoam commented Apr 3, 2024

@wxdrizzle Can you provide a simple code example?

BTW, why did you go with dict_params = task.get_parameters(cast=True) and task.set_parameters(dict_params)
rather than dict_params = task.connect(dict_params)?

@wxdrizzle
Copy link
Author

Hi @ainoam , thanks a lot for your reply!

Code example

First, I created a training.py, the code in which is:

from clearml import Task
import sys

cli = sys.argv[1:]
if '--manually' in cli:
    run_by_agent = False
else:
    run_by_agent = True


def read_yaml(run_by_agent):
    if run_by_agent:
        dict_params = {}
    else:
        dict_params = {
            'dataset/modalities': ['CT', 'MRI'],
            'model/name': 'u-net',
        }
    return dict_params


if not run_by_agent:
    task = Task.init(project_name='tmp_project', task_name='tmp_task', task_type="training")
    # the following two lines are because I want the agents use my existing python environment
    task.set_base_docker(docker_image='/home/xxx/software/anaconda3/envs/research')
    task.set_packages([])
    dict_params = read_yaml(run_by_agent)
    task.set_parameters(dict_params)
else:
    task = Task.init()
    dict_params = task.get_parameters(cast=True)
print('run by agent?', run_by_agent)
print('dict_params: ', dict_params)
print('type of dataset/modalities', type(dict_params['dataset/modalities']))

Then I executed this file manually, by python training.py --manually. Here --manually is just to tell the code that this is not executed by an agent. (I'm not sure if there is a simpler way to automatically determine whether the file is run by an agent or manually; if so, please kindly let me know). Then, I saw the output is:

run by agent? False
dict_params:  {'dataset/modalities': ['CT', 'MRI'], 'model/name': 'u-net'}
type of dataset/modalities <class 'list'>

Note that I used task.set_base_docker() in order to prevent the next hyperparameter optimization step from creating new python environment. This method is from here.

You can see, the type of the hyperparameter "dataset/modalities" is "list", as expected.

Next, I found the ID of the generated task was "13f202cc8a014ba4b92f1e93e34352d1". Then, I created another file hyperparam_optim.py, the code in which is:

from clearml.automation import UniformParameterRange, UniformIntegerParameterRange, ParameterSet, DiscreteParameterRange
from clearml.automation import HyperParameterOptimizer, GridSearch, Objective
from clearml.automation.optuna import OptimizerOptuna
from clearml import Task

task = Task.init(project_name='tmp_project', task_name='hyperparam_optim', task_type=Task.TaskTypes.optimizer,
                 reuse_last_task_id=False)

objective_metric = Objective('test', 'dice_mean')

optimizer = GridSearch(
    base_task_id='13f202cc8a014ba4b92f1e93e34352d1',
    hyper_parameters=[
        DiscreteParameterRange('model/name', ['resnet']),
    ],
    objective_metric=objective_metric,
    num_concurrent_workers=16,
    objective_metric_title='test',
    objective_metric_series='dice_mean',
    objective_metric_sign='max',
    execution_queue='one_gpu_work',
    max_iteration_per_job=50000,
)

optimizer.start()
optimizer.wait()
optimizer.stop()

Note that the base_task_id is different if you try to reproduce it.

Then I ran python hyperparam_optim.py, and in the Webapp I saw one task ran by the HPO. The console output of that task is:

image

You can see, the type of the hyperparameter "dataset/modalities" changed from "list" to "str". As a current workaround, I have to manually go through all hyperparameters and use eval(xxxx) to change the type of xxxx from "str" to the type it should be. If you have any solution I'd be very grateful.

Regarding why I didn't use task.connect()

One reason is that my hyperparameters have different categories, i.e., some are for the dataset, and some are for the model, etc. For example, assume the hyperparameter dict dict_params have two keys: "model/name" and "dataset/name". If I use task.connect(dict_params), then these keys will appear in the section "General" in the Webapp, as shown below:

image

But I prefer to have several sections like "model" and "dataset". If I use task.set_parameters(dict_params) then I can achieve this, as shown below:

image

Another reason is I'm actually not sure how to use task.connect() with agents. As you can see above, in my daily practice, if I run experiments manually, I want to read from a specific yaml file to get the hyperparameters and initialize dict_params. However, consider the case that I clone an existing task, modify some hyperparameters, and the send it to a queue (so an agent can run it). In this case, I don't have a yaml file path, so I can only set dict_params to {} if I want to use dict_params=task.connect(dict_params). I found the result is still {}. That is to say, I'm not able to get modified hyperparameters using task.connect() when running an agent. Is anything I did wrong? I really appreciate it if you have suggestions on this also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants