HPO cannot access logger.report_single_value metrics #1221

ilouzl · 2024-03-04T09:42:29Z

Describe the bug

HPO will not include metrics generated by logger.report_single_value when searching for the desired objective_metric

To reproduce

Define a base task which logs it's result metric using task.get_logger().report_single_value(some_value, 'accuracy')
Define a HPO task that tries to optimize that metric:

...
task = Task.init(project_name="examples", task_name="HP optimizer", task_type=Task.TaskTypes.optimizer)
task.execute_remotely(queue_name="services")

an_optimizer = HyperParameterOptimizer(
    base_task_id=...,
    hyper_parameters=...
    objective_metric_title="Summary",
    objective_metric_series="accuracy",
)
...

The HP task will have an error of this form:

Traceback (most recent call last):

File "/home/miniconda3/envs/clearml/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/miniconda3/envs/clearml/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/.clearml/venvs-builds.2.3/3.9/lib/python3.9/site-packages/clearml/automation/optimization.py", line 1997, in report_daemon
self.report_completed_status(completed_jobs, cur_completed_jobs, task_logger, title)
File "/home/.clearml/venvs-builds.2.3/3.9/lib/python3.9/site-packages/clearml/automation/optimization.py", line 2052, in report_completed_status
iteration = [it[0] if it else -1 for it in iteration_value]
TypeError: 'NoneType' object is not iterable

Expected behaviour

the metrics should be discovered and optimized as regular (i.e. tensorboard like) metrics.

Environment

Server type - app.clear.ml
ClearML SDK Version - clearml==1.14.4
Python Version - 3.9.16
Dockerized worker

Related Discussion

https://clearml.slack.com/archives/CTK20V944/p1709203560313889

The text was updated successfully, but these errors were encountered:

ainoam · 2024-03-04T16:40:16Z

Thanks for reporting @ilouzl.
We'll update when a fix is available.

AlexandruBurlacu · 2024-03-07T17:49:17Z

Hey @ilouzl, you seem to have a small issue with the way you report the single value. As per documentation you first need to provide name of the single value, and then its value.

Could you please re-run your code with these changes and report whether the issue still persists? Because I wasn't able to reproduce the problem

ilouzl · 2024-03-07T18:53:06Z

Hi @AlexandruBurlacu, its just a typo in the issue description.
The actual implementation is correct - first name and then value.

AlexandruBurlacu · 2024-03-08T13:12:17Z

Can you please provide a full example, because I couldn't reproduce it on my side using this code:

import logging

from clearml import Task
from clearml.automation import (
    DiscreteParameterRange,
    HyperParameterOptimizer,
    RandomSearch
)

aSearchStrategy = RandomSearch

def job_complete_callback(
    job_id,                 # type: str
    objective_value,        # type: float
    objective_iteration,    # type: int
    job_parameters,         # type: dict
    top_performance_job_id  # type: str
    ):
    print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
    if job_id == top_performance_job_id:
        print('WOOT WOOT we broke the record! Objective reached {}'.format(objective_value))


task = Task.init(project_name='Hyper-Parameter Optimization',
                 task_name='Automatic Hyper-Parameter Optimization',
                 task_type=Task.TaskTypes.optimizer,
                 reuse_last_task_id=False)


def objective_function():
    import time
    task = Task.current_task()
    epochs = task.get_parameter("General/epochs", cast=True)

    for ep in range(epochs):
        task.get_logger().report_scalar(title="epoch_accuracy", series="epoch_accuracy", iteration=ep, value=ep ** 2)

    task.get_logger().report_single_value("Final value", ep ** 2)
    time.sleep(1)

    return ep ** 2


objective_task = task.create_function_task(objective_function)


# experiment template to optimize in the hyper-parameter optimization
print(">>>>>>>", objective_task.id)
args = {
    'template_task_id': objective_task.id}
args = task.connect(args)

execution_queue = 'queue-7'

an_optimizer = HyperParameterOptimizer(
    base_task_id=args['template_task_id'], 
    hyper_parameters=[
        DiscreteParameterRange('General/epochs', values=list(range(10, 30))),
    ], 
    objective_metric_title='Summary',
    objective_metric_series='Final value',
    objective_metric_sign='max',
    max_number_of_concurrent_tasks=3,
    optimizer_class=aSearchStrategy,
    execution_queue=execution_queue,
    spawn_project=None,
    time_limit_per_job=10.,
    pool_period_min=0.2,
    total_max_jobs=10,
    max_iteration_per_job=30,)

an_optimizer.start(job_complete_callback=job_complete_callback)
an_optimizer.set_time_limit(in_minutes=120.0)
an_optimizer.wait()
top_exp = an_optimizer.get_top_experiments(top_k=3)
print([t.id for t in top_exp])

ilouzl · 2024-03-10T11:19:58Z

Well @AlexandruBurlacu your code does work for me.
I probably had a different error, but it seems to be just fine now.
Thanks!

ilouzl added the bug Something isn't working label Mar 4, 2024

ilouzl closed this as completed Mar 10, 2024

allegroai-git pushed a commit that referenced this issue Mar 17, 2024

Fix HPO crashes when optimizing for single value scalars (#1221)

f48d9d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HPO cannot access logger.report_single_value metrics #1221

HPO cannot access logger.report_single_value metrics #1221

ilouzl commented Mar 4, 2024

ainoam commented Mar 4, 2024

AlexandruBurlacu commented Mar 7, 2024

ilouzl commented Mar 7, 2024

AlexandruBurlacu commented Mar 8, 2024

ilouzl commented Mar 10, 2024

HPO cannot access logger.report_single_value metrics #1221

HPO cannot access logger.report_single_value metrics #1221

Comments

ilouzl commented Mar 4, 2024

Describe the bug

To reproduce

Expected behaviour

Environment

Related Discussion

ainoam commented Mar 4, 2024

AlexandruBurlacu commented Mar 7, 2024

ilouzl commented Mar 7, 2024

AlexandruBurlacu commented Mar 8, 2024

ilouzl commented Mar 10, 2024