Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use offline mode #1215

Open
michelkok opened this issue Feb 27, 2024 · 4 comments
Open

Cannot use offline mode #1215

michelkok opened this issue Feb 27, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@michelkok
Copy link

michelkok commented Feb 27, 2024

Describe the bug

I cannot train with offline mode as it errors out with ValueError: Unsupported keyword arguments: force. When not using offline mode, the training starts just fine.

Stacktrace
  C:\Users\user\anaconda3\envs\conda_wrapper\python.exe C:\Users\user\Documents\GitHub\projects\model_update\train.py --multirun 
  [I 2024-02-27 15:46:29,606] Using an existing study with name 'debug' instead of creating a new one.
  [2024-02-27 15:46:29,610][HYDRA] Study name: debug
  [2024-02-27 15:46:29,610][HYDRA] Storage: sqlite:///C:/Users/user/Documents/Internship_AgroCares/experiments/debug.db
  [2024-02-27 15:46:29,612][HYDRA] Sampler: TPESampler
  [2024-02-27 15:46:29,612][HYDRA] Directions: ['minimize']
  [2024-02-27 15:46:29,724][HYDRA] Launching 1 jobs locally
  [2024-02-27 15:46:29,724][HYDRA]        #0 : learning_rate=0.0009646166816485542 conv_dilation=4 conv_kernel_size=4 conv_filters_0=80 conv_filters_1=72 fc_neurons_0=512 fc_neurons_1=64 fc_neurons_2=512 fc_l2=3.3171161072480045e-05 batch_size=256 activation=relu pooling=avg pooling_size=2
  ClearML Task: created new task id=offline-07e2611c2a684673926cf42cb3a03b51
  Error executing job with overrides: ['learning_rate=0.0009646166816485542', 'conv_dilation=4', 'conv_kernel_size=4', 'conv_filters_0=80', 'conv_filters_1=72', 'fc_neurons_0=512', 'fc_neurons_1=64', 'fc_neurons_2=512', 'fc_l2=3.3171161072480045e-05', 'batch_size=256', 'activation=relu', 'pooling=avg', 'pooli
  ng_size=2']
  Error executing job with overrides: ['learning_rate=0.0009646166816485542', 'conv_dilation=4', 'conv_kernel_size=4', 'conv_filters_0=80', 'conv_filters_1=72', 'fc_neurons_0=512', 'fc_neurons_1=64', 'fc_neurons_2=512', 'fc_l2=3.3171161072480045e-05', 'batch_size=256', 'activation=relu', 'pooling=avg', 'pooli
  ng_size=2']
  Traceback (most recent call last):
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\utils.py", line 213, in run_and_report
      return func()
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\utils.py", line 461, in <lambda>
      lambda: hydra.multirun(
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\hydra.py", line 162, in multirun
      ret = sweeper.sweep(arguments=task_overrides)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\optuna_sweeper.py", line 52, in sweep
      return self.sweeper.sweep(arguments)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 391, in sweep
      raise e
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 360, in sweep
      f"Return value must be float-castable. Got '{ret.return_value}'."
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 260, in return_value
      raise self._return_value
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 357, in sweep
      values = [float(ret.return_value)]
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 260, in return_value
      raise self._return_value
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 186, in run_job
      ret.return_value = task_function(task_cfg)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\binding\hydra_bind.py", line 230, in _patched_task_function
      return task_function(a_config, *a_args, **a_kwargs)
    File "C:\Users\user\Documents\GitHub\projects\model_update\train.py", line 66, in main
      task = Task.init(
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\task.py", line 765, in init
      PatchHydra.delete_overrides()
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\binding\hydra_bind.py", line 53, in delete_overrides
      cls._current_task.delete_parameter(cls._overrides_section, force=True)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_interface\task\task.py", line 1365, in delete_parameter
      res = self.send(tasks.DeleteHyperParamsRequest(
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_api\services\v2_9\tasks.py", line 3814, in __init__
      super(DeleteHyperParamsRequest, self).__init__(**kwargs)
    File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_api\session\request.py", line 31, in __init__
      raise ValueError('Unsupported keyword arguments: %s' % ', '.join(kwargs.keys()))
  ValueError: Unsupported keyword arguments: force
  ClearML Task: Offline session stored in C:/Users/user/.clearml/cache/offline/offline-07e2611c2a684673926cf42cb3a03b51.zip

To reproduce

"""Demonstrate how training can be done in a simple fashion."""
from pathlib import Path
from clearml import Task
import hydra
from hydra.core.config_store import ConfigStore
import os

ConfigStore.instance().store(name="base_config", node=TrainConfiguration)

@hydra.main(version_base=None, config_path="conf", config_name="sweep")
def main(cfg: ScriptConfiguration):
    """Run."""
    Task.set_offline(offline_mode=True)

    task = Task.init(
        project_name="Test",
        task_name="debugtask",
        tags=['debug']
    )
    trainer = Trainer(config=cfg)
    train_loss = trainer.train()

    task.close()

    # Set offline to false and upload task to server
    Task.set_offline(False)


if __name__ == "__main__":
    main()

Expected behaviour

It should have trained normally, like when offline mode is not on.

Environment

  • Server type (both self hosted and on app.clear.ml)
  • ClearML SDK Version 1.14.3
  • ClearML Server Version (Only for self hosted). Can be found on the bottom right corner of the settings screen.: 1.14.1-451
  • Python Version 3.10.8
  • OS (Windows \ Linux \ Macos)
@michelkok michelkok added the bug Something isn't working label Feb 27, 2024
@eugen-ajechiloae-clearml
Copy link
Collaborator

Hi @michelkok ! Thank you for reporting. We have identified the problem and we will release a fix for this problem soon.

@zhulinchng
Copy link

@eugen-ajechiloae-clearml while waiting for the release, would dropping the force argument in the cls._current_task.delete_parameter function in the PatchHydra class from hydra_bind.py fix the issue?

@pollfly
Copy link
Contributor

pollfly commented May 9, 2024

Hey @michelkok! Just letting you know that this issue has been resolved in v1.15.0. Let us know if there are any issues :)

@michelkok
Copy link
Author

Thanks, I will test the coming week! Will close if it is indeed resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants