sumx: no valid experiments found #15

svengiegerich · 2021-05-18T22:56:04Z

Hey,
first thanks for this nice, light-weight tool! Very helpful.

Just one thing, I can't get sumx running:
For python -m runx.sumx config_simple, I get the following error:
No valid experiments found for /Users/svengiegerich/runx/config_simple
using version 0.0.10.

However, the path is correct and there are two successful runs (folders) in it; each subfolder contains a metrics.csv looking like this,

start,start/step,0,timestamp,1621378323.1818151
val,loss,0.013856839059957424,epoch,1,timestamp,1621378337.119943
val,loss,0.00520349506242475,epoch,2,timestamp,1621378344.595231

metrics are added by the following code lines,

metrics_val = {'loss': epoch_loss}
logx.metric(phase='val', metrics=metrics_val, epoch=epoch_i + 1)

Every other logging works smoothly, e.g. logx.add_scalar() for tensorboard.

-> Any idea what's wrong here?

My .runx,

LOGROOT: /Users/svengiegerich/runx
CODE_IGNORE_PATTERNS: '*.git,data/raw*,.*,results*'

FARM: bigfarm

# Farm resource needs
bigfarm:
    SUBMIT_CMD: 'submit_job'
    RESOURCES:
        image: mydocker-image-big:1.0
        gpu: 8
        cpu: 64
        mem: 450

and the config_simple.yml,

CMD: 'python train.py'

HPARAMS: [
  {
    logdir: LOGDIR,
    epochs: [1,2],
    RUNX.TAG: 'transformer',
    arch: 'transformer',
  }
]

The text was updated successfully, but these errors were encountered:

ajtao · 2021-05-19T02:51:17Z

Hi @svengiegerich! sumx looks in /Users/svengiegerich/runx for directories that contain both metrics.csv and hparams.json. It sounds like you've confirmed that the metrics.csv files exist. Do you also see the hparams.json files there too?

svengiegerich · 2021-05-19T09:30:43Z

Ah, no hparams.json is indeed missing. I run in the interactive mode (python -m runx.sumx config_simple -i) because I don't have access to a farm. So probably this is the issue?
Is it possible to configure .runx in a way that I can use runx non-interactive but also not on a farm? In other words, can I just use sumx with a farm?

Reading #9, I tried to modify .runx but with no success.

# not working
FARM: fake

fake:
    SUBMIT_CMD: na
    RESOURCES:
        dummy: na

ajtao · 2021-05-19T15:40:53Z

So firstoff, I'll plan to release better support for the 'no farm' mode, where you shouldn't have to define the FARM.

But as a hack, the .runx you show above should actually work. I just confirmed this myself.

What sort of failure are you seeing?

svengiegerich · 2021-05-19T20:09:38Z

Running python -m runx.runx config_simple.yml, I get:

 File "/opt/anaconda3/envs/thesis/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 394, in <module>
    main()
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 387, in main
    run_experiment(args.exp_yml)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 380, in run_experiment
    run_yaml(experiment_copy, runroot)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 330, in run_yaml
    cmd = build_farm_cmd(cmd, job_name, resource_copy, logdir)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/farm.py", line 126, in build_farm_cmd
    raise f'Unsupported farm: {cfg.FARM}'
TypeError: exceptions must derive from BaseException

And if I rename the farm to FARM: ngc, I get:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/thesis/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 394, in <module>
    main()
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 387, in main
    run_experiment(args.exp_yml)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/runx.py", line 361, in run_experiment
    experiment = read_config(args.farm, args.exp_yml)
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/utils.py", line 122, in read_config
    cfg.NGC_LOGROOT = read_config_item(experiment, 'NGC_LOGROOT')
  File "/opt/anaconda3/envs/thesis/lib/python3.7/site-packages/runx/utils.py", line 72, in read_config_item
    raise f'can\'t find {key} in config'
TypeError: exceptions must derive from BaseException

Thanks for your time & help!

ajtao · 2021-05-19T21:59:47Z

Hi Sven, I appreciate your patience!

I've updated the pypi runx to 0.0.11. Can you please pip install it and try it out. Now your .runx should only need the LOGROOT defined, and all that fake FARM stuff isn't needed anymore. Please let me know if it works.

I've been trying to improve the examples a little for this case. It could certainly be improved :).

svengiegerich · 2021-05-19T23:37:14Z

Hey, thanks for the update!

Going through the examples again, I found my issue: I didn't include the hparams=vars(args) argument in logx.initialize(). Now everything works smoothly. As feedback, it would have helped me as a user if this argument was explained in the README; however, I also may have missed it.

Thanks again for this package!

[Just a side note: the "syntax" of metrics.csv seems to be inconsistent across rows (also for your example). Right now, the first line is short on two cells as there is probably no validation score. At least for me, a "consistent" format, with the first line containing 7 cells, would simplify analyzing this metrics.csv's]

svengiegerich closed this as completed May 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sumx: no valid experiments found #15

sumx: no valid experiments found #15

svengiegerich commented May 18, 2021

ajtao commented May 19, 2021

svengiegerich commented May 19, 2021

ajtao commented May 19, 2021

svengiegerich commented May 19, 2021

ajtao commented May 19, 2021

svengiegerich commented May 19, 2021

sumx: no valid experiments found #15

sumx: no valid experiments found #15

Comments

svengiegerich commented May 18, 2021

ajtao commented May 19, 2021

svengiegerich commented May 19, 2021

ajtao commented May 19, 2021

svengiegerich commented May 19, 2021

ajtao commented May 19, 2021

svengiegerich commented May 19, 2021