Weights & Biases handlers for MonAI #6305

soumik12345 · 2023-04-05T16:27:34Z

Features Contributed

WandbStatsHandler defines a set of Ignite Event-handlers for all the Weights & Biases logging logic. It can be used for any Ignite Engine(trainer, validator, and evaluator) and support both epoch level and iteration level. The expected data source is Ignite engine.state.output and engine.state.metrics. Default behaviors:
- When EPOCH_COMPLETED, write each dictionary item in engine.state.metrics to Weights & Biases.
- When ITERATION_COMPLETED, write each dictionary item in self.output_transform(engine.state.output) to Weights & Biases.
WandbModelCheckpointHandler inherits from ~ignite.handlers.ModelCheckpoint and can be used to periodically save objects as Weights & Biases artifacts.

The following colab notebook and Weights & Biases run demonstrate the usage of these handlers and their results respectively:

Some additional Weights & Biases features:

When TensorBoardStatsHandler and TensorBoardImageHandler are used inside a wandb run, Weights & Biases automatically hosts the Tensorboard instance inside the run if during wandb.init(), sync_tensorboard is set to True.
When used with TensorBoardImageHandler, the images and videos are automatically logged to Weights & Biases media panel if during wandb.init(), sync_tensorboard is set to True.

A few sentences describing the changes proposed in this pull request.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Nic-Ma · 2023-04-06T11:55:51Z

Hi @soumik12345 ,

Thanks for your contribution here.
Please add below things:

Unit tests.
Add to docs: https://github.com/Project-MONAI/MONAI/blob/dev/docs/source/handlers.rst.
Add new optional dependency for wandb, referring to: https://github.com/Project-MONAI/MONAI/blob/dev/CONTRIBUTING.md#adding-new-optional-dependencies.

Thanks in advance.

monai/handlers/wandb_handlers.py

binliunls · 2023-04-06T12:31:33Z

monai/handlers/wandb_handlers.py

+            tag_name: when iteration output is a scalar, tag_name is used to plot, defaults to `'Loss'`.
+        """
+        if wandb.run is None:
+            raise wandb.Error("You must call `wandb.init()` before WandbStatsHandler()")


HI @soumik12345 ,
I think it would be better that the handler itself process the initialization of wandb at the beginning of the expriment and have the run inside handler. @Nic-Ma How do you think about it?
If the experiment run is inside the handler, please be careful about the multiprocessing and single-process-multi-tasks cases.

Thanks,
Bin

Hi @binliunls, since there are 2 different handlers (and potentially a third one in the future as per #6305 (comment)), I think its best if the initialization of the wandb run is left independent of the handlers. Would of course love to know your thoughts as well.

cc: @Nic-Ma

dzenanz · 2023-04-06T15:40:45Z

Looking at your run the visualization of 3D images is hard to comprehend. They overflow my screen (like they are high resolution), but appear very blocky. Is this expected?

Another question, can this tensorboard 3D visualization plugin be used with W&B, and how? If not, could it be ported/integrated? It is open source.

soumik12345 · 2023-04-06T16:09:34Z

Looking at your run the visualization of 3D images is hard to comprehend. They overflow my screen (like they are high resolution), but appear very blocky. Is this expected?

Another question, can this tensorboard 3D visualization plugin be used with W&B, and how? If not, could it be ported/integrated? It is open source.

Hi @dzenanz currently, the 3D images are being logged as gifs by the TensorBoardImageHandler. If you wish to disable them from the wandb run, you can remove sync_tensorboard=True from wandb.init.

I can explore if it is possible to integrate tensorboard-plugin-3d with Weights & Biases or not. However, I can also create a WandbImageHandler similar to the TensorBoardImageHandler that would use image overlays for segmentation maps combined with the media panel step slider to visualize 3D volumetric images and compare visualizations across multiple runs.

soumik12345 · 2023-04-06T19:39:11Z

Hi @soumik12345 ,

Thanks for your contribution here. Please add below things:

Unit tests.

Add to docs: https://github.com/Project-MONAI/MONAI/blob/dev/docs/source/handlers.rst.

Add new optional dependency for wandb, referring to: https://github.com/Project-MONAI/MONAI/blob/dev/CONTRIBUTING.md#adding-new-optional-dependencies.

Thanks in advance.

Hi @Nic-Ma

Added the following as per your request:

Unit tests for the handlers
Documentation for the handlers
New optional dependency for wandb

Nic-Ma · 2023-04-07T00:49:47Z

Please add your test file here to skip the min tests as it has optional dependency:
https://github.com/Project-MONAI/MONAI/blob/dev/tests/min_tests.py#L203
And please fix the other CI failures.

Thanks.

Issues: #6291 Allows to skip the already trained algos, and continue training only for the non-trained ones. after this PR, the default option AutoRunner(train=None) will have this behavior, whereas manually setting AutoRunner(train=True/False) will always train all or skip all training. Previously we can only train all or skip all (without any option to resume) --------- Signed-off-by: myron <amyronenko@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Reverts #6290 fixes #6294 this commit is not compatible with the integration tests --------- Signed-off-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: Mingxin Zheng <18563433+mingxin-zheng@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mingxin Zheng <18563433+mingxin-zheng@users.noreply.github.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Fixes #6184 . ### Description Enable bundlle_gen to skip an algorithm if the BundleAlgo pre_check_skip_algo() function sets skip_bundlegen=True. SegResNet2D overrided pre_check_skip_algo() and will be skipped if data is not highly anisotropic. ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: heyufan1995 <heyufan1995@gmail.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

@myron

…6254) Fixes #6193 ### Description - adds `buffer_step` (code adapted from @myron's idea/implementation) - clean up multioutput resizing logic - perf optimize - `overlap` to support different values along the spatial dimensions ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [x] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [x] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [x] In-line docstrings updated. - [x] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

to solve: #6285 Currently ClassesToIndicesd can be used to pre-compute and cache the locations of all background/foreground classes. This significantly speedups the cropping transform RandCropByLabelClassesd, since the coordinates of class voxels do not need to be computed every time. Unfortunately for an average dataset it requires ~80gb of extra RAM, with an average cache (per image) is on the order of image size itself. (most of it due to background coordinates, or by some large segmentation classes). This PR clips (optionally) the number of coordinates in each class to a parameter max_indices_per_class. (e.g. 5000), which greatly reduces cache size requirements only to a few MB. Generally these coordinates are used to random choose a cropping center at each epoch, e.g. for 1000 epochs -> 1000 centers per image, so we don't need to store ALL the foreground coordinates, it's sufficient to store just some large random subset. PS: I wasn't sure if it's better to also change ClassesToIndices (not ClassesToIndicesD), feel free edit this PR. thank you ### Description A few sentences describing the changes proposed in this pull request. ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: myron <amyronenko@nvidia.com> Signed-off-by: Wenqi Li <wenqil@nvidia.com> Co-authored-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

To solve: #6299 ### Description You can find the description of the changes in issue #6299 Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

…6309) Fixes # . #6297 ### Description PyTorch onnx exporter API has been changed since 1.10 to remove example_outputs as a required input argument. Special handling is added in this PR to support PyTorch version older than 1.10. Unit test is also extended to covert this case. ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Second PR for issue #6291 Since the previous PR #6290 was reverted #6295 Allows to skip the already trained algos, and continue training only for the non-trained ones. after this PR, the default option AutoRunner(train=None) will have this behavior, whereas manually setting AutoRunner(train=True/False) will always train all or skip all training. Previously we can only train all or skip all (without any option to resume) I changed import_bundle_algo_history() to return a better algo_dict previously it returned "list[dict(name: algo)]" - a list of dict, but each dict must have a single key name "name => algo". Not it returns a list of dicts, each with several keys dict(AlgoEnsembleKeys.ID: name, AlgoEnsembleKeys.ALGO, algo, "is_trained": bool, etc). this allows to put additional metadata inside of each algo_dict, and it's easier to read it back. previously, to get a name we had to use "name = history[0].keys()[0]", now it's more elegant "name = history[0][AlgoEnsembleKeys.ID]". this however required to change many files, everywhere where import_bundle_algo_history and export_bundle_algo_history was used. All the tests have passed, except for "integration GPU utilization tests" , but those errors seems unrelated After this PR, tutorials need to be updated too Project-MONAI/tutorials#1288 --------- Signed-off-by: myron <amyronenko@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

for more information, see https://pre-commit.ci

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

soumik12345 · 2023-04-07T11:16:35Z

Please add your test file here to skip the min tests as it has optional dependency: https://github.com/Project-MONAI/MONAI/blob/dev/tests/min_tests.py#L203 And please fix the other CI failures.

Thanks.

Hi @Nic-Ma, I attempted to fix the CI failures, but ran into a couple of issues:

The reason this test is failing, seems to be an issue of inheritance with PyTorch Ignite itself and how the ignite.handlers.checkpoint.ModelCheckpoint class is created. Could you please recommend as to how we can resolve this?
Few CI checks (such as this) are also failing due to the following error:

Traceback (most recent call last):
  File "/Users/runner/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/runner/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/runner/work/MONAI/MONAI/tests/min_tests.py", line 229, in <module>
    assert not err_mod, f"err_mod={err_mod} not empty"
AssertionError: err_mod=['monai.handlers'] not empty

Can you please please recommend as to how we can resolve these issues?

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Nic-Ma · 2023-04-07T13:43:06Z

Hi @vfdev-5 ,

Could you please help give some guidance for the inheritance issue of ignite handler?

Thanks in advance.

soumik12345 · 2023-04-10T15:07:39Z

Hi @vfdev-5 ,

Could you please help give some guidance for the inheritance issue of ignite handler?

Thanks in advance.

Hi @vfdev-5
Could you please guide me in resolving the CI issues?

Thanks in advance.

vfdev-5 · 2023-04-11T09:09:18Z

@soumik12345

The first issue is mypy issue:

mypy
mypy 1.2.0 (compiled: yes)
monai/handlers/wandb_handlers.py:134:13: error: Too few arguments  [call-arg]
monai/handlers/wandb_handlers.py:147:13: error: Too few arguments  [call-arg]
monai/handlers/wandb_handlers.py:305:5: error: Signature of "__call__" incompatible with supertype "Checkpoint"  [override]
monai/handlers/wandb_handlers.py:305:5: note:      Superclass:
monai/handlers/wandb_handlers.py:305:5: note:          def __call__(self, engine: Engine) -> None
monai/handlers/wandb_handlers.py:305:5: note:      Subclass:
monai/handlers/wandb_handlers.py:305:5: note:          def __call__(self, engine: Engine, to_save: Mapping[Any, Any]) -> None
Found 3 errors in 1 file (checked 1071 source files)

We can fix it but do we really need to subclass ignite Checkpoint? Other experiment systems like ClearML or Neptune are providing ClearMLSaver and NeptunSaver to save checkpoint to their platforms/cloud

soumik12345 · 2023-04-11T14:07:59Z

@soumik12345

The first issue is mypy issue:

mypy
mypy 1.2.0 (compiled: yes)
monai/handlers/wandb_handlers.py:134:13: error: Too few arguments  [call-arg]
monai/handlers/wandb_handlers.py:147:13: error: Too few arguments  [call-arg]
monai/handlers/wandb_handlers.py:305:5: error: Signature of "__call__" incompatible with supertype "Checkpoint"  [override]
monai/handlers/wandb_handlers.py:305:5: note:      Superclass:
monai/handlers/wandb_handlers.py:305:5: note:          def __call__(self, engine: Engine) -> None
monai/handlers/wandb_handlers.py:305:5: note:      Subclass:
monai/handlers/wandb_handlers.py:305:5: note:          def __call__(self, engine: Engine, to_save: Mapping[Any, Any]) -> None
Found 3 errors in 1 file (checked 1071 source files)

We can fix it but do we really need to subclass ignite Checkpoint? Other experiment systems like ClearML or Neptune are providing ClearMLSaver and NeptunSaver to save checkpoint to their platforms/cloud

Hi @vfdev-5
I can refactor the Checkpoint Handler logic without subclassing an Ignite Handler.

soumik12345 · 2023-05-16T07:04:11Z

Re-raised the PR #6519

wyli requested review from Nic-Ma, binliunls and ericspod April 5, 2023 19:49

soumik12345 marked this pull request as draft April 6, 2023 10:10

soumik12345 marked this pull request as ready for review April 6, 2023 10:42

binliunls reviewed Apr 6, 2023

View reviewed changes

myron and others added 19 commits April 7, 2023 09:30

add: wandb dependency + handlers

08acff2

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

53c9e6b

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: added licensing header

0f81591

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

fix: isort linting

d7a1b10

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ad37d6

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Bug-fix in AlgoEnsembleBestN and AlgoEnsembleBestbyFold (#6300)

9dbe2ac

To solve: #6299 ### Description You can find the description of the changes in issue #6299 Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: added return type annotation

2161d39

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

add: docs for wandb handlers

cf867cd

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: docs + added wandb as optional depensency

e254b5b

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: logging state_attributes as configs

b615874

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

add: unit test for WandbStatsHandler

d8292a9

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: unit test for WandbStatsHandler

f132e4e

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

pre-commit-ci bot and others added 11 commits April 7, 2023 09:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

1fc0749

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

add: unit test for TestWandbModelCheckpointHandler

8ce356f

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0dd02b2

for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: added unit tests to min_tests.py

e9ebd77

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

fix: linting

204dec9

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

fix: imports

0bc6a67

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

fix: ci

7a94aac

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

resolve merge conflict

e3e28db

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

Merge branch 'dev' into dev

0e80900

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca26f09

for more information, see https://pre-commit.ci

removed wandb dir

7e7bfa1

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

update: wandb.init in unit tests

0fcccfe

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>

soumik12345 added 2 commits April 11, 2023 21:15

Merge branch 'dev' into dev

8928b56

Merge branch 'dev' into dev

e613f5c

soumik12345 closed this by deleting the head repository May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights & Biases handlers for MonAI #6305

Weights & Biases handlers for MonAI #6305

soumik12345 commented Apr 5, 2023 •

edited

Nic-Ma commented Apr 6, 2023

binliunls Apr 6, 2023

soumik12345 Apr 6, 2023

dzenanz commented Apr 6, 2023 •

edited

soumik12345 commented Apr 6, 2023

soumik12345 commented Apr 6, 2023

Nic-Ma commented Apr 7, 2023

soumik12345 commented Apr 7, 2023 •

edited

Nic-Ma commented Apr 7, 2023

soumik12345 commented Apr 10, 2023

vfdev-5 commented Apr 11, 2023

soumik12345 commented Apr 11, 2023 •

edited

soumik12345 commented May 16, 2023 •

edited

Weights & Biases handlers for MonAI #6305

Weights & Biases handlers for MonAI #6305

Conversation

soumik12345 commented Apr 5, 2023 • edited

Features Contributed

Types of changes

Nic-Ma commented Apr 6, 2023

binliunls Apr 6, 2023

Choose a reason for hiding this comment

soumik12345 Apr 6, 2023

Choose a reason for hiding this comment

dzenanz commented Apr 6, 2023 • edited

soumik12345 commented Apr 6, 2023

soumik12345 commented Apr 6, 2023

Nic-Ma commented Apr 7, 2023

soumik12345 commented Apr 7, 2023 • edited

Nic-Ma commented Apr 7, 2023

soumik12345 commented Apr 10, 2023

vfdev-5 commented Apr 11, 2023

soumik12345 commented Apr 11, 2023 • edited

soumik12345 commented May 16, 2023 • edited

soumik12345 commented Apr 5, 2023 •

edited

dzenanz commented Apr 6, 2023 •

edited

soumik12345 commented Apr 7, 2023 •

edited

soumik12345 commented Apr 11, 2023 •

edited

soumik12345 commented May 16, 2023 •

edited