Skip to content

Weights & Biases handler for MonAI#6519

Open
soumik12345 wants to merge 31 commits intoProject-MONAI:devfrom
soumik12345:feat/wandb-stats-handler
Open

Weights & Biases handler for MonAI#6519
soumik12345 wants to merge 31 commits intoProject-MONAI:devfrom
soumik12345:feat/wandb-stats-handler

Conversation

@soumik12345
Copy link
Copy Markdown

Features Contributed

  • WandbStatsHandler defines a set of Ignite Event-handlers for all the Weights & Biases logging logic. It can be used for any Ignite Engine(trainer, validator, and evaluator) and support both epoch level and iteration level. The expected data source is Ignite engine.state.output and engine.state.metrics. Default behaviors:
    • When EPOCH_COMPLETED, write each dictionary item in engine.state.metrics to Weights & Biases.
    • When ITERATION_COMPLETED, write each dictionary item in self.output_transform(engine.state.output) to Weights & Biases.

The following colab notebook and Weights & Biases run demonstrate the usage of these handlers and their results respectively:


Some additional Weights & Biases features:

  • When TensorBoardStatsHandler and TensorBoardImageHandler are used inside a wandb run, Weights & Biases automatically hosts the Tensorboard instance inside the run if during wandb.init(), sync_tensorboard is set to True.
  • When used with TensorBoardImageHandler, the images and videos are automatically logged to Weights & Biases media panel if during wandb.init(), sync_tensorboard is set to True.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@soumik12345 soumik12345 marked this pull request as draft May 16, 2023 07:03
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 062959c
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 42dc1fb
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 163364e
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: a01b2b0
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 38dd508
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 19380c5
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 3952742

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci
@soumik12345 soumik12345 force-pushed the feat/wandb-stats-handler branch from f9f6f49 to 3e0039a Compare May 16, 2023 07:31
soumik12345 and others added 14 commits May 16, 2023 13:25
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 062959c
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 42dc1fb
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 163364e
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: a01b2b0
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 38dd508
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 19380c5
I, Soumik Rakshit <19soumik.rakshit96@gmail.com>, hereby add my Signed-off-by to this commit: 3952742

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
@soumik12345 soumik12345 marked this pull request as ready for review May 18, 2023 10:43
@wyli wyli requested review from Nic-Ma, binliunls and ericspod May 18, 2023 15:40
Comment thread tests/test_handler_wandb_stats.py Outdated
Comment thread monai/handlers/wandb_handlers.py Outdated
Comment thread monai/handlers/wandb_handlers.py
Comment thread tests/test_handler_wandb_stats.py Outdated
os.system("wandb offline")
os.environ["WANDB_DIR"] = tempdir.name

wandb.init(dir=tempdir.name)
Copy link
Copy Markdown
Contributor

@Nic-Ma Nic-Ma May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the wandb is global, is it thread-safe and supports multi-processing?
I would suggest to add some tests to call wandb.init(dir=tempdir.name) and the whole handler in multi-thread and multi-processing logic.
For multi-processing test, you can refer to:
https://github.com/Project-MONAI/MONAI/blob/dev/tests/test_cumulative_average_dist.py
For multi-thread test, you can easily add a test function here, refer to:
https://github.com/Project-MONAI/MONAI/blob/dev/monai/data/dataset.py#L875-L878
What do you think?

Thanks.

Comment thread monai/handlers/wandb_handlers.py Outdated
when every iteration completed. The default behavior is to print loss from output[0] as
output is a decollated list and we replicated loss value for every item of the decollated
list. `engine.state` and `output_transform` inherit from the
ignite concept: https://pytorch.org/ignite/concepts.html#state, explanation and usage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot open the ignite concept link and the tutorial link below. May need to update these links.

Thanks,
Bin

engine: Ignite Engine, it can be a trainer, validator or evaluator.
"""
if self.epoch_event_writer is not None:
self.epoch_event_writer(engine)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the doc-string, it says that the epoch_event_writer: ..... Must accept the parameter "engine" and "summary_writer" ..... Whereas, I don't see the summary_writer parameter here. It's a little bit confusing for me. Could you please explain it?

Thanks,
Bin

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a typo from an older draft of the PR, where I was mimicking the behavior of the Tensorboard Stats handler. I will update the docstring.


def __init__(
self,
iteration_log: bool = True,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoud a dict parameter like hyperparameters be added to the init function to record some users defined parameters with wandb.config?

Thanks,
Bin

@Nic-Ma
Copy link
Copy Markdown
Contributor

Nic-Ma commented May 19, 2023

@SachidanandAlle Can help share more concerns about multi-thread running of the hander.

Thanks.

@binliunls
Copy link
Copy Markdown
Contributor

When running with the multithread code shown below, the wandb handler records the thread results into one run as shown in the pic. However, I think several indiviual runs should be created to record different threads.

from __future__ import annotations

import unittest
import wandb
import torch
from concurrent.futures import ThreadPoolExecutor

from ignite.engine import Engine

from monai.handlers import WandbStatsHandler


def dummy_train(start):
    # set up engine
    def _train_func(engine, batch):
        return batch + 1.0

    engine = Engine(_train_func)

    # set up testing handler
    handler = WandbStatsHandler(
        output_transform=lambda x: x,
    )
    handler.attach(engine)
    engine.run(torch.tensor([start]), max_epochs=5)


class TestHandlerWB(unittest.TestCase):
    def test_multi_thread(self):
        wandb.init(
            project="multithread-handlers", save_code=True, sync_tensorboard=True
        )
        with ThreadPoolExecutor(2, "Training") as executor:
            for t in range(2):
                executor.submit(dummy_train, t + 2)


if __name__ == "__main__":
    unittest.main()
Screenshot 2023-05-22 at 15 37 51

Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Comment thread monai/handlers/wandb_handlers.py
@Nic-Ma
Copy link
Copy Markdown
Contributor

Nic-Ma commented Aug 8, 2023

Hi @soumik12345 ,

Do you still plan to complete this PR?

Thanks.

@soumik12345
Copy link
Copy Markdown
Author

Hi @soumik12345 ,

Do you still plan to complete this PR?

Thanks.

Hi @Nic-Ma
I will be completing this PR soon, thanks for being patient :)

soumik12345 and others added 4 commits August 9, 2023 18:35
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
@soumik12345 soumik12345 requested review from Nic-Ma and binliunls August 9, 2023 13:34
@DanielNobbe
Copy link
Copy Markdown
Contributor

Is this still being worked on?

@ericspod
Copy link
Copy Markdown
Member

Hi @DanielNobbe sorry about this PR, it has seemingly fallen through the cracks quite a while ago. @soumik12345 I realised this is rather old now but would you be able to update the PR so we can merge this? Looking at it now I think we're fine to merge this if we can resolve the conflicts and run the tests. Thanks!

@KumoLiu KumoLiu self-requested a review as a code owner January 30, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants