-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weights & Biases handlers for MonAI #6305
Conversation
Hi @soumik12345 , Thanks for your contribution here.
Thanks in advance. |
tag_name: when iteration output is a scalar, tag_name is used to plot, defaults to `'Loss'`. | ||
""" | ||
if wandb.run is None: | ||
raise wandb.Error("You must call `wandb.init()` before WandbStatsHandler()") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @soumik12345 ,
I think it would be better that the handler itself process the initialization of wandb
at the beginning of the expriment and have the run inside handler. @Nic-Ma How do you think about it?
If the experiment run
is inside the handler, please be careful about the multiprocessing and single-process-multi-tasks cases.
Thanks,
Bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @binliunls, since there are 2 different handlers (and potentially a third one in the future as per #6305 (comment)), I think its best if the initialization of the wandb run is left independent of the handlers. Would of course love to know your thoughts as well.
cc: @Nic-Ma
Looking at your run the visualization of 3D images is hard to comprehend. They overflow my screen (like they are high resolution), but appear very blocky. Is this expected? Another question, can this tensorboard 3D visualization plugin be used with W&B, and how? If not, could it be ported/integrated? It is open source. |
Hi @dzenanz currently, the 3D images are being logged as gifs by the I can explore if it is possible to integrate |
Hi @Nic-Ma Added the following as per your request:
|
Please add your test file here to skip the min tests as it has optional dependency: Thanks. |
Issues: #6291 Allows to skip the already trained algos, and continue training only for the non-trained ones. after this PR, the default option AutoRunner(train=None) will have this behavior, whereas manually setting AutoRunner(train=True/False) will always train all or skip all training. Previously we can only train all or skip all (without any option to resume) --------- Signed-off-by: myron <amyronenko@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Reverts #6290 fixes #6294 this commit is not compatible with the integration tests --------- Signed-off-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: Mingxin Zheng <18563433+mingxin-zheng@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mingxin Zheng <18563433+mingxin-zheng@users.noreply.github.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Fixes #6184 . ### Description Enable bundlle_gen to skip an algorithm if the BundleAlgo pre_check_skip_algo() function sets skip_bundlegen=True. SegResNet2D overrided pre_check_skip_algo() and will be skipped if data is not highly anisotropic. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: heyufan1995 <heyufan1995@gmail.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
…6254) Fixes #6193 ### Description - adds `buffer_step` (code adapted from @myron's idea/implementation) - clean up multioutput resizing logic - perf optimize - `overlap` to support different values along the spatial dimensions ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [x] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [x] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [x] In-line docstrings updated. - [x] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
to solve: #6285 Currently ClassesToIndicesd can be used to pre-compute and cache the locations of all background/foreground classes. This significantly speedups the cropping transform RandCropByLabelClassesd, since the coordinates of class voxels do not need to be computed every time. Unfortunately for an average dataset it requires ~80gb of extra RAM, with an average cache (per image) is on the order of image size itself. (most of it due to background coordinates, or by some large segmentation classes). This PR clips (optionally) the number of coordinates in each class to a parameter max_indices_per_class. (e.g. 5000), which greatly reduces cache size requirements only to a few MB. Generally these coordinates are used to random choose a cropping center at each epoch, e.g. for 1000 epochs -> 1000 centers per image, so we don't need to store ALL the foreground coordinates, it's sufficient to store just some large random subset. PS: I wasn't sure if it's better to also change ClassesToIndices (not ClassesToIndicesD), feel free edit this PR. thank you ### Description A few sentences describing the changes proposed in this pull request. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: myron <amyronenko@nvidia.com> Signed-off-by: Wenqi Li <wenqil@nvidia.com> Co-authored-by: Wenqi Li <wenqil@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
…6309) Fixes # . #6297 ### Description PyTorch onnx exporter API has been changed since 1.10 to remove example_outputs as a required input argument. Special handling is added in this PR to support PyTorch version older than 1.10. Unit test is also extended to covert this case. ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [x] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Second PR for issue #6291 Since the previous PR #6290 was reverted #6295 Allows to skip the already trained algos, and continue training only for the non-trained ones. after this PR, the default option AutoRunner(train=None) will have this behavior, whereas manually setting AutoRunner(train=True/False) will always train all or skip all training. Previously we can only train all or skip all (without any option to resume) I changed import_bundle_algo_history() to return a better algo_dict previously it returned "list[dict(name: algo)]" - a list of dict, but each dict must have a single key name "name => algo". Not it returns a list of dicts, each with several keys dict(AlgoEnsembleKeys.ID: name, AlgoEnsembleKeys.ALGO, algo, "is_trained": bool, etc). this allows to put additional metadata inside of each algo_dict, and it's easier to read it back. previously, to get a name we had to use "name = history[0].keys()[0]", now it's more elegant "name = history[0][AlgoEnsembleKeys.ID]". this however required to change many files, everywhere where import_bundle_algo_history and export_bundle_algo_history was used. All the tests have passed, except for "integration GPU utilization tests" , but those errors seems unrelated After this PR, tutorials need to be updated too Project-MONAI/tutorials#1288 --------- Signed-off-by: myron <amyronenko@nvidia.com> Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
for more information, see https://pre-commit.ci Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Hi @Nic-Ma, I attempted to fix the CI failures, but ran into a couple of issues:
Can you please please recommend as to how we can resolve these issues? |
Signed-off-by: Soumik Rakshit <19soumik.rakshit96@gmail.com>
Hi @vfdev-5 , Could you please help give some guidance for the inheritance issue of ignite handler? Thanks in advance. |
The first issue is mypy issue:
We can fix it but do we really need to subclass ignite Checkpoint? Other experiment systems like ClearML or Neptune are providing ClearMLSaver and NeptunSaver to save checkpoint to their platforms/cloud |
Hi @vfdev-5 |
Re-raised the PR #6519 |
Features Contributed
WandbStatsHandler
defines a set of Ignite Event-handlers for all the Weights & Biases logging logic. It can be used for any Ignite Engine(trainer, validator, and evaluator) and support both epoch level and iteration level. The expected data source is Igniteengine.state.output
andengine.state.metrics
. Default behaviors:EPOCH_COMPLETED
, write each dictionary item inengine.state.metrics
to Weights & Biases.ITERATION_COMPLETED
, write each dictionary item inself.output_transform(engine.state.output)
to Weights & Biases.WandbModelCheckpointHandler
inherits from~ignite.handlers.ModelCheckpoint
and can be used to periodically save objects as Weights & Biases artifacts.The following colab notebook and Weights & Biases run demonstrate the usage of these handlers and their results respectively:
Some additional Weights & Biases features:
TensorBoardStatsHandler
andTensorBoardImageHandler
are used inside a wandb run, Weights & Biases automatically hosts the Tensorboard instance inside the run if duringwandb.init()
,sync_tensorboard
is set toTrue
.TensorBoardImageHandler
, the images and videos are automatically logged to Weights & Biases media panel if duringwandb.init()
,sync_tensorboard
is set toTrue
.A few sentences describing the changes proposed in this pull request.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.