meta tensor basic profiling #4223

wyli · 2022-05-04T11:25:42Z

Signed-off-by: Wenqi Li wenqil@nvidia.com

Description

basic profiling of __torch_function__ in MetaTensor, need to run manually

code based on https://github.com/pytorch/pytorch/tree/v1.11.0/benchmarks/overrides_benchmark

Status

Ready

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

wyli · 2022-05-04T11:35:00Z

the current results on a macos are

Type Tensor had a minimum time of 2.0996250677853823 us and a standard deviation of 0.5214191623963416 us.
Type SubTensor had a minimum time of 9.690787643194199 us and a standard deviation of 1.265820348635316 us.
Type SubWithTorchFunc had a minimum time of 12.085512280464172 us and a standard deviation of 0.9849673369899392 us.
Type MetaTensor had a minimum time of 177.9918670654297 us and a standard deviation of 27.68106199800968 us.

I think we should improve the override of torch_function

rijobro · 2022-05-04T12:46:22Z

Thanks for this. It might be worth noting that 177 µs is still pretty quick, and presumably the difference between the classes will decrease for more non-trivial tasks (1+2 is about as simple as it can get). I'll check that now.

rijobro · 2022-05-04T13:06:51Z

Locally (also MacOS), I got this for 1+2:

Type Tensor had a minimum time of 1.4180639991536736 us and a standard deviation of 0.7132108439691365 us.
Type SubTensor had a minimum time of 6.272729020565748 us and a standard deviation of 2.3775098379701376 us.
Type SubWithTorchFunc had a minimum time of 6.717714015394449 us and a standard deviation of 0.6704257102683187 us.
Type MetaTensor had a minimum time of 48.26740548014641 us and a standard deviation of 13.995005749166012 us.

And this for torch.rand((1, 200, 200, 200), dtype=torch.float32) + torch.rand((1, 200, 200, 200), dtype=torch.float32):

Type Tensor had a minimum time of 5421.039462089539 us and a standard deviation of 536.7174744606018 us.
Type SubTensor had a minimum time of 5442.803502082825 us and a standard deviation of 284.26727280020714 us.
Type SubWithTorchFunc had a minimum time of 5333.672761917114 us and a standard deviation of 490.83579331636435 us.
Type MetaTensor had a minimum time of 5656.501054763794 us and a standard deviation of 693.1112706661224 us.

I agree though that we should try to optimise where possible!

P.S. I think displaying the mean might be more useful than the minimum:

bench_time = float(torch.sum(torch.Tensor(bench_times))) / (NUM_REPEATS * NUM_REPEAT_OF_REPEATS)

instead of:

bench_time = float(torch.min(torch.Tensor(bench_times))) / NUM_REPEATS

rijobro · 2022-05-04T13:13:27Z

Using the mean makes things a little bit less clear-cut:

Type Tensor had a mean time of 5256.852149963379 us and a standard deviation of 383.22440814226866 us.
Type SubTensor had a mean time of 5128.483772277832 us and a standard deviation of 132.8055397607386 us.
Type SubWithTorchFunc had a mean time of 5159.413814544678 us and a standard deviation of 241.35354906320572 us.
Type MetaTensor had a mean time of 5491.213798522949 us and a standard deviation of 324.95150808244944 us.

wyli · 2022-05-04T13:14:03Z

sure, this is mainly to show the overhead of creating/copying the meta info. good point on the metric, I'll try median values as well..

rijobro · 2022-05-04T13:15:58Z

Yeah, maybe we can use something like cPofile to profile MetaTensor.__torch_function__.

rijobro · 2022-05-04T14:10:23Z

Here are some preliminary results using cProfile and snakeviz. Of the 16.6s spent adding, 12.2s was spent in torch._C._TensorBase.add. The majority of the remaining time was spent updating metadata, of which 2.9s was spent doing deepcopy. Reducing the deepcopy would certainly speed things up, but in the case of c = a + b, I feel that the metadata need to be copied, as c should be independent of both a and b.

wyli · 2022-05-04T14:39:20Z

thanks, it looks interesting, perhaps we can come back to the optimisation topic once the metatensor is in a good shape, what do you think?

(I'm still looking into this deepcopy issue https://github.com/Project-MONAI/MONAI/runs/6287828695?check_suite_focus=true)

wyli · 2022-05-04T16:33:47Z

@rijobro could you share the cProfile commands? I can include them in the PR if it's simple to set up. I think this PR is useful for getting the performance monitored during our developments.

rijobro · 2022-05-09T16:14:37Z

Sorry for slow reply @wyli . Here's the profile code, requires cProfile and something like snakeviz to view. Will create two files, one for adding images of (3, 10, 10, 10) and the other for adding images of (3, 200, 200, 200). In the former, the updating of the metadata dominates, whereas in the latter it contributes negligibly.

from monai.data.meta_tensor import MetaTensor
import torch
import cProfile

if __name__ == "__main__":
    n_chan = 3
    for hwd in (10, 200):
        shape = (n_chan, hwd, hwd, hwd)
        a = MetaTensor(torch.rand(shape), meta={"affine": torch.eye(4) * 2, "fname": "something1"})
        b = MetaTensor(torch.rand(shape), meta={"affine": torch.eye(4) * 3, "fname": "something2"})
        cProfile.run("c = a + b", filename=f"out_{hwd}.prof")

(3, 10, 10, 10)

(3, 200, 200, 200)

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

wyli · 2022-06-14T07:53:43Z

Hi @rijobro, I've added the cProfile script here. Do you have any other concerns about this PR?

rijobro · 2022-06-14T14:32:04Z

@wyli, No looks good to me. Thanks!

wyli requested review from Nic-Ma, ericspod and rijobro May 4, 2022 11:25

wyli force-pushed the adds-metatensor-profiling branch from ebb9db6 to 45a0c9a Compare May 4, 2022 11:33

wyli force-pushed the feature/MetaTensor branch from c022ba2 to cd8baa4 Compare May 11, 2022 23:03

Meta tensor basic profiling

3a81148

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

wyli force-pushed the adds-metatensor-profiling branch from 66a088e to 3a81148 Compare May 18, 2022 16:57

wyli and others added 4 commits May 18, 2022 18:16

adds cprofile results

569d3e9

Signed-off-by: Wenqi Li <wenqil@nvidia.com>

Merge branch 'feature/MetaTensor' into adds-metatensor-profiling

5a079df

Merge branch 'feature/MetaTensor' into adds-metatensor-profiling

05d3ce8

Merge branch 'feature/MetaTensor' into adds-metatensor-profiling

853a05e

rijobro approved these changes Jun 14, 2022

View reviewed changes

Merge branch 'feature/MetaTensor' into adds-metatensor-profiling

11e1dc5

wyli merged commit 36dc126 into Project-MONAI:feature/MetaTensor Jun 14, 2022

wyli deleted the adds-metatensor-profiling branch June 14, 2022 15:12

meta tensor basic profiling #4223

meta tensor basic profiling #4223

Uh oh!

Conversation

wyli commented May 4, 2022

Description

Status

Types of changes

Uh oh!

wyli commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rijobro commented May 4, 2022

Uh oh!

rijobro commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rijobro commented May 4, 2022

Uh oh!

wyli commented May 4, 2022

Uh oh!

rijobro commented May 4, 2022

Uh oh!

rijobro commented May 4, 2022

Uh oh!

wyli commented May 4, 2022

Uh oh!

wyli commented May 4, 2022

Uh oh!

rijobro commented May 9, 2022

(3, 10, 10, 10)

(3, 200, 200, 200)

Uh oh!

wyli commented Jun 14, 2022

Uh oh!

rijobro commented Jun 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wyli commented May 4, 2022 •

edited

Loading

rijobro commented May 4, 2022 •

edited

Loading