Skip to content

Conversation

@wyli
Copy link
Contributor

@wyli wyli commented May 4, 2022

Signed-off-by: Wenqi Li wenqil@nvidia.com

Description

basic profiling of __torch_function__ in MetaTensor, need to run manually

code based on https://github.com/pytorch/pytorch/tree/v1.11.0/benchmarks/overrides_benchmark

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@wyli wyli requested review from Nic-Ma, ericspod and rijobro May 4, 2022 11:25
@wyli wyli force-pushed the adds-metatensor-profiling branch from ebb9db6 to 45a0c9a Compare May 4, 2022 11:33
@wyli
Copy link
Contributor Author

wyli commented May 4, 2022

the current results on a macos are

Type Tensor had a minimum time of 2.0996250677853823 us and a standard deviation of 0.5214191623963416 us.
Type SubTensor had a minimum time of 9.690787643194199 us and a standard deviation of 1.265820348635316 us.
Type SubWithTorchFunc had a minimum time of 12.085512280464172 us and a standard deviation of 0.9849673369899392 us.
Type MetaTensor had a minimum time of 177.9918670654297 us and a standard deviation of 27.68106199800968 us.

I think we should improve the override of torch_function

@rijobro
Copy link
Contributor

rijobro commented May 4, 2022

Thanks for this. It might be worth noting that 177 µs is still pretty quick, and presumably the difference between the classes will decrease for more non-trivial tasks (1+2 is about as simple as it can get). I'll check that now.

@rijobro
Copy link
Contributor

rijobro commented May 4, 2022

Locally (also MacOS), I got this for 1+2:

Type Tensor had a minimum time of 1.4180639991536736 us and a standard deviation of 0.7132108439691365 us.
Type SubTensor had a minimum time of 6.272729020565748 us and a standard deviation of 2.3775098379701376 us.
Type SubWithTorchFunc had a minimum time of 6.717714015394449 us and a standard deviation of 0.6704257102683187 us.
Type MetaTensor had a minimum time of 48.26740548014641 us and a standard deviation of 13.995005749166012 us.

And this for torch.rand((1, 200, 200, 200), dtype=torch.float32) + torch.rand((1, 200, 200, 200), dtype=torch.float32):

Type Tensor had a minimum time of 5421.039462089539 us and a standard deviation of 536.7174744606018 us.
Type SubTensor had a minimum time of 5442.803502082825 us and a standard deviation of 284.26727280020714 us.
Type SubWithTorchFunc had a minimum time of 5333.672761917114 us and a standard deviation of 490.83579331636435 us.
Type MetaTensor had a minimum time of 5656.501054763794 us and a standard deviation of 693.1112706661224 us.

I agree though that we should try to optimise where possible!

P.S. I think displaying the mean might be more useful than the minimum:

bench_time = float(torch.sum(torch.Tensor(bench_times))) / (NUM_REPEATS * NUM_REPEAT_OF_REPEATS)

instead of:

bench_time = float(torch.min(torch.Tensor(bench_times))) / NUM_REPEATS

@rijobro
Copy link
Contributor

rijobro commented May 4, 2022

Using the mean makes things a little bit less clear-cut:

Type Tensor had a mean time of 5256.852149963379 us and a standard deviation of 383.22440814226866 us.
Type SubTensor had a mean time of 5128.483772277832 us and a standard deviation of 132.8055397607386 us.
Type SubWithTorchFunc had a mean time of 5159.413814544678 us and a standard deviation of 241.35354906320572 us.
Type MetaTensor had a mean time of 5491.213798522949 us and a standard deviation of 324.95150808244944 us.

@wyli
Copy link
Contributor Author

wyli commented May 4, 2022

sure, this is mainly to show the overhead of creating/copying the meta info. good point on the metric, I'll try median values as well..

@rijobro
Copy link
Contributor

rijobro commented May 4, 2022

Yeah, maybe we can use something like cPofile to profile MetaTensor.__torch_function__.

@rijobro
Copy link
Contributor

rijobro commented May 4, 2022

Here are some preliminary results using cProfile and snakeviz. Of the 16.6s spent adding, 12.2s was spent in torch._C._TensorBase.add. The majority of the remaining time was spent updating metadata, of which 2.9s was spent doing deepcopy. Reducing the deepcopy would certainly speed things up, but in the case of c = a + b, I feel that the metadata need to be copied, as c should be independent of both a and b.

image

@wyli
Copy link
Contributor Author

wyli commented May 4, 2022

thanks, it looks interesting, perhaps we can come back to the optimisation topic once the metatensor is in a good shape, what do you think?

(I'm still looking into this deepcopy issue https://github.com/Project-MONAI/MONAI/runs/6287828695?check_suite_focus=true)

@wyli
Copy link
Contributor Author

wyli commented May 4, 2022

@rijobro could you share the cProfile commands? I can include them in the PR if it's simple to set up. I think this PR is useful for getting the performance monitored during our developments.

@rijobro
Copy link
Contributor

rijobro commented May 9, 2022

Sorry for slow reply @wyli . Here's the profile code, requires cProfile and something like snakeviz to view. Will create two files, one for adding images of (3, 10, 10, 10) and the other for adding images of (3, 200, 200, 200). In the former, the updating of the metadata dominates, whereas in the latter it contributes negligibly.

from monai.data.meta_tensor import MetaTensor
import torch
import cProfile

if __name__ == "__main__":
    n_chan = 3
    for hwd in (10, 200):
        shape = (n_chan, hwd, hwd, hwd)
        a = MetaTensor(torch.rand(shape), meta={"affine": torch.eye(4) * 2, "fname": "something1"})
        b = MetaTensor(torch.rand(shape), meta={"affine": torch.eye(4) * 3, "fname": "something2"})
        cProfile.run("c = a + b", filename=f"out_{hwd}.prof")

(3, 10, 10, 10)

image

(3, 200, 200, 200)

image

@wyli wyli force-pushed the feature/MetaTensor branch from c022ba2 to cd8baa4 Compare May 11, 2022 23:03
Signed-off-by: Wenqi Li <wenqil@nvidia.com>
@wyli wyli force-pushed the adds-metatensor-profiling branch from 66a088e to 3a81148 Compare May 18, 2022 16:57
@wyli
Copy link
Contributor Author

wyli commented Jun 14, 2022

Hi @rijobro, I've added the cProfile script here. Do you have any other concerns about this PR?

@rijobro
Copy link
Contributor

rijobro commented Jun 14, 2022

@wyli, No looks good to me. Thanks!

@wyli wyli merged commit 36dc126 into Project-MONAI:feature/MetaTensor Jun 14, 2022
@wyli wyli deleted the adds-metatensor-profiling branch June 14, 2022 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants