[AOT] Calculate used memory at the callsite of primitive functions #11208

lhutton1 · 2022-05-04T10:29:44Z

Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations.

~~Note: this PR is dependent on #11091 so also shows the contents of that PR.~~

cc @Mousius @NicolaLancellotti @ekalda @manupa-arm

lhutton1 · 2022-05-09T10:14:04Z

also cc @mbs-octoml @areusch

lhutton1 · 2022-05-09T14:58:08Z

src/relay/backend/manifest_lifetimes.cc

@@ -0,0 +1,367 @@
+/*


No functional changes here, simply moving manifest_lifetimes.cc ../ (outside scope of vm) and splitting into .cc/.h

manupak

Thanks @lhutton1 !

I did a take a first look. It looks broady good.
Few suggestions for more test cases and a question about using the the stack.

src/relay/backend/aot/annotate_used_memory.cc

tests/python/relay/aot/test_used_memory_annotator.py

areusch

hi @lhutton1 @manupa-arm , just had a couple questions on this one

src/relay/backend/aot/annotate_used_memory.cc

src/relay/backend/aot_executor_codegen.cc

src/relay/backend/aot/annotate_used_memory.cc

src/relay/backend/manifest_lifetimes.cc

lhutton1

Thanks for the reviews @manupa-arm, @areusch, @altanh - hoping to have a revised version ready soon!

src/relay/backend/aot/annotate_used_memory.cc

src/relay/backend/manifest_lifetimes.cc

areusch · 2022-05-23T18:18:35Z

thanks @lhutton1 for the replies! ping us when this is ready for review again.

lhutton1 · 2022-05-26T16:52:57Z

Apologies for the delay, this is ready for another look!

areusch · 2022-05-26T20:39:26Z

ok thanks @lhutton1 ! i'll defer to @altanh on this one

altanh

mostly LGTM with some small questions/nits!

src/relay/backend/annotate_used_memory.cc

altanh

Thanks for the changes! LGTM

src/relay/backend/annotate_used_memory.cc

manupak

LGTM!

manupak · 2022-06-24T16:16:15Z

@lhutton1 since it has been 18 days, should we re-run a round of CI -- just to be sure :)

Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations. Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89

Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2

In addition, a new "io_used_memory" annotation is added to the main function which refers to the total size of the IO tensors in the provided module, enabling these to be discounted from memory pressure calculations where necessary. Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86

Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd

Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0

manupak · 2022-06-25T12:04:40Z

Thanks @lhutton1 @altanh @areusch ! This is merged now

…pache#11208) * [AOT] Calculate used memory at the callsite of primitive functions Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations. Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89 * small fix to file description Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2 * Various improvements addressing comments In addition, a new "io_used_memory" annotation is added to the main function which refers to the total size of the IO tensors in the provided module, enabling these to be discounted from memory pressure calculations where necessary. Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86 * addressing comments Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd * add note for dynamic shapes Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0

zhaoyang-star · 2022-10-13T01:12:14Z

Hi @lhutton1 , thanks for your contributition.
After running FuseOps pass, I want to get the memory usage per op or per primitive func by AnnotateUsedMemory pass for furture optimization. I get a resnet18 ir model, then put it as the input IRModule of AnnotateUsedMemory pass. The output IRModule has no used_memory attr. Test code as follow:

import pytest
from collections import OrderedDict
import numpy as np
import tvm
from tvm import relay
from tvm.relay import testing


def AnnotateUsedMemory():
    return relay.transform._ffi_api.AnnotateUsedMemory()


def _get_data(in_data_shapes, dtype="float32"):
    in_data = OrderedDict()
    for name, shape in in_data_shapes.items():
        in_data[name] = np.random.uniform(size=shape).astype(dtype)
    return in_data


def _run_relay(mod, params, in_data, pass_enabled):
    target = "llvm"
    dev = tvm.device("llvm", 0)
    in_data = [tvm.nd.array(value) for value in in_data.values()]

    if pass_enabled:
        mod = relay.transform.InferType()(mod)
        mod = relay.transform.ToANormalForm()(mod)
        mod = relay.transform.InferType()(mod)
        mod = AnnotateUsedMemory()(mod)
        # create primitive functions
        mod = relay.transform.FuseOps()(mod)

    print(f'\nmod when AnnotateUsedMemory is {pass_enabled}:\n {mod}')

    out_data = relay.create_executor(
        "graph", mod, device=dev, target=target).evaluate()(*in_data, **params)
    return out_data.numpy()


def _verify_results(mod, params, in_data, rtol=1e-5, atol=1e-5):
    before = _run_relay(mod, params, in_data, False)
    after = _run_relay(mod, params, in_data, True)
    np.testing.assert_allclose(before, after, rtol, atol)


def test_resnet():
    num_class = 1000
    in_data_shapes = OrderedDict({"data": (1, 3, 224, 224)})
    in_data = _get_data(in_data_shapes, dtype="float32")
    for n in [18]:  # 18, 34, 50, 101
        mod, params = tvm.relay.testing.resnet.get_workload(
            batch_size=1, num_classes=num_class, num_layers=n)
        _verify_results(mod, params, in_data)


if __name__ == "__main__":
    pytest.main([__file__])

I am not familar with AnnotateUsedMemory pass. Could memory usage per op or per primitive func be gotten by your pass? If not, how to get it based on your pass? Thanks in advance ^_^

lhutton1 · 2022-10-14T15:37:44Z

Hi @zhaoyang-star, thanks for taking a look, its great to see this pass being used elsewhere. The pass currently expects the input to be a module of primitive functions so I would suggest running AnnotateUsedMemory after FuseOps similar to:

mod = relay.transform.InferType()(mod)
mod = relay.transform.FuseOps()(mod)
mod = relay.transform.InferType()(mod)
mod = relay.transform.ToANormalForm()(mod)
mod = relay.transform.InferType()(mod)
mod = AnnotateUsedMemory()(mod)

I did try running your example locally with the above change and this produced the relevant used_memory annotations. However, it looks like there is an issue while building the module after having run the AnnotateUsedMemory pass. Without digging too much into it I would suspect it's because this pass wasn't considered for the graph executor; only for the AOT executor. I believe changes similar to #11091 would be needed in the graph executor to support A-normal form. Hope this helps :)

zhaoyang-star · 2022-10-17T03:15:49Z

Hi @zhaoyang-star, thanks for taking a look, its great to see this pass being used elsewhere. The pass currently expects the input to be a module of primitive functions so I would suggest running AnnotateUsedMemory after FuseOps similar to:
mod = relay.transform.InferType()(mod)
mod = relay.transform.FuseOps()(mod)
mod = relay.transform.InferType()(mod)
mod = relay.transform.ToANormalForm()(mod)
mod = relay.transform.InferType()(mod)
mod = AnnotateUsedMemory()(mod)
I did try running your example locally with the above change and this produced the relevant used_memory annotations. However, it looks like there is an issue while building the module after having run the AnnotateUsedMemory pass. Without digging too much into it I would suspect it's because this pass wasn't considered for the graph executor; only for the AOT executor. I believe changes similar to #11091 would be needed in the graph executor to support A-normal form. Hope this helps :)

@lhutton1, I want to confirm: Did you reproduce the issue( no used_memory attr in the output log) using my script above? If you ran all right, could you please share your script? There is only one io_used_memory attr and no used_memory attr found after running my script.

If I placed the FuseOps before AnnotateUsedMemory just as you showed, there is a error Check failed: (tensor_type) is false:. You have mentioned maybe we should support ANF in graph executor to solve the error.

lhutton1 · 2022-10-17T09:40:05Z

Hi @zhaoyang-star, yes I was able to reproduce the issue with your script. The script I have would be the same as yours just with the a different pass order as mentioned above. Placing FuseOps before AnnotateUsedMemory seems like the correct thing to do here; if you print out the module (mod) after the AnnotateUsedMemory pass you should be able to see the used_memory annotations. The Check failed: (tensor_type) is false: error comes later in the compilation so it seems as though some later optimization passes cannot deal with ANF yet.

…pache#11208) * [AOT] Calculate used memory at the callsite of primitive functions Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations. Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89 * small fix to file description Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2 * Various improvements addressing comments In addition, a new "io_used_memory" annotation is added to the main function which refers to the total size of the IO tensors in the provided module, enabling these to be discounted from memory pressure calculations where necessary. Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86 * addressing comments Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd * add note for dynamic shapes Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0

github-actions bot requested a review from manupak May 4, 2022 10:30

lhutton1 mentioned this pull request May 4, 2022

[microNPU] Calculate memory pressure for microNPU external functions #11209

Merged

lhutton1 force-pushed the annotate-mem-usage branch from 5395581 to 71a7fa6 Compare May 9, 2022 10:11

lhutton1 marked this pull request as ready for review May 9, 2022 10:13

lhutton1 commented May 9, 2022

View reviewed changes

manupak requested changes May 17, 2022

View reviewed changes

src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved

tests/python/relay/aot/test_used_memory_annotator.py Outdated Show resolved Hide resolved

areusch reviewed May 19, 2022

View reviewed changes

src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved

src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved

src/relay/backend/aot_executor_codegen.cc Show resolved Hide resolved

altanh suggested changes May 19, 2022

View reviewed changes

lhutton1 commented May 20, 2022

View reviewed changes

lhutton1 force-pushed the annotate-mem-usage branch from 71a7fa6 to cd58ca9 Compare May 26, 2022 13:40

github-actions bot requested a review from Mousius May 26, 2022 13:41

lhutton1 force-pushed the annotate-mem-usage branch from cd58ca9 to 9241b66 Compare May 26, 2022 16:51

altanh suggested changes May 26, 2022

View reviewed changes

src/relay/backend/annotate_used_memory.cc Outdated Show resolved Hide resolved

src/relay/backend/annotate_used_memory.cc Show resolved Hide resolved

src/relay/backend/annotate_used_memory.cc Outdated Show resolved Hide resolved

src/relay/backend/annotate_used_memory.cc Show resolved Hide resolved

lhutton1 force-pushed the annotate-mem-usage branch 2 times, most recently from 297c62e to 4d95daa Compare May 31, 2022 08:31

altanh approved these changes May 31, 2022

View reviewed changes

src/relay/backend/annotate_used_memory.cc Show resolved Hide resolved

lhutton1 force-pushed the annotate-mem-usage branch from 4d95daa to 1c274d7 Compare June 6, 2022 09:28

manupak approved these changes Jun 24, 2022

View reviewed changes

lhutton1 force-pushed the annotate-mem-usage branch from 1c274d7 to 9e6db41 Compare June 24, 2022 16:21

lhutton1 added 5 commits June 24, 2022 16:21

small fix to file description

ab605fb

Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2

addressing comments

93c0672

Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd

add note for dynamic shapes

89f7523

Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0

lhutton1 force-pushed the annotate-mem-usage branch from 9e6db41 to 89f7523 Compare June 24, 2022 16:21

manupak merged commit 6d6e070 into apache:main Jun 25, 2022

lhutton1 deleted the annotate-mem-usage branch June 25, 2022 18:30

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AOT] Calculate used memory at the callsite of primitive functions #11208

[AOT] Calculate used memory at the callsite of primitive functions #11208

lhutton1 commented May 4, 2022 •

edited by github-actions bot

Loading

lhutton1 commented May 9, 2022

lhutton1 May 9, 2022

manupak left a comment

areusch left a comment

lhutton1 left a comment

areusch commented May 23, 2022 •

edited

Loading

lhutton1 commented May 26, 2022

areusch commented May 26, 2022

altanh left a comment

altanh left a comment

manupak left a comment

manupak commented Jun 24, 2022

manupak commented Jun 25, 2022

zhaoyang-star commented Oct 13, 2022

lhutton1 commented Oct 14, 2022

zhaoyang-star commented Oct 17, 2022 •

edited

Loading

lhutton1 commented Oct 17, 2022

[AOT] Calculate used memory at the callsite of primitive functions #11208

[AOT] Calculate used memory at the callsite of primitive functions #11208

Conversation

lhutton1 commented May 4, 2022 • edited by github-actions bot Loading

lhutton1 commented May 9, 2022

lhutton1 May 9, 2022

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

areusch left a comment

Choose a reason for hiding this comment

lhutton1 left a comment

Choose a reason for hiding this comment

areusch commented May 23, 2022 • edited Loading

lhutton1 commented May 26, 2022

areusch commented May 26, 2022

altanh left a comment

Choose a reason for hiding this comment

altanh left a comment

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

manupak commented Jun 24, 2022

manupak commented Jun 25, 2022

zhaoyang-star commented Oct 13, 2022

lhutton1 commented Oct 14, 2022

zhaoyang-star commented Oct 17, 2022 • edited Loading

lhutton1 commented Oct 17, 2022

lhutton1 commented May 4, 2022 •

edited by github-actions bot

Loading

areusch commented May 23, 2022 •

edited

Loading

zhaoyang-star commented Oct 17, 2022 •

edited

Loading