Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

Merged

Conversation

Lunderberg
Copy link
Contributor

This reverts commit e88d0d, which itself reverted 9ff71f for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.

This reverts commit
[`e88d0d`](apache#15725), which
itself reverted
[`9ff71f`](apache#15103) for
breakages on the metal backend.  Now that the CI contains
compile-time testing of the metal codegen, the original
breakage should be identifiable.
@Lunderberg
Copy link
Contributor Author

This PR is currently listed as a draft, to ensure that the CI can catch the failure mode that was reported here.

@Lunderberg
Copy link
Contributor Author

Lunderberg commented Sep 29, 2023

And the fix is now implemented and included in this PR. It ended up being a <10 line fix once the CI was able to catch the problem, but getting the CI to that point was the harder part.

@MasterJH5574 Can you verify that PR doesn't re-introduce the issue that you reported here. The CI tests should be sufficient, but it would be good to confirm it as well.

@Lunderberg Lunderberg changed the title [Draft][CodeGenC][Redo] Handle GlobalVar callee as internal function call [CodeGenC][Redo] Handle GlobalVar callee as internal function call Oct 2, 2023
@Lunderberg Lunderberg marked this pull request as ready for review October 2, 2023 13:39
@Lunderberg
Copy link
Contributor Author

@MasterJH5574 Have you had a chance to verify that this PR does not cause a regression on OSX? I'd like to avoid leaving it idle for too long, as that allows conflicts with unrelated changes to creep in.

Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #15756 the original cause for reversion (#15725) has been resolved and is now tested on main.

@csullivan csullivan merged commit 698531e into apache:main Oct 18, 2023
19 of 20 checks passed
@Lunderberg Lunderberg deleted the codegen_c_globalvar_as_function_call branch October 18, 2023 19:07
@MasterJH5574
Copy link
Contributor

Hi Eric, I apologize for missing this for too long. Yes it now works for Metal perfectly I think. While we just noticed that the changes in the PR breaks the “iPhone” target (https://github.com/mlc-ai/mlc-llm/blob/200653a82d025be7d58d0d7f04442f85aee52c98/mlc_llm/utils.py#L542-L561) in MLC LLM.

I suppose the issue happens when building end-to-end models: the issue seems not reproducible when building a single TIR function on my side. So I have not yet got a minimal reproducible code.

While meanwhile, the end-to-end build command in MLC LLM can reproduce the issue:

python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target iphone --max-seq-len 768 --quantization q4f16_1 --build-model-only

Here is the error message:

Error message
> python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target iphone --max-seq-len 768 --quantization q4f16_1 --build-model-only

Using path "dist/models/Llama-2-7b-chat-hf" for model "Llama-2-7b-chat-hf"
Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1
Save a cached module to dist/Llama-2-7b-chat-hf-q4f16_1/mod_cache_before_build.pkl.
Compilation error:
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:142:8: error: redefinition of 'Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t'
struct Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t {
       ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:8:8: note: previous definition is here
struct Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t {
       ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:146:78: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel(  device half* A [[ buffer(0) ]],
                                                                             ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:147:31: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  device half* T_transpose [[ buffer(1) ]],
                              ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:148:72: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  constant Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t& arg [[ buffer(2) ]],
                                                                       ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:149:19: warning: 'threadgroup_position_in_grid' attribute ignored on function declaration [-Wignored-attributes]
  uint blockIdx [[threadgroup_position_in_grid]],
                  ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:150:20: warning: 'thread_position_in_threadgroup' attribute ignored on function declaration [-Wignored-attributes]
  uint threadIdx [[thread_position_in_threadgroup]]
                   ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:152:77: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void Llama_2_7b_chat_hf_q4f16_1_extend_te_kernel(  device half* A [[ buffer(0) ]],
                                                                            ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:153:29: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  device half* concat_te [[ buffer(1) ]],
                            ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:154:71: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  constant Llama_2_7b_chat_hf_q4f16_1_extend_te_kernel_args_t& arg [[ buffer(2) ]],
                                                                      ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:155:19: warning: 'threadgroup_position_in_grid' attribute ignored on function declaration [-Wignored-attributes]
  uint blockIdx [[threadgroup_position_in_grid]],
                  ^

(... many similar warnings are omitted)

245 warnings and 1 error generated.
LLVM ERROR: Error opening '/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.air': No such file or directory!
Traceback (most recent call last):
  File "/Users/ruihang-macstudio/Workspace/miniforge3/envs/python310/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/ruihang-macstudio/Workspace/miniforge3/envs/python310/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/build.py", line 47, in <module>
    main()
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/build.py", line 43, in main
    core.build_model_from_args(parsed_args)
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/core.py", line 742, in build_model_from_args
    build(mod, args)
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/core.py", line 629, in build
    ex = relax.build(mod_deploy, args.target, system_lib=args.system_lib)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/relax/vm_build.py", line 343, in build
    return _vmlink(builder, target, tir_mod, ext_libs, params, system_lib=system_lib)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/relax/vm_build.py", line 242, in _vmlink
    lib = tvm.build(
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/driver/build_module.py", line 281, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc", line 527, in tvm::$_5::operator()(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target) const
    return TIRToRuntime(inputs_arg, host_target);
  File "/Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc", line 510, in tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
    device_modules.push_back(codegen::Build(device_mod, it.first));
  File "/Users/ruihang-macstudio/Workspace/tvm/src/target/codegen.cc", line 73, in tvm::codegen::Build(tvm::IRModule, tvm::Target)
    return (*bf)(mod, target);
  File "/Users/ruihang-macstudio/Workspace/tvm/src/target/source/codegen_metal.cc", line 360, in tvm::codegen::BuildMetal(tvm::IRModule, tvm::Target)
    fsource = (*fmetal_compile)(fsource, target).operator std::string();
tvm.error.InternalError: Traceback (most recent call last):
  3: tvm::$_5::operator()(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target) const
        at /Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc:527
  2: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc:510
  1: tvm::codegen::Build(tvm::IRModule, tvm::Target)
        at /Users/ruihang-macstudio/Workspace/tvm/src/target/codegen.cc:73
  0: tvm::codegen::BuildMetal(tvm::IRModule, tvm::Target)
        at /Users/ruihang-macstudio/Workspace/tvm/src/target/source/codegen_metal.cc:360
  File "/Users/ruihang-macstudio/Workspace/tvm/include/tvm/runtime/packed_func.h", line 836
InternalError: Check failed: type_code_ == kTVMStr (4 vs. 11) : expected str but got NULL

I locally reverted the PR (https://github.com/mlc-ai/relax/commits) and am not gonna revert here, so hopefully the revert will not bother the unity branch too much. I appreciate it if you can kindly take a look at this iPhone build issue to see if it can be fixed quickly. Thank you so much.

@tqchen
Copy link
Member

tqchen commented Nov 1, 2023

I think one issue revealed by the latest set of regression is that we are trying to build a generic set of functionalities that are not necessarily used in some of the subclass settings, and such generalization increases the complexity of the overall code during concurrent development. We also observed similar problems in WebGPU backend.

In this case, perhaps decoupling the codegen logic without overcoupling the common ones would help. Specicially for the case of metal codegen:

  • Do not reuse PrintFunctionSignature in the metal codegen and simply reuse the original metal codegen logic, as metal codegen do not need to handle GlobalVar callee anyway, and such indirect (through function signature printing) creates extra complexity in maintaining code
  • Directly reuse the original metal codegen

@tqchen
Copy link
Member

tqchen commented Nov 1, 2023

Looking at the error.

This particular issue is due to the fact that metal codegen will construct a structural buffer to pass all the arguments. Right now normal codegen requires first declare then define the functions, resulting in the same struct definition to appear twice.

WebGPU likely had a similar issue so we simply rolled to the original implementations in the unity branch. We do not need to handle cross function in GPU shaders, so one simple approach should be simply skip the declare steps

@tqchen
Copy link
Member

tqchen commented Nov 1, 2023

My read is that for kernel generators(where we know there won't be cross function call, or we know we do not yet support kernel function call), we can safely remove

https://github.com/Lunderberg/tvm/blob/e8ef2f25f45f44aabfa1652027a05b3aae4c67f8/src/target/source/codegen_metal.cc#L352C2-L354C6 and

add a comment that we skip declaration as the cross-kernel calls are not supported

@tqchen
Copy link
Member

tqchen commented Nov 1, 2023

actually it turns out to be more complicated, although the above analysis indeed is right. I think given shader generator won't have a need for inter function call, we can keep the original logic which will ensure printing the signature and auxiliary data structure once

MasterJH5574 added a commit to MasterJH5574/tvm that referenced this pull request Nov 1, 2023
This PR restores the Metal codegen to the one before apache#15835.
Due to there will likely be no internal function call in Metal,
we think it is safe to do so.

Verified that with this PR, the metal codegen and iPhone codegen
will not fail and will work properly.

The reason of the iPhone codegen failure is because the multiple
declarations of a same function will lead to multiple emissions
of a same structs, which is not recognizable by the metal compiler.
@MasterJH5574
Copy link
Contributor

Filed a PR #16033 that restores the Metal codegen logic. I verified that it has no problem with both iPhone and Metal codegen.

MasterJH5574 added a commit to MasterJH5574/tvm that referenced this pull request Nov 1, 2023
This PR restores the Metal codegen to the one before apache#15835.
Due to there will likely be no internal function call in Metal,
we think it is safe to do so.

Verified that with this PR, the metal codegen and iPhone codegen
will not fail and will work properly.

The reason of the iPhone codegen failure is because the multiple
declarations of a same function will lead to multiple emissions
of a same structs, which is not recognizable by the metal compiler.
MasterJH5574 added a commit to MasterJH5574/tvm that referenced this pull request Nov 1, 2023
This PR restores the Metal codegen to the one before apache#15835.
Due to there will likely be no internal function call in Metal,
we think it is safe to do so.

Verified that with this PR, the metal codegen and iPhone codegen
will not fail and will work properly.

The reason of the iPhone codegen failure is because the multiple
declarations of a same function will lead to multiple emissions
of a same structs, which is not recognizable by the metal compiler.
tqchen pushed a commit that referenced this pull request Nov 3, 2023
* [Codegen][Metal] Disable cross-function call in Metal codegen

This PR restores the Metal codegen to the one before #15835.
Due to there will likely be no internal function call in Metal,
we think it is safe to do so.

Verified that with this PR, the metal codegen and iPhone codegen
will not fail and will work properly.

The reason of the iPhone codegen failure is because the multiple
declarations of a same function will lead to multiple emissions
of a same structs, which is not recognizable by the metal compiler.

* Fix the action script
jinhongyii pushed a commit to jinhongyii/tvm that referenced this pull request Nov 3, 2023
… call (apache#15835)"

This reverts commit 698531e.

The original PR fixes the metal codegen issue, while building to iPhone
in MLC LLM is still broken. Revert the commit for now first.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Nov 29, 2023
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining does is
skipped.  This commit includes unit tests for these unsupported
constructs, to validate that `InlinePrivateFunctions` produces
well-formed output even when they are present.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Nov 29, 2023
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Dec 19, 2023
The functionality to express a call from one `PrimFunc` to another was
introduced in apache#14889.  While this
was initially planned to be supported at codegen for all targets (see
apache#15835), this resulted in breakage
on some backends (see apache#16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.
Lunderberg added a commit that referenced this pull request Jan 3, 2024
* [TIR] Update DeclBuffer nodes when specializing PrimFunc

Prior to this commit, a buffer whose parameters (e.g. shape/stride)
contained a specialized parameter would not be updated when appearing
in a `DeclBuffer` node.  This commit updates the `Specialize` function
to update buffers that occur in `DeclBuffer` nodes.

* [TIR] Handle specialization that remaps a buffer var

* [TIR] Handle specialization of buffer variable to PrimExpr

* [TIR][Transform] Implement InlinePrivateFunctions

The functionality to express a call from one `PrimFunc` to another was
introduced in #14889.  While this
was initially planned to be supported at codegen for all targets (see
#15835), this resulted in breakage
on some backends (see #16033).
After discussion, the plan was changed to support TIR inlining, which
would enable the same high-level functionality in TIR without
requiring immediate low-level support across all codegens.

This commit implements and tests a new IRModule transform
`InlinePrivateFunctions`, which can be used as part of lowering in a
follow-up commit.

Because this is initially implemented for use quite late in the
lowering flow, many constructs are not currently supported.  The
current implementation has the following restrictions.

* `tir::Block` nodes may not occur in the inlined function.  Because a
  subroutine may be called multiple times, inlining of a subroutine
  that contains `tir::Block` would result in non-unique names.
  Support of subroutines with `tir::Block` instances will require
  de-duplication of block names.

* The subroutine's callsite must occur within a `tir::Evaluate` block.
  Because inlining a subroutine inserts the `tir::Stmt` body at the
  point of use, replacement must occur in a context where a
  `tir::Stmt` can be returned.  Support of subroutines that are called
  within an expression (e.g. Replacing `func` in `Buf[0] = func(1) +
  func(2)`) would require hoisting preprocessing done in the
  subroutine to the parent `tir::Stmt`.

* The subroutine may only accept primitive arguments, and must have an
  empty `buffer_map`.  Support of subroutines that are called with
  `tir::Buffer` or `tir::BufferRegion` arguments would require a way
  to represent these arguments at the callsite, and substitution of
  the buffer into the callee.

If these unsupported constructs are used, then the inlining of those
functions is skipped.  This commit includes unit tests for these
unsupported constructs, to validate that `InlinePrivateFunctions`
produces well-formed output even when they are present.

* Updates based on review comments

* ci bump

* CI bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants