[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

Lunderberg · 2023-09-27T19:55:44Z

This reverts commit e88d0d, which itself reverted 9ff71f for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.

This reverts commit [`e88d0d`](apache#15725), which itself reverted [`9ff71f`](apache#15103) for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.

Lunderberg · 2023-09-27T19:56:29Z

This PR is currently listed as a draft, to ensure that the CI can catch the failure mode that was reported here.

Lunderberg · 2023-09-29T16:03:26Z

And the fix is now implemented and included in this PR. It ended up being a <10 line fix once the CI was able to catch the problem, but getting the CI to that point was the harder part.

@MasterJH5574 Can you verify that PR doesn't re-introduce the issue that you reported here. The CI tests should be sufficient, but it would be good to confirm it as well.

Lunderberg · 2023-10-12T16:14:55Z

@MasterJH5574 Have you had a chance to verify that this PR does not cause a regression on OSX? I'd like to avoid leaving it idle for too long, as that allows conflicts with unrelated changes to creep in.

csullivan

With #15756 the original cause for reversion (#15725) has been resolved and is now tested on main.

MasterJH5574 · 2023-11-01T15:22:09Z

Hi Eric, I apologize for missing this for too long. Yes it now works for Metal perfectly I think. While we just noticed that the changes in the PR breaks the “iPhone” target (https://github.com/mlc-ai/mlc-llm/blob/200653a82d025be7d58d0d7f04442f85aee52c98/mlc_llm/utils.py#L542-L561) in MLC LLM.

I suppose the issue happens when building end-to-end models: the issue seems not reproducible when building a single TIR function on my side. So I have not yet got a minimal reproducible code.

While meanwhile, the end-to-end build command in MLC LLM can reproduce the issue:

python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target iphone --max-seq-len 768 --quantization q4f16_1 --build-model-only

Here is the error message:

Error message

> python3 -m mlc_llm.build --model Llama-2-7b-chat-hf --target iphone --max-seq-len 768 --quantization q4f16_1 --build-model-only

Using path "dist/models/Llama-2-7b-chat-hf" for model "Llama-2-7b-chat-hf"
Target configured: metal -keys=metal,gpu -libs=iphoneos -max_function_args=31 -max_num_threads=256 -max_shared_memory_per_block=32768 -max_threads_per_block=256 -thread_warp_size=1
Save a cached module to dist/Llama-2-7b-chat-hf-q4f16_1/mod_cache_before_build.pkl.
Compilation error:
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:142:8: error: redefinition of 'Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t'
struct Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t {
       ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:8:8: note: previous definition is here
struct Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t {
       ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:146:78: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel(  device half* A [[ buffer(0) ]],
                                                                             ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:147:31: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  device half* T_transpose [[ buffer(1) ]],
                              ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:148:72: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  constant Llama_2_7b_chat_hf_q4f16_1_transpose6_kernel_args_t& arg [[ buffer(2) ]],
                                                                       ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:149:19: warning: 'threadgroup_position_in_grid' attribute ignored on function declaration [-Wignored-attributes]
  uint blockIdx [[threadgroup_position_in_grid]],
                  ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:150:20: warning: 'thread_position_in_threadgroup' attribute ignored on function declaration [-Wignored-attributes]
  uint threadIdx [[thread_position_in_threadgroup]]
                   ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:152:77: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
kernel void Llama_2_7b_chat_hf_q4f16_1_extend_te_kernel(  device half* A [[ buffer(0) ]],
                                                                            ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:153:29: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  device half* concat_te [[ buffer(1) ]],
                            ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:154:71: warning: 'buffer' attribute ignored on function declaration [-Wignored-attributes]
  constant Llama_2_7b_chat_hf_q4f16_1_extend_te_kernel_args_t& arg [[ buffer(2) ]],
                                                                      ^
/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.metal:155:19: warning: 'threadgroup_position_in_grid' attribute ignored on function declaration [-Wignored-attributes]
  uint blockIdx [[threadgroup_position_in_grid]],
                  ^

(... many similar warnings are omitted)

245 warnings and 1 error generated.
LLVM ERROR: Error opening '/var/folders/0k/ckmv_5b93fl26rqn3fx8v07w0000gp/T/tmpesw5bpqd/my_lib.air': No such file or directory!
Traceback (most recent call last):
  File "/Users/ruihang-macstudio/Workspace/miniforge3/envs/python310/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/ruihang-macstudio/Workspace/miniforge3/envs/python310/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/build.py", line 47, in <module>
    main()
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/build.py", line 43, in main
    core.build_model_from_args(parsed_args)
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/core.py", line 742, in build_model_from_args
    build(mod, args)
  File "/Users/ruihang-macstudio/Workspace/mlc-llm/mlc_llm/core.py", line 629, in build
    ex = relax.build(mod_deploy, args.target, system_lib=args.system_lib)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/relax/vm_build.py", line 343, in build
    return _vmlink(builder, target, tir_mod, ext_libs, params, system_lib=system_lib)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/relax/vm_build.py", line 242, in _vmlink
    lib = tvm.build(
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/driver/build_module.py", line 281, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc", line 527, in tvm::$_5::operator()(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target) const
    return TIRToRuntime(inputs_arg, host_target);
  File "/Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc", line 510, in tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
    device_modules.push_back(codegen::Build(device_mod, it.first));
  File "/Users/ruihang-macstudio/Workspace/tvm/src/target/codegen.cc", line 73, in tvm::codegen::Build(tvm::IRModule, tvm::Target)
    return (*bf)(mod, target);
  File "/Users/ruihang-macstudio/Workspace/tvm/src/target/source/codegen_metal.cc", line 360, in tvm::codegen::BuildMetal(tvm::IRModule, tvm::Target)
    fsource = (*fmetal_compile)(fsource, target).operator std::string();
tvm.error.InternalError: Traceback (most recent call last):
  3: tvm::$_5::operator()(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target) const
        at /Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc:527
  2: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/driver/driver_api.cc:510
  1: tvm::codegen::Build(tvm::IRModule, tvm::Target)
        at /Users/ruihang-macstudio/Workspace/tvm/src/target/codegen.cc:73
  0: tvm::codegen::BuildMetal(tvm::IRModule, tvm::Target)
        at /Users/ruihang-macstudio/Workspace/tvm/src/target/source/codegen_metal.cc:360
  File "/Users/ruihang-macstudio/Workspace/tvm/include/tvm/runtime/packed_func.h", line 836
InternalError: Check failed: type_code_ == kTVMStr (4 vs. 11) : expected str but got NULL

I locally reverted the PR (https://github.com/mlc-ai/relax/commits) and am not gonna revert here, so hopefully the revert will not bother the unity branch too much. I appreciate it if you can kindly take a look at this iPhone build issue to see if it can be fixed quickly. Thank you so much.

tqchen · 2023-11-01T15:50:54Z

I think one issue revealed by the latest set of regression is that we are trying to build a generic set of functionalities that are not necessarily used in some of the subclass settings, and such generalization increases the complexity of the overall code during concurrent development. We also observed similar problems in WebGPU backend.

In this case, perhaps decoupling the codegen logic without overcoupling the common ones would help. Specicially for the case of metal codegen:

Do not reuse PrintFunctionSignature in the metal codegen and simply reuse the original metal codegen logic, as metal codegen do not need to handle GlobalVar callee anyway, and such indirect (through function signature printing) creates extra complexity in maintaining code
Directly reuse the original metal codegen

tqchen · 2023-11-01T16:53:16Z

Looking at the error.

This particular issue is due to the fact that metal codegen will construct a structural buffer to pass all the arguments. Right now normal codegen requires first declare then define the functions, resulting in the same struct definition to appear twice.

WebGPU likely had a similar issue so we simply rolled to the original implementations in the unity branch. We do not need to handle cross function in GPU shaders, so one simple approach should be simply skip the declare steps

tqchen · 2023-11-01T16:57:23Z

My read is that for kernel generators(where we know there won't be cross function call, or we know we do not yet support kernel function call), we can safely remove

https://github.com/Lunderberg/tvm/blob/e8ef2f25f45f44aabfa1652027a05b3aae4c67f8/src/target/source/codegen_metal.cc#L352C2-L354C6 and

add a comment that we skip declaration as the cross-kernel calls are not supported

tqchen · 2023-11-01T18:11:38Z

actually it turns out to be more complicated, although the above analysis indeed is right. I think given shader generator won't have a need for inter function call, we can keep the original logic which will ensure printing the signature and auxiliary data structure once

This PR restores the Metal codegen to the one before apache#15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.

MasterJH5574 · 2023-11-01T19:17:38Z

Filed a PR #16033 that restores the Metal codegen logic. I verified that it has no problem with both iPhone and Metal codegen.

This PR restores the Metal codegen to the one before apache#15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.

* [Codegen][Metal] Disable cross-function call in Metal codegen This PR restores the Metal codegen to the one before #15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler. * Fix the action script

… call (apache#15835)" This reverts commit 698531e. The original PR fixes the metal codegen issue, while building to iPhone in MLC LLM is still broken. Revert the commit for now first.

The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining does is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.

The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.

* [TIR] Update DeclBuffer nodes when specializing PrimFunc Prior to this commit, a buffer whose parameters (e.g. shape/stride) contained a specialized parameter would not be updated when appearing in a `DeclBuffer` node. This commit updates the `Specialize` function to update buffers that occur in `DeclBuffer` nodes. * [TIR] Handle specialization that remaps a buffer var * [TIR] Handle specialization of buffer variable to PrimExpr * [TIR][Transform] Implement InlinePrivateFunctions The functionality to express a call from one `PrimFunc` to another was introduced in #14889. While this was initially planned to be supported at codegen for all targets (see #15835), this resulted in breakage on some backends (see #16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present. * Updates based on review comments * ci bump * CI bump

Lunderberg added 3 commits September 28, 2023 09:04

Added codegen metal CI debug print

1ced648

Print function decl to the argument stream

4eb9f8c

Remove the codegen metal CI debug print-outs

e8ef2f2

Lunderberg changed the title ~~[Draft][CodeGenC][Redo] Handle GlobalVar callee as internal function call~~ [CodeGenC][Redo] Handle GlobalVar callee as internal function call Oct 2, 2023

Lunderberg marked this pull request as ready for review October 2, 2023 13:39

csullivan approved these changes Oct 18, 2023

View reviewed changes

csullivan merged commit 698531e into apache:main Oct 18, 2023
19 of 20 checks passed

Lunderberg deleted the codegen_c_globalvar_as_function_call branch October 18, 2023 19:07

MasterJH5574 mentioned this pull request Nov 1, 2023

[Codegen][Metal] Disable cross-function call in Metal codegen #16033

Merged

Lunderberg mentioned this pull request Nov 29, 2023

[TIR][Transform] Implement InlinePrivateFunctions #16184

Merged

ysh329 mentioned this pull request Jan 11, 2024

[Release] v0.15.0 Release Candidate Notes #16391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

Lunderberg commented Sep 27, 2023

Lunderberg commented Sep 27, 2023

Lunderberg commented Sep 29, 2023 •

edited

Lunderberg commented Oct 12, 2023

csullivan left a comment •

edited

MasterJH5574 commented Nov 1, 2023

tqchen commented Nov 1, 2023 •

edited

tqchen commented Nov 1, 2023

tqchen commented Nov 1, 2023

tqchen commented Nov 1, 2023

MasterJH5574 commented Nov 1, 2023

[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

Conversation

Lunderberg commented Sep 27, 2023

Lunderberg commented Sep 27, 2023

Lunderberg commented Sep 29, 2023 • edited

Lunderberg commented Oct 12, 2023

csullivan left a comment • edited

Choose a reason for hiding this comment

MasterJH5574 commented Nov 1, 2023

tqchen commented Nov 1, 2023 • edited

tqchen commented Nov 1, 2023

tqchen commented Nov 1, 2023

tqchen commented Nov 1, 2023

MasterJH5574 commented Nov 1, 2023

Lunderberg commented Sep 29, 2023 •

edited

csullivan left a comment •

edited

tqchen commented Nov 1, 2023 •

edited