New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835
[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835
Conversation
This reverts commit [`e88d0d`](apache#15725), which itself reverted [`9ff71f`](apache#15103) for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.
This PR is currently listed as a draft, to ensure that the CI can catch the failure mode that was reported here. |
And the fix is now implemented and included in this PR. It ended up being a <10 line fix once the CI was able to catch the problem, but getting the CI to that point was the harder part. @MasterJH5574 Can you verify that PR doesn't re-introduce the issue that you reported here. The CI tests should be sufficient, but it would be good to confirm it as well. |
@MasterJH5574 Have you had a chance to verify that this PR does not cause a regression on OSX? I'd like to avoid leaving it idle for too long, as that allows conflicts with unrelated changes to creep in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Eric, I apologize for missing this for too long. Yes it now works for Metal perfectly I think. While we just noticed that the changes in the PR breaks the “iPhone” target (https://github.com/mlc-ai/mlc-llm/blob/200653a82d025be7d58d0d7f04442f85aee52c98/mlc_llm/utils.py#L542-L561) in MLC LLM. I suppose the issue happens when building end-to-end models: the issue seems not reproducible when building a single TIR function on my side. So I have not yet got a minimal reproducible code. While meanwhile, the end-to-end build command in MLC LLM can reproduce the issue:
Here is the error message: Error message
I locally reverted the PR (https://github.com/mlc-ai/relax/commits) and am not gonna revert here, so hopefully the revert will not bother the unity branch too much. I appreciate it if you can kindly take a look at this iPhone build issue to see if it can be fixed quickly. Thank you so much. |
I think one issue revealed by the latest set of regression is that we are trying to build a generic set of functionalities that are not necessarily used in some of the subclass settings, and such generalization increases the complexity of the overall code during concurrent development. We also observed similar problems in WebGPU backend. In this case, perhaps decoupling the codegen logic without overcoupling the common ones would help. Specicially for the case of metal codegen:
|
Looking at the error. This particular issue is due to the fact that metal codegen will construct a structural buffer to pass all the arguments. Right now normal codegen requires first declare then define the functions, resulting in the same struct definition to appear twice. WebGPU likely had a similar issue so we simply rolled to the original implementations in the unity branch. We do not need to handle cross function in GPU shaders, so one simple approach should be simply skip the declare steps |
My read is that for kernel generators(where we know there won't be cross function call, or we know we do not yet support kernel function call), we can safely remove add a comment that we skip declaration as the cross-kernel calls are not supported |
actually it turns out to be more complicated, although the above analysis indeed is right. I think given shader generator won't have a need for inter function call, we can keep the original logic which will ensure printing the signature and auxiliary data structure once |
This PR restores the Metal codegen to the one before apache#15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.
Filed a PR #16033 that restores the Metal codegen logic. I verified that it has no problem with both iPhone and Metal codegen. |
This PR restores the Metal codegen to the one before apache#15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.
This PR restores the Metal codegen to the one before apache#15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler.
* [Codegen][Metal] Disable cross-function call in Metal codegen This PR restores the Metal codegen to the one before #15835. Due to there will likely be no internal function call in Metal, we think it is safe to do so. Verified that with this PR, the metal codegen and iPhone codegen will not fail and will work properly. The reason of the iPhone codegen failure is because the multiple declarations of a same function will lead to multiple emissions of a same structs, which is not recognizable by the metal compiler. * Fix the action script
… call (apache#15835)" This reverts commit 698531e. The original PR fixes the metal codegen issue, while building to iPhone in MLC LLM is still broken. Revert the commit for now first.
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining does is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
The functionality to express a call from one `PrimFunc` to another was introduced in apache#14889. While this was initially planned to be supported at codegen for all targets (see apache#15835), this resulted in breakage on some backends (see apache#16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present.
* [TIR] Update DeclBuffer nodes when specializing PrimFunc Prior to this commit, a buffer whose parameters (e.g. shape/stride) contained a specialized parameter would not be updated when appearing in a `DeclBuffer` node. This commit updates the `Specialize` function to update buffers that occur in `DeclBuffer` nodes. * [TIR] Handle specialization that remaps a buffer var * [TIR] Handle specialization of buffer variable to PrimExpr * [TIR][Transform] Implement InlinePrivateFunctions The functionality to express a call from one `PrimFunc` to another was introduced in #14889. While this was initially planned to be supported at codegen for all targets (see #15835), this resulted in breakage on some backends (see #16033). After discussion, the plan was changed to support TIR inlining, which would enable the same high-level functionality in TIR without requiring immediate low-level support across all codegens. This commit implements and tests a new IRModule transform `InlinePrivateFunctions`, which can be used as part of lowering in a follow-up commit. Because this is initially implemented for use quite late in the lowering flow, many constructs are not currently supported. The current implementation has the following restrictions. * `tir::Block` nodes may not occur in the inlined function. Because a subroutine may be called multiple times, inlining of a subroutine that contains `tir::Block` would result in non-unique names. Support of subroutines with `tir::Block` instances will require de-duplication of block names. * The subroutine's callsite must occur within a `tir::Evaluate` block. Because inlining a subroutine inserts the `tir::Stmt` body at the point of use, replacement must occur in a context where a `tir::Stmt` can be returned. Support of subroutines that are called within an expression (e.g. Replacing `func` in `Buf[0] = func(1) + func(2)`) would require hoisting preprocessing done in the subroutine to the parent `tir::Stmt`. * The subroutine may only accept primitive arguments, and must have an empty `buffer_map`. Support of subroutines that are called with `tir::Buffer` or `tir::BufferRegion` arguments would require a way to represent these arguments at the callsite, and substitution of the buffer into the callee. If these unsupported constructs are used, then the inlining of those functions is skipped. This commit includes unit tests for these unsupported constructs, to validate that `InlinePrivateFunctions` produces well-formed output even when they are present. * Updates based on review comments * ci bump * CI bump
This reverts commit
e88d0d
, which itself reverted9ff71f
for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.