Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGenC] Handle GlobalVar callee as internal function call #15103

Merged
merged 13 commits into from
Aug 8, 2023

Conversation

Lunderberg
Copy link
Contributor

Analogous to #14901, treat GlobalVar callees as internal function calls in CodeGenC. This specific PR doesn't provide new end-to-end functionality, as the target="c" backend isn't compiled. It does lead into allowing subroutines in any target whose codegen derives from CodeGenC, which will depend on the single-module lowering flow in #14985.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 14, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: codegenc See #10317 for details

Generated by tvm-bot

@Lunderberg Lunderberg force-pushed the codegen_c_support_subroutine_calls branch 4 times, most recently from 3860422 to 2a9ddc3 Compare June 21, 2023 14:10
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jun 21, 2023
Prior to this commit, the tests in
`test_tir_transform_inject_ptx_async_copy.py` registered the
`"tvm_callback_cuda_postproc"` function during at pytest collection
time, and used a global variable to disable its functionality outside
of the tests in this file.  This had two major issues.  First, if any
other test also installs a postproc function, these tests would fail.
Second, if these tests fail, the global variable controlling the
postproc function would also fail, causing any subsequent CUDA-related
tests to fail.

This commit updates these NVPTX tests to conditionally install the
postproc function, to de-register it after the test instead of
disabling its functionality, and to de-register it regardless of the
test result.

This issue was initially found when debugging
apache#15103, when a failure in
`test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else`
caused failures in 32 unrelated tests ([CI
link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jun 21, 2023
Prior to this commit, the tests in
`test_tir_transform_inject_ptx_async_copy.py` registered the
`"tvm_callback_cuda_postproc"` function during pytest collection, and
used a global variable to disable its functionality outside of the
tests in this file.  This had two major issues.  First, if any other
test also installs a postproc function, these postproc function
required by the NVPTX tests would be overwritten.  Second, if one of
the NTPTX tests fails, the global variable controlling the postproc
function would not be reset, causing any subsequent CUDA-related tests
to also fail.

This commit updates these NVPTX tests to conditionally install the
postproc function, to de-register it after the test instead of
disabling its functionality, and to de-register it regardless of the
test result.

This issue was initially found when debugging
apache#15103, when a failure in
`test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else`
caused failures in 32 unrelated tests ([CI
link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
@Lunderberg Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 4fd4497 to 82f0a26 Compare June 21, 2023 21:01
@Lunderberg Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 4852ca2 to 60da49a Compare July 3, 2023 16:28
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jul 4, 2023
Prior to this commit, the tests in
`test_tir_transform_inject_ptx_async_copy.py` registered the
`"tvm_callback_cuda_postproc"` function during pytest collection, and
used a global variable to disable its functionality outside of the
tests in this file.  This had two major issues.  First, if any other
test also installs a postproc function, these postproc function
required by the NVPTX tests would be overwritten.  Second, if one of
the NTPTX tests fails, the global variable controlling the postproc
function would not be reset, causing any subsequent CUDA-related tests
to also fail.

This commit updates these NVPTX tests to conditionally install the
postproc function, to de-register it after the test instead of
disabling its functionality, and to de-register it regardless of the
test result.

This issue was initially found when debugging
apache#15103, when a failure in
`test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else`
caused failures in 32 unrelated tests ([CI
link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
- Call `DeclareFunction` for each `PrimFunc`, prior to any
  `AddFunction` calls

- Provide both `GlobalVar` and `PrimFunc` to `AddFunction` calls.
C's automatic pointer cast (e.g. `void*` to `int*`) means that use of
the arguments to infer the function signature may be incorrect.  If a
`call_extern` refers to a function within the same module, only output
a single forward declaration based on the PrimFunc's parameters, not
based on the CallNode's arguments.
These `Call` instances can return a
`PointerType(PrimType(pointee_dtype))` rather than a
`PrimType(DataType::Handle())`.
@Lunderberg Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 33bc277 to 02e9ddf Compare July 5, 2023 12:59
Previously, the micro kernels for gemm, avg_pool, max_pool, and
tensordot relied on C's implicit type conversions for the arguments,
when the caller's argument types differ from the signature's parameter
types.  This works, except when the codegen has auto-generated a
forward declaration based on the caller's argument types, such as
during AOT, which then causes a conflicting definition.

Since the codegen cannot determine the functions names from the
`"pragma_import_c"` in order to suppress these forward declarations,
this conflict can be more easily resolved by updating the micro kernel
signatures.  The three types of mismatches are below.

- Use of `int` or `long` parameters, whose width may vary by compiler,
  instead of fixed-width types.

- TIR expecting the data array's integer type to also be used as an
  error code's return type, rather than the micro kernels' `int32_t`
  error code.

- Pointer conversion done during argument conversion.

Type conversions are done at the start of each micro kernel, to avoid
changing types that are used within the computational sections of each
micro kernel.
@Lunderberg Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 02e9ddf to 2bf2313 Compare July 5, 2023 13:47
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 15, 2023
…#15136)

Prior to this commit, the tests in
`test_tir_transform_inject_ptx_async_copy.py` registered the
`"tvm_callback_cuda_postproc"` function during pytest collection, and
used a global variable to disable its functionality outside of the
tests in this file.  This had two major issues.  First, if any other
test also installs a postproc function, these postproc function
required by the NVPTX tests would be overwritten.  Second, if one of
the NTPTX tests fails, the global variable controlling the postproc
function would not be reset, causing any subsequent CUDA-related tests
to also fail.

This commit updates these NVPTX tests to conditionally install the
postproc function, to de-register it after the test instead of
disabling its functionality, and to de-register it regardless of the
test result.

This issue was initially found when debugging
apache#15103, when a failure in
`test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else`
caused failures in 32 unrelated tests ([CI
link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
Copy link
Contributor

@csullivan csullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with only a few questions and nitpicks.

src/target/source/codegen_c.h Outdated Show resolved Hide resolved
src/target/source/codegen_c.h Outdated Show resolved Hide resolved
src/target/source/codegen_c.h Outdated Show resolved Hide resolved
@csullivan csullivan merged commit 9ff71f4 into apache:main Aug 8, 2023
17 checks passed
@Lunderberg Lunderberg deleted the codegen_c_support_subroutine_calls branch August 9, 2023 02:48
ekalda added a commit to ekalda/tvm that referenced this pull request Aug 10, 2023
The tests in `tests/python/relay/aot` were run as part of
task_python_integration.sh which is run for `ci-cpu` and `ci-arm`, but
some of tests in that folder require CMSIS-NN or FVP, so in
practice they were not running anywhere in the upstream CI.

Since all the tests in that folder could run on `ci-cortexm`, change
the scripts run them there.

Also fix some tests that started failing as a result of apache#15103 (but
didn't manifest in upstream CI) and `test_device_api_hooks_unpacked_api`
that clearly hasn't been running in any CI for a while.
lhutton1 pushed a commit that referenced this pull request Aug 11, 2023
[TEST] Run tests in tests/python/relay/aot in `ci-cortexm`

The tests in `tests/python/relay/aot` were run as part of
task_python_integration.sh which is run for `ci-cpu` and `ci-arm`, but
some of tests in that folder require CMSIS-NN or FVP, so in
practice they were not running anywhere in the upstream CI.

Since all the tests in that folder could run on `ci-cortexm`, change
the scripts run them there.

Also fix some tests that started failing as a result of #15103 (but
didn't manifest in upstream CI) and `test_device_api_hooks_unpacked_api`
that clearly hasn't been running in any CI for a while.
@MasterJH5574
Copy link
Contributor

Hello @Lunderberg, unfortunately this PR/commit completely breaks our Metal codegen.

You are able to reproduce this issue by following the steps on Mac below:

  • open tests/python/unittest/test_allreduce_cuda.py (I just picked an arbitrary test for CUDA),
  • replace all occurrences of cuda with metal,
  • run this test_allreduce_cuda.py.

Then you will see the following error:

~/W/tvm/t/p/unittest @4d7e93cd *1 !2 ❯ python test_allreduce_cuda.py                                 python310 ruihang-macstudio@GS22385 08:12:05 PM
Traceback (most recent call last):
  File "/Users/ruihang-macstudio/Workspace/tvm/tests/python/unittest/test_allreduce_cuda.py", line 104, in <module>
    test_allreduce_metal()
  File "/Users/ruihang-macstudio/Workspace/tvm/tests/python/unittest/test_allreduce_cuda.py", line 99, in test_allreduce_metal
    check_sum(d1, d2, d3)
  File "/Users/ruihang-macstudio/Workspace/tvm/tests/python/unittest/test_allreduce_cuda.py", line 69, in check_sum
    f(a, b)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/runtime/module.py", line 201, in __call__
    return self.entry_func(*args)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/Users/ruihang-macstudio/Workspace/tvm/src/runtime/library_module.cc", line 76, in tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
    ICHECK_EQ(ret, 0) << TVMGetLastError();
tvm.error.InternalError: Traceback (most recent call last):
  0: tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/library_module.cc:76
  6: tvm::runtime::ModuleNode::GetFuncFromEnv(tvm::runtime::String const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/module.cc:114
  5: tvm::runtime::ModuleNode::GetFunction(tvm::runtime::String const&, bool)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/module.cc:66
  4: tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:246
  3: void tvm::runtime::metal::AutoReleasePoolWrapper::operator<<<tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>(tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0 const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_common.h:89
  2: tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()() const
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:257
  1: tvm::runtime::MetalWrappedFunc::Init(tvm::runtime::MetalModuleNode*, tvm::runtime::ObjectPtr<tvm::runtime::Object>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long, unsigned long, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:187
  0: tvm::runtime::MetalModuleNode::GetPipelineState(unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:109
  File "/Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm", line 109
  File "/Users/ruihang-macstudio/Workspace/tvm/src/runtime/library_module.cc", line 76
InternalError: Check failed: ret == 0 (-1 vs. 0) : TVMError: Fail to compile metal source:program_source:10:1: error: expected parameter declarator
;
^
program_source:10:1: error: expected ')'
program_source:8:36: note: to match this '('
kernel void default_function_kernel(  device float* A [[ buffer(0) ]],
                                   ^
program_source:8:58: warning: 'buffer' attribute ignored on function declaration
kernel void default_function_kernel(  device float* A [[ buffer(0) ]],
                                                         ^
program_source:9:22: warning: 'buffer' attribute ignored on function declaration
  device float* B [[ buffer(1) ]],
                     ^
program_source:11:20: error: 'threadgroup_position_in_grid' attribute only applies to parameters
  uint3 blockIdx [[threadgroup_position_in_grid]],
                   ^
program_source:11:9: error: program scope variable must reside in constant address space
  uint3 blockIdx [[threadgroup_position_in_grid]],
        ^
program_source:11:50: error: expected ';' at end of declaration
  uint3 blockIdx [[threadgroup_position_in_grid]],
                                                 ^
                                                 ;
program_source:12:21: error: 'thread_position_in_threadgroup' attribute only applies to parameters
  uint3 threadIdx [[thread_position_in_threadgroup]]
                    ^
program_source:12:9: error: program scope variable must reside in constant address space
  uint3 threadIdx [[thread_position_in_threadgroup]]
        ^
program_source:12:53: error: expected ';' after top level declarator
  uint3 threadIdx [[thread_position_in_threadgroup]]
                                                    ^
                                                    ;

It says that the generated Metal code cannot be parsed.

By reverting this commit, I am able to get this test run (though it will fail on numerical comparison due to precision issue, which is a nonissue for here).

Could you make a fix for this issue? I appreciate it if the problem can get resolved shortly. Thank you!

@junrushao
Copy link
Member

We might need some basic testings for Metal to prevent this issue from happening...For now, to quickly ensure that MLC LLM is not impacted, @MasterJH5574 shall we submit a PR to temporarily revert this commit? Thanks a lot!

junrushao added a commit that referenced this pull request Sep 12, 2023
junrushao added a commit that referenced this pull request Sep 12, 2023
junrushao added a commit that referenced this pull request Sep 12, 2023
MasterJH5574 pushed a commit that referenced this pull request Sep 12, 2023
…#15725)

Revert "[CodeGenC] Handle GlobalVar callee as internal function call (#15103)"

This reverts commit 9ff71f4, a recent change that breaks the Metal backend.
MasterJH5574 added a commit to MasterJH5574/tvm that referenced this pull request Sep 13, 2023
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Sep 15, 2023
Prior to this commit, the CI compiled TVM with `USE_METAL=ON` on OSX,
as defined in `conda/recipe/build.sh`, but did not validate the
execution of any generated metal kernels.  As a result, breakage could
occur without being caught by the CI, such as found following
apache#15103.

This commit adds the execution of a single metal kernel as a minimal
functionality test of the metal backend.
Lunderberg added a commit that referenced this pull request Sep 27, 2023
* [Unittest][Metal] Add minimal metal functionality test to CI

Prior to this commit, the CI compiled TVM with `USE_METAL=ON` on OSX,
as defined in `conda/recipe/build.sh`, but did not validate the
execution of any generated metal kernels.  As a result, breakage could
occur without being caught by the CI, such as found following
#15103.

This commit adds the execution of a single metal kernel as a minimal
functionality test of the metal backend.

* CI testing, attempt a compile-only test case

* CI testing, moved intentional failure from test-case to contrib.xcode

* Move intentional failure point into codegen

* ci bump

* Removing the intentional failure during metallib compilation
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Sep 27, 2023
This reverts commit
[`e88d0d`](apache#15725), which
itself reverted
[`9ff71f`](apache#15103) for
breakages on the metal backend.  Now that the CI contains
compile-time testing of the metal codegen, the original
breakage should be identifiable.
csullivan pushed a commit that referenced this pull request Oct 18, 2023
…15835)

* [CodeGenC][Redo] Handle GlobalVar callee as internal function call

This reverts commit
[`e88d0d`](#15725), which
itself reverted
[`9ff71f`](#15103) for
breakages on the metal backend.  Now that the CI contains
compile-time testing of the metal codegen, the original
breakage should be identifiable.

* Added codegen metal CI debug print

* Print function decl to the argument stream

* Remove the codegen metal CI debug print-outs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants