Skip to content

Hide shim function linkables behind typing/lowering pass, removes c_ext_shim_func variable#102

Closed
isVoid wants to merge 7 commits intoNVIDIA:mainfrom
isVoid:fea-link-inplace
Closed

Hide shim function linkables behind typing/lowering pass, removes c_ext_shim_func variable#102
isVoid wants to merge 7 commits intoNVIDIA:mainfrom
isVoid:fea-link-inplace

Conversation

@isVoid
Copy link
Collaborator

@isVoid isVoid commented Feb 25, 2025

Numba_cuda introduced a new way to auto link external functions to declared functions in Numba. Numbast leverages this new mechanism to auto link shim functions, hiding away its linkage from user API.

Currently failing with linker error: undefined reference to add_1 in <unnamed-ptx>

@isVoid
Copy link
Collaborator Author

isVoid commented Mar 5, 2025

The reason why this error happens is that the link argument is only picked up if the declared device function is used in a proper type inference pass. However in Numbast, the link to external function is used directly in lowering via compile_internal, which bypasses the typing pass.

@isVoid isVoid changed the title Use declare_device(link) to add shim functions Hide shim function linkables behind typing/lowering pass, removes c_ext_shim_func variable Mar 12, 2025
@isVoid
Copy link
Collaborator Author

isVoid commented Mar 12, 2025

Progress:

Current state of the PR attempts to add the linkable object at lowering time to an additional attribute _external_linkage in CUDATargetContext. And Numba cuda iterates this set and adds them to linker. There's a drawback to this: context stays alive through out the program. If we launches many kernels, the context is going to grow larger, yielding longer launch latency. Ideally, the linkable code should be added to somewhere that's only tied to current kernel launch. For example, adding them to the ExternFunction's link attribute*. The ExternFunction isn't general enough because for operators, the typemap does not contain a reference to the loaded external function. Therefore its link attribute isn't visible at launch time.

Some of this relates to what's visible at lowering time. It seems like some metadata can be rolled over in compilation pipeline and yields in the final CompileResult. But without further digging, I'm not sure how to add them there.

*Technically, the externfunction object also stays alive throughout the program. However, the max number of linkable objects in this function equals the number of overloaded functions in C++, which usually is low in numbers (otherwise the clang parser could've hanged, which is a larger issue).

@isVoid
Copy link
Collaborator Author

isVoid commented Mar 12, 2025

So far I quite prefer adding linkable code at lowering time since it keeps the shim function local. It has much less code change than the AbstractTemplate approach. Logically it also makes sense because resolving to external linkage does consider itself as part of the "implementation", not type inference.

@isVoid
Copy link
Collaborator Author

isVoid commented Mar 18, 2025

Superceded by #106

@isVoid isVoid closed this Mar 18, 2025
@isVoid isVoid mentioned this pull request Mar 18, 2025
isVoid added a commit that referenced this pull request May 5, 2025
- The linkable code object that contains the shim functions are added to
`active_linking_library` in python.
- Streamed Shim Function Writes. The shim functions are written to a
string stream at lowering time using a special class named
`KeyedStringIO` that prevents double write of the same shim function.
- Ruff formatting generated code (with import sort):
#108
- Generation metadata added to binding: #116 
- Allow user to specify entry point and retain file list in config file
using relative path
d1e7aa2
- Allow user to add additional imports via `Additional Import` config
item
8f7bd22
- Allow user to override the include line in shim function via `Shim
Include Override` config item
8f7bd22
- Allow user to specify whether to include check in the binding to
assert existence of pynvjitlink
[bad0c69](bad0c69)
- Allow user to specify a custom macro, which dictates both how
clangTooling parses the header file, as well as how the shim function is
compiled with NVRTC.
[0f0ee9c](0f0ee9c)

[ORIGINAL]
This PR supercedes #102, which hand picked the parts where only lowering
modifications are involved. In essence, this PR uses a new attribute in
Numba-cuda target context named `_external_linkage`. By inserting the
correct external linkables in the context, this PR hides the shim
function usage to user, therefore eliminating the need for
`c_ext_shim_func` variable.

Additionally, since each shim function is isolated into the sub scope
inside each function lowering, the monolithic shim function block no
longer exists. This should presumably speedup compilation due to reduced
IO burden on the size of the shim function being parsed by the RTC.

~~depends on NVIDIA/numba-cuda#165

Jointly in work with NVIDIA/numba-cuda#166

Includes changes from #116, #108

---------

Co-authored-by: isVoid <isVoid@users.noreply.github.com>
Co-authored-by: Yevhenii Havrylko <yhavrylko@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant