Skip to content

[C++][Gandiva] Refactor built-in stub functions to use FunctionRegistry for registration #39052

@niyue

Description

@niyue

Describe the enhancement requested

Description

Currently, Gandiva has some internal stub functions, which are registered via two steps:

  1. function metadata are registered in multiple internal registry classes, such as:
    1.1) GetStringFunctionRegistry in function_registry_string.cc
    1.2) GetMathOpsFunctionRegistry in function_registry_math_ops.cc
    1.3) etc
  2. The stub functions' implementation are mapped to LLVM engine in:
    2.1) ExportedStubFunctions::AddMappings
    2.2) ExportedHashFunctions::AddMappings
    2.3) ExportedStringFunctions::AddMappings

There are some issues with this organizing approach:

  • When adding/removing a stub function, developers need to look for and change two places, which is not convenient. For example, when adding a new string function, both GetStringFunctionRegistry in function_registry_string.cc and ExportedStringFunctions::AddMappings in gdv_string_function_stubs.cc need to be modified
  • The LLVM type information provided in the AddMappings API is similar as the function signature metadata provided in GetXXXFunctionRegistry API, which cost more time and effort for developers to maintain.

Proposal

In PR #38632, we added the capability to programmatically map function signature NativeFunction into LLVM-typed args. So the LLVM args for each function in AddMappings could be mapped directly from its NativeFunction.
This proposal plans to use FunctionRegistry's Register C function API to internally register the existing stub functions, and this will leverage the above mapping capability, and for stub functions, we could combine the metadata registration and implementation mapping into one step, so that:

  • stub function metadata and implementation are associated and registered in one place, and developers don't have to look for two places for maintainance
  • when adding/updating a stub function's signature, there is no need for developers to manually map arrow data type signature into LLVM-typed args, which makes it easier to maintain and it is less error prone. And this will simplify the code a lot as well, it is expected to reduce 500~1000 lines of code via this change.
    • After this refactoring, the code like below is not necessary and is expected to be removed:
      • // mask-show-n
        mask_args = {
        types->i64_type(), // context
        types->i8_ptr_type(), // data
        types->i32_type(), // data_length
        types->i32_type(), // n_to_show
        types->i32_ptr_type() // out_length
        };
        engine->AddGlobalMappingForFunc(
        "gdv_mask_show_first_n_utf8_int32", types->i8_ptr_type() /*return_type*/, mask_args,
        reinterpret_cast<void*>(gdv_mask_show_first_n_utf8_int32));
        engine->AddGlobalMappingForFunc(
        "gdv_mask_show_last_n_utf8_int32", types->i8_ptr_type() /*return_type*/, mask_args,
        reinterpret_cast<void*>(gdv_mask_show_last_n_utf8_int32));
      • some of the stub functions, are not exposed to end users in expressions, and they are expected to be used only internally, such as the several context helper functions, some in related functions, and the gdv_fn_time_with_zone time casting function. For these functions, we may have to leave them as they are since they don't even have the corresponding metadata. There are probably some more functions like them, it seems not easy to find them all, and we have to go through all the stub functions to figure out.

Component(s)

C++ - Gandiva

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions