[C++][Gandiva] Refactor built-in stub functions to use `FunctionRegistry` for registration

### Describe the enhancement requested

# Description
Currently, Gandiva has some internal stub functions, which are registered via two steps:
1) function metadata are registered in multiple internal registry classes, such as:
1.1) `GetStringFunctionRegistry` in `function_registry_string.cc`
1.2) `GetMathOpsFunctionRegistry` in `function_registry_math_ops.cc`
1.3) etc
2) The stub functions' implementation are mapped to LLVM engine in:
2.1) `ExportedStubFunctions::AddMappings`
2.2) `ExportedHashFunctions::AddMappings`
2.3) `ExportedStringFunctions::AddMappings`

There are some issues with this organizing approach:
* When adding/removing a stub function, developers need to look for and change two places, which is not convenient. For example, when adding a new string function, both  `GetStringFunctionRegistry` in `function_registry_string.cc` and `ExportedStringFunctions::AddMappings` in `gdv_string_function_stubs.cc` need to be modified
* The LLVM type information provided in the `AddMappings` API is similar as the function signature metadata provided in `GetXXXFunctionRegistry` API, which cost more time and effort for developers to maintain.

# Proposal
In PR https://github.com/apache/arrow/pull/38632, we added the capability to programmatically map function signature `NativeFunction` into LLVM-typed args. So the LLVM args for each function in `AddMappings` could be mapped directly from its `NativeFunction`. 
This proposal plans to use `FunctionRegistry`'s `Register` C function API to internally register the existing stub functions, and this will leverage the above mapping capability, and for stub functions, we could combine the metadata registration and implementation mapping into one step, so that:
* stub function metadata and implementation are associated and registered in one place, and developers don't have to look for two places for maintainance
* when adding/updating a stub function's signature, there is no need for developers to manually map arrow data type signature into LLVM-typed args, which makes it easier to maintain and it is less error prone. And this will simplify the code a lot as well, it is expected to reduce 500~1000 lines of code via this change.
  * After this refactoring, the code like below is not necessary and is expected to be removed:
    * https://github.com/apache/arrow/blob/47dadb02c3426c5bdd0df903dbc0f6ec17c5c784/cpp/src/gandiva/gdv_function_stubs.cc#L1197-L1212
    * some of the stub functions, are not exposed to end users in expressions, and they are expected to be used only internally, such as the several context helper functions, some `in` related functions, and the `gdv_fn_time_with_zone` time casting function. For these functions, we may have to leave them as they are since they don't even have the corresponding metadata. There are probably some more functions like them, it seems not easy to find them all, and we have to go through all the stub functions to figure out.

### Component(s)

C++ - Gandiva

	// mask-show-n
	mask_args = {
	types->i64_type(), // context
	types->i8_ptr_type(), // data
	types->i32_type(), // data_length
	types->i32_type(), // n_to_show
	types->i32_ptr_type() // out_length
	};

	engine->AddGlobalMappingForFunc(
	"gdv_mask_show_first_n_utf8_int32", types->i8_ptr_type() /return_type/, mask_args,
	reinterpret_cast<void*>(gdv_mask_show_first_n_utf8_int32));

	engine->AddGlobalMappingForFunc(
	"gdv_mask_show_last_n_utf8_int32", types->i8_ptr_type() /return_type/, mask_args,
	reinterpret_cast<void*>(gdv_mask_show_last_n_utf8_int32));

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++][Gandiva] Refactor built-in stub functions to use `FunctionRegistry` for registration #39052

Describe the enhancement requested

Description

Proposal

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++][Gandiva] Refactor built-in stub functions to use FunctionRegistry for registration #39052

Description

Describe the enhancement requested

Description

Proposal

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[C++][Gandiva] Refactor built-in stub functions to use `FunctionRegistry` for registration #39052