Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38589: [C++][Gandiva] Support registering external C functions #38632

Merged
merged 9 commits into from Nov 17, 2023

Conversation

niyue
Copy link
Contributor

@niyue niyue commented Nov 8, 2023

Rationale for this change

This PR tries to enhance Gandiva by supporting registering external C functions to its function registry, so that developers can author third party functions with complex dependency and expose them as C functions to be used in Gandiva expression. See more details in GH-38589.

What changes are included in this PR?

This PR primarily adds a new API to the FunctionRegistry so that developers can use it to register external C functions:

arrow::Status Register(
      NativeFunction func, void* c_function_ptr,
      std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);

Are these changes tested?

  • The changes are tested via unit tests in this PR, and the unit tests include several C functions written using C++ and we confirm this kind of functions can be used by Gandiva after registration using the above mentioned new API.
  • Additionally, locally I wrote some Rust based functions, and integrate the Rust based functions into a C++ program by using the new registration API and verified this approach did work, but this piece of work is not included in the PR.

Are there any user-facing changes?

There are several new APIs added to FunctionRegistry class:

/// \brief register a C function into the function registry
  /// @param func the registered function's metadata
  /// @param c_function_ptr the function pointer to the
  /// registered function's implementation
  /// @param function_holder_maker this will be used as the function holder if the
  /// function requires a function holder
  arrow::Status Register(
      NativeFunction func, void* c_function_ptr,
      std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);

  /// \brief get a list of C functions saved in the registry
  const std::vector<std::pair<NativeFunction, void*>>& GetCFunctions() const;

  const FunctionHolderMakerRegistry& GetFunctionHolderMakerRegistry() const;

Notes

Copy link

github-actions bot commented Nov 8, 2023

⚠️ GitHub issue #38589 has been automatically assigned in GitHub to PR creator.

@niyue niyue force-pushed the feature/external-c-interface-func branch from 77a8135 to 2fcd234 Compare November 8, 2023 06:10
return std::make_pair(args, ret_llvm_type);
}

arrow::Status ExternalCInterfaceFunctions::AddMappings(Engine* engine) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the LLVM types of stub functions arguments are hard coded in the code base, now we automatically convert arrow data types of these function's parameter types into their LLVM equivalent types so that users don't have to specify similar info twice.

}

private:
static map_type& makers() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into cc file and make this state non static by copying them into each registry's instance.

/// function requires a function holder
arrow::Status Register(
NativeFunction func, void* c_interface_function_ptr,
std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);
Copy link
Contributor Author

@niyue niyue Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new API used for registering external C interface functions. If a function requires function holder, it can use the function_holder_maker to specify it

@niyue niyue force-pushed the feature/external-c-interface-func branch from 2fcd234 to 26491ce Compare November 8, 2023 06:15
@@ -3608,4 +3608,80 @@ TEST_F(TestProjector, TestExtendedFunctions) {
EXPECT_ARROW_ARRAY_EQUALS(out, outs.at(0));
}

TEST_F(TestProjector, TestExtendedCInterfaceFunctions) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case demonstrates registering an external C interface function (but this function doesn't use function holder/context)

EXPECT_ARROW_ARRAY_EQUALS(out, outs.at(0));
}

TEST_F(TestProjector, TestExtendedCInterfaceFunctionsWithFunctionHolder) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case demonstrates registering an external C interface function that uses function holder

EXPECT_ARROW_ARRAY_EQUALS(out, outs.at(0));
}

TEST_F(TestProjector, TestExtendedCInterfaceFunctionThatNeedsContext) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case demonstrates registering an external C interface function that uses context.

arrow::Status Engine::AddGlobalMappings() {
ARROW_RETURN_NOT_OK(ExportedFuncsRegistry::AddMappings(this));
ExternalCInterfaceFunctions c_interface_funcs(function_registry_);
return c_interface_funcs.AddMappings(this);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The external C interface functions are added and mapped to the JIT engine here.

@@ -38,7 +38,7 @@

namespace gandiva {

void ExportedDecimalFunctions::AddMappings(Engine* engine) const {
arrow::Status ExportedDecimalFunctions::AddMappings(Engine* engine) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For external C interface functions, they may cause some error when adding mapping for them, so I change the AddGlobalMappings and ExportedFuncsBase::AddMappings functions to return arrow::Status to represent the result

@niyue
Copy link
Contributor Author

niyue commented Nov 8, 2023

@js8544 @kou
As discussed in the recent PR #38116, I submitted a follow up PR to further enhance Gandiva's external function registration capability, allowing registering C interface functions (stub function called by Gandiva internally) in this PR. Could you please help to review? Thanks.

notes

  • The name C interface function is different with stub function since I think stub function may not be easily understood externally, and I am looking for something equivalent to LLVM IR based function but I am not sure if there is a better term than C interface function, please advice.
  • I am not sure if I need to post a new discussion thread to the mailing list since it was discussed previously for PR 38116, and this PR is a smaller enhancement to the previous one. Please let me know if I need to post a new discussion thread for this. Thanks.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming "stub function" as "C function" not "C interface function"?
(And how about naming LLVM IR based function as "IR function"?)
It seems that "interface" is redundant.

cpp/src/gandiva/external_c_interface_functions.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/external_c_interface_functions.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/external_c_interface_functions.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/function_holder_maker_registry.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/function_registry.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_string_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/tests/test_util.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/tests/test_util.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 9, 2023
@niyue niyue changed the title GH-38589: [C++][Gandiva] Support registering external C interface functions GH-38589: [C++][Gandiva] Support registering external C functions Nov 9, 2023
@niyue niyue force-pushed the feature/external-c-interface-func branch from 33c1bb1 to c8aaa50 Compare November 9, 2023 11:40
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 9, 2023
@niyue niyue force-pushed the feature/external-c-interface-func branch from c8aaa50 to 258f613 Compare November 9, 2023 12:13
@niyue
Copy link
Contributor Author

niyue commented Nov 9, 2023

How about naming "stub function" as "C function" not "C interface function"?

Thanks for the suggestion. I rename the PR/corresponding issue description/code to use the name C function now.

@niyue niyue force-pushed the feature/external-c-interface-func branch from 258f613 to e2c3f24 Compare November 9, 2023 13:39
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@js8544 You may want to review this too.

I am not sure if I need to post a new discussion thread to the mailing list since it was discussed previously for PR 38116, and this PR is a smaller enhancement to the previous one.

I think that we don't need this because we don't have a new specification here.

cpp/src/gandiva/external_c_functions.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/external_c_functions.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/external_c_functions.cc Outdated Show resolved Hide resolved
Comment on lines +42 to +47
template <typename HolderType>
static arrow::Result<FunctionHolderPtr> HolderMaker(const FunctionNode& node) {
std::shared_ptr<HolderType> derived_instance;
ARROW_RETURN_NOT_OK(HolderType::Make(node, &derived_instance));
return derived_instance;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change HolderType::Make() to use Result instead of Status (e.g. Status LikeHolder::Make(...) -> Result<std::shared_ptr<LikeHolder>> LikeHolder::Make(...)), we can remove this helper template function?
If so, we can do it as a follow-up task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. If I make this change, we need to change some existing classes, like LikeHolder/ReplaceHolder/etc. They are GANDIVA_EXPORT classes, but I am not sure if they are really used externally. The code will be simpler after this refactoring. I am glad to submit another PR for this if this is desired.

cpp/src/gandiva/function_registry.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/function_registry.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the awaiting change review Awaiting change review label Nov 9, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Nov 9, 2023
Copy link
Collaborator

@js8544 js8544 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks! Just a couple of nits.

#include "gandiva/function_registry.h"
#include "gandiva/function_signature.h"
#include "gandiva/in_holder.h"
#include "gandiva/node.h"
#include "gandiva/regex_functions_holder.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this include needed?

Copy link
Contributor Author

@niyue niyue Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found LikeHolder is used below but its header is not included so I added it (it is likely included indirectly by other header files ). Let me know if this is not recommended in the project (if not included, I ran into some issues in other projects, during refactoring, a indirectly included header file was removed, and causing the other file failed to be compiled)

cpp/src/gandiva/external_c_functions.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/function_registry.cc Outdated Show resolved Hide resolved
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge in a few days if nobody objects it.

cpp/src/gandiva/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/src/gandiva/external_c_functions.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting merge Awaiting merge awaiting changes Awaiting changes labels Nov 13, 2023
@niyue niyue force-pushed the feature/external-c-interface-func branch from a3d0625 to ec24d23 Compare November 17, 2023 02:51
@niyue
Copy link
Contributor Author

niyue commented Nov 17, 2023

To resolve merge conflicts, rebased onto the latest main branch.

@kou
Copy link
Member

kou commented Nov 17, 2023

No objection. I'll merge this.

@kou kou merged commit c353c81 into apache:main Nov 17, 2023
30 of 31 checks passed
@kou kou removed the awaiting change review Awaiting change review label Nov 17, 2023
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit c353c81.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

pitrou pushed a commit that referenced this pull request Nov 29, 2023
…ult (#38873)

### Rationale for this change
* This PR tries to make Gandiva `FunctionHolder` classes to return `arrow::Result` instead of using output parameters, and this tries to address the follow up task mentioned in #38632 (comment) and makes the code slightly simpler

### What changes are included in this PR?
* A refactoring task to return `arrow::Result` in Gandiva FunctionHolder classes

### Are these changes tested?
It should be covered by existing unit tests.

### Are there any user-facing changes?
No
* Closes: #38920

Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ns (apache#38632)

### Rationale for this change
This PR tries to enhance Gandiva by supporting registering external C functions to its function registry, so that developers can author third party functions with complex dependency and expose them as C functions to be used in Gandiva expression. See more details in apacheGH-38589.

### What changes are included in this PR?
This PR primarily adds a new API to the `FunctionRegistry` so that developers can use it to register external C functions:
```C++
arrow::Status Register(
      NativeFunction func, void* c_function_ptr,
      std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);
```

### Are these changes tested?
* The changes are tested via unit tests in this PR, and the unit tests include several C functions written using C++ and we confirm this kind of functions can be used by Gandiva after registration using the above mentioned new API.
* Additionally, locally I wrote some Rust based functions, and integrate the Rust based functions into a C++ program by using the new registration API and verified this approach did work, but this piece of work is not included in the PR.

### Are there any user-facing changes?
There are several new APIs added to `FunctionRegistry` class:
```C++
/// \brief register a C function into the function registry
  /// @ param func the registered function's metadata
  /// @ param c_function_ptr the function pointer to the
  /// registered function's implementation
  /// @ param function_holder_maker this will be used as the function holder if the
  /// function requires a function holder
  arrow::Status Register(
      NativeFunction func, void* c_function_ptr,
      std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt);

  /// \brief get a list of C functions saved in the registry
  const std::vector<std::pair<NativeFunction, void*>>& GetCFunctions() const;

  const FunctionHolderMakerRegistry& GetFunctionHolderMakerRegistry() const;
```

* Closes: apache#38589

### Notes
* This PR is related with apache#38116, which adds the initial support for registering LLVM IR based external functions into Gandiva.

Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…ow Result (apache#38873)

### Rationale for this change
* This PR tries to make Gandiva `FunctionHolder` classes to return `arrow::Result` instead of using output parameters, and this tries to address the follow up task mentioned in apache#38632 (comment) and makes the code slightly simpler

### What changes are included in this PR?
* A refactoring task to return `arrow::Result` in Gandiva FunctionHolder classes

### Are these changes tested?
It should be covered by existing unit tests.

### Are there any user-facing changes?
No
* Closes: apache#38920

Authored-by: Yue Ni <niyue.com@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++][Gandiva] Allow registering external C functions
3 participants