Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16677: [C++] Support nesting of function registries #13252

Merged
merged 9 commits into from Jun 18, 2022

Conversation

rtpsw
Copy link
Contributor

@rtpsw rtpsw commented May 27, 2022

Currently, only a default function-registry is supported. Modifying this registry has global effects, which is often undesirable. Support for nesting function-registries will provide scoping for such modifications.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@rtpsw
Copy link
Contributor Author

rtpsw commented May 27, 2022

cc @icexelloss

@rtpsw
Copy link
Contributor Author

rtpsw commented May 27, 2022

This proposed feature, along with the one in #13232 , would enable scoping for all registries used for UDFs.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 2, 2022

@westonpace - could you review?

@westonpace westonpace self-requested a review June 2, 2022 16:32
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this a bit but this looks useful.

cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry_test.cc Show resolved Hide resolved
cpp/src/arrow/compute/registry_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry_test.cc Outdated Show resolved Hide resolved
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, picked the wrong radio bubble :)

@rtpsw rtpsw requested a review from westonpace June 3, 2022 14:12
@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 3, 2022

Looks like the failed jobs are not due to the changes here.

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 7, 2022

@westonpace, is this good to go?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these changes are minor but we do need to clean up the constructors.

I realized as I was doing the review that some of these method comments (e.g. \brief) were following existing patterns in this file. However, I think the suggestions move us more in line with the rest of the code base (e.g. \brief should be a one line description. The things that follow are a note) and we can clean up the others later.

I find the nested ternary (i.e. (a ? b : c) & d a little too compact to follow. Now that Status && Status is unavailable I think we should probably avoid & when short-circuit is desired when using Status.

It might be worth noting somewhere (assuming my understanding is correct here) that even if allow_overwrite is true it a registry will never modify its parent.

Lastly, I'm not sure why we didn't check for existing function names in AddAlias. I think we should probably correct that here rather than persist the status quo. But it's possible I'm not understanding some nuance.

cpp/src/arrow/compute/registry.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
FunctionRegistry::FunctionRegistry() { impl_.reset(new FunctionRegistryImpl()); }
std::unique_ptr<FunctionRegistry> FunctionRegistry::Make(FunctionRegistry* parent) {
return std::unique_ptr<FunctionRegistry>(
new FunctionRegistry(new FunctionRegistry::FunctionRegistryImpl(&*parent->impl_)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new FunctionRegistry(new FunctionRegistry::FunctionRegistryImpl(&*parent->impl_)));
new FunctionRegistry(parent->impl_.get());

The way it is currently will create a copy of the parent. This pointer leaks but, more importantly, if you did something like...

auto parent = GetFunctionRegistry();
auto nested = FunctionRegistry::Make(parent);
// The function added here (using parent) will not be visible by nested
// and, if I'm understanding the purpose correctly, we would want that
parent->AddFunction(...);

Instead we should just take a non-owning "view" pointer to the parent. If the parent is deleted before the nested registry is deleted then bad things will happen but that is to be expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change leads to a failed test:

/mnt/user1/tscontract/github/rtpsw/arrow/cpp/src/arrow/compute/registry_test.cc:124: Failure
Failed
'default_registry->CanAddFunction(func)' failed with Key error: Already have a function registered with name: f1
[  FAILED  ] TestRegistry.RegisterTempFunctions (0 ms)

The existing code makes a new FunctionRegistryImpl with a parent of the same type, which is what we want for scoping. Though I did fix this code to use .get().

cpp/src/arrow/compute/registry.cc Outdated Show resolved Hide resolved
@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 9, 2022

Most of these changes are minor but we do need to clean up the constructors.

I realized as I was doing the review that some of these method comments (e.g. \brief) were following existing patterns in this file. However, I think the suggestions move us more in line with the rest of the code base (e.g. \brief should be a one line description. The things that follow are a note) and we can clean up the others later.

I find the nested ternary (i.e. (a ? b : c) & d a little too compact to follow. Now that Status && Status is unavailable I think we should probably avoid & when short-circuit is desired when using Status.

It might be worth noting somewhere (assuming my understanding is correct here) that even if allow_overwrite is true it a registry will never modify its parent.

Lastly, I'm not sure why we didn't check for existing function names in AddAlias. I think we should probably correct that here rather than persist the status quo. But it's possible I'm not understanding some nuance.

I agree with almost everything. I think your comments about logic are not minor :) I ended up fixing and cleaning up the code quite a bit following them. Thanks for catching.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to go. Thanks for that last round of cleanup. I think there is some overlap now with what is in #13375. Shall I merge this first and let you merge this into #13375? Or would you rather hold off and merge everything all at once as part of #13375?

@rtpsw
Copy link
Contributor Author

rtpsw commented Jun 17, 2022

This can get pushed and I'll handle the merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants