Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1310: Allowlist Support for plugin filter #1314

Closed
wants to merge 17 commits into from

Conversation

deshanxiao
Copy link
Contributor

@deshanxiao deshanxiao commented Nov 14, 2022

What changes were proposed in this pull request?

This PR is aimed to add allowlist support for plugin filter.

Why are the changes needed?

ServiceLoader will load all the interfaces in current classpath. When we have two implementations of PluginFilterService, Both of them will be loaded as plugin filters. This PR will provide a configuration "orc.filter.plugin.allowlist". ORC reader will only load the class which is in the allowlist when the configuration is not null.

How was this patch tested?

UT

@github-actions github-actions bot added the JAVA label Nov 14, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, however, Apache ORC intentionally aims to avoid non-inclusive languages like whitelist. Could you revise your PR to use more conscious language?

FYI, we are checking like the following.

Screenshot 2022-11-14 at 9 15 53 AM

@dongjoon-hyun
Copy link
Member

cc @williamhyun and @wgtmac , too

@williamhyun
Copy link
Member

+1 for @dongjoon-hyun's comment

@deshanxiao
Copy link
Contributor Author

Sorry @dongjoon-hyun , will update it. Thanks~

@deshanxiao deshanxiao changed the title ORC-1310: Whitelist Support for plugin filter ORC-1310: Allowlist Support for plugin filter Nov 15, 2022
@deshanxiao
Copy link
Contributor Author

@dongjoon-hyun I have updated the jira and the pull request.

@dongjoon-hyun
Copy link
Member

Thank you for update, @deshanxiao .


PLUGIN_FILTER_ALLOWLIST("orc.filter.plugin.allowlist",
"orc.filter.plugin.allowlist",
"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a breaking change if this mean that we ban every plugins by default.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a breaking change to the existing production, @deshanxiao . Is the intention of this PR correct?

@deshanxiao
Copy link
Contributor Author

deshanxiao commented Nov 16, 2022

It looks like a breaking change to the existing production, @deshanxiao . Is the intention of this PR correct?

Actually if orc.filter.plugin.allowlist is empty. Everything will the same as before. I think it will be better if change the default value to *.

* loaded if it is not in allowList.
*/
static void deleteFilterNotInAllowList(List<BatchFilter> filters, List<String> allowList) {
if (allowList != null && allowList.size() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the function name, deleteFilterNotInAllowList and description, this logic looks wrong to me.

@@ -56,4 +60,24 @@ public void testErrorCreatingFilter() {
public void testMissingFilter() {
assertTrue(FilterFactory.findPluginFilters("file://db/table11/file1", conf).isEmpty());
}

@Test
public void testAllowListFilter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add additional test cases explicitly and revise the logic.

  • testAllowListFilterAllowAll
  • testAllowListFilterDisallowAll

assertEquals(1, pluginFilters.size());
// Empty allowlist.
List<String> emptyAllowListNotHit = new ArrayList<>();
FilterFactory.deleteFilterNotInAllowList(pluginFilters, emptyAllowListNotHit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Empty allowlist. sounds misleading. We had better clarify what empty allowlist means.

PLUGIN_FILTER_ALLOWLIST("orc.filter.plugin.allowlist",
"orc.filter.plugin.allowlist",
"*",
"A comma-separated list of allowlist classes. It is used to specify the "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like recursive definition. We cannot use allowlist in the definition sentence of allowlist. Maybe, the following is better?

- A comma-separated list of allowlist classes
+ A list of comma-separated class names

@pavibhai
Copy link
Contributor

Thanks @deshanxiao for raising this. Can you please help me understand the use a little better? Is it something like

Is this more like I have started a spark session with a set of dependencies (that in this case might include plugin filter classes). However I might not want to engage these filters for some queries while I might use it for other queries from the same session.

@deshanxiao
Copy link
Contributor Author

deshanxiao commented Nov 29, 2022

@pavibhai Yes, you are right. In fact, a spark query does not want to use all plugin filter classes because some plugin filter classes are used by other queries. Now, according to the mechanism of ServiceLoader , all plugin filter classes will be loaded into any queies.
With this PR we can choose the plugin filter classes we want to use.

java/core/src/java/org/apache/orc/OrcConf.java Outdated Show resolved Hide resolved
java/core/src/java/org/apache/orc/OrcConf.java Outdated Show resolved Hide resolved
if (!pluginFilters.isEmpty()) {
LOG.debug("Added plugin filters {} to the read", pluginFilters);
filters.addAll(pluginFilters);
List<BatchFilter> allowFilters =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move this logic into findPluginFilters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to pluginFilters to minize the diff.

@deshanxiao
Copy link
Contributor Author

@pavibhai Thanks for your suggestions!

@deshanxiao
Copy link
Contributor Author

Hi @pavibhai, I have updated all the comments, please help to see if you have time, thanks~

@pavibhai
Copy link
Contributor

pavibhai commented Dec 6, 2022

Hi @pavibhai, I have updated all the comments, please help to see if you have time, thanks~

Sure. I shall take a look first thing tomorrow morning

@pavibhai
Copy link
Contributor

pavibhai commented Dec 6, 2022

Hi @pavibhai, I have updated all the comments, please help to see if you have time, thanks~

Looks good. I have no further comments on this.

return new ArrayList<>();
}
return Arrays.asList(confStr.split(","));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, could you open a new JIRA and PR for this utility method, @deshanxiao ?

And, it would be great to remove white space and empty strings after split. For example, a,,b,c, ,d,?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your JIRA and have been waiting for your PR, @deshanxiao :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged #1337 .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1338 is merged. Could you rebase to main branch and resolve the conflicts?

@deshanxiao
Copy link
Contributor Author

@dongjoon-hyun I have resolved the conflicts, thank you!

@dongjoon-hyun
Copy link
Member

Thank you for update, @deshanxiao .

* @param filters whole BatchFilter list we load from class path.
* @param allowList a Class-Name list that we want to load in.
*/
static List<BatchFilter> getAllowedFilters(List<BatchFilter> filters, List<String> allowList) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it open for test? I'd recommend to switch to private static?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert to `private static` method
@deshanxiao
Copy link
Contributor Author

Thank you, I merge it.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun dongjoon-hyun added this to the 1.9.0 milestone Dec 13, 2022
@dongjoon-hyun
Copy link
Member

Merged to main for Apache ORC 1.9.0. Thank you, @deshanxiao and @pavibhai .

cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
### What changes were proposed in this pull request?
This PR is aimed to add allowlist support for plugin filter.

### Why are the changes needed?
**ServiceLoader** will load all the interfaces in current classpath. When we have two implementations of PluginFilterService, Both of them will be loaded as plugin filters. This PR will provide a configuration "orc.filter.plugin.allowlist". ORC reader will only load the class which is in the allowlist when the configuration is not null.

### How was this patch tested?
UT

Closes apache#1314 from deshanxiao/deshan/1310.

Lead-authored-by: deshanxiao <deshanxiao@microsoft.com>
Co-authored-by: Deshan Xiao <deshanxiao@microsoft.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Deshan Xiao <42019462+deshanxiao@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants