New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1310: Allowlist Support for plugin filter #1314
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, however, Apache ORC intentionally aims to avoid non-inclusive
languages like whitelist
. Could you revise your PR to use more conscious language?
FYI, we are checking like the following.
cc @williamhyun and @wgtmac , too |
+1 for @dongjoon-hyun's comment |
Sorry @dongjoon-hyun , will update it. Thanks~ |
@dongjoon-hyun I have updated the jira and the pull request. |
Thank you for update, @deshanxiao . |
|
||
PLUGIN_FILTER_ALLOWLIST("orc.filter.plugin.allowlist", | ||
"orc.filter.plugin.allowlist", | ||
"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a breaking change if this mean that we ban every plugins by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a breaking change to the existing production, @deshanxiao . Is the intention of this PR correct?
Actually if orc.filter.plugin.allowlist is empty. Everything will the same as before. I think it will be better if change the default value to *. |
* loaded if it is not in allowList. | ||
*/ | ||
static void deleteFilterNotInAllowList(List<BatchFilter> filters, List<String> allowList) { | ||
if (allowList != null && allowList.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the function name, deleteFilterNotInAllowList
and description, this logic looks wrong to me.
@@ -56,4 +60,24 @@ public void testErrorCreatingFilter() { | |||
public void testMissingFilter() { | |||
assertTrue(FilterFactory.findPluginFilters("file://db/table11/file1", conf).isEmpty()); | |||
} | |||
|
|||
@Test | |||
public void testAllowListFilter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add additional test cases explicitly and revise the logic.
testAllowListFilterAllowAll
testAllowListFilterDisallowAll
assertEquals(1, pluginFilters.size()); | ||
// Empty allowlist. | ||
List<String> emptyAllowListNotHit = new ArrayList<>(); | ||
FilterFactory.deleteFilterNotInAllowList(pluginFilters, emptyAllowListNotHit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Empty allowlist.
sounds misleading. We had better clarify what empty allowlist
means.
PLUGIN_FILTER_ALLOWLIST("orc.filter.plugin.allowlist", | ||
"orc.filter.plugin.allowlist", | ||
"*", | ||
"A comma-separated list of allowlist classes. It is used to specify the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like recursive definition. We cannot use allowlist
in the definition sentence of allowlist
. Maybe, the following is better?
- A comma-separated list of allowlist classes
+ A list of comma-separated class names
Thanks @deshanxiao for raising this. Can you please help me understand the use a little better? Is it something like
|
@pavibhai Yes, you are right. In fact, a spark query does not want to use all plugin filter classes because some plugin filter classes are used by other queries. Now, according to the mechanism of ServiceLoader , all plugin filter classes will be loaded into any queies. |
java/core/src/java/org/apache/orc/impl/filter/FilterFactory.java
Outdated
Show resolved
Hide resolved
if (!pluginFilters.isEmpty()) { | ||
LOG.debug("Added plugin filters {} to the read", pluginFilters); | ||
filters.addAll(pluginFilters); | ||
List<BatchFilter> allowFilters = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move this logic into findPluginFilters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this to pluginFilters
to minize the diff.
java/core/src/test/org/apache/orc/impl/filter/TestPluginFilterService.java
Outdated
Show resolved
Hide resolved
@pavibhai Thanks for your suggestions! |
Hi @pavibhai, I have updated all the comments, please help to see if you have time, thanks~ |
Sure. I shall take a look first thing tomorrow morning |
Looks good. I have no further comments on this. |
return new ArrayList<>(); | ||
} | ||
return Arrays.asList(confStr.split(",")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, could you open a new JIRA and PR for this utility method, @deshanxiao ?
And, it would be great to remove white space and empty strings after split
. For example, a,,b,c, ,d,
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your JIRA and have been waiting for your PR, @deshanxiao :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged #1337 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1338 is merged. Could you rebase to main
branch and resolve the conflicts?
@dongjoon-hyun I have resolved the conflicts, thank you! |
Thank you for update, @deshanxiao . |
* @param filters whole BatchFilter list we load from class path. | ||
* @param allowList a Class-Name list that we want to load in. | ||
*/ | ||
static List<BatchFilter> getAllowedFilters(List<BatchFilter> filters, List<String> allowList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it open for test
? I'd recommend to switch to private static
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a PR to you, @deshanxiao .
Convert to `private static` method
Thank you, I merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Merged to main for Apache ORC 1.9.0. Thank you, @deshanxiao and @pavibhai . |
### What changes were proposed in this pull request? This PR is aimed to add allowlist support for plugin filter. ### Why are the changes needed? **ServiceLoader** will load all the interfaces in current classpath. When we have two implementations of PluginFilterService, Both of them will be loaded as plugin filters. This PR will provide a configuration "orc.filter.plugin.allowlist". ORC reader will only load the class which is in the allowlist when the configuration is not null. ### How was this patch tested? UT Closes apache#1314 from deshanxiao/deshan/1310. Lead-authored-by: deshanxiao <deshanxiao@microsoft.com> Co-authored-by: Deshan Xiao <deshanxiao@microsoft.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Deshan Xiao <42019462+deshanxiao@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR is aimed to add allowlist support for plugin filter.
Why are the changes needed?
ServiceLoader will load all the interfaces in current classpath. When we have two implementations of PluginFilterService, Both of them will be loaded as plugin filters. This PR will provide a configuration "orc.filter.plugin.allowlist". ORC reader will only load the class which is in the allowlist when the configuration is not null.
How was this patch tested?
UT