DRILL-8465: Check Input Data for Iceberg Plugin #2853

pjfanning · 2023-12-01T15:38:17Z

Description

https://issues.apache.org/jira/browse/DRILL-8465

I'm not happy with the class check here. I don't know to what extent that we need to support subclasses that a user might make of Iceberg classes. If we need to support subclasses of Iceberg classes, it might be better to support a config option that allows users to extend the allow list for classes in this deserialization code.

In this PR, the allow list is:

java primitives
java/javax classes
org.apache.iceberg classes
org.apache.drill classes
arrays of the types listed above
messiest bit is the support for any subclass of iceberg ScanTask class

pjfanning · 2023-12-10T12:31:08Z

@cgivre I'm not that familiar with how Drill plugin configurations work. If I was to extend IcebergFormatPluginConfig to add a configurable allowPackageList (String[] or List<String> -- whichever works best) -- how would I go about accessing this from the IcebergWork class. It is IcebergWork where I need to apply the allow list?

@vvysotskyi you appear to have written most of the Iceberg code. Would you have any idea if this issue is one that we need to worry about? If it is, it looks like it will be hard to get the config values injected into IcebergWork because the class seems to only be instantiated by a custom Jackson Deserializer that itself only created only Java reflection.

cgivre · 2023-12-10T16:02:30Z

@cgivre I'm not that familiar with how Drill plugin configurations work. If I was to extend IcebergFormatPluginConfig to add a configurable allowPackageList (String[] or List<String> -- whichever works best) -- how would I go about accessing this from the IcebergWork class. It is IcebergWork where I need to apply the allow list?

It think the thing to do would be to add an argument to the IcebergWork constructor. Then once that's done, it looks like the IcebergWork is instantiated in the IcebergGroupScan.

drill/contrib/format-iceberg/src/main/java/org/apache/drill/exec/store/iceberg/IcebergGroupScan.java

Lines 236 to 241 in fe57fd1

    
           private List<IcebergWork> convertWorkList(List<IcebergCompleteWork> workList) { 
        
             return workList.stream() 
        
               .map(IcebergCompleteWork::getScanTask) 
        
               .map(IcebergWork::new) 
        
               .collect(Collectors.toList()); 
        
           }

The GroupScan has access to the IcebergPlugin and from there you can access the IcebergConfig. Does that make sense?

@vvysotskyi you appear to have written most of the Iceberg code. Would you have any idea if this issue is one that we need to worry about? If it is, it looks like it will be hard to get the config values injected into IcebergWork because the class seems to only be instantiated by a custom Jackson Deserializer that itself only created only Java reflection.

pjfanning · 2023-12-10T16:54:39Z

@cgivre unfortunately, IcebergWork is also created by IcebergWorkDeserializer but this class is constructed using reflection based on @JsonDeserialize(using = IcebergWork.IcebergWorkDeserializer.class).

There is no point in changing IcebergWork unless we can find a way to inject the config value into IcebergWorkDeserializer too.

cgivre · 2024-01-03T14:00:24Z

@jnturton Do you have any thoughts here? This seems like this would be a good PR to get into the bug fix release.

jnturton · 2024-01-05T09:53:02Z

I've started looking at this. First question: if we're adding dynamically loaded class checks to protect against untrusted code then is checking the package name worth much? Or do we need to do something like verify signatures against a list of trusted keys? Second question: if this is about security then is the code we're loading actually untrusted or is it only ever loaded from serialisations that we produced ourselves (e.g. in IcebergWorkSerializer)?

P.S. Please include this "Why we're doing this" background that I'm lacking in the Jira issue when it's nontrivial.

EDIT: I've just seen the security label on this PR so that gives some clue. There's also a Security "component" in Jira that we should add to the issue (and the background mentioned above)

pjfanning · 2024-01-05T11:21:16Z

The short background to this in this link - https://lists.apache.org/thread/vpjz467rg8449m63v1n9nl3o56twwyzt (a private thread requiring ASF login).

I'm no expert on Iceberg or the Drill Iceberg Plugin but I was hoping to maybe engage with someone who knows more about how they work and to get an understanding of whether we need some constraints. Due to the security aspect of this, I'm not too comfortable going into more detail here.

jnturton · 2024-01-05T15:25:43Z

Got it @pjfanning. Let's discuss further in the right forum.

pjfanning added 2 commits December 1, 2023 15:39

DRILL-8465. check iceberg input

9f1a17d

refactor

9a2b8f5

pjfanning marked this pull request as draft December 1, 2023 15:41

cgivre requested a review from vvysotskyi December 1, 2023 16:39

cgivre assigned pjfanning Dec 1, 2023

cgivre added security backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Dec 1, 2023

cgivre changed the title ~~DRILL-8465. check input data for iceberg plugin~~ DRILL-8465: Check Input Data for Iceberg Plugin Dec 3, 2023

config

7a642d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRILL-8465: Check Input Data for Iceberg Plugin #2853

DRILL-8465: Check Input Data for Iceberg Plugin #2853

pjfanning commented Dec 1, 2023

pjfanning commented Dec 10, 2023 •

edited

cgivre commented Dec 10, 2023 •

edited

pjfanning commented Dec 10, 2023

cgivre commented Jan 3, 2024

jnturton commented Jan 5, 2024 •

edited

pjfanning commented Jan 5, 2024 •

edited

jnturton commented Jan 5, 2024

DRILL-8465: Check Input Data for Iceberg Plugin #2853

Are you sure you want to change the base?

DRILL-8465: Check Input Data for Iceberg Plugin #2853

Conversation

pjfanning commented Dec 1, 2023

Description

pjfanning commented Dec 10, 2023 • edited

cgivre commented Dec 10, 2023 • edited

pjfanning commented Dec 10, 2023

cgivre commented Jan 3, 2024

jnturton commented Jan 5, 2024 • edited

pjfanning commented Jan 5, 2024 • edited

jnturton commented Jan 5, 2024

pjfanning commented Dec 10, 2023 •

edited

cgivre commented Dec 10, 2023 •

edited

jnturton commented Jan 5, 2024 •

edited

pjfanning commented Jan 5, 2024 •

edited