Skip to content

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

The explain() method prints the arguments of tree nodes in logical/physical plans. The arguments could contain a map-type option that contains sensitive data.
We should map-type options in the output of explain(). Otherwise, we will see sensitive data in explain output or Spark UI.
image

Why are the changes needed?

Data security.

Does this PR introduce any user-facing change?

Yes, redact the map-type options in the output of explain()

How was this patch tested?

Unit tests

…f explain()

The `explain()` method prints the arguments of tree nodes in logical/physical plans. The arguments could contain a map-type option that contains sensitive data.
We should map-type options in the output of `explain()`. Otherwise, we will see sensitive data in explain output or Spark UI.
![image](https://user-images.githubusercontent.com/1097932/113719178-326ffb00-96a2-11eb-8a2c-28fca3e72941.png)

Data security.

Yes, redact the map-type options in the output of `explain()`

Unit tests

Closes apache#32066 from gengliangwang/redactOptions.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
@gengliangwang gengliangwang changed the title [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain() [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain() Apr 8, 2021
@gengliangwang
Copy link
Member Author

This is to backport #32066 to branch-3.0

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41630/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41630/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Test build #137052 has finished for PR 32085 at commit f69b7e2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a backport, @gengliangwang .
The second test case fails like the following. Could you fix it?

[info] ExplainSuite:
08:38:16.800 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[info] - SPARK-34970: Redact Map type options in explain output (1 second, 409 milliseconds)
[info] - SPARK-34970: Redact CaseInsensitiveMap type options in explain output *** FAILED *** (1 second, 884 milliseconds)
[info]   "== Parsed Logical Plan ==
[info]   'UnresolvedRelation [t]
[info]
[info]   == Analyzed Logical Plan ==
[info]   id: bigint
[info]   SubqueryAlias spark_catalog.default.t
[info]   +- Relation[id#xL] json
[info]
[info]   == Optimized Logical Plan ==
[info]   Relation[id#xL] json
[info]
[info]   == Physical Plan ==
[info]   FileScan json default.t[id#xL] Batched: false, DataFilters: [], Format: JSON, Location: InMemoryFileIndex[file:/Users/dongjoon/PRS/SPARK-PR-32085/sql/core/spark-warehouse/org.apache.spa..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>
[info]
[info]   " did not contain "value" (ExplainSuite.scala:66)

@gengliangwang
Copy link
Member Author

@dongjoon-hyun yes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-34970][3.0][SQL][SERCURITY] Redact map-type options in the output of explain() [SPARK-34970][3.0][SQL] Redact map-type options in the output of explain() Apr 8, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @gengliangwang .

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41669/

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41669/

dongjoon-hyun pushed a commit that referenced this pull request Apr 8, 2021
…ain()

### What changes were proposed in this pull request?

The `explain()` method prints the arguments of tree nodes in logical/physical plans. The arguments could contain a map-type option that contains sensitive data.
We should map-type options in the output of `explain()`. Otherwise, we will see sensitive data in explain output or Spark UI.
![image](https://user-images.githubusercontent.com/1097932/113719178-326ffb00-96a2-11eb-8a2c-28fca3e72941.png)

### Why are the changes needed?

Data security.

### Does this PR introduce _any_ user-facing change?

Yes, redact the map-type options in the output of `explain()`

### How was this patch tested?

Unit tests

Closes #32085 from gengliangwang/redact3.0.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Merged to branch-3.0.

@SparkQA
Copy link

SparkQA commented Apr 8, 2021

Test build #137091 has finished for PR 32085 at commit be00c1e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants