Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24542] [SQL] UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files #21549

Closed
wants to merge 3 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files. Spark does not have built-in access control. When users use the external access control library, users might bypass them and access the file contents.

This PR basically patches the Hive fix to Apache Spark. https://issues.apache.org/jira/browse/HIVE-18879

How was this patch tested?

A unit test case

@gatorsmile
Copy link
Member Author

cc @petermaxlee @rxin @cloud-fan

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91749 has finished for PR 21549 at commit 07f71d3.

  • This patch fails Java style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


val xml =
"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" + "<!DOCTYPE test [ \n" +
" <!ENTITY embed SYSTEM \"" + fname + "\"> \n" + "]>\n" + "<foo>&embed;</foo>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use multiline string to make it easier to read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xml has a unique syntax. A little bit hard to make it work sometimes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91784 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91786 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91787 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 15, 2018

Test build #91943 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 15, 2018

Test build #91948 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2018

Test build #91952 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 16, 2018

Test build #91960 has finished for PR 21549 at commit 52e706b.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@HyukjinKwon is it possible that the constant build failure is caused by the java style checker? Other PRs that don't touch java files are fine.

@HyukjinKwon
Copy link
Member

@cloud-fan, I will take a look tonight (singapore timezone). Please feel free to disable it for now to unblock other PRs if you think so. From a very quick look (it's mobile), I think it needs some time.

@HyukjinKwon
Copy link
Member

you can commwnt this line https://github.com/HyukjinKwon/spark/blob/master/dev/run-tests.py#L577 and add pass instead.

@HyukjinKwon
Copy link
Member

If you think it's not super urgent, please give me few days. I have some speculations about it. Will test it and try a fix tomorrow as soon as we can access to Jenkins results.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92021 has finished for PR 21549 at commit 52e706b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92033 has finished for PR 21549 at commit f9a9f68.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 18, 2018

Test build #92042 has finished for PR 21549 at commit f9a9f68.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 19, 2018

Test build #92053 has finished for PR 21549 at commit f9a9f68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.3!

asfgit pushed a commit that referenced this pull request Jun 19, 2018
…lly crafted XML to access arbitrary files

## What changes were proposed in this pull request?

UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files. Spark does not have built-in access control. When users use the external access control library, users might bypass them and access the file contents.

This PR basically patches the Hive fix to Apache Spark. https://issues.apache.org/jira/browse/HIVE-18879

## How was this patch tested?

A unit test case

Author: Xiao Li <gatorsmile@gmail.com>

Closes #21549 from gatorsmile/xpathSecurity.

(cherry picked from commit 9a75c18)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in 9a75c18 Jun 19, 2018
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…lly crafted XML to access arbitrary files

UDF series UDFXPathXXXX allow users to pass carefully crafted XML to access arbitrary files. Spark does not have built-in access control. When users use the external access control library, users might bypass them and access the file contents.

This PR basically patches the Hive fix to Apache Spark. https://issues.apache.org/jira/browse/HIVE-18879

A unit test case

Author: Xiao Li <gatorsmile@gmail.com>

Closes apache#21549 from gatorsmile/xpathSecurity.

(cherry picked from commit 9a75c18)

RB=1807957
BUG=APA-6723
G=superfriends-reviewers
R=mshen,latang,fli,zolin,yezhou
A=chsingh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants