Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35105][SQL] Support multiple paths for ADD FILE/JAR/ARCHIVE commands #32205

Closed
wants to merge 7 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Apr 16, 2021

What changes were proposed in this pull request?

This PR extends ADD FILE/JAR/ARCHIVE commands to be able to take multiple path arguments like Hive.

Why are the changes needed?

To make those commands more useful.

Does this PR introduce any user-facing change?

Yes. In the current implementation, those commands can take a path which contains whitespaces without enclose it by neither ' nor " but after this change, users need to enclose such paths.
I've note this incompatibility in the migration guide.

How was this patch tested?

New tests.

@github-actions
Copy link

Test build #755422826 for PR 32205 at commit 7cd0628.

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42061/

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42061/

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Test build #137486 has finished for PR 32205 at commit 7cd0628.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@sarutak would you mind rebasing your branch against the latest master branch in Apache Spark? there have been a couple of CI updates so it has to be synced.

@sarutak
Copy link
Member Author

sarutak commented Apr 19, 2021

@HyukjinKwon Thank you for letting me know. I'll update it.

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42113/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42113/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137540 has finished for PR 32205 at commit 002eaed.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class PandasOnSparkFrameMethods(object):
  • class PandasOnSparkSeriesMethods(object):
  • class PandasOnSparkPlotAccessor(PandasObject):
  • class PandasOnSparkBarPlot(PandasBarPlot, TopNPlotBase):
  • class PandasOnSparkBoxPlot(PandasBoxPlot, BoxPlotBase):
  • class PandasOnSparkHistPlot(PandasHistPlot, HistogramPlotBase):
  • class PandasOnSparkPiePlot(PandasPiePlot, TopNPlotBase):
  • class PandasOnSparkAreaPlot(PandasAreaPlot, SampledPlotBase):
  • class PandasOnSparkLinePlot(PandasLinePlot, SampledPlotBase):
  • class PandasOnSparkBarhPlot(PandasBarhPlot, TopNPlotBase):
  • class PandasOnSparkScatterPlot(PandasScatterPlot, TopNPlotBase):
  • class PandasOnSparkKdePlot(PandasKdePlot, KdePlotBase):
  • class PandasOnSparkUsageLogger(object):

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137572 has finished for PR 32205 at commit 7cd0628.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42132/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42132/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137596 has finished for PR 32205 at commit 9fe3eee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sarutak
Copy link
Member Author

sarutak commented Apr 23, 2021

@HyukjinKwon Do you have any comment on this change?

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42528/

@SparkQA
Copy link

SparkQA commented Apr 28, 2021

Test build #138009 has finished for PR 32205 at commit c7175de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, these are all supported in Hive too, right? If that's the case, I think the change is fine

@sarutak
Copy link
Member Author

sarutak commented Apr 28, 2021

@HyukjinKwon

Just to clarify, these are all supported in Hive too, right? If that's the case, I think the change is fine

Yes. It works in Hive as documented (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Commands).
I also confirmed with Hive 2.3.7.

@sarutak sarutak closed this in 132cbf0 Apr 29, 2021
@sarutak
Copy link
Member Author

sarutak commented Apr 29, 2021

Merged to master. Thanks @HyukjinKwon for the review.

xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
…mmands

### What changes were proposed in this pull request?

This PR extends `ADD FILE/JAR/ARCHIVE` commands to be able to take multiple path arguments like Hive.

### Why are the changes needed?

To make those commands more useful.

### Does this PR introduce _any_ user-facing change?

Yes. In the current implementation, those commands can take a path which contains whitespaces without enclose it by neither `'` nor `"` but after this change, users need to enclose such paths.
I've note this incompatibility in the migration guide.

### How was this patch tested?

New tests.

Closes apache#32205 from sarutak/add-multiple-files.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants