Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38652][K8S] uploadFileUri should preserve file scheme #36010

Closed
wants to merge 1 commit into from
Closed

[SPARK-38652][K8S] uploadFileUri should preserve file scheme #36010

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 30, 2022

What changes were proposed in this pull request?

This PR replaces new Path(fileUri.getPath) with new Path(fileUri).
By using Path class constructor with URI parameter, we can preserve file scheme.

Why are the changes needed?

If we use, Path class constructor with String parameter, it loses file scheme information.
Although the original code works so far, it fails at Apache Hadoop 3.3.2 and breaks dependency upload feature which is covered by K8s Minikube integration tests.

test("uploadFileUri") {
   val fileUri = org.apache.spark.util.Utils.resolveURI("/tmp/1.txt")
   assert(new Path(fileUri).toString == "file:/private/tmp/1.txt")
   assert(new Path(fileUri.getPath).toString == "/private/tmp/1.txt")
}

Does this PR introduce any user-facing change?

No, this will prevent a regression at Apache Spark 3.3.0 instead.

How was this patch tested?

Pass the CIs.

In addition, this PR and #36009 will recover K8s IT DepsTestsSuite.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dcoliversun
Copy link
Contributor

+1, LGTM.

@dcoliversun
Copy link
Contributor

dcoliversun commented Mar 30, 2022

@dongjoon-hyun I agree this PR could fix SPARK-38652.
Does spark upload file with schema file:// using hadoop-aws-3.3.2 still throw a PathIOException in normal usage job other than DepsTestsSuite?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Mar 30, 2022

Thank you, @LuciferYang , @dcoliversun , @wangyum . Merged to master/3.3/3.2/3.1/3.0.

All tests including K8s IT passed.

KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with no resources & statefulset allocation
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- SPARK-38398: Schedule pod creation from template
- PVs with local hostpath storage on statefulsets
- PVs with local hostpath and storageClass on statefulsets
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- SPARK-37576: Rolling decommissioning
Run completed in 22 minutes, 16 seconds.
Total number of tests run: 31
Suites: completed 2, aborted 0
Tests: succeeded 31, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

To @dcoliversun . No, it should not. This PR aims to fix uploadFileUri itself for all usages. Like the test case written in the PR description, Apache Spark uploadFileUri uses Utils.resolveURI(uri) to add file scheme before invoking uploadFileToHadoopCompatibleFS.

dongjoon-hyun added a commit that referenced this pull request Mar 30, 2022
### What changes were proposed in this pull request?

This PR replaces `new Path(fileUri.getPath)` with `new Path(fileUri)`.
By using `Path` class constructor with URI parameter, we can preserve file scheme.

### Why are the changes needed?

If we use, `Path` class constructor with `String` parameter, it loses file scheme information.
Although the original code works so far, it fails at Apache Hadoop 3.3.2 and breaks dependency upload feature which is covered by K8s Minikube integration tests.

```scala
test("uploadFileUri") {
   val fileUri = org.apache.spark.util.Utils.resolveURI("/tmp/1.txt")
   assert(new Path(fileUri).toString == "file:/private/tmp/1.txt")
   assert(new Path(fileUri.getPath).toString == "/private/tmp/1.txt")
}
```

### Does this PR introduce _any_ user-facing change?

No, this will prevent a regression at Apache Spark 3.3.0 instead.

### How was this patch tested?

Pass the CIs.

In addition, this PR and #36009 will recover K8s IT `DepsTestsSuite`.

Closes #36010 from dongjoon-hyun/SPARK-38652.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cab8aa1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Mar 30, 2022
### What changes were proposed in this pull request?

This PR replaces `new Path(fileUri.getPath)` with `new Path(fileUri)`.
By using `Path` class constructor with URI parameter, we can preserve file scheme.

### Why are the changes needed?

If we use, `Path` class constructor with `String` parameter, it loses file scheme information.
Although the original code works so far, it fails at Apache Hadoop 3.3.2 and breaks dependency upload feature which is covered by K8s Minikube integration tests.

```scala
test("uploadFileUri") {
   val fileUri = org.apache.spark.util.Utils.resolveURI("/tmp/1.txt")
   assert(new Path(fileUri).toString == "file:/private/tmp/1.txt")
   assert(new Path(fileUri.getPath).toString == "/private/tmp/1.txt")
}
```

### Does this PR introduce _any_ user-facing change?

No, this will prevent a regression at Apache Spark 3.3.0 instead.

### How was this patch tested?

Pass the CIs.

In addition, this PR and #36009 will recover K8s IT `DepsTestsSuite`.

Closes #36010 from dongjoon-hyun/SPARK-38652.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cab8aa1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Mar 30, 2022
### What changes were proposed in this pull request?

This PR replaces `new Path(fileUri.getPath)` with `new Path(fileUri)`.
By using `Path` class constructor with URI parameter, we can preserve file scheme.

### Why are the changes needed?

If we use, `Path` class constructor with `String` parameter, it loses file scheme information.
Although the original code works so far, it fails at Apache Hadoop 3.3.2 and breaks dependency upload feature which is covered by K8s Minikube integration tests.

```scala
test("uploadFileUri") {
   val fileUri = org.apache.spark.util.Utils.resolveURI("/tmp/1.txt")
   assert(new Path(fileUri).toString == "file:/private/tmp/1.txt")
   assert(new Path(fileUri.getPath).toString == "/private/tmp/1.txt")
}
```

### Does this PR introduce _any_ user-facing change?

No, this will prevent a regression at Apache Spark 3.3.0 instead.

### How was this patch tested?

Pass the CIs.

In addition, this PR and #36009 will recover K8s IT `DepsTestsSuite`.

Closes #36010 from dongjoon-hyun/SPARK-38652.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cab8aa1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Mar 30, 2022
### What changes were proposed in this pull request?

This PR replaces `new Path(fileUri.getPath)` with `new Path(fileUri)`.
By using `Path` class constructor with URI parameter, we can preserve file scheme.

### Why are the changes needed?

If we use, `Path` class constructor with `String` parameter, it loses file scheme information.
Although the original code works so far, it fails at Apache Hadoop 3.3.2 and breaks dependency upload feature which is covered by K8s Minikube integration tests.

```scala
test("uploadFileUri") {
   val fileUri = org.apache.spark.util.Utils.resolveURI("/tmp/1.txt")
   assert(new Path(fileUri).toString == "file:/private/tmp/1.txt")
   assert(new Path(fileUri.getPath).toString == "/private/tmp/1.txt")
}
```

### Does this PR introduce _any_ user-facing change?

No, this will prevent a regression at Apache Spark 3.3.0 instead.

### How was this patch tested?

Pass the CIs.

In addition, this PR and #36009 will recover K8s IT `DepsTestsSuite`.

Closes #36010 from dongjoon-hyun/SPARK-38652.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cab8aa1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-38652 branch March 30, 2022 15:29
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
### What changes were proposed in this pull request?

This PR replaces `new Path(fileUri.getPath)` with `new Path(fileUri)`.
By using `Path` class constructor with URI parameter, we can preserve file scheme.

### Why are the changes needed?

If we use, `Path` class constructor with `String` parameter, it loses file scheme information.
Although the original code works so far, it fails at Apache Hadoop 3.3.2 and breaks dependency upload feature which is covered by K8s Minikube integration tests.

```scala
test("uploadFileUri") {
   val fileUri = org.apache.spark.util.Utils.resolveURI("/tmp/1.txt")
   assert(new Path(fileUri).toString == "file:/private/tmp/1.txt")
   assert(new Path(fileUri.getPath).toString == "/private/tmp/1.txt")
}
```

### Does this PR introduce _any_ user-facing change?

No, this will prevent a regression at Apache Spark 3.3.0 instead.

### How was this patch tested?

Pass the CIs.

In addition, this PR and apache#36009 will recover K8s IT `DepsTestsSuite`.

Closes apache#36010 from dongjoon-hyun/SPARK-38652.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit cab8aa1)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants