New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41970][SQL][FOLLOWUP] Revert SparkPath changes to FileIndex and FileRelation #39808
[SPARK-41970][SQL][FOLLOWUP] Revert SparkPath changes to FileIndex and FileRelation #39808
Conversation
SparkPath -> String
8f1c9bb
to
1102f05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a partial revert, @databricks-david-lewis ?
cc @cloud-fan and @HyukjinKwon and @xinrong-meng
Yes it is @dongjoon-hyun. |
override def inputFiles: Array[SparkPath] = | ||
allFiles().map(SparkPath.fromFileStatus).toArray | ||
override def inputFiles: Array[String] = | ||
allFiles().map(fs => SparkPath.fromFileStatus(fs).urlEncoded).toArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this part is not reverting to the original code, allFiles().map(_.getPath.toUri.toString).toArray
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. It is not exactly a revert, but is reusing the SparkPath code to translate from hadoop FileStatus
to url-encoded string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs). Thank you for reducing the breaking changes.
Also, cc @sunchao |
Merged to master and branch-3.4. |
…d FileRelation ### What changes were proposed in this pull request? This PR reverts the `SparkPath`changes to `FileIndex` and `FileRelation` because they provided little benefit to Open Source Spark, but are widely used extension points for other open source projects. For the 3.4.0 release we want to preserve this type of binary compatibility. That said, we reserve the right to make this change for Spark 4.0. ### Why are the changes needed? Revert `inputFiles: Array[SparkPath]` back to `inputFiles: Array[String]`, with an explicit comment that the strings are expected to be url-encoded. ### Does this PR introduce _any_ user-facing change? This is to revert an internal interface change. ### How was this patch tested? Existing unit tests. Closes #39808 from databricks-david-lewis/SPARK_PATH_FOLLOWUP. Authored-by: David Lewis <david.lewis@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 3887e71) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
late LGTM |
…d FileRelation ### What changes were proposed in this pull request? This PR reverts the `SparkPath`changes to `FileIndex` and `FileRelation` because they provided little benefit to Open Source Spark, but are widely used extension points for other open source projects. For the 3.4.0 release we want to preserve this type of binary compatibility. That said, we reserve the right to make this change for Spark 4.0. ### Why are the changes needed? Revert `inputFiles: Array[SparkPath]` back to `inputFiles: Array[String]`, with an explicit comment that the strings are expected to be url-encoded. ### Does this PR introduce _any_ user-facing change? This is to revert an internal interface change. ### How was this patch tested? Existing unit tests. Closes apache#39808 from databricks-david-lewis/SPARK_PATH_FOLLOWUP. Authored-by: David Lewis <david.lewis@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 3887e71) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR reverts the
SparkPath
changes toFileIndex
andFileRelation
because they provided little benefit to Open Source Spark, but are widely used extension points for other open source projects. For the 3.4.0 release we want to preserve this type of binary compatibility.That said, we reserve the right to make this change for Spark 4.0.
Why are the changes needed?
Revert
inputFiles: Array[SparkPath]
back toinputFiles: Array[String]
, with an explicit comment that the strings are expected to be url-encoded.Does this PR introduce any user-facing change?
This is to revert an internal interface change.
How was this patch tested?
Existing unit tests.