-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-15454][SQL] Filter out files starting with _ #13227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @liancheng and @marmbrus |
|
Test build #59017 has finished for PR 13227 at commit
|
|
LGTM, pending tests. |
| // because Parquet needs to find those metadata files from leaf files returned by this method. | ||
| // We should refactor this logic to not mix metadata files with data files. | ||
| (pathName.startsWith("_") || pathName.startsWith(".")) && | ||
| !pathName.startsWith("_common_metadata") && !pathName.startsWith("_metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why startsWith instead of == here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case we do other variants here ..
|
LGTM except for one minor comment. |
|
Test build #59020 has finished for PR 13227 at commit
|
|
Merging in master/2.0. Thanks. |
## What changes were proposed in this pull request? Many other systems (e.g. Impala) uses _xxx as staging, and Spark should not be reading those files. ## How was this patch tested? Added a unit test case. Author: Reynold Xin <rxin@databricks.com> Closes #13227 from rxin/SPARK-15454. (cherry picked from commit dcac8e6) Signed-off-by: Reynold Xin <rxin@databricks.com>
|
Test build #59018 has finished for PR 13227 at commit
|
What changes were proposed in this pull request?
Many other systems (e.g. Impala) uses _xxx as staging, and Spark should not be reading those files.
How was this patch tested?
Added a unit test case.