Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47157][SQL] Refactor file listing with ScanFileListing interface #45224

Closed
wants to merge 3 commits into from

Conversation

costas-db
Copy link
Contributor

@costas-db costas-db commented Feb 22, 2024

What changes were proposed in this pull request?

In this pull request, we've introduce the ScanFileListing trait and its implementation, the GenericScanFileListing class, to encapsulate and streamline the handling of file listing results. This new abstraction enhances modularity and facilitates more flexible management of file listings within the system.

Why are the changes needed?

The introduction of these constructs is crucial for defining a standardized API for file listing operations, regardless of the underlying representation that's used to represent files and partitions. By improving the modularity of the code we enable future improvements that can prove to be beneficial both for runtime and memory improvements.

Does this PR introduce any user-facing change?

No

How was this patch tested?

This is just a refactoring, not a new behavior, so existing tests would suffice.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Feb 22, 2024
@dtenedor
Copy link
Contributor

@HyukjinKwon @cloud-fan if this change looks good to you, would one of you mind to help merge it? 🙏

@HyukjinKwon
Copy link
Member

Seems fine but mind filing a JIRA and put it into the PR title? See also https://spark.apache.org/contributing.html

@costas-db costas-db changed the title Refactor file listing with ScanFileListing interface [SPARK-47157][SQL] Refactor file listing with ScanFileListing interface Feb 26, 2024
@HyukjinKwon
Copy link
Member

Merged to master.

jpcorreia99 pushed a commit to jpcorreia99/spark that referenced this pull request Feb 26, 2024
### What changes were proposed in this pull request?

In this pull request, we've introduce the `ScanFileListing` trait and its implementation, the `GenericScanFileListing` class, to encapsulate and streamline the handling of file listing results. This new abstraction enhances modularity and facilitates more flexible management of file listings within the system.

### Why are the changes needed?

The introduction of these constructs is crucial for defining a standardized API for file listing operations,  regardless of the underlying representation that's used to represent files and partitions. By improving the modularity of the code we enable future improvements that can prove to be beneficial both for runtime and memory improvements.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

This is just a refactoring, not a new behavior, so existing tests would suffice.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45224 from costas-db/refactorFileListing.

Lead-authored-by: Costas Zarifis <costas.zarifis@databricks.com>
Co-authored-by: Shoumik Palkar <shoumik.palkar@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
### What changes were proposed in this pull request?

In this pull request, we've introduce the `ScanFileListing` trait and its implementation, the `GenericScanFileListing` class, to encapsulate and streamline the handling of file listing results. This new abstraction enhances modularity and facilitates more flexible management of file listings within the system.

### Why are the changes needed?

The introduction of these constructs is crucial for defining a standardized API for file listing operations,  regardless of the underlying representation that's used to represent files and partitions. By improving the modularity of the code we enable future improvements that can prove to be beneficial both for runtime and memory improvements.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

This is just a refactoring, not a new behavior, so existing tests would suffice.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45224 from costas-db/refactorFileListing.

Lead-authored-by: Costas Zarifis <costas.zarifis@databricks.com>
Co-authored-by: Shoumik Palkar <shoumik.palkar@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants