Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45940][PYTHON] Add InputPartition to DataSourceReader interface #44085

Closed

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR introduces a new Python class InputPartition that represents the partition value returned by the partitions method in DataSourceReader.

Why are the changes needed?

Before this PR, the partitions method can return anything, and it can be confusing to infer what is the partition argument in the read(self, partition) method.

Adding InputPartition can make the Python data source API more intuitive and user-friendly.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@allisonwang-db
Copy link
Contributor Author

cc @HyukjinKwon @cloud-fan

@allisonwang-db
Copy link
Contributor Author

The test failure seems unrelated

@HyukjinKwon
Copy link
Member

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Dec 4, 2023
…on and PySpark environments are available

### What changes were proposed in this pull request?

This is a test-only follow-up PR for #44085 to make Python data source tests depend on the availability of Python and PySpark environments.

### Why are the changes needed?

To fix tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

test only.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44164 from allisonwang-db/spark-45940-follow-up.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
### What changes were proposed in this pull request?

This PR introduces a new Python class `InputPartition` that represents the partition value returned by the `partitions` method in DataSourceReader.

### Why are the changes needed?

Before this PR, the `partitions` method can return anything, and it can be confusing to infer what is the `partition` argument in the `read(self, partition)` method.

Adding `InputPartition` can make the Python data source API more intuitive and user-friendly.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44085 from allisonwang-db/spark-45940-input-partition.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
asl3 pushed a commit to asl3/spark that referenced this pull request Dec 5, 2023
…on and PySpark environments are available

### What changes were proposed in this pull request?

This is a test-only follow-up PR for apache#44085 to make Python data source tests depend on the availability of Python and PySpark environments.

### Why are the changes needed?

To fix tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

test only.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44164 from allisonwang-db/spark-45940-follow-up.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
dbatomic pushed a commit to dbatomic/spark that referenced this pull request Dec 11, 2023
…on and PySpark environments are available

### What changes were proposed in this pull request?

This is a test-only follow-up PR for apache#44085 to make Python data source tests depend on the availability of Python and PySpark environments.

### Why are the changes needed?

To fix tests.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

test only.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#44164 from allisonwang-db/spark-45940-follow-up.

Authored-by: allisonwang-db <allison.wang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants