Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45654][PYTHON] Add Python data source write API #43516

Closed

Conversation

allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented Oct 24, 2023

What changes were proposed in this pull request?

This PR adds Python data source write API and DataSourceWriter class datasource.py.

Here is an overview of writer class:

class DataSourceWriter(ABC):
    @abstractmethod
    def write(self, iterator: Iterator[Row]) -> Any:
        ...
    
    def commit(self, messages: List[Any]) -> None:
        ...
    
    def abort(self, messages: List[Any]) -> None:
        ...

Why are the changes needed?

To support Python data source write.

Does this PR introduce any user-facing change?

No. This PR alone does not introduce any user-facing change.

How was this patch tested?

Unit test

Was this patch authored or co-authored using generative AI tooling?

No

@@ -24,7 +24,7 @@
from pyspark.sql._typing import OptionalPrimitiveType


__all__ = ["DataSource", "DataSourceReader"]
__all__ = ["DataSource", "DataSourceReader", "DataSourceWriter"]


class DataSource(ABC):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add @since 4.0.0

@HyukjinKwon
Copy link
Member

@allisonwang-db mind resolving conflicts please?

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants