Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter #37461

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to improve the examples in pyspark.sql.streaming.readwriter by making each example self-contained with a brief explanation and a bit more realistic example.

Why are the changes needed?

To make the documentation more readable and able to copy and paste directly in PySpark shell.

Does this PR introduce any user-facing change?

Yes, it changes the documentation

How was this patch tested?

Manually ran each doctest.

@HyukjinKwon
Copy link
Member Author

cc @viirya and @HeartSaVioR mind taking a look please when you find some time?

Comment on lines +811 to +813
>>> df = spark.readStream.format("rate").load()
>>> df.writeStream.format("text")
<pyspark.sql.streaming.readwriter.DataStreamWriter object ...>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant? Looks not related to the example below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose is to show the type of DataStreamWriter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's a separate example. In the documentation, it shows like:
Screen Shot 2022-08-11 at 12 16 45 PM

python/pyspark/sql/streaming/readwriter.py Outdated Show resolved Hide resolved
python/pyspark/sql/streaming/readwriter.py Outdated Show resolved Hide resolved
Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand correctly, either 1) you've tested these examples manually or 2) these examples will be automatically tested via CI?

python/pyspark/sql/streaming/readwriter.py Show resolved Hide resolved
python/pyspark/sql/streaming/readwriter.py Show resolved Hide resolved
python/pyspark/sql/streaming/readwriter.py Outdated Show resolved Hide resolved
python/pyspark/sql/streaming/readwriter.py Show resolved Hide resolved
@HyukjinKwon
Copy link
Member Author

either 1) you've tested these examples manually or 2) these examples will be automatically tested via CI?

Both, yes :-).

@HyukjinKwon
Copy link
Member Author

Merged to master.

@HyukjinKwon
Copy link
Member Author

I am touching these examples a lot. So all posthoc reviews are very appreciated!

@HyukjinKwon HyukjinKwon deleted the SPARK-40027 branch January 15, 2024 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants