[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter #37461

HyukjinKwon · 2022-08-10T06:30:09Z

What changes were proposed in this pull request?

This PR proposes to improve the examples in pyspark.sql.streaming.readwriter by making each example self-contained with a brief explanation and a bit more realistic example.

Why are the changes needed?

To make the documentation more readable and able to copy and paste directly in PySpark shell.

Does this PR introduce any user-facing change?

Yes, it changes the documentation

How was this patch tested?

Manually ran each doctest.

HyukjinKwon · 2022-08-10T06:30:24Z

cc @viirya and @HeartSaVioR mind taking a look please when you find some time?

viirya · 2022-08-10T06:47:04Z

python/pyspark/sql/streaming/readwriter.py

+        >>> df = spark.readStream.format("rate").load()
+        >>> df.writeStream.format("text")
+        <pyspark.sql.streaming.readwriter.DataStreamWriter object ...>


Is this redundant? Looks not related to the example below.

the purpose is to show the type of DataStreamWriter?

Yeah. It's a separate example. In the documentation, it shows like:

python/pyspark/sql/streaming/readwriter.py

HeartSaVioR

Just to make sure I understand correctly, either 1) you've tested these examples manually or 2) these examples will be automatically tested via CI?

python/pyspark/sql/streaming/readwriter.py

HyukjinKwon · 2022-08-11T01:46:02Z

either 1) you've tested these examples manually or 2) these examples will be automatically tested via CI?

Both, yes :-).

HyukjinKwon · 2022-08-11T11:19:03Z

Merged to master.

HyukjinKwon · 2022-08-11T11:19:24Z

I am touching these examples a lot. So all posthoc reviews are very appreciated!

Add self-contained examples for SS IO

e4daee1

HyukjinKwon force-pushed the SPARK-40027 branch from c916bd4 to e4daee1 Compare August 10, 2022 06:31

github-actions bot added CORE PYTHON SQL STRUCTURED STREAMING labels Aug 10, 2022

viirya reviewed Aug 10, 2022

View reviewed changes

HeartSaVioR reviewed Aug 10, 2022

View reviewed changes

python/pyspark/sql/streaming/readwriter.py Show resolved Hide resolved

python/pyspark/sql/streaming/readwriter.py Show resolved Hide resolved

python/pyspark/sql/streaming/readwriter.py Outdated Show resolved Hide resolved

python/pyspark/sql/streaming/readwriter.py Show resolved Hide resolved

Cleanup and address comments

aef7c63

zhengruifeng approved these changes Aug 11, 2022

View reviewed changes

HyukjinKwon closed this in 7179241 Aug 11, 2022

HyukjinKwon deleted the SPARK-40027 branch January 15, 2024 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter #37461

[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter #37461

HyukjinKwon commented Aug 10, 2022

HyukjinKwon commented Aug 10, 2022

viirya Aug 10, 2022

zhengruifeng Aug 10, 2022

HyukjinKwon Aug 11, 2022

HeartSaVioR left a comment

HyukjinKwon commented Aug 11, 2022

HyukjinKwon commented Aug 11, 2022

HyukjinKwon commented Aug 11, 2022

[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter #37461

[SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter #37461

Conversation

HyukjinKwon commented Aug 10, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Aug 10, 2022

viirya Aug 10, 2022

Choose a reason for hiding this comment

zhengruifeng Aug 10, 2022

Choose a reason for hiding this comment

HyukjinKwon Aug 11, 2022

Choose a reason for hiding this comment

HeartSaVioR left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Aug 11, 2022

HyukjinKwon commented Aug 11, 2022

HyukjinKwon commented Aug 11, 2022