-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15933][SQL][STREAMING] Refactored DF reader-writer to use readStream and writeStream for streaming DFs #13653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| import org.apache.spark.sql.execution.streaming.StreamingRelation | ||
| import org.apache.spark.sql.types.StructType | ||
|
|
||
| @Experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add docs.
|
Test build #60452 has finished for PR 13653 at commit
|
|
Test build #60460 has finished for PR 13653 at commit
|
|
Test build #60461 has finished for PR 13653 at commit
|
|
Overall looks pretty good. Feel free to merge after addressing comments / passing tests to avoid more conflicts. |
|
@marmbrus Overall, I have changed the following.
|
|
Test build #60516 has finished for PR 13653 at commit
|
|
Test build #60519 has finished for PR 13653 at commit
|
| self._jwrite.mode(mode).jdbc(url, table, jprop) | ||
|
|
||
|
|
||
| class DataStreamReader(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add new classes to __all__ = ["DataFrameReader", "DataFrameWriter"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this in the follow up pr #13673
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in #13673
…Stream and writeStream for streaming DFs ## What changes were proposed in this pull request? Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams. - [x] Python API!! ## How was this patch tested? Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13653 from tdas/SPARK-15933. (cherry picked from commit 214adb1) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
What changes were proposed in this pull request?
Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.
How was this patch tested?
Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.