Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Jun 13, 2016

What changes were proposed in this pull request?

Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.

  • Python API!!

How was this patch tested?

Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.

import org.apache.spark.sql.execution.streaming.StreamingRelation
import org.apache.spark.sql.types.StructType

@Experimental
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docs.

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60452 has finished for PR 13653 at commit a59498b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60460 has finished for PR 13653 at commit e118631.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DataStreamReader(object):
    • class DataStreamWriter(object):

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60461 has finished for PR 13653 at commit bbfff70.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Overall looks pretty good. Feel free to merge after addressing comments / passing tests to avoid more conflicts.

@tdas
Copy link
Contributor Author

tdas commented Jun 14, 2016

@marmbrus Overall, I have changed the following.

  • Renamed writeStream.save() to writeStream.start()
  • Refactored writeStream.foreach() to not start the query. instead writeStream.foreach().start() starts the query
  • Removed writeStream.parquet(path) for now. Its good to have a single method start actually start the background query. experience says that its much easier read code and debug.

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60516 has finished for PR 13653 at commit 536c25e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60519 has finished for PR 13653 at commit 29ca23b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

self._jwrite.mode(mode).jdbc(url, table, jprop)


class DataStreamReader(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add new classes to __all__ = ["DataFrameReader", "DataFrameWriter"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix this in the follow up pr #13673

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #13673

@tdas
Copy link
Contributor Author

tdas commented Jun 15, 2016

I am merging this to master and 2.0 for now, to unblock #13673 . Please keep reviewing this and I will address them in #13673

asfgit pushed a commit that referenced this pull request Jun 15, 2016
…Stream and writeStream for streaming DFs

## What changes were proposed in this pull request?
Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.

- [x] Python API!!

## How was this patch tested?
Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13653 from tdas/SPARK-15933.

(cherry picked from commit 214adb1)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in 214adb1 Jun 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants