Skip to content

[SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.#14087

Closed
ScrapCodes wants to merge 2 commits intoapache:masterfrom
ScrapCodes:textFile
Closed

[SPARK-16411][SQL][STREAMING] Add textFile to Structured Streaming.#14087
ScrapCodes wants to merge 2 commits intoapache:masterfrom
ScrapCodes:textFile

Conversation

@ScrapCodes
Copy link
Member

What changes were proposed in this pull request?

Adds the textFile API which exists in DataFrameReader and serves same purpose.

How was this patch tested?

Added corresponding testcase.

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61908 has finished for PR 14087 at commit ac82232.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a text file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No actually it should be text files.

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61976 has finished for PR 14087 at commit 0c76ef9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ScrapCodes
Copy link
Member Author

@marmbrus Do you think this is useful ?

@marmbrus
Copy link
Contributor

/cc @tdas

@ScrapCodes
Copy link
Member Author

@tdas Ping !

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66261 has finished for PR 14087 at commit 1c43ffa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems unrelated and takes us out of sync with the batch version. I don't think this means a JVM interface, but rather the interface in API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks for correcting !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should text files be plural here? The api would be more intuitive by copying the non-streaming equivalent with a vararg-method for multiple parameters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to be corrected, as I just followed the convention over here. Since this class does not have any vararg method for other APIs, I was doubtful in adding one myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be weird to add var args, since the streaming case would always be to watch a directory (not list a bunch of files). I think its fine to leave it out for now.

This is existing, but its a little odd that the methods in this file talk about loading files rather than watching directories of files and processing them as they appear.

@SparkQA
Copy link

SparkQA commented Oct 4, 2016

Test build #66303 has finished for PR 14087 at commit 25dfd09.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2016

Test build #66376 has finished for PR 14087 at commit ecdf653.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("read from textfile") {
withTempDirs { case (src, tmp) =>
val textStream = spark.readStream.textFile(src.getCanonicalPath)
val filtered = textStream.filter($"value" contains "keep")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment. I'd use the typed API here since that is the whole point of textFile vs text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it. :)

@SparkQA
Copy link

SparkQA commented Oct 7, 2016

Test build #66498 has finished for PR 14087 at commit 867394d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Oct 7, 2016

Thanks, merging to master.

@asfgit asfgit closed this in bb1aaf2 Oct 7, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

Adds the textFile API which exists in DataFrameReader and serves same purpose.

## How was this patch tested?

Added corresponding testcase.

Author: Prashant Sharma <prashsh1@in.ibm.com>

Closes apache#14087 from ScrapCodes/textFile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants