[SPARK-3916] [Streaming] discover new appended data for fileStream() #2806

davies · 2014-10-14T21:04:47Z

In a case that new data will be appended to existed files continuously, then fileStream() should discovery the new appended data. This patch brings this ability to fileStream.

In order to get an RDD based on partial data of file, added a private partialHadoopRDD() API.

cc @tdas

SparkQA · 2014-10-14T21:09:45Z

QA tests have started for PR 2806 at commit 05ad755.

This patch merges cleanly.

SparkQA · 2014-10-14T21:10:48Z

QA tests have finished for PR 2806 at commit 05ad755.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CustomPathFilter(maxModTime: Long)

AmplabJenkins · 2014-10-14T21:10:49Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21736/
Test FAILed.

SparkQA · 2014-10-14T21:24:43Z

QA tests have started for PR 2806 at commit 09561e8.

This patch merges cleanly.

SparkQA · 2014-10-14T22:13:50Z

QA tests have finished for PR 2806 at commit 09561e8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CustomPathFilter(maxModTime: Long)

AmplabJenkins · 2014-10-14T22:13:53Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21739/
Test FAILed.

SparkQA · 2014-10-17T05:53:45Z

QA tests have started for PR 2806 at commit 09561e8.

This patch merges cleanly.

SparkQA · 2014-10-17T07:53:46Z

Tests timed out for PR 2806 at commit 09561e8 after a configured wait of 120m.

SparkQA · 2014-10-17T08:17:37Z

QA tests have started for PR 2806 at commit 09561e8.

This patch merges cleanly.

SparkQA · 2014-10-17T09:08:32Z

QA tests have finished for PR 2806 at commit 09561e8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CustomPathFilter(maxModTime: Long)

davies · 2014-10-17T21:22:54Z

@tdas Could you help to review this? The failed tests run stable locally, I'm investigating it.

SparkQA · 2014-10-18T06:06:33Z

QA tests have started for PR 2806 at commit 09561e8.

This patch merges cleanly.

SparkQA · 2014-10-18T07:01:46Z

QA tests have finished for PR 2806 at commit 09561e8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class CustomPathFilter(maxModTime: Long)

tdas · 2014-11-11T11:27:45Z

@davies this is a significant PR. Lets talk about this PR after the 1.2 rush is over.

tdas · 2014-12-27T00:09:56Z

There has been significant refactoring done in the FileInputStream. Can you update the PR accordingly?

tdas · 2014-12-27T00:11:56Z

Also, I took a quick look at the PR. Its seems a little complicated to understand just by looking at the code, so could you write a short design doc (or update the PR description) on the high-level technique used to implement this. It does not have to be very detailed, just enough for any one understand the logic and then verify it in the code.

tdas · 2015-03-23T23:57:25Z

Since we are not working on this feature right now, mind closing this? We can open it again when we are want to work on it.

discover new appended data for fileStream()

05ad755

fix scalastyle and newFilesOnly

09561e8

davies closed this Mar 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3916] [Streaming] discover new appended data for fileStream() #2806

[SPARK-3916] [Streaming] discover new appended data for fileStream() #2806

davies commented Oct 14, 2014

SparkQA commented Oct 14, 2014

SparkQA commented Oct 14, 2014

AmplabJenkins commented Oct 14, 2014

SparkQA commented Oct 14, 2014

SparkQA commented Oct 14, 2014

AmplabJenkins commented Oct 14, 2014

SparkQA commented Oct 17, 2014

SparkQA commented Oct 17, 2014

SparkQA commented Oct 17, 2014

SparkQA commented Oct 17, 2014

davies commented Oct 17, 2014

SparkQA commented Oct 18, 2014

SparkQA commented Oct 18, 2014

tdas commented Nov 11, 2014

tdas commented Dec 27, 2014

tdas commented Dec 27, 2014

tdas commented Mar 23, 2015

[SPARK-3916] [Streaming] discover new appended data for fileStream() #2806

[SPARK-3916] [Streaming] discover new appended data for fileStream() #2806

Conversation

davies commented Oct 14, 2014

SparkQA commented Oct 14, 2014

SparkQA commented Oct 14, 2014

AmplabJenkins commented Oct 14, 2014

SparkQA commented Oct 14, 2014

SparkQA commented Oct 14, 2014

AmplabJenkins commented Oct 14, 2014

SparkQA commented Oct 17, 2014

SparkQA commented Oct 17, 2014

SparkQA commented Oct 17, 2014

SparkQA commented Oct 17, 2014

davies commented Oct 17, 2014

SparkQA commented Oct 18, 2014

SparkQA commented Oct 18, 2014

tdas commented Nov 11, 2014

tdas commented Dec 27, 2014

tdas commented Dec 27, 2014

tdas commented Mar 23, 2015