Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3916] [Streaming] discover new appended data for fileStream() #2806

Closed
wants to merge 2 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Oct 14, 2014

In a case that new data will be appended to existed files continuously, then fileStream() should discovery the new appended data. This patch brings this ability to fileStream.

In order to get an RDD based on partial data of file, added a private partialHadoopRDD() API.

cc @tdas

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have started for PR 2806 at commit 05ad755.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have finished for PR 2806 at commit 05ad755.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CustomPathFilter(maxModTime: Long)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21736/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have started for PR 2806 at commit 09561e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have finished for PR 2806 at commit 09561e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CustomPathFilter(maxModTime: Long)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21739/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have started for PR 2806 at commit 09561e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

Tests timed out for PR 2806 at commit 09561e8 after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have started for PR 2806 at commit 09561e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 17, 2014

QA tests have finished for PR 2806 at commit 09561e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CustomPathFilter(maxModTime: Long)

@davies
Copy link
Contributor Author

davies commented Oct 17, 2014

@tdas Could you help to review this? The failed tests run stable locally, I'm investigating it.

@SparkQA
Copy link

SparkQA commented Oct 18, 2014

QA tests have started for PR 2806 at commit 09561e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 18, 2014

QA tests have finished for PR 2806 at commit 09561e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class CustomPathFilter(maxModTime: Long)

@tdas
Copy link
Contributor

tdas commented Nov 11, 2014

@davies this is a significant PR. Lets talk about this PR after the 1.2 rush is over.

@tdas
Copy link
Contributor

tdas commented Dec 27, 2014

There has been significant refactoring done in the FileInputStream. Can you update the PR accordingly?

@tdas
Copy link
Contributor

tdas commented Dec 27, 2014

Also, I took a quick look at the PR. Its seems a little complicated to understand just by looking at the code, so could you write a short design doc (or update the PR description) on the high-level technique used to implement this. It does not have to be very detailed, just enough for any one understand the logic and then verify it in the code.

@tdas
Copy link
Contributor

tdas commented Mar 23, 2015

Since we are not working on this feature right now, mind closing this? We can open it again when we are want to work on it.

@davies davies closed this Mar 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants