Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Aug 21, 2015

The current code only checks checkpoint files in local filesystem, and always tries to create a new Python SparkContext (even if one already exists). The solution is to do the following:

  1. Use the same code path as Java to check whether a valid checkpoint exists
  2. Create a new Python SparkContext only if there no active one.

There is not test for the path as its hard to test with distributed filesystem paths in a local unit test. I am going to test it with a distributed file system manually to verify that this patch works.

@SparkQA
Copy link

SparkQA commented Aug 21, 2015

Test build #41390 has finished for PR 8366 at commit 9bf151b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas tdas changed the title [SPARK-10142][STREAMING] Made python checkpoint recovery use same codepath as Java to find the checkpoint files [SPARK-10142][STREAMING] Made python checkpoint recovery handle non-local checkpoint paths and existing clusters Aug 22, 2015
@tdas tdas changed the title [SPARK-10142][STREAMING] Made python checkpoint recovery handle non-local checkpoint paths and existing clusters [SPARK-10142][STREAMING] Made python checkpoint recovery handle non-local checkpoint paths and existing SparkContexts Aug 22, 2015
@SparkQA
Copy link

SparkQA commented Aug 22, 2015

Test build #41400 has finished for PR 8366 at commit 2dd4ae5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor Author

tdas commented Aug 22, 2015

@zsxwing Please take a look?

@SparkQA
Copy link

SparkQA commented Aug 22, 2015

Test build #41401 has finished for PR 8366 at commit 3afa666.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Aug 22, 2015

LGTM

@tdas
Copy link
Contributor Author

tdas commented Aug 24, 2015

I test the non-local path support with S3 on my own. I am merging this to master and 1.5. Thanks @zsxwing for checking this out.

asfgit pushed a commit that referenced this pull request Aug 24, 2015
…local checkpoint paths and existing SparkContexts

The current code only checks checkpoint files in local filesystem, and always tries to create a new Python SparkContext (even if one already exists). The solution is to do the following:
1. Use the same code path as Java to check whether a valid checkpoint exists
2. Create a new Python SparkContext only if there no active one.

There is not test for the path as its hard to test with distributed filesystem paths in a local unit test. I am going to test it with a distributed file system manually to verify that this patch works.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8366 from tdas/SPARK-10142 and squashes the following commits:

3afa666 [Tathagata Das] Added tests
2dd4ae5 [Tathagata Das] Added the check to not create a context if one already exists
9bf151b [Tathagata Das] Made python checkpoint recovery use java to find the checkpoint files

(cherry picked from commit 053d94f)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in 053d94f Aug 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants