Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7467] Dag visualization: treat checkpoint as an RDD operation #6004

Closed
wants to merge 3 commits into from

Conversation

andrewor14
Copy link
Contributor

Such that a checkpoint RDD does not go into random scopes on the UI, e.g. take. We've seen this in streaming.

@AmplabJenkins
Copy link

Merged build triggered.

@andrewor14
Copy link
Contributor Author

@zsxwing @tdas

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32217 has started for PR 6004 at commit 19bc07b.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32217 has finished for PR 6004 at commit 19bc07b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32217/
Test PASSed.

@zsxwing
Copy link
Member

zsxwing commented May 8, 2015

I just tried this PR. But looks it does not work.

Test codes:

    sc.setCheckpointDir(".")
    val r1 = sc.parallelize(1 to 100)
    r1.checkpoint()
    r1.map(_ + 2).filter(_ < 50).collect()
    r1.map(_ + 2).filter(_ < 50).collect()

I opened the spark shell and typed the above codes. And the dag of job 2 is
screen shot 2015-05-08 at 10 24 14 am

The first one is still collect.

@andrewor14
Copy link
Contributor Author

Oops, thanks for the screen shot. It appears that it's still not being captured correctly.

@andrewor14
Copy link
Contributor Author

By the way I have a simpler reproduction:

sc.setCheckpointDir("/tmp")
val rdd = sc.parallelize(1 to 10)
rdd.checkpoint()
rdd.take(10)

Since the take runs multiple jobs and we do the actual checkpointing at the end of each job.

@andrewor14
Copy link
Contributor Author

I think I fixed it in the latest commit.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32296 has started for PR 6004 at commit 9217439.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32296 has finished for PR 6004 at commit 9217439.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HasCheckpointInterval(Params):
    • class ALS(JavaEstimator, HasCheckpointInterval, HasMaxIter, HasPredictionCol, HasRegParam, HasSeed):
    • class ALSModel(JavaModel):

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32296/
Test FAILed.

@andrewor14
Copy link
Contributor Author

retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32308 has started for PR 6004 at commit 9217439.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32308 has finished for PR 6004 at commit 9217439.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32308/
Test PASSed.

@andrewor14
Copy link
Contributor Author

Ok I'm merging this into master 1.4.

@asfgit asfgit closed this in f3e8e60 May 12, 2015
asfgit pushed a commit that referenced this pull request May 12, 2015
Such that a checkpoint RDD does not go into random scopes on the UI, e.g. `take`. We've seen this in streaming.

Author: Andrew Or <andrew@databricks.com>

Closes #6004 from andrewor14/dag-viz-checkpoint and squashes the following commits:

9217439 [Andrew Or] Fix checkpoints
4ae8806 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-checkpoint
19bc07b [Andrew Or] Treat checkpoint as an RDD operation

(cherry picked from commit f3e8e60)
Signed-off-by: Andrew Or <andrew@databricks.com>
@andrewor14 andrewor14 deleted the dag-viz-checkpoint branch May 12, 2015 08:46
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Such that a checkpoint RDD does not go into random scopes on the UI, e.g. `take`. We've seen this in streaming.

Author: Andrew Or <andrew@databricks.com>

Closes apache#6004 from andrewor14/dag-viz-checkpoint and squashes the following commits:

9217439 [Andrew Or] Fix checkpoints
4ae8806 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-checkpoint
19bc07b [Andrew Or] Treat checkpoint as an RDD operation
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Such that a checkpoint RDD does not go into random scopes on the UI, e.g. `take`. We've seen this in streaming.

Author: Andrew Or <andrew@databricks.com>

Closes apache#6004 from andrewor14/dag-viz-checkpoint and squashes the following commits:

9217439 [Andrew Or] Fix checkpoints
4ae8806 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-checkpoint
19bc07b [Andrew Or] Treat checkpoint as an RDD operation
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Such that a checkpoint RDD does not go into random scopes on the UI, e.g. `take`. We've seen this in streaming.

Author: Andrew Or <andrew@databricks.com>

Closes apache#6004 from andrewor14/dag-viz-checkpoint and squashes the following commits:

9217439 [Andrew Or] Fix checkpoints
4ae8806 [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-viz-checkpoint
19bc07b [Andrew Or] Treat checkpoint as an RDD operation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants