Skip to content

Add caching information to rdd.toDebugString#1535

Closed
nkronenfeld wants to merge 9 commits intoapache:masterfrom
nkronenfeld:feature/debug-caching2
Closed

Add caching information to rdd.toDebugString#1535
nkronenfeld wants to merge 9 commits intoapache:masterfrom
nkronenfeld:feature/debug-caching2

Conversation

@nkronenfeld
Copy link
Contributor

I find it useful to see where in an RDD's DAG data is cached, so I figured others might too.

I've added both the caching level, and the actual memory state of the RDD.

Some of this is redundant with the web UI (notably the actual memory state), but (a) that is temporary, and (b) putting it in the DAG tree shows some context that can help a lot.

For example:

(4) ShuffledRDD[3] at reduceByKey at <console>:14
 +-(4) MappedRDD[2] at map at <console>:14
    |  MapPartitionsRDD[1] at mapPartitions at <console>:12
    |  ParallelCollectionRDD[0] at parallelize at <console>:12

should change to

(4) ShuffledRDD[3] at reduceByKey at <console>:14 [Memory Deserialized 1x Replicated]
 |       CachedPartitions: 4; MemorySize: 50.8 MB; TachyonSize: 0.0 B; DiskSize: 0.0 B
 +-(4) MappedRDD[2] at map at <console>:14 [Memory Deserialized 1x Replicated]
    |  MapPartitionsRDD[1] at mapPartitions at <console>:12 [Memory Deserialized 1x Replicated]
    |      CachedPartitions: 4; MemorySize: 109.1 MB; TachyonSize: 0.0 B; DiskSize: 0.0 B
    |  ParallelCollectionRDD[0] at parallelize at <console>:12 [Memory Deserialized 1x Replicated]

@pwendell
Copy link
Contributor

Hey, do you mind putting an example of what the output looks like in the PR description?

nkronenfeld referenced this pull request Jul 22, 2014
…es more clear

Changes RDD.toDebugString() to show hierarchy and shuffle transformations more clearly

New output:

```
(3) FlatMappedValuesRDD[325] at apply at Transformer.scala:22
 |  MappedValuesRDD[324] at apply at Transformer.scala:22
 |  CoGroupedRDD[323] at apply at Transformer.scala:22
 +-(5) MappedRDD[320] at apply at Transformer.scala:22
 |  |  MappedRDD[319] at apply at Transformer.scala:22
 |  |  MappedValuesRDD[318] at apply at Transformer.scala:22
 |  |  MapPartitionsRDD[317] at apply at Transformer.scala:22
 |  |  ShuffledRDD[316] at apply at Transformer.scala:22
 |  +-(10) MappedRDD[315] at apply at Transformer.scala:22
 |     |   ParallelCollectionRDD[314] at apply at Transformer.scala:22
 +-(100) MappedRDD[322] at apply at Transformer.scala:22
     |   ParallelCollectionRDD[321] at apply at Transformer.scala:22
```

Author: Gregory Owen <greowen@gmail.com>

Closes #1364 from GregOwen/to-debug-string and squashes the following commits:

08f5c78 [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
1603f7b [Gregory Owen] toDebugString: prettier debug printing to show shuffles and joins more clearly
@nkronenfeld
Copy link
Contributor Author

Done, and I also left a comment on Greg Owen's PR from yesterday asking him for formatting comments

@nkronenfeld
Copy link
Contributor Author

Sorry, forgot to move one small formatting issue over from the old branch, I'll check that in as soon as I test it. [DONE]

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16987/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA results for PR 1535:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16987/consoleFull

@pwendell
Copy link
Contributor

@gowen mind taking a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s"$partitionStr $desc"
s"$nextPrefix $desc"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And elsewhere in this PR, avoid string concatenation with + when string interpolation would be equally clear or clearer.

@nkronenfeld
Copy link
Contributor Author

thanks mark, I had no idea that existed.

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17005/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1535:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17005/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17008/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1535:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17008/consoleFull

@nkronenfeld
Copy link
Contributor Author

I'm not sure what to do about this test failure; all I've changed is toDebugString, and this is in a spark streaming test which never calls that, so I'm pretty sure it's nothing to do with me.

@markhamstra
Copy link
Contributor

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17035/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1535:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17035/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey on this one, this is actually an extremely operation... I wonder if maybe for now it's better to not put this in there and only put the storage level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean - do you mean "an extremely costly operation"?

Assuming that to be the case, two comments::

  • I though about attaching flags to the function so one could specify the type of debug information desired; I think that makes the function too complex, but I'm hardly firm in that idea.
  • This whole function is specifically to help a developer with debugging. I don't think having it be costly is all that bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, yeah I mean this very costly. I'd rather not do this in a debug function - because people will do things like print debug statements inside of loops. In that case the debugging will significantly alter the performance of their application. There is a separate JIRA to make this function faster (it's a function also used in the UI), but until that's fixed I'd rather not call it here:

https://issues.apache.org/jira/browse/SPARK-2316

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW - we can create a JIRA to add this back once SPARK-2316 is fixed if you'd like.

@GregOwen
Copy link
Contributor

Looks good to me.

…e shown or not. Default is for it not to be shown.
@nkronenfeld
Copy link
Contributor Author

I just parameterized the memory so one can display it or not as desired (with not displaying it the default) - is that sufficient?

I forgot to put in the note about the JIRA into the code, I'll definitely add that too, or I can back out the optional nature and just leave in the code comment about the JIRA

Also, while I was at it, I marked this method as DeveloperAPI - it seems to me an oversight that it isn't, but if I'm wrong, or if that should be in a separate PR, let me know, it's trivial to put back, of course.

Let me know which you want, please.

Thanks,
-Nathan

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17302/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA results for PR 1535:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17302/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17304/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA results for PR 1535:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17304/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1535. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17310/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA results for PR 1535:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17310/consoleFull

@nkronenfeld
Copy link
Contributor Author

If I'm reading that correctly, that test failure is from an MLLib change that's nothing to do with what I've done? Perhaps I'll just try it again, maybe it's a bad sync with master:

Jenkins, please test this

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17363/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 29, 2014

QA results for PR 1535:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17363/consoleFull

@pwendell
Copy link
Contributor

Hey @nkronenfeld - I traced through the exact function call more closely and I actually think it's fine. The issue I pointed out in the JIRA is orthogonal. So I'm fine to just revert this back to always showing the status. However, we should not mark this as a developer API. This is a stable API we are happy to support forever.

Still, this will cause a significant amount of object allocation due to the way other internal function calls happen (it is basically O(all blocks)) for an application. It might be nice to add a note to the docs that the operation might be expensive and should not be called inside of a critical code path. Though we could likely optimize those things down the road.

@nkronenfeld
Copy link
Contributor Author

Thanks, @pwendel. I can revert it back if you want - is that preferable to the way it is now, with the option to include the memory info or not?

I'll start with taking out the DeveloperAPI and adjusting the docs; I'll leave off taking out the optional memory parameter until I hear from you again.

@pwendell
Copy link
Contributor

yeah to keep it simple let's just always have it show memory. I'd rather not add a new public API for this showMemory thing at the moment.

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1535. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17612/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1535:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17612/consoleFull

@nkronenfeld
Copy link
Contributor Author

OK, @pwendel, I think it's set now. Let me know if there are merge problems, I can resubmit on a clean branch if necessary.

@pwendell
Copy link
Contributor

Hey this looks good. Merging it now into mater. Sorry about the delay.

@asfgit asfgit closed this in fba8ec3 Aug 15, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
I find it useful to see where in an RDD's DAG data is cached, so I figured others might too.

I've added both the caching level, and the actual memory state of the RDD.

Some of this is redundant with the web UI (notably the actual memory state), but (a) that is temporary, and (b) putting it in the DAG tree shows some context that can help a lot.

For example:
```
(4) ShuffledRDD[3] at reduceByKey at <console>:14
 +-(4) MappedRDD[2] at map at <console>:14
    |  MapPartitionsRDD[1] at mapPartitions at <console>:12
    |  ParallelCollectionRDD[0] at parallelize at <console>:12
```
should change to
```
(4) ShuffledRDD[3] at reduceByKey at <console>:14 [Memory Deserialized 1x Replicated]
 |       CachedPartitions: 4; MemorySize: 50.8 MB; TachyonSize: 0.0 B; DiskSize: 0.0 B
 +-(4) MappedRDD[2] at map at <console>:14 [Memory Deserialized 1x Replicated]
    |  MapPartitionsRDD[1] at mapPartitions at <console>:12 [Memory Deserialized 1x Replicated]
    |      CachedPartitions: 4; MemorySize: 109.1 MB; TachyonSize: 0.0 B; DiskSize: 0.0 B
    |  ParallelCollectionRDD[0] at parallelize at <console>:12 [Memory Deserialized 1x Replicated]
```

Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>

Closes apache#1535 from nkronenfeld/feature/debug-caching2 and squashes the following commits:

40490bc [Nathan Kronenfeld] Back out DeveloperAPI and arguments to RDD.toDebugString, reinstate memory output
794e6a3 [Nathan Kronenfeld] Attempt to merge mima changes from master
6fe9e80 [Nathan Kronenfeld] Add exclusions to allow for signature change in toDebugString (will back out if necessary)
31d6769 [Nathan Kronenfeld] Attempt to get rid of style errors.  Add comments for the new memory usage parameter.
a0f6f76 [Nathan Kronenfeld] Add parameter to RDD.toDebugString to allow detailed memory info to be shown or not.  Default is for it not to be shown.
f8f565a [Nathan Kronenfeld] Fix code style error
8f54287 [Nathan Kronenfeld] Changed string addition to string interpolation as per PR comments
2a0cd4d [Nathan Kronenfeld] Fixed a small formatting issue I forgot to copy over from the old branch
8fbecb6 [Nathan Kronenfeld] Add caching information to rdd.toDebugString
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants