-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4072][Core]Display Streaming blocks in Streaming UI #6672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @tdas |
|
Test build #34281 has finished for PR 6672 at commit
|
|
Awesome! Screenshot looking pretty good. Here are some nit picky comments.
Related
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will it look if one of the 2 copies of the blocks fall off from the executor?
2x will become 1x. When one of 2 copies is removed, an update info with StorageLevel.None will be sent. Then removeBlockFromBlockManager will create a new StorageLevel with replication = 1 for this block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cool then. Will take a more detailed look at the code soon.
On Fri, Jun 5, 2015 at 4:36 PM, Shixiong Zhu notifications@github.com
wrote:
In core/src/main/scala/org/apache/spark/storage/BlockStatusListener.scala
#6672 (comment):
useOffHeap = externalBlockStoreSize > 0,deserialized = storageLevel.deserialized,replication = newLocations.size)blocks.put(blockId,BlockUIData(blockId,newStorageLevel,memSize,diskSize,externalBlockStoreSize,newLocations))} else {// If isValid is not true, it means we should drop the block.blocksInBlockManager -= blockIdremoveBlockFromBlockManager(blockId, blockManagerId)How will it look if one of the 2 copies of the blocks fall off from the
executor?2x will become 1x. When one of 2 copies is removed, an update info with
StorageLevel.None will be sent. Then removeBlockFromBlockManager will
create a new StorageLevel with replication = 1 for this block.—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/6672/files#r31860010.
|
Test build #34365 has finished for PR 6672 at commit
|
|
Test build #34418 has finished for PR 6672 at commit
|
|
retest this please |
|
Test build #34437 has finished for PR 6672 at commit
|
|
@JoshRosen Can you take a look at this PR, especially the SparkListener updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt UpdateBlockInfo private[spark]? Then its not right that a public class refer to an internal class that people will not be able to refer to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Just added a new develop class BlockUpdatedInfo. And I also updated JsonProtocol to support the new SparkListenerBlockUpdated.
|
Test build #35441 has finished for PR 6672 at commit
|
|
retest this please |
|
Test build #35451 has finished for PR 6672 at commit
|
|
retest this please |
|
Test build #35458 has finished for PR 6672 at commit
|
|
I've noticed that you're not persisting the SparkListenerBlockUpdated events in the event logging listener. Since there might be tons of these events, I can understand why we might not want to persist them in the event log, similar to how we don't persist ExecutorMetricsUpdate events. If this is intentional, I think we should add an override and comment to EventLoggingListener to make this more explicit: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L201 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we get an unknown class name here? That situation might occur if we're parsing a newer event log using an older version of the Spark HistoryServer. This point might be moot if we're not persisting these events to the log (see my earlier comment about EventLoggingListener). If this is safe due to lack of persistence, maybe we should just leave a note here explaining that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That situation might occur if we're parsing a newer event log using an older version of the Spark HistoryServer.
Good point. But actually, I think JsonProtocol only supports backward-compatibility. JsonProtocol.sparkEventFromJson does not handle unknown class names either. Because ReplayListenerBus will throw an exception if parsing a json event log unsuccessfully, it cannot support forward-compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. This will throw scala.MatchError for an unknown class name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ReplayListenerBus ties to log exceptions except when it receives an IOException (which might be fatal from its perspective). AFAIK it will try to keep going when encountering unknown events, but I don't know that there's a test for this. I'll investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReplayListenerBus will log exceptions except IOException. But it also skips the rest content in the file. BTW, scala.MatchError is a RuntimeException.
|
I see that you have changed the data structures to suit the desired table structure. But since the table structure has to change, i guess the data structures have to change. So I will not take a look at that right now. |
|
Test build #36627 has finished for PR 6672 at commit
|
|
Test build #36680 has finished for PR 6672 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These public functions can get called at different times leading to inconsistent values in the UI. Can you define a method that returns both information in a single synchronized call?
|
Test build #36934 has finished for PR 6672 at commit
|
|
Screenshot looks great! We can do it separately but maybe we should remove the RDD table if there are no RDD blocks cached. |
|
Everything else looks good, but I just realized that there are no UI tests! There should be tests added to StorageTabSuite, isnt it? |
I agree that we should do unit tests for UI stuffs. Let me see how to do some unit tests for |
Updated it in this PR. |
|
Test build #37004 has finished for PR 6672 at commit
|
|
retest this please |
|
Test build #37009 has finished for PR 6672 at commit
|
|
retest this please |
|
Jenkins, test this please. |
|
Test build #37055 has finished for PR 6672 at commit
|
|
Jenkins, test this please. |
|
Test build #37072 has finished for PR 6672 at commit
|
|
LGTM, merging this in master! |



Replace #6634
This PR adds
SparkListenerBlockUpdatedto SparkListener so that it can monitor all block update infos that are sent toBlockManagerMasaterEndpoint, and also add new tables in the Storage tab to display the stream block infos.