New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20647][core] Port StorageTab to the new UI backend. #19679
Conversation
This required adding information about StreamBlockId to the store, which is not available yet via the API. So an internal type was added until there's a need to expose that information in the API. The UI only lists RDDs that have cached partitions, and that information wasn't being correctly captured in the listener, so that's also fixed, along with some minor (internal) API adjustments so that the UI can get the correct data. Because of the way partitions are cached, some optimizations w.r.t. how often the data is flushed to the store could not be applied to this code; because of that, some different ways to make the code more performant were added to the data structures tracking RDD blocks, with the goal of avoiding expensive copies when lots of blocks are being updated. Tested with existing and updated unit tests.
For context:
|
Note that unit tests will fail; they depend on changes being added in #19678. I'm posting this to speed up the review process. |
Test build #83509 has finished for PR 19679 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm (at least, as much I can say now till the other parts are merged in and tests pass etc.)
I still need to make sure I understand StreamBlocks better before I would merge, but I do think you've got the same behavior as before
@@ -685,12 +700,55 @@ class AppStatusListenerSuite extends SparkFunSuite with BeforeAndAfter { | |||
assert(exec.info.diskUsed === 0L) | |||
} | |||
|
|||
// Add a block from a different RDD. Verify the executor is updated correctly and also that | |||
// the distribution data for rdd1 is up to date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're only checking the distribution data for rdd2, right?
I figured out what is going on with StreamBlocks now, both before and after this change, so I'm OK with it |
Test build #83560 has finished for PR 19679 at commit
|
Test build #83604 has finished for PR 19679 at commit
|
Test build #83619 has finished for PR 19679 at commit
|
merged to master |
This required adding information about StreamBlockId to the store,
which is not available yet via the API. So an internal type was added
until there's a need to expose that information in the API.
The UI only lists RDDs that have cached partitions, and that information
wasn't being correctly captured in the listener, so that's also fixed,
along with some minor (internal) API adjustments so that the UI can
get the correct data.
Because of the way partitions are cached, some optimizations w.r.t. how
often the data is flushed to the store could not be applied to this code;
because of that, some different ways to make the code more performant
were added to the data structures tracking RDD blocks, with the goal of
avoiding expensive copies when lots of blocks are being updated.
Tested with existing and updated unit tests.