New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25719][UI] : Search functionality in datatables in stages page should search over formatted data rather than the raw data #24419
Conversation
…uld search over formatted data rather than the raw data The Pull Request to add datatables to stage page SPARK-21809 got merged. The search functionality in those datatables being a great improvement for searching through a large number of tasks, also performs search over the raw data rather than the formatted data displayed in the tables. It would be great if the search can happen for the formatted data as well. Added code to enable searching over displayed data in tables e.g. "165.7 MiB" or "0.3 ms"
ok to test |
Have also added searching on Shuffle Read Bytes as well as Shuffle Remote Reads columns which was somehow missed out earlier. |
def formatBytes(bytes: Long): String = { | ||
if (bytes == 0) { | ||
return "0.0 B" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you.
|| containsValue(task.taskMetrics.get.shuffleReadMetrics.fetchWaitTime) | ||
|| containsValue(UIUtils.formatDuration( | ||
task.taskMetrics.get.shuffleReadMetrics.fetchWaitTime)) | ||
|| containsValue(UIUtils.formatBytes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also adding new fields to search as well, no big deal, but should probably add to the description of the pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test build #104762 has finished for PR 24419 at commit
|
test this please |
Test build #104764 has finished for PR 24419 at commit
|
Test build #104813 has finished for PR 24419 at commit
|
@@ -145,6 +145,16 @@ private[spark] object UIUtils extends Logging { | |||
} | |||
} | |||
|
|||
def formatBytes(bytes: Long): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there is Utils.bytesToString
already, but it is using decimal units. In this case it makes sense to format in binary units, as you are trying to match what is shown in the UI for the search. You should probably add a comment above to say it's purpose. How about changing the name to formatBytesBinary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, have updated the code. Thank you.
Test build #105294 has finished for PR 24419 at commit
|
…RK-25719 [SPARK-25719] : Upmerging with master branch
Test build #105315 has finished for PR 24419 at commit
|
Test build #105343 has finished for PR 24419 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks @pgandhi999
Merged to master |
The Pull Request to add datatables to stage page SPARK-21809 got merged. The search functionality in those datatables being a great improvement for searching through a large number of tasks, also performs search over the raw data rather than the formatted data displayed in the tables. It would be great if the search can happen for the formatted data as well.
What changes were proposed in this pull request?
Added code to enable searching over displayed data in tables e.g. searching on "165.7 MiB" or "0.3 ms" will now return the search results. Also, earlier we were missing search for two columns in the task table "Shuffle Read Bytes" as well as "Shuffle Remote Reads", which I have added here.
How was this patch tested?
Manual Tests