-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8701][Streaming][WebUI] Add input metadata in the batch page #7081
Conversation
Test build #35977 has finished for PR 7081 at commit
|
@@ -33,7 +33,7 @@ import org.apache.spark.streaming.Time | |||
@DeveloperApi | |||
case class BatchInfo( | |||
batchTime: Time, | |||
streamIdToNumRecords: Map[Int, Long], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a problem. We cannot expose InputInfo
class through BatchInfo
as the first one is private[streaming] whereas the latter in public.
I wondering what is the right way here. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use two parameters: streamIdToNumRecords and streamIdToMetadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I chatted with @pwendell , and the decision in as follows.
- Name = StreamInputInfo,
- It will have the field info: Map[String, Object]. Rather than a string, its most future-proof is to use a Map[String, Object], as we may want to not just strings but actual objects. For example, for files, one of the keys in the map be "files" and the value is a list of files.
- Deprecate
streamIdToNumRecords
and introducestreamIdToInputInfo: Map[Int, StreamInputInfo]
- object StreamInputInfo can have well known key names like for number of records, etc. They are used to store the corresponding data in the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe case class StreamInputInfo(numRecords: Int, metadata: Map[String, Object])
is fine.
And the metadata map will have at least one standard field named "Description" which will map to the string that will be shown in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please make the string prettier by introducing newlines and tabs, and making the HTML preserve newlines and tabs
private def metadataDescriptionToHTML(metadataDescription: String): Seq[Node] = { | ||
// tab to 4 spaces and "\n" to "<br/>" | ||
Unparsed(StringEscapeUtils.escapeHtml4(metadataDescription). | ||
replaceAllLiterally("\t", " ").replaceAllLiterally("\n", "<br/>")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because HTML doesn't support tab
, I use 4 spaces
instead.
Test build #36532 has finished for PR 7081 at commit
|
} | ||
val description = offsetRanges.map { offsetRange => | ||
s"topic: ${offsetRange.topic}\tpartition: ${offsetRange.partition}\t" + | ||
s"range: [${offsetRange.fromOffset}, ${offsetRange.untilOffset})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the format X --> Y (where Y = unitOffset - 1)
BTW, if the range is from: 5 until: 5 (that is not data), then you should ignore that range in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or "offsets: X to Y"
Test build #36679 has started for PR 7081 at commit |
Test build #36679 has finished for PR 7081 at commit
|
Merged build finished. Test FAILed. |
Merged build triggered. |
Merged build started. |
Test build #36682 has started for PR 7081 at commit |
Test build #36682 has finished for PR 7081 at commit
|
Merged build finished. Test PASSed. |
Merged build triggered. |
Merged build started. |
Test build #36915 has started for PR 7081 at commit |
Merged build triggered. |
Merged build started. |
Test build #36918 has started for PR 7081 at commit |
Test build #36918 has finished for PR 7081 at commit
|
Merged build finished. Test FAILed. |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Test build #36920 has started for PR 7081 at commit |
Test build #36915 has finished for PR 7081 at commit
|
Merged build finished. Test FAILed. |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Test build #36925 has started for PR 7081 at commit |
Test build #36920 has finished for PR 7081 at commit
|
Merged build finished. Test FAILed. |
retest this please |
Merged build triggered. |
Test build #36925 timed out for PR 7081 at commit |
Merged build finished. Test FAILed. |
Merged build started. |
Test build #36932 has started for PR 7081 at commit |
Test build #36932 has finished for PR 7081 at commit
|
Merged build finished. Test PASSed. |
Merging this to master! Thanks @zsxwing |
This PR adds
metadata
toInputInfo
.InputDStream
can report its metadata for a batch and it will be shown in the batch page.For example,
FileInputDStream will display the new files for a batch, and DirectKafkaInputDStream will display its offset ranges.