Conversation
This commit changes the labels in the Monitor Activity Menu a bit and adds a new Compaction Overview page. The coordinator activity table was moved from the Manager page to this new page, along with tables for queue activity, running compactions by table and group.
|
#6251 will be closed and this merged in its place when this is ready. Currently having an issue with the data in the compactions by table/group tables not showing |
| FMetric fm = FMetric.getRootAsFMetric(binary); | ||
| for (int i = 0; i < fm.tagsLength(); i++) { | ||
| FTag t = fm.tags(i); |
There was a problem hiding this comment.
Can allocate FMetric and FTag objects once before the outer loop and call methods that populate them like FMetric.getRootAsFMetric(binary, fm) and fm.tags(t, i)
There was a problem hiding this comment.
Great catch. I didn't know about those methods.
| cols.add(new MetricColumnFactory(Metric.TSERVER_TABLETS_LONG_ASSIGNMENTS)); | ||
| List<Metric> tabletServerMetrics = tabletServerMetrics(); | ||
| ColumnFactoryList cfl = new ColumnFactoryList(tabletServerMetrics.size()); | ||
| tabletServerMetrics.forEach(tsm -> cfl.add(new MetricColumnFactory(tsm))); |
There was a problem hiding this comment.
Before this change the tablet server columns were grouped in such a way that scan columns, write columns, and conditional update columns were ordered together. Also a subset of all metrics were selected. After these changes the order is scrambled and alot metrics are included.
There was a problem hiding this comment.
I implemented a change in 1b19113 that sorts the list of columns after everything is added. It should put the common columns first and sort by the metric key. Our naming convention of the metric keys should group like things next to each other. If the user wants the columns in a specific order in their browser, then they should be able to drag and drop the columns in the order they want.
There was a problem hiding this comment.
What is the goal of these changes with the tserver view? In #6329 the goal was to make the tserver view use a curated set of columns with a given initial order. Those web page columns could be derived from multiple metrics. In the future we could have columns derived from metadata table or zookeeper data and not just metrics (like a count of tablets in metadata table w/ future set to a tserver) or even a combination of data from those sources. Attached a screenshot of these changes which could be compared w/ the screenshot in #6329. I had to zoom way way out to get the screenshot.
There was a problem hiding this comment.
I was trying to reduce the lines of code here that were setting the ColumnFactory to the MetricColumnFactory. MetricColumnFactory seems like the default implementation, so I was applying that to everything and then applying your overrides. I don't think I realized that you removed some of the columns when replacing them with the new computed columns. It makes sense, I just didn't put it together when reviewing your PR. I'm fine with backing out these changes to what you had in #6329.
However, I think I'm confused as to the direction were going here with where we are defining columns for the Monitor UI pages. My PR #6278 modified the code to display all of the metrics as columns and it moved the column and order information from the front-end code. @DomGarguilo had reservations about this saying that he believed the front-end code should request the specific columns that it wants from the back-end. I'm not sure if the changes in #6329 makes that harder or not.
There was a problem hiding this comment.
We could do a few things in the future to improve this.
- Create a per server page that show all the metrics in a long table where each row is a single metric. Then the view of all servers could link to this for each server. With this hopefully the overall view can indicate there is a problem with a server and the per server view an offer a lot more details.
- Potentially create an end point that advertises what columns are available for a server view and have UI elements for selecting which ones you want to see. Have a default smaller set. The web columns available for a server may not have a 1:1 relationship w/ metrics.
Not sure about these options, but it seems like for now we need a curated view to make the pages useful. We should iterate on how to make more information available in a way that is useful.
| .add(fm); | ||
| private ServersView createCompactionQueueSummary(final Set<ServerId> managers) { | ||
|
|
||
| final Column COMPACTION_QUEUE_COL = |
There was a problem hiding this comment.
This is not following the pattern of the existing code where static methods in server view compute the ServerView. However its seems computing the ServerView in SystemInformation is better overall. Also this PR uses ServerView to compute a table view for the monitor that is not server based. So maybe we could rename ServerView to TableView and maybe the rest endpoint could be made more generic.
Not a change for this PR, but thinking ServerView/TableView should be made a pure data class like
record TableView(List<Column> columns, List<Map<String,Object>> rows, long timestamp){}then we move the computation of these into SystemInformation. The way the current ServersView constructor works is constraining, like having a colunmn that included data from ZK or metatdata table would be cumbersome w/ the current constructor. If SystemInformation directly computed TableView objects, it would make it easier to compute columns based on metrics,metadata table info, and/or ZK info.
Not a change for this PR, I think the code makes it difficult, could collapse the Running Compactions By Queue and Compaction Coordinator Queues tables on the web page into a single table because both tables are keyed on queued.id. This would allow running and queued compactions to be on the same row in the same table.
There was a problem hiding this comment.
this PR uses ServerView to compute a table view for the monitor that is not server based. So maybe we could rename ServerView to TableView and maybe the rest endpoint could be made more generic.
Yeah, the ServersView object is now poorly named. That's my fault as it was solely used to create the tables for the server specific pages on the monitor. Really, it's a generic self-describing object for table data. Here I'm using it that way where I am taking metrics from the Manager and decomposing them into N views of compaction queue specific information.
There was a problem hiding this comment.
For code organization maybe we could keep the ServersView class and introduce a new TableView pure data class. The ServersView class would just contain static methods for computing TableView objects for server tables using data from SystemInformation. Not completely sure about this, was trying to figure out how to avoid shifting too much code into SystemInformation.
There was a problem hiding this comment.
As I was writing these changes I realized that ServersView was no longer an appropriate name and had thought of the name TableData. We could use TableView, doesn't matter to me. We could move the static methods to a TableViewFactory class, but we might need to provide access to creating a TableView directly for some edge cases where a factory method might not make sense.
There was a problem hiding this comment.
Reorganizing into TableData or TableView and a factory class sounds good to me. That would make the code easier to work with and avoid putting too much code into SystemInformation.
This commit changes the labels in the Monitor Activity Menu a bit and adds a new Compaction Overview page. The coordinator activity table was moved from the Manager page to this new page, along with tables for queue activity, running compactions by table and group.