Skip to content

Created Compaction Overview page#6328

Open
dlmarion wants to merge 10 commits intoapache:mainfrom
dlmarion:monitor-coordinator-overview
Open

Created Compaction Overview page#6328
dlmarion wants to merge 10 commits intoapache:mainfrom
dlmarion:monitor-coordinator-overview

Conversation

@dlmarion
Copy link
Copy Markdown
Contributor

This commit changes the labels in the Monitor Activity Menu a bit and adds a new Compaction Overview page. The coordinator activity table was moved from the Manager page to this new page, along with tables for queue activity, running compactions by table and group.

This commit changes the labels in the Monitor Activity Menu a bit
and adds a new Compaction Overview page. The coordinator activity
table was moved from the Manager page to this new page, along with
tables for queue activity, running compactions by table and group.
@dlmarion dlmarion added this to the 4.0.0 milestone Apr 21, 2026
@dlmarion dlmarion self-assigned this Apr 21, 2026
@dlmarion
Copy link
Copy Markdown
Contributor Author

#6251 will be closed and this merged in its place when this is ready. Currently having an issue with the data in the compactions by table/group tables not showing

@dlmarion dlmarion marked this pull request as ready for review April 23, 2026 16:11
Comment on lines +554 to +556
FMetric fm = FMetric.getRootAsFMetric(binary);
for (int i = 0; i < fm.tagsLength(); i++) {
FTag t = fm.tags(i);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can allocate FMetric and FTag objects once before the outer loop and call methods that populate them like FMetric.getRootAsFMetric(binary, fm) and fm.tags(t, i)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I didn't know about those methods.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this change in 1b19113

cols.add(new MetricColumnFactory(Metric.TSERVER_TABLETS_LONG_ASSIGNMENTS));
List<Metric> tabletServerMetrics = tabletServerMetrics();
ColumnFactoryList cfl = new ColumnFactoryList(tabletServerMetrics.size());
tabletServerMetrics.forEach(tsm -> cfl.add(new MetricColumnFactory(tsm)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change the tablet server columns were grouped in such a way that scan columns, write columns, and conditional update columns were ordered together. Also a subset of all metrics were selected. After these changes the order is scrambled and alot metrics are included.

Copy link
Copy Markdown
Contributor Author

@dlmarion dlmarion Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a change in 1b19113 that sorts the list of columns after everything is added. It should put the common columns first and sort by the metric key. Our naming convention of the metric keys should group like things next to each other. If the user wants the columns in a specific order in their browser, then they should be able to drag and drop the columns in the order they want.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of these changes with the tserver view? In #6329 the goal was to make the tserver view use a curated set of columns with a given initial order. Those web page columns could be derived from multiple metrics. In the future we could have columns derived from metadata table or zookeeper data and not just metrics (like a count of tablets in metadata table w/ future set to a tserver) or even a combination of data from those sources. Attached a screenshot of these changes which could be compared w/ the screenshot in #6329. I had to zoom way way out to get the screenshot.

Screenshot at 2026-04-24 10-22-11

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are also including duplicate data relative to #6329. Like in #6329 a single web column is computed from multiple cache metrics. These changes include those raw metrics and the derived data in columns.

Copy link
Copy Markdown
Contributor Author

@dlmarion dlmarion Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to reduce the lines of code here that were setting the ColumnFactory to the MetricColumnFactory. MetricColumnFactory seems like the default implementation, so I was applying that to everything and then applying your overrides. I don't think I realized that you removed some of the columns when replacing them with the new computed columns. It makes sense, I just didn't put it together when reviewing your PR. I'm fine with backing out these changes to what you had in #6329.

However, I think I'm confused as to the direction were going here with where we are defining columns for the Monitor UI pages. My PR #6278 modified the code to display all of the metrics as columns and it moved the column and order information from the front-end code. @DomGarguilo had reservations about this saying that he believed the front-end code should request the specific columns that it wants from the back-end. I'm not sure if the changes in #6329 makes that harder or not.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted these changes in eecabcf

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a few things in the future to improve this.

  1. Create a per server page that show all the metrics in a long table where each row is a single metric. Then the view of all servers could link to this for each server. With this hopefully the overall view can indicate there is a problem with a server and the per server view an offer a lot more details.
  2. Potentially create an end point that advertises what columns are available for a server view and have UI elements for selecting which ones you want to see. Have a default smaller set. The web columns available for a server may not have a 1:1 relationship w/ metrics.

Not sure about these options, but it seems like for now we need a curated view to make the pages useful. We should iterate on how to make more information available in a way that is useful.

.add(fm);
private ServersView createCompactionQueueSummary(final Set<ServerId> managers) {

final Column COMPACTION_QUEUE_COL =
Copy link
Copy Markdown
Contributor

@keith-turner keith-turner Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not following the pattern of the existing code where static methods in server view compute the ServerView. However its seems computing the ServerView in SystemInformation is better overall. Also this PR uses ServerView to compute a table view for the monitor that is not server based. So maybe we could rename ServerView to TableView and maybe the rest endpoint could be made more generic.

Not a change for this PR, but thinking ServerView/TableView should be made a pure data class like

record TableView(List<Column> columns, List<Map<String,Object>> rows, long timestamp){}

then we move the computation of these into SystemInformation. The way the current ServersView constructor works is constraining, like having a colunmn that included data from ZK or metatdata table would be cumbersome w/ the current constructor. If SystemInformation directly computed TableView objects, it would make it easier to compute columns based on metrics,metadata table info, and/or ZK info.

Not a change for this PR, I think the code makes it difficult, could collapse the Running Compactions By Queue and Compaction Coordinator Queues tables on the web page into a single table because both tables are keyed on queued.id. This would allow running and queued compactions to be on the same row in the same table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR uses ServerView to compute a table view for the monitor that is not server based. So maybe we could rename ServerView to TableView and maybe the rest endpoint could be made more generic.

Yeah, the ServersView object is now poorly named. That's my fault as it was solely used to create the tables for the server specific pages on the monitor. Really, it's a generic self-describing object for table data. Here I'm using it that way where I am taking metrics from the Manager and decomposing them into N views of compaction queue specific information.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code organization maybe we could keep the ServersView class and introduce a new TableView pure data class. The ServersView class would just contain static methods for computing TableView objects for server tables using data from SystemInformation. Not completely sure about this, was trying to figure out how to avoid shifting too much code into SystemInformation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was writing these changes I realized that ServersView was no longer an appropriate name and had thought of the name TableData. We could use TableView, doesn't matter to me. We could move the static methods to a TableViewFactory class, but we might need to provide access to creating a TableView directly for some edge cases where a factory method might not make sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganizing into TableData or TableView and a factory class sounds good to me. That would make the code easier to work with and avoid putting too much code into SystemInformation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants