-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save shard-specific query plans to metrics #1229
Comments
I have been working on this and currently have the subPlans created by the VisitorFunction added to the metrics. @ivakegg had mentioned that it would be a good idea to open up discussion regarding what route we might want to go design wise for a few different aspects. I wanted to get some design based opinions on:
*Edit - My branch in the Datawave repo: datawave/tree/bugfix/DATAWAVE-1609 |
So when the metric makes it to the query metric service, there we need to add fields to the query event being stored in accumulo that encode this information. I am thinking that we might add fields as follows:
Where hash1 and hash2 are the hashes of the respective query plan strings. The reason for this somewhat awkward storage is that we expect the query plans to be potentially huge and the same sub-plan will exist for many of the ranges. This will avoid storing those sub-plans multiple times for the same value. |
I talked with @ivakegg and I think we settled on something a little different...
What I put above roughly represents what the mutations going to the metrics shard table should look like. You'll notice that the RANGES field has multiple values. This is intentional, and part of this effort will be to write an accumulo combining iterator to combine the values of those entries into a single value. So, what we eventually will write to the table will look more like this:
Combining the RANGES fields should help to save space since the idea is that we'll be writing a lot of plans. A follow-on task would be to update the query metrics html endpoint to display the contents of the RANGES field as a set of clickable links. When you click on one of these links, we should reload the metrics for given query and swap the displayed PLAN with the selected SUBPLAN. For now though I would just work on the first part - we can work out the UI details at a later time. |
Wouldn't it make lookups faster if we kept the RANGES separate? RANGES.HASH1:20220101_23 |
I suppose there may be an issue when we have so many ranges that we blow out the size of a Key in accumulo... |
Shard-specific query plans are created by the visitor function. We should add these query plans to the metrics for the query, and make the metrics viewable via the metrics endpoint.
The metrics endpoint should be updated to accept a shard query parameter which can be used to determine which query plan is displayed. We could/should also display a column with the shards associated with the query, and potentially make each one a hyperlink to the metrics entry specific to that shard.
The text was updated successfully, but these errors were encountered: