-
Notifications
You must be signed in to change notification settings - Fork 899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Batch collect gap metrics collection #22221
base: master
Are you sure you want to change the base?
Conversation
0f0a937
to
6ce5073
Compare
update:
|
6ce5073
to
cecb6d6
Compare
cecb6d6
to
54d55b0
Compare
54d55b0
to
6fd9bc8
Compare
@Fryguy The commits look good, I just left as WIP because I wanted to discuss which of these commits we want to pull out into separate PRs. Let me know if you want to discuss them (or if some of the commits need better documentation) |
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation. |
1 similar comment
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation. |
6fd9bc8
to
55e5a9a
Compare
55e5a9a
to
2d50b8e
Compare
This pull request is not mergeable. Please rebase and repush. |
This pull request has been automatically closed because it has not been updated for at least 3 months. Feel free to reopen this pull request if these changes are still valid. Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation. |
only method movement and indentation changes cutting up perf_process to make it easier to profile. - consolidate multiple process_counter_vaues into a single call - extract perf_capture_states for multiple perf_capture calls - extracted update_last_perf_capture_on - extracted raise_perf_complete_alerts - extracted rollupo_to_parent_metrics
We run rollups for a single ems, and a single zone. But we were looking up an ems and my_zone for every resource rollup optimization: pass the zone to miq, since it is known and we already have the ems / zone in memory. We have had to do this optimization for a number of other miq_queue calls, so it makes sense. This code is using the parent object for the ems. Any one would do. If this method gets converted to a static method, then use any one of the resources' ems. numbers ------- Running capture for only 16 Vms. They generate: ManageIQ::Providers::Vmware::InfraManager::HostEsx#perf_capture_realtime => 6 ManageIQ::Providers::Vmware::InfraManager::Vm#perf_capture_realtime => 16 So that will save 22 ems and 22 zone lookups (total number of vms + hosts) Minor savings. But removes queries But it clears up the number of queries and makes it easier to | ms | bytes | objects |query | qry ms | rows |`comments` | ---:| ---:| ---:| ---:| ---:| ---:| --- | 19,312.2 | 138,947,654* | 32,019,635 |1,151 | 6,993.9 | 262 |`capture-vc91-250-before` | 19,192.5 | 108,353,470* | 31,971,845 |1,117 | 6,869.5 | 228 |`capture-vc91-250-after` | | | | 3% | | 13% | delta
Since this is on an instance method, just use the common ems in the parent. All of the targets has the same ems
- Using a common last_perf_capture_on so they can be grouped together. - This also sets the value from before contacting the server, so we don't introduce micro gaps when collecting right around the hour.
sort targets for metrics collection so consistent records are batched together.
Entries are sent in batches of up to 250 records. Currently, they are still batched by day
2d50b8e
to
fa6f72b
Compare
This pull request has been automatically closed because it has not been updated for at least 3 months. Feel free to reopen this pull request if these changes are still valid. Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation. |
we still have more work to do here |
Checked commits kbrock/manageiq@e70ca4e~...fa6f72b with ruby 2.7.8, rubocop 1.56.3, haml-lint 0.51.0, and yamllint |
This pull request has been automatically marked as stale because it has not been updated for at least 3 months. If these changes are still valid, please remove the |
Overview
When we detect that metrics capture has not been run for a while, or never run for that matter, we submit gap/historical metrics requests.
followup to #20071
Problem
Since the number of queue entries are split out by day, we were possibly putting hundreds of requests onto the queue for only a dozen VMs.
Solution
When a gap is detected for inventory items, send the gap collection request in batches. This is minimal since the work to handle batches of inventory items has already been implemented.
For a coordinator with a troublesome C&U installation, gaps are by far the biggest issue we've had in the past in terms of over extending our queue.
Now, it is putting a dozen message for hundreds of vms.
The backend has already been changed to handle batching collection. This changes the coordinator to send the collection requests in batches
This PR is split up into many improvements. They build a little but are relatively independent:
testing:
performance:
enhancements:
this has big performance improvement b/c 250 times fewer queue entries. (# vms * # days / 250)