-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #25415 - import Hypervisor facts from Candlepin #7821
Conversation
Issues: #25415 |
@@ -35,6 +35,15 @@ def get_all(uuids) | |||
consumers | |||
end | |||
|
|||
# workaround for https://bugzilla.redhat.com/1647724 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might have a heavy performance impact, as it queries each hypervisor directly, do we have tests that cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the number of hyprevisors won't be too big I think and it also runs asynchronously... we should ask QE for explicit test if we're in doubt, but this should be good until the candlepin provides data on /consumers endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're currently batching the requests 75 at a time, so if we have 15k hypervisors (and users do), the old method makes 200 HTTPS requests, the new one 15000. I'll run a few tests on my local box to have a few before/after numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with 500 hypervisors, 50 guests each
before patch: 0(run)+110(finalize) sec new, 0(run)+20(finalize) sec update
after patch: 180(run)+200(finalize) sec new, 75(run)+175(finalize) sec update
I'll see how we can speed that up :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lzap @jlsherrill if you have ideas, please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is to immediately add telemetry to this file to get some real-world numbers, then we can talk about poking the codebase. It's super easy the very same thing is done in this plugin: https://github.com/theforeman/foreman_discovery/pull/408/files#diff-45af6b2f1c078550eba223b206b16cb4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evgeni yeah, we used to query each host individually and recently changed to not do that to speed things up. I'm not surprised this slows it down quite a bit.
If we are okay with facts being imported eventually (but not necessarily immediately), one option would be to use our 'event_queue' to import them. This would involve just throwing the host id on the queue, and then it would get processed in the background and would eventually be imported.
e2ba5e7
to
3720367
Compare
[test katello] |
I tested with local libvirt and got a duplicate host. Otherwise it works great. More details: I already had server ibm-x3655-03...com on which I run foreman+katello+libvirt, I configured virt-who on the same machine and restarted virt-who service. A new host `virt-who-ibm-x3655-03...com-1 appeared. I see facts correctly set. But given the history of duplicated hosts, we should try to map it correctly to existing hosts I think as we always have the Foreman host available in DB after installation. |
Sorry got confused, this only adds facts fetching (which works), the duplicity issue is already there. |
49cd832
to
b2863a7
Compare
Not sure why that one test is failing now :/ |
So, I played a bit more with this. And the slowdown does not come from this PR at all. It's from bd456f150c00ab782b38f663af9cd6e3880c9a7e, where we started to correctly load data from Candlepin. Before that commit, katello/app/lib/actions/katello/host/hypervisors_update.rb Lines 106 to 115 in a0e0f89
So, to revisit my numbers: without facts
with facts
With that said, I would say I am mostly happy with the performance, as I knew it will have an impact. Still need to figure out why that one test fails. |
The test failure is due to https://projects.theforeman.org/issues/25546 and by that unrelated. |
8d9a8a0
to
c7795ae
Compare
[test katello] |
Hah. 💚 tests @jlsherrill if you could have another look, that'd be awesome :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline comment
def get_all_with_facts(uuids) | ||
consumers = [] | ||
uuids.each do |uuid| | ||
consumers << get(uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably can be better written as
uuids.collect { |uuid| get(uuid) }
@hosts.each do |uuid, host| | ||
update_subscription_facet(uuid, host) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, why did you move this to the run phase and put it in a transaction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FactImporter needs to run outside of a transaction, so I moved the code to the run phase. But I also wanted the rest of the code to run in transaction, so I wrapped it with one. Should I add a comment with this explanation?
Testing this against master, i saw these timings with ~450 hypervisors: initial load 172s -> 266s This seems like a lot to me, although it seems to conflict with your findings? I can send you some user provided json with these 450 hypervisors if you want to try yourself. If this performance decrease is accurate, i'd suggest one of a few things:
|
Just re-run this on a fresh VM with 500 hypervisors (2 guests each): without patch: with patch: So not far from what you've seen (and I'd say also not too different to the previous numbers, which were in the 60-90sec increase ballpark, even tho the "new data" increase is more 120sec here).
|
@evgeni i think my bigger concern was around db locking and increasing that time. (although the transaction is only be increased by however long it takes to fetch the facts, but not store them). I wonder if we should just move the entire thing outside of a transaction? Since its written to be idempotent (and can be re-run at any time), i think that would be better? Curious your thoughts. |
Yeah, I can see this being a concern. We could break that up into two transactions? One for
I didn't dare to touch this aspect yet. One of the "hidden" "gems" of this task is that |
Yeah, I think that makes sense!
fair enough, i may file an issue and try to tackle this soon in some manner. |
@jlsherrill updated with two transactions |
[test katello] |
@evgeni I was able to get facts importing from a virt-who hypervisor with this PR 👍
But when I try to get the facts from the API, it still returns null. I couldn't find why this is happening for just the virt-who hypervisors vs. other content hosts. Let me know if you see the same, it would be helpful to use fo this bug
Also, should |
@johnpmitsch I've seen that, it seems to me that it will return fine on And no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked well for me and code looks good. My other comment seems to be a valid issue, but seems pre-existing and not caused by this PR, so I think we can fix that separately
I haven't tested or evaluated any of the scaling performance, I'll leave it up to others who have been in those conversations to give their approval.
we discussed this PR during grooming. The performance impact is OK thanks @evgeni ! |
💚 thanks everyone for reviewing, helping, discussing! 🎉 |
No description provided.