New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blacklist kvm and iTCO modules #132
Comments
Yes this can be done. So I can change the image and add:
With the content
|
We would need this for both LI and VLI. And yeah, blacklisting those modules is what we want. Thanks! |
Hi Jeff, yes this is valide also for LI and VLI |
Thanks |
Image builds to change:
|
this is a general setting - image independent! |
Same here, that means you need this change for all of them. So we need bugzilla reports for the released ones to handle the request properly |
We would like the change in all images please. Entered in Bugzilla, ID 1130713. |
Robert added the change in the SLE15 SP1 Devel images already |
Let me welcome @jesusbv to the Azure LI/VLI team. Jesus will take a look at this one to begin with and to become familiar with the project :) Jesus I'm happy to have you on board. As discussed I'll assign this one to you. |
Welcome aboard, @jesusbv, nice to meet you virtually! I look forward to working with you more closely down the road. |
Thank you, @jeffaco, nice to meet you too ! |
I have updated the rest of the images |
That's a little bit too vague. Please update the checkbox list from #132 (comment). Also if you have done the images marked as "released" please update the bugzilla entry too: https://bugzilla.suse.com/show_bug.cgi?id=1130713 Thanks |
@jesusbv Thanks, looks all great now. I have taken the commits in the Devel projects and applied them to the Stable builds from where we now can provide new testing images. This one completes the milestone. Thanks |
I'm using the latest SLES 12 SP3 test image, That's odd. This change only made it half way (well, 50.1% of the way). 😄 First, by looking at the loaded modules:
So the itco modules are gone, but Looking at the contents of file
So |
Hmm, even if a module is blacklisted it can be pro-actively loaded. Something manually loads the module via modprobe. It's gonna be hard to find out where this happens. Can you check in |
Sigh - I don't like the way this is shaping up so far! 😦
Any other ideas of where to look? |
Hmm, can you safely unload the module
I'm asking to make sure it's not in use Next can you check if it's loaded by the initrd. You need to reboot the system and add the following at the kernel cmdline
Does that make any difference ? |
Oh, yuck. I hate rebooting, it's a pain and takes forever! But ask and I'll do it begrudgingly. First:
So it's not in use. After the reboot:
And the boot command line:
Let me know the next steps, thanks. |
Thanks much for taking the burden to reboot :) I hate it too So there are good and bad news. The good news is; it's not loaded as part of the initrd, because fixing that would be more work and touches sensitive code. The bad news is; I have no clue what loads kvm on your system. It must be a manual loading by some script/program via kvm itself is disabled by your BIOS, this means the cpu relevant parts of kvm via e.g kvm_intel do not exists and are not loaded. Thus the remaining interface part should not hurt in any way. This is a lame excuse I know but I now can only guess what loads kvm on the machine. Interesting enough when I boot the image in my integration system (Virtual Machine) none of the modules are loaded. What you can try:
|
By default there shouldn't be any qemu packages installed. At least the image does not provide any but I don't know if you install stuff via the config file. So qemu was known to manually load the kvm module. All this is just a guess. As we tested the plain image through a VM test and no kvm module was loaded we sent it out for testing. It's really weird that you see the module loaded on your machine, especially because kvm is disabled in the BIOS, that makes loading it completely useless. Can you confirm that none of the components you add trough the yaml config, scripts, packages... do something with module loading ? |
Well, interesting:
So that's not it. No need to reboot for that, so there's that. I don't really install much for testing. I'm not even installing the UCS drivers automatically, although I'll need to do that. There is a test package that I install, just to verify that installation works, and that's still in place. Let me remove that, redeploy, and see where we stand with regards to the The Anyway, I'll modify my setup to not install squat, and then see where I stand. I'll report back shortly (after a reboot, of course, for the deployment). |
Okay, the entire section to install software is commented out in my YAML file. And, unfortunately:
What can we do to try and isolate who is loading |
Let's dig a little deeper:
Since VTx is disabled in the BIOS you should not see
And yes you can remove
Where $KERNEL_VERSION needs to be replaced with the version of the kernel you are running. After reboot we should see some errors inf the boot log. |
The reason why
The output of $ modinfo kvm
...
depends: irqbypass
... Thus, blacklisting @jeffaco, if you could update Hopefully, that would solve it. However, in case |
Okay, I first tried as @jesusbv helpfully suggested, but no go. I did the following for @jesusbv:
So I moved on to what @rjschwei asked for:
After reboot:
However, I was unsure if you wanted the entire output from It's interesting that, in the full |
Whoops, I'm sorry. I totally missed the last part of what @jesusbv said:
So sorry. So here's the current
After a reboot:
Anything I should check in terms of health/stability of the system, in particular? Or do you want to make that change to the |
@jeffaco thanks much for testing. The final setup looks good to me. I'm wondering if we still need to blacklist edac_core, sb_edac. Please note the edac project covers modules for "Error Detection and Correction". I don't want to have them blacklisted. For the purpose of not loading kvm this should not be neded. Can you do a test with:
If that leads to the expected results we should still have a closer look on dmesg. To be honest I'm not sure if that blacklisting is a good idea. What exactly is your motivation on that ? As you saw from the modinfo, a module is only loaded if the hardware matches a certain hardware blacklisting of modules only makes sense if there are conflicts. For example VirtualBox virtualization had a conflict with kvm in the past and that made VirtualBox to be non functional if kvm was loaded. However in your case the reason for all this blacklisting is unknown to me
This, to be honest, doesn't sound like a reason why we should blacklist them. The loading of the module even if not needed takes away a few bytes of your main memory. Compared to the huge amount of main memory you have available I would say it doesn't matter at all :)
From my understanding we should be more careful with changes and perform them for a good reason Hope that makes sense to you too ? Thanks |
@schaefi Please see #41 on disabling EDAC. Both our hardware vendors (both LI and VLI) dictate that EDAC must be disabled, they are both quite clear on this. I never got a complete straight answer when I posed that EDAC should be detection only, and shouldn't modify system behavior (other than reporting). When I asked, I was told that use of the EDAC module can cause timing problems. As for KVM and iTCO modules, I asked @RalfKlahr to comment on this. Clearly, the HANA systems should never use KVM, so disabling it shouldn't be harmful. @RalfKlahr Was your ask for disabling KVM and iTCO a preference thing? Or does SAP recommend this? Have we experienced problems without those modules being disabled? |
Ah, crud. @RalfKlahr is OOF until April 28th. We may not get a very prompt response on this ... Can we proceed with disabling the modules for now, at least for test purposes? Thanks ... By the way, can #135 be picked up for milestone Azure Testing Next, by chance, so we can have that in our next production image? I know that that won't be our way for a while, but it would be nice if it included all our requests to date, thanks. |
@jeffaco thanks for collecting the logs. Based on the "grep log" it is clear that kvm_intel is getting loaded as that module complains about missing symbols that the kvm module provides. I suspect that "kvm_intel" gets loaded because we are on Intel HW. Then it is detected that VTx extensions are disabled and the "kvm_intel" module gets dumped. However the dependent module "kvm" does not get unloaded. I'll see if I can confirm this theory. |
Thanks @jeffaco for the details on edac
your theory is for sure correct. Which also means we have to blacklist the cpu relevant kvm module.
From my perspective yes. Could you confirm the above blacklist setup would work flawlessly for you ? |
I'd like to get clarity for success on the prioritized topics of the current milestone first. This includes
We will create a new testing image as soon as this blacklisting topic has been resolved. I expect that new image to address the milestone issues and that makes it a production candidate. Once this is done we jump on the other open issues. The rushing game didn't work well in the past and we basically received concerns due to stupid mistakes that happened on our side. In the end it took longer than it should be. We will avoid that in the future which however puts a bit more strictness on the process. Please let us first nail down the issues currently worked on and then jump on the next ones. Thanks |
We should be able to get away without setting the modules to point to /bin/false. @jeffaco could you please also test with:
Sorry for the seemingly endless reboot testing, but the module loading is more art then science. |
With explicitly black listing kvm_intel this might work, however could still end with kvm being loaded... let's see |
Hopefully it is |
@schaefi I'm not proposing going back to "rushing", that was clearly ill-conceived all around. I totally appreciated your efforts to be responsive, but it obviously just wasn't working regardless of best intentions. My proposal here is to pick up #135 into Azure Testing Next, following standard procedures. This could mean that the production image might be delayed, but I do this in the (perhaps false) hope that this will be the last set of changes for a while, and then we can start focusing on VLI. I concede that this hope may be false, but I think all the eyes that needed to be on the image have been on the image, so unless something significant was missed, we should (hopefully) be "good" for a while at least. In retrospect, perhaps we should get used to regular updates to the image anyway, in which case it wouldn't matter. I guess I'm saying: It would be easier if #135 could be picked up but if, for whatever reason, that proves difficult, I can live with that. @rjschwei I've applied the set of changes to the |
Okay, it came back just in time (before I needed to leave). This appears to be good so far:
Let me know the next steps, thanks. Note that I never got a good test of #119 because of the network dependency problem, so I definitely need a new test image before moving forward. Thanks. |
ok, thanks for the feedback. I will adapt the images and upload a new testing image for you. This will be my last action before my vacation starts :) Stay tuned |
All images updated and building. Expect e-mail with sas url in the next two hours. |
SAP HANA doesn't need either KVM or iTCO modules.
Can these be blacklisted so they don't load at boot time?
Thanks.
The text was updated successfully, but these errors were encountered: