Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: feat(livenet): add memory size check depending on live image size #2344

Merged

Conversation

tblume
Copy link
Collaborator

@tblume tblume commented May 17, 2023

For writeable live images, the memory must hold the complete image. That means it must be at least the size of the image plus some RAM necessary to run processes.
With too small RAM, the OOM Killer steps in and makes the boot hang. This patch lets the system go into the emergency shell instead.

The minimum RAM of 512Mb is just arbitrarily choosen. We could set any other number that appears appropriate instead. Alternatively, we could add a new parameter rd.min.ram or such.

This pull request changes...

Changes

livenet module

Checklist

  • [X ] I have tested it locally
  • [X ] I have reviewed and updated any documentation if relevant
  • I am providing new code and test(s) for it

Fixes #
OOM in livenetboot with too low memory.

@github-actions github-actions bot added livenet Issues related to the livenet module modules Issue tracker for all modules labels May 17, 2023
@LaszloGombos
Copy link
Collaborator

@tblume Thank you for working on this.

Before considering this PR, I wanted to get your opinion on #1780 . #1780 is certainly different but between this PR and #1780 it is common that both trying to handle OOM and both using dmsquash-live dracut module.

Do you see a way to resolve both issues at once, or you think that issues are different and needs to be resolved separately with separate PRs ?

@LaszloGombos LaszloGombos added the enhancement Issue adding new functionality label May 17, 2023
@tblume
Copy link
Collaborator Author

tblume commented May 17, 2023

@tblume Thank you for working on this.

Before considering this PR, I wanted to get your opinion on #1780 . #1780 is certainly different but between this PR and #1780 it is common that both trying to handle OOM and both using dmsquash-live dracut module.

Do you see a way to resolve both issues at once, or you think that issues are different and needs to be resolved separately with separate PRs ?

Yes, my patch would cover this issue.
But we need to agree on a baseline for the lower memory limit that covers all use cases.
From my experience, I'd say that 512M free RAM left after subtracting the live image size, should be sufficient.
But considering the Fedora memory requirements, it is probably safer to have 1G left.
Still, that could exclude use cases on systems with very small memory (e.g. embedded).

Would it be acceptable to set 1G as default lower limit and add an boot parameter that could modify this?

@LaszloGombos
Copy link
Collaborator

LaszloGombos commented May 17, 2023

Yes, my patch would cover #1780.

I am not sure. #1780 is not using livenet, just local ISO.

CC @Conan-Kudo

@Conan-Kudo
Copy link
Member

Would it be acceptable to set 1G as default lower limit and add an boot parameter that could modify this?

Yes, I think that's reasonable. Distributions will want to set their own minimum limits, and giving a way to do that makes sense.

@tblume
Copy link
Collaborator Author

tblume commented May 19, 2023

Yes, my patch would cover #1780.

I am not sure. #1780 is not using livenet, just local ISO.

CC @Conan-Kudo

Indeed, I overlooked this, sorry.
Still, the calculation of subtracting the live image size from the main memory should work for the writable case.
So, there should probably be a similar code for the dmsquash-live module.
And with an additional parameter to specify the minimum leftover RAM, this can be adjusted to the particular needs.
I will work on that.

@tblume tblume force-pushed the check-memory-for-live-images branch 3 times, most recently from d8c954c to 7dc0638 Compare June 21, 2023 13:43
Copy link
Member

@aafeijoo-suse aafeijoo-suse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas! Some things I miss:

  • Could you add some doc about the new parameter rd.minmem.liverw in dracut.cmdline.7, close to rd.writable.fsimg?
  • All the calls to curl are encapsulated in functions within url-lib.sh, maybe the best approach is create a new function there with the code that retrieves the value of imgsize. What do you think?
  • Some kind of error handling for getting imgsize. What happens if the status code of the curl request is not 200? If the liveurl address cannot be reached, we get the error bash: / (1024 * 1024): syntax error: operand expected (error token is "/ (1024 * 1024)"). Another example that can be easily fixed, if it returns 302 (redirect), the headers does not contain Content-Length. E.g. with our OBS:
sh-5.2# curl -I "https://download.opensuse.org/repositories/systemsmanagement:/Agama:/Staging/images/iso/agama-live.aarch64-ALP.iso"
HTTP/2 302 
date: Wed, 21 Jun 2023 13:34:58 GMT
server: Apache
location: https://ftp.gwdg.de/pub/opensuse/repositories/systemsmanagement%3A/Agama%3A/Staging/images/iso/agama-live.aarch64-ALP.iso
vary: Accept,COUNTRY
content-type: application/x-iso9660-image

sh-5.2# curl -I "https://ftp.gwdg.de/pub/opensuse/repositories/systemsmanagement%3A/Agama%3A/Staging/images/iso/agama-live.aarch64-ALP.iso"
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 21 Jun 2023 13:36:04 GMT
Content-Type: application/octet-stream
Content-Length: 991166464
Last-Modified: Fri, 16 Jun 2023 14:09:36 GMT
Connection: keep-alive
ETag: "648c6d20-3b140000"
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Accept-Ranges: bytes

In this case, the -L option does the trick:

sh-5.2# curl -IL "https://download.opensuse.org/repositories/systemsmanagement:/Agama:/Staging/images/iso/agama-live.aarch64-ALP.iso"
HTTP/2 302 
server: Apache
date: Wed, 21 Jun 2023 13:46:55 GMT
vary: Accept,COUNTRY
location: https://ftp.gwdg.de/pub/opensuse/repositories/systemsmanagement%3A/Agama%3A/Staging/images/iso/agama-live.aarch64-ALP.iso
content-type: application/x-iso9660-image

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 21 Jun 2023 13:46:56 GMT
Content-Type: application/octet-stream
Content-Length: 991166464
Last-Modified: Fri, 16 Jun 2023 14:09:36 GMT
Connection: keep-alive
ETag: "648c6d20-3b140000"
Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Accept-Ranges: bytes

@LaszloGombos
Copy link
Collaborator

  • Could you add some doc about the new parameter rd.minmem.liverw in dracut.cmdline.7, close to rd.writable.fsimg?

I hope to discuss the name of the new parameter a bit. As discussed there are other scenarios where something similar like a minmem can be used (e.g. together with rd.live.ram=1).

To keep the possibility of reusing this argument, what do you think of simply using the name of rd.minmem (and clarify in the documentation that this is the memory needed on top of images loaded into ram).

  • All the calls to curl are encapsulated in functions within url-lib.sh, maybe the best approach is create a new function there with the code that retrieves the value of imgsize. What do you think?

I am somewhat on a different opinion on this. I think of a function is only used by one dracut module, then it make sense to only include that logic in that one module - regardless of the functionality.
If there is a need for at least two modules sharing a function, it make sense to find a way to share the code (even if it is small code).

  • Some kind of error handling for getting imgsize.

Yes, please !

@tblume tblume force-pushed the check-memory-for-live-images branch from 7dc0638 to 988fd0c Compare July 7, 2023 07:47
@github-actions github-actions bot added dmsquash-live Issues related to the dmsquash-live module base Issues related to the base module labels Jul 7, 2023
@tblume
Copy link
Collaborator Author

tblume commented Jul 7, 2023

Thanks for the feedback.
I've updated the PR according to your recommendations.

@tblume tblume force-pushed the check-memory-for-live-images branch 2 times, most recently from bbbedab to 3380ab8 Compare July 7, 2023 07:58
@@ -16,6 +16,13 @@ netroot="$2"
liveurl="${netroot#livenet:}"
info "fetching $liveurl"

if getargbool 0 'rd.writable.fsimg'; then

imgsize=$(($(curl -IL "$liveurl" | sed -n 's/Content-Length: *\([[:digit:]]*\).*/\1/p') / (1024 * 1024)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You accidentally deleted the -s from the curl call, otherwise it also prints the progress:

sh-5.2# curl -IL "$liveurl" | sed -n 's/Content-Length: *\([[:digit:]]*\).*/\1/p'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  885M    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
928514048
sh-5.2# curl -sIL "$liveurl" | sed -n 's/Content-Length: *\([[:digit:]]*\).*/\1/p'
928514048

Is that what you intended to do?

Other than that, the patch looks good, thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that was an mistake.
Thanks for spotting!
->fixed

Copy link
Collaborator

@LaszloGombos LaszloGombos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

For writeable live images, the memory must hold the complete image.
That means it must be at least the size of the image plus some RAM
necessary to run processes.
With too small RAM, the OOM Killer steps in and makes the boot hang.
This patch lets the system go into the emergency shell instead.

The default minimum RAM after subtracting the live image size is set to 1G.
The parameter rd.minmem is added to modify this.
@tblume tblume force-pushed the check-memory-for-live-images branch from 3380ab8 to 1faf3a9 Compare July 7, 2023 11:21
@aafeijoo-suse aafeijoo-suse merged commit 52351cf into dracutdevs:master Jul 7, 2023
70 of 78 checks passed
@tblume tblume deleted the check-memory-for-live-images branch October 16, 2023 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
base Issues related to the base module dmsquash-live Issues related to the dmsquash-live module enhancement Issue adding new functionality livenet Issues related to the livenet module modules Issue tracker for all modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants