-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: properly assign default server option values #649
fix: properly assign default server option values #649
Conversation
You can access the deployment of this PR at https://renku-ci-nb-649.dev.renku.ch |
Thanks for the immediate action on this! I have 4 questions/remarks:
Why not use the same logic for all parameters, ie allow all parameters to be missing and simply use the defaults from values.yaml for all missing params? I think that would be the simpler logic and would allow admins to for example disable showing the memory selection in the UI through the values file if they for some reason wanted to do this.
I would prefer to have these defaults (also) set in the values.yaml that we ship with the renku-notebook chart. The helm chart is currently the only way people deploy this application and the values.yaml is where people go first to look for configuration. If defaults are also set in the code, preferably only in the app config.
I guess that is the case already now for pods that are evicted because of insufficient disk space on a node. So in that sense it's already an improvement over the status quo since this PR makes sure that the eviction hits a session that indeed consumes a lot of disk space. One more thing to figure out for sure: do you know if the notebook containers usage of the empyDir volume is also counted against the containers ephemeral-storage quota? I noticed that yesterday a pod without any limit on the empty dir got evicted when I added more than ~22GB of data to the emptyDir volume from within the session. This session had the default ephemeral-storage limit of 20GB. I'd find this a rather unexpected behaviour. But if what I describe above holds true, we might want to set the ephemeral storage limit on the notebooks container to 20GB + emptyDir limit to get a more expected behaviour. |
You are correct @ableuler, the emptydir usage counts against the ephemeral storage limit. I started a session where the ephemeral storage limit was set to 1GB but the size limit on empty dir was set to 10GB. After adding 2GB worth of data in the user session the pod was evicted with the message:
|
… possible in ui server options
@ableuler I addressed all your comments from above. In addition to this I already implemented the changes required for #651 because it was easier to just include this rather than not and then add logic to parse and extract data from the current way we provide the server options. Now the values.yaml files have the following sections: serverOptionsUI:
defaultUrl:
order: 1
displayName: Default Environment
type: enum
default: /lab
options: [/lab]
cpu_request:
order: 2
displayName: Number of CPUs
type: enum
default: 0.5
options: [0.5, 1.0]
# ....
serverOptionsDefaults:
defaultUrl: /lab
cpu_request: 0.5
mem_request: 1G
disk_request: 1G
gpu_request: 0
lfs_auto_fetch: false The Other than that all the other points we discussed are implemented:
I think this covers all use cases we may come across now and in the future. |
After this PR is merged then this PR should also be merged: SwissDataScienceCenter/renku#2121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the updates! 👍
@olevski any reason not to merge this? |
No, no reason at all. I should have merged it earlier. |
This properly assigns the default values to server options in all cases.
The logic is as follows:
As for limiting the size when using emptyDir, I tested and k8s will evict the pods if they even go over the limit for their emptyDir by a few megabytes. When the limit is not respected the eviction happens in a matter of minutes if not seconds. However there are a few issues:
evicted
but the UI and notebooks do not really use or properly communicate this status. So a session is essentially stuck in limbo because it does not go away but its status is set toevicted
but from the perspective of the notebooks it considers anything that is not with statusrunning
to beloading
. The UI does not allow the user to shut down a session while it isloading
so theevicted
session can only be cleaned up by usingkubectl
or calling the API directly.I have created a follow up issue to properly handle the size limit for emptyDir and cleanup of evicted sessions here #648.
One last question I have is what we want to do with the gpu and disk size portions for server options in values.yaml. As the code is written right now the cpu, memory, default url and LFS auto-fetch are mandatory in the values.yaml. Gpus and disk size are not mandatory and defaults for them are hardcoded in renku-notebooks code.
/deploy