-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limit Magpie usernames #396
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but consider my suggestion to group all configs together, if possible.
services: | ||
magpie: | ||
environment: | ||
MAGPIE_USER_NAME_EXTRA_REGEX: "^[a-z0-9]+$$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, instead of this env var could we have done this in the magpie.ini
file? I have a slight preference for all configs to be grouped together in one file for ease of tracing them.
Is the magpie.ini
route not possible or there are advantages to use env var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that I put this here is that we only need to enforce this limitation if both the magpie and jupyterhub components are enabled.
Yes, this will be 99% of the use-cases but it's important to keep the dependencies clear in the way we structure things in this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to think about this: This follows the "pluggable component architecture" that we've been trying to create since #214
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see ! Never thought about that one, having Magpie without JupyterHub do not need this config. You are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I just think of a counter argument in favor of magpie.ini
since we are discussing about control users already violating the extra regex.
Even if Magpie can initially be activated without Jupyterhub, we better ensure all the users created are "JupyterHub forward compatible". Otherwise the poor node admin will be in for a surprise the day they activate JupyterHub.
That is to say, it's more future proof to have a stricter regex, then relax it if needed, than the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mishaschwartz Not sure if you've seen my previous comment above but I think it's better to have the new extra regex in magpie.ini
so all users created are forward-compatible if JupyterHub is enabled later.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2181/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-20.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1383/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to bring this up again, but just curious to see if all these Magpie changes were actually necessary. Maybe the solution was taken too hastily?
Instead of forcing a subset of escaped/lowercase characters from JupyterHub to Mapgie, it seems we could simply update the references to use {raw_username}
instead of {username}
and keep the large character set of Magpie usernames. Magpie names are restricted to ^[A-Za-z0-9]+(?:[\s_\-\.][A-Za-z0-9]+)*$
, which should all be valid characters for the file system.
And if Or even override
|
As @tlvu mentions here (#393 (comment)) having a different username in Magpie as in Jupyterhub adds an additional burden on admins who are trying to manage the system. Also, having mismatched names may cause confusion for users who log in with one username and see a different username displayed in jupyterlab. Finally, any strategy that requires us to escape usernames means we have to be very very careful about conflicts. For example, in Magpie, we can theoretically have two users: I'm not saying we can't construct an escape function that will be foolproof, I just think that this adds a ton of extra complexity that we have to worry about. In my opinion it is simpler for both admins, developers, and users to simply restrict the usernames in Magpie |
@mishaschwartz |
You're right, my comments were only responding to your suggestion to override escape_name. However:
I really think that the best and simplest option is to just restrict user names |
@mishaschwartz class CustomDockerSpawner(DockerSpawner):
@property
def escaped_name(self):
return self.user.name Since the authenticator is also a custom implementation, it is a very quick to override
Good point. |
Just to recap, we are still discussing about this PR simply to have I don't think it is worth it. No users have ever complained to us in production that they did not like the username we gave them and need to have I would rather have the regular |
We do have some users that have Recaping everything, I would suggest that |
@fmigneault This would have been useful to know earlier in the discussion so that I could have understood why you were objecting so strongly to this change. With regards to the admin-catalog user: this user is only added if the (deprecrated) catalog component is enabled and it is added to jupyterhub's blacklist anyway. I'm not inclined to limit this change in order to support a deprecated component. Are there other control users that you are concerned about that are not deprecated?
This is giving me pause too @tlvu . I don't want to override these functions and then create other conflicts later on. However, the fact that jupyter provides public configuration options to override these settings makes me feel a little bit better about it. I would be willing to consider overriding these functions after thorough testing. In summary:
|
The This is just to say, unless the effort to test the override functions is simple and quick, then yes go for it since it's nice to have Note that we can always defer overriding the functions later since it's easy to relax existing restriction. It's much harder to tighten restrictions since it implies manually fixing all the relaxed usernames that become illegal with harder restriction. |
I can try and test this a little later. I'm just working on other stuff right now so it might be a few days until I can get to this. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2217/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-133.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1400/NOTEBOOK TEST RESULTS |
Testing the user nameSet up
Test 1: create a user with a space in their name
Test 2: create a user with capital letter in their name
Test 3: create a user with capital letter in their name, sign in as lowercase username equivalent
Test 4: create a user with capital letter in their name, and another with the lowercase username equivalent, sign in as capitalized user
Test 5: create a user with capital letter in their name, and another with the lowercase username equivalent, sign in as lowercase user
Test 6: create a user with a special character in their name
Summary:I don't see any issues with jupyterhub if all we do is keep jupyter from converting the username and ensuring that magpie does not allow whitespace in usernames BUT... there is an issue in Magpie where username capitalization in ignored when logging in. We need to make sure that Magpie doesn't confuse users whose usernames only differ in terms of capitalization. Probably what is happening is that the username look up is not case sensitive. TODO:
|
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2218/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1402/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Will have to see how that works with Ouranosinc/Magpie#596.
I propose we update this PR with the Magpie version that has the fix at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new regex surprised me.
CHANGES.md
Outdated
cannot be mounter to the jupyterlab container. | ||
|
||
This fixes the issue by limiting Magpie usernames to contain only lowercase ascii characters and digits. This ensures | ||
that no characters will be escaped in Jupyterhub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention Magpie is updated to a new version.
When Magpie changes, there is always a risk of failed DB migration on existing deployments in production so all node admin should always know if there is a Magpie update in a change.
@@ -1,5 +1,7 @@ | |||
version: "3.4" | |||
services: | |||
magpie: | |||
environment: | |||
MAGPIE_USER_NAME_EXTRA_REGEX: ^\S+$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am completely confused here. So \S
will "matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v]." (definition from https://docs.python.org/3/howto/regex.html#simple-patterns).
So we want the usernames to only contain non-writespace char ???
Other than non-whitespace, we probably would want to avoid other kind of illegal char like /
, *
, #
... Unless this is already handled by Magpie built-in username validation?
Or I might have completely mis-understood how the new MAGPIE_USER_NAME_EXTRA_REGEX
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opps sorry, my brain is dead past midnight, yes we want non-whitespace char.
The regex is right but I'd like something more conservative, like an explicit allow list: "^[a-zA-Z0-9_\-]+$$"
. This explicit list will protect us from any other illegal char that we ever forgot to think about. I'd rather be safe than sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex is used in combination with the one already enforced by Magpie by default ^[A-Za-z0-9]+(?:[\s_\-\.][A-Za-z0-9]+)*$
(see #396 (review)).
The username must match both that default regex as well as ^\S+$
. The default regex protects us from unexpected characters, the second regex protects us from whitespace.
I'd rather not duplicate most of the default regex here since the more complex we make it, the more prone to errors we are if we change it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to override the regex either way, I agree with @tlvu that it is better to have it explicitly, even if they overlap partially. It is easier to analyze one location than having to go back into Magpie code to validate the full set. Even if we do make a mistake, the regex on Magpie side will still double-check it.
host machine are discovered properly since we expect the username to match the username | ||
set by Magpie. | ||
""" | ||
return self.user.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I am not sure I want to override escape_name()
. This will make {username}
for old existing username that potentially can have invalid char actually use the raw username with invalid char.
MAGPIE_USER_NAME_EXTRA_REGEX
take effect on new username only. Existing ones, if they are invalid, are still invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any user who would have that issue will not be able to spawn a jupyterlab server right now and you would know about it already.
If we don't override escape_name
then we have to limit the usernames to lowercase ascii and digits. I believe that @fmigneault is still not in favour of that strategy though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Jupyter usernames were always bounded by Magpie usernames at least. The only mismatch this escape caused was about lower/upper case (which won't even be a thing anymore with an upcoming Magpie PR), and .-_
characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any user who would have that issue will not be able to spawn a jupyterlab server right now and you would know about it already.
Agreed. For existing user that never log into Jupyter before, if they generate an error, hopefully they will contact us and we'll create them a new username that is "valid".
Now that I think about it though, isn't overriding escape_name()
to return self.user.name
the same as simply using {raw_username}
? Why overriding when we can simply use {raw_username}
?
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2222/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-46.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1403/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2235/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1412/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2236/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-133.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1413/NOTEBOOK TEST RESULTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the fix of lower/upper-case user name handling is done in Magpie, this looks fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add back the extra regex in magpie.ini
in explicit form.
For {raw_username}
vs overriding escape_name()
this is just a my reflection, don't have to change it if no new advantages.
Otherwise, the rest looks good, thanks.
@@ -1,7 +1,5 @@ | |||
version: "3.4" | |||
services: | |||
magpie: | |||
environment: | |||
MAGPIE_USER_NAME_EXTRA_REGEX: ^\S+$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you migrate this change to magpie.ini
to ensure forward compat if later JupyterHub is enabled? Missing corresponding change in magpie.ini
. Please use an explicit regex "^[a-zA-Z0-9_\-\.]+$$"
which would allow upper case, _-.
and nothing else.
This is to ensure even if Magpie validation regex changes later, we are independent and as @fmigneault said, this is to simplify the life of other node admin, avoiding them to lookup the validation regex on Magpie side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding back the extra regex, please add a comment that Magpie already has a built-in regex too, just to raise awareness for those not in the know.
I guess now that Magpie do not allow uppercase (Ouranosinc/Magpie#597), the extra regex becomes "[a-z0-9_\-\.]+$$"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new changes, we do not need to use the MAGPIE_USER_NAME_EXTRA_REGEX
feature at all. The current default checks that Magpie does are sufficient (without the need to do an extra check), see Ouranosinc/Magpie#597 for details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. In this case, just to help other future node admin not in the know, can you document in magpie.ini
the exact permanent link to the magie regex, this one currently https://github.com/Ouranosinc/Magpie/blob/595602d2cebae94223b952f0cf04a1caa64f6546/magpie/api/management/user/user_utils.py#L66.
So this is the regex we expect at this moment. If in the future this regex changes on Magpie side, and somehow break us, we will know what was the original expected working regex.
host machine are discovered properly since we expect the username to match the username | ||
set by Magpie. | ||
""" | ||
return self.user.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any user who would have that issue will not be able to spawn a jupyterlab server right now and you would know about it already.
Agreed. For existing user that never log into Jupyter before, if they generate an error, hopefully they will contact us and we'll create them a new username that is "valid".
Now that I think about it though, isn't overriding escape_name()
to return self.user.name
the same as simply using {raw_username}
? Why overriding when we can simply use {raw_username}
?
This only works in the places where jupyterhub allows you to specify {raw_username} in a template. We want to ensure that this applied in all cases (not just when mounting user workspaces to the container) |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2308/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1438/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2311/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1440/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2314/Result : failure BIRDHOUSE_DEPLOY_BRANCH : limit-magpie-usernames DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-133.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1443/NOTEBOOK TEST RESULTS |
Overview
When Jupyterhub spawns a new jupyterlab container, it escapes any non-ascii, non-digit character in the username. This results in a username that may not match the expected username (as defined by Magpie). This mismatch results in the container failing to spawn since expected volumes cannot be mounter to the jupyterlab container.
This fixes the issue by ensuring that juptyerhub does not convert the username that is receives from Magpie.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false