-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network mode #589
base: master
Are you sure you want to change the base?
network mode #589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not finished full review. I will need to study the features and security aspects more. Could need to discuss about the feature design.
I'm not sure if I'm sold on the MAGPIE_NETWORK_NAME_PREFIX
approach to generate new users/groups. This can cause weird situations, such as permissions being set on two entities that should represent the same user. It would be better to have a distinct table, such as in the case of pending users and external identities, and make a link to users to "combine" their various network references.
It seems like the custom external login code is essentially doing what authomatic
would perform, but using non-standard OAuth methodology and endpoints.
op.add_column("users", sa.Column("network_node_id", sa.Integer, | ||
sa.ForeignKey("network_nodes.id", onupdate="CASCADE", ondelete="CASCADE"), | ||
nullable=True)) | ||
op.drop_constraint("uq_users_user_name", "users") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot remove this constraint without a lot of validation that all the code still works.
User names are used for login and for querying the API. They must be unique to find the right ones directly. Nothing was designed to handle potentially returning multiple matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I tried to search through to catch all the cases for the current searches and enforce that the associated network node be null. Another pair of eyes would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, whenever <user>.user_name
is accessed in the code, that <user>
reference must be resolved beforehand based on what was passed as input from the API, UI, CLI or configuration startup.
This is an important one:
Line 193 in 1d602ee
def get_user(request, user_name_or_token=None, user_status=None): |
There are also some considerations to be taken for
Line 210 in 1d602ee
class UserPending(Base): |
and
Line 921 in 1d602ee
class TemporaryToken(BaseModel, Base): |
which generates a somewhat related definition to a future user, but that is only really "linked" via user_name
until the relevant registration/approval process is completed, since the actual user_id
is not defined yet.
This code never checks for potential multi-results returned from user_name
matches because it always assumed the unique constraint.
Similarly, all webhooks events for interactions with Cowbird work using user_name
, since this is what the API paths use.
magpie/api/schemas.py
Outdated
if get_constant("MAGPIE_NETWORK_MODE", settings_name="magpie.network_mode"): | ||
TAG_DESCRIPTIONS[NetworkNodeTag] = "Management of references to other Magpie instances in the network." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can insert it to the dict even if the mode is not activated.
Either way, until the application has parsed its INI file, this could cause inconsistent results (eg: while importing) because the env-var could exist, but not (yet) the settings equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I don't understand the order here. When does the INI file get parsed? At run time, after the python files are parsed but before the server starts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until the WSGI app is returned from
Line 98 in 1d602ee
def main(global_config=None, **settings): # noqa: F811 |
and actually started by Gunicorn, any intermediate
import ...
operation causes the imported module to be executed immediately before the "current app" that get_constant
can detect is defined.
Therefore, any call to get_constant
at module level, such as in this case, is highly dependent on the order that modules are imported, which could be before or after the app is defined. Something that is working at one point could suddenly start breaking simply by changing imports, and this is hard to track. However, if you use get_constant
within a function, the "current app" should exist because the function execution is performed during runtime, contrary to its definition.
magpie/api/schemas.py
Outdated
colander.Integer(), | ||
description="Token Expiry (in seconds)", | ||
example=2000, | ||
default=get_constant("MAGPIE_DEFAULT_TOKEN_EXPIRY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to other comment.
Default should be evaluated in the code where it is used instead of here, otherwise the actual default could be wrong if defined in the INI instead of env-var.
magpie/api/schemas.py
Outdated
path="/token/validate", | ||
name="TokenValidate") | ||
NetworkNodeAPI = Service( | ||
path="/network-nodes/{node}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace by node_id
or node_name
accordingly to what is expected, so that the API doc displays it right away.
magpie/constants.py
Outdated
@@ -105,6 +105,9 @@ def _get_default_log_level(): | |||
MAGPIE_CRON_LOG = os.getenv("MAGPIE_CRON_LOG", "~/magpie-cron.log") | |||
MAGPIE_DB_MIGRATION = asbool(os.getenv("MAGPIE_DB_MIGRATION", True)) # run db migration on startup | |||
MAGPIE_DB_MIGRATION_ATTEMPTS = int(os.getenv("MAGPIE_DB_MIGRATION_ATTEMPTS", 5)) | |||
MAGPIE_NETWORK_MODE = os.getenv("MAGPIE_NETWORK_MODE", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing asbool
group.description = "Group for users who have accounts on a different Magpie instance on this network.".format(name) | ||
group.discoverable = False | ||
|
||
request.tm.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless really necessary to resolve db query timing issues, transaction commit should be left to the request handler that should do it automatically once the request closes its connection. Over-using this can cause issues if the db connection gets reused somewhere else.
node_name = request.POST.get("name") | ||
node_url = request.POST.get("url") | ||
ax.verify_param(all([node_name, node_url]), is_true=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the checking utilities to validate each parameter against some regex. This only ensures there is a non-empty string, but it could be any value that could cause issues if they are malformed.
magpie/api/login/login.py
Outdated
try: | ||
authenticated_user = user_from_token(request, token) | ||
except jwt.exceptions.InvalidIssuerError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perform error handling closer to the part where it actually happens. Return None
and handle accordingly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user_from_token
function is also used in validate_token_view
. These functions handle the exceptions differently depending on which jwt.exceptions
class is raised. If I handle it in the user_from_token
view and then return None
then the calling functions are not able to handle different errors differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
The "how" to return the error needs to be given more though than just None
.
However, this is why I would prefer the try/except closer to the call that can raise it (I assume from jwt.decode
).
If user_from_token
got refactored to handle exceptions differently, we might not notice that it break how it is used elsewhere. At least, I didn't notice until you pointed it out.
new_name = request.POST.get("name") | ||
new_url = request.POST.get("url") | ||
ax.verify_param(any([new_name, new_url]), is_true=True, | ||
http_error=HTTPBadRequest, msg_on_fail=s.BadRequestResponseSchema.description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same checks.
ax.evaluate_call(lambda: create_network_node(request, node_name, node_url), | ||
http_error=HTTPConflict, | ||
msg_on_fail=s.NetworkNodes_POST_ConflictResponseSchema.description) | ||
return ax.valid_http(http_success=HTTPCreated, detail=s.NetworkNode_GET_OkResponseSchema.description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need more than this.
For now, any admin could register a node, which could be updated/deleted/recreated at some other point by any other admin, without any handshake validation of the node and associated tokens generated for it. If one node in the network gets hacked, that could allow a malicious user to generate tokens over the entire network.
Similar to when registering an external auth provider, there should be some kind of client ID/secret pair that is generated, and which the user must register back on their side to sign the tokens for that new node. This way, when a token is generated for a given node and sent to it, that node would have some ID to validate the token against and ensure it comes from a trusted location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok a few questions about this...
For now, any admin could register a node, which could be updated/deleted/recreated at some other point by any other admin
Do you mean an admin for the same node or an admin for a different node? If you're talking about an admin for node A changing stuff of node B, I don't know how they would do that unless they had an account on both?
If one node in the network gets hacked, that could allow a malicious user to generate tokens over the entire network.
Yes, I guess that if a node got hacked they could validate a token and return any user name they wanted. Not that that isn't bad and requires some thought, but I think that even with Oauth2, if the provider is compromised they can send any response they want anyway. Is this scenario different from that?
This way, when a token is generated for a given node and sent to it, that node would have some ID to validate the token against and ensure it comes from a trusted location.
Right now, the JWT is validated to ensure that the token was issued by the right node. We could add another check to ensure that it was sent by one of the expected location(s). I think that's a great idea. Would that be something that using Oauth and authomatic would automatically get us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean an admin for the same node or an admin for a different node?
An admin of the node where the POST/PUT/DELETE requests are done. This is because of the default security check of Magpie that blocks requests from non-admin users.
Yes, I guess that if a node got hacked they could validate a token and return any user name they wanted. Not that that isn't bad and requires some thought, but I think that even with Oauth2, if the provider is compromised they can send any response they want anyway. Is this scenario different from that?
It wouldn't stop someone from sending requests from within the hacked node, but if an admin-user is compromised, it makes it slightly more complicated for someone to try to impersonate a fake Magpie node, because the other nodes that they could try to send token validations to would not have registered/handshake that fake node. It would also be easier to respond to some breach by revoking the hacked node client ID/secret.
Right now, the JWT is validated to ensure that the token was issued by the right node. We could add another check to ensure that it was sent by one of the expected location(s). I think that's a great idea. Would that be something that using Oauth and authomatic would automatically get us?
Yes. This is something like what I had in mind. However, the "location" doesn't need to be a URL, because that can be spoofed as well with requests. Doesn't hurt to have more validation layers though.
Yes, most of the features should be fairly well supported. It is used here for example https://github.com/Ouranosinc/Magpie/blob/master/magpie/security.py#L171-L190
I've been thinking about this:
I think that thinking about this from the perspective of OAuth is actually the wrong approach. Implementing a full OAuth provider is actually overkill for our purposes, but I think that we can adopt some of the OAuth ideas to improve the functionality and security of this new feature. The goal of this feature is simply to provide the user with an API token. This API token is analogous to OAuth's "access token" but:
We can still make the token exchange more secure by passing a client secret when one node passes the token to another. Please let me know if you have any ideas for a different approach that I'm missing here |
I'm not sure if I understand this part. When a UserNodeB wants to access a resource from NodeA using a token generated for (NodeB, UserNodeB) by NodeA, that token would need to be generated on behalf of that user after NodeA-NodeB handshake was accomplished. Therefore, UserNodeB would need to be logged in NodeB in order to perform a "Request NodeA Token". That token request could be the "external provider login" procedure, which Did you have another procedure in mind? I don't see how else the nodes can validate that the user requesting an access token is legitimate if NodeA cannot re-validate UserNodeB on NodeB with an appropriate login.
Technically speaking, once Magpie can authenticate the user, it gives the same access via Twitcher. There are no distinction between these two. When logged in an "admin", you obtain access to all Twitcher services and Magpie UI pages. When non-admin, you get restricted Twitcher access based on Magpie permissions and limited access to Magpie UI to |
Yeah...
Just to clarify a few things:
The token is generated by NodeB for UserNodeB, not by NodeA
The token is not for NodeA, the token is generated and validated exclusively by NodeB. The same token can be included in a request to any node (NodeA, NodeB, NodeC, NodeD) and it will still work. |
This is the problem. You are missing a key step here:
Seems to contradict the first point: "UserNodeB would like to access a resource on NodeA". No matter how the token/cookie is generated, when Twitcher validates with Magpie for AuthN/AuthZ of said token, it will be for accessing resources on its own instance. Twitcher/Magpie of NodeA do not care about how NodeB resources are accessed and by whom. It is the problem of this instance's Twitcher/Magpie.
That is just not possible, by concept, for security. Each node should have respective and distinct secret signing of their own tokens. |
Ok, just to be clear, I understand what you're saying. You're describing a solution using Oauth, I'm describing one that doesn't use Oauth but relies on limited resource access and short token lifespans to mitigate unauthorized access. The main advantage of my solution is that it requires fewer steps for users and node administrators to set up a system that allows them to access resources on multiple nodes. BUT ... after reviewing your proposed solution again, I'm happy to go with your proposal and we can provide the user with client-side tools to simplify their workflow instead. I don't know what that will look like yet but I'm sure we can figure something out. I think the most important thing is to get this working in the short time that CRIM is still available for the project. |
So it does for someone trying to bypass security! Hence why I see it on the flip side as a disadvantage. The main issue I can see with the proposed solution was that it uses a less secure approach by design. If it was a security method on its own, it wouldn't be as problematic, but since it inevitably ties back into the Twitcher authorization at some point, that new method would create the weakest link security-wise affecting the overall solution. I'm not too strict about using Oauth or an alternative method, as long as the core principles around it align.
I believe this could be made relatively transparent for the user. It is rather the Magpie nodes themselves that would need to perform a few more request validation steps to sync tokens between each other behind the scene. Once Cookie or Authorization headers are set, the user shouldn't see the difference where the resources are being accessed from. |
I have two new proposals for the access flow. Both options are similar to OAuth's client credentials flow with the added protection that both Nodes validate the requests/responses of each other using asymmetric public/private keys (can be implemented by passing RSA256 signed JWTs) Option 1: UserA acquires a token to access a specific resource on a remote NodeB For both proposals, we assume the following has already been set up:
Access flow for Option 1:
Access flow for Option 2:
Let me know if you have any questions or an opinion about which method you'd prefer. Some security implications to consider:
|
Option 1:
How does NoteB perfom that check? Option 2:
That could work, but we need to figure out how we will define the user to avoid potential name conflicts. One thinng to consider is that, if UserA is not admin on NodeB (shouldn't be, otherwise they wouldn't need the token), they cannot list users on NodeB. Therefore, they have limited capabilities to "associate" their UserA to some "NodeBUserA". Trusted NodeB by NodeA could do this association on behalf of UserA, but again, we must consider how to handle name conflicts. In both options, the rest of the procedure seem adequate. I tend to lean toward Option 2 more, because that involves less (no) modifications in the "effective permission" logic. There's also less API/UI to develop to support "admin revoking a resource token", since it can use the current serivce/permission interface. After resolving the JWT token into some auth/cookie, Magpie/Twitcher would see the resolved "NodeBUserA" as any other local NodeB user (and applicable groups if need be).
This is also a compelling argument for that option. It allows the user to directly interact with all its resources without having to manage individual tokens per endpoint, which would be very complicated if they try to do so themselves (eg: when accessing data in a notebook). Extra ConsiderationsKeys of remote nodes would have to be stored in DB using an extra hashing pass using the secret of the local instance. |
In the way that we discussed above and is currently implemented in this PR:
NOTE: If we're going with Option 2 anyway (see below) this doesn't matter anymore
As discussed here (#589 (comment)): remote users will be stored in a separate table and have a foreign key reference to the
Only an admin on NodeB should be creating those associations at the moment. It could be something that an administrator would have to set up when creating a new account. We could allow users to create their own associations between nodes using a version of Oauth's implicit flow:
That might be a different PR though.
Ok that's a good enough reason to choose this option
Great idea lets do that |
Ok. Sounds good for admin creating the associations for now, and having a separate user table. Now, considering that separate user table, how would the admin make that association? |
Yes for sure. The plan is to start with the API endpoints and then add the UI additions as a second step. The plan for the UI is to add a new field to the form on Then we can add a new page to list all users with associated remote user names (admin view only). We could also display this information on |
…ismatch between node and anonymous user for another node
Introduce "Network Mode" which allows other Magpie instances to act as external authentication providers using access tokens.
This allows users registered across multiple Magpie instances in a network to more easily gain access to the resources within the network, without requiring the duplication of user credentials across the network.
In response to the discussion here: DACCS-Climate/DACCS-executive-committee#8.
This change is fully backwards compatible and is only enabled if
MAGPIE_NETWORK_MODE
is set (not set by default). Otherwise, magpie admins and regular users should see no changes at all.TODO: