Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion on design of the DbUser model: no UUID and email is unique but mutable #6174

Open
sphuber opened this issue Nov 10, 2023 · 4 comments

Comments

@sphuber
Copy link
Contributor

sphuber commented Nov 10, 2023

The DbUser does not define a UUID column, unlike (almost?) all other database models. Rather, it more or less uses the email column for this purpose by making it unique. The email is used as a proxy of a real identifier for instances of DbUser (and User in the front-end ORM). For example, the default user of a profile is indicated by its email in the configuration file. Currently if the email of a user is changed (which is perfectly possible, even through the User ORM class), the profile no longer has a default user, if that user was the default one. Since AiiDA requires a default user to be set when creating any database entities, the code will start to except.

There are also some problems to anticipate with archive imports I think. Currently the code explicitly checks for users that exist both in the profile and the archive, and only users with emails that don't yet exist in the profile are added. However, there are some weird edge cases. Consider the following.

In an existing profile:

  • Create user a@email
  • Create a node for that user
  • Export the node (archive will now contain a@email user)
  • Change the email of the user to b@email
  • Now import the created archive

The import code will notice that the node already exists and skips it. However, the code thinks that the user a@email doesn't exist yet (because it checks based on email) and it adds it. We know have b@email in the database which has a single node, and a@email which has no nodes. Not sure yet whether this is a real problem.

This could be fixed by adding UUIDs to users and simply use that, however, that would have downsides as well:

  • What do we do when a user (based on UUID) to be imported already exists and the other properties differ between archive and profile. Do we ignore those, override, or provide some complicated logic to update (like is done for extras for example)
  • Currently if a user creates different profiles and uses the same email in all of them, when moving data between profiles, the nodes are automatically added to the same User instance. From a user perspective, this is actually probably what is expected and desired. If we were to change this based on UUID, what is really the same user is now different in different profiles just because the UUID is different.

Especially the last point is I think a strong argument to not add a UUID. The current behavior is actually what we want I think. We should maybe just consider whether to make the email immutable (at least through the front-end ORM). Not sure if there are good use-cases for it, and it could make behavior more complex with potential bugs.

One advantage that I could see now for keeping the email mutable is that we could even make the --email option in the profile setup completely optional by providing some default. This will make setting up a profile easier and if the user is actually interested in the generated data later and wants to export it with a "real" email, they can update the email.

@sphuber
Copy link
Contributor Author

sphuber commented Nov 10, 2023

tagging @mbercx

@mbercx
Copy link
Member

mbercx commented Nov 10, 2023

Currently if a user creates different profiles and uses the same email in all of them, when moving data between profiles, the nodes are automatically added to the same User instance. From a user perspective, this is actually probably what is expected and desired. If we were to change this based on UUID, what is really the same user is now different in different profiles just because the UUID is different.

Yeah, also just thought about this one and wanted to make the same comment.

One advantage that I could see now for keeping the email mutable is that we could even make the --email option in the profile setup completely optional by providing some default. This will make setting up a profile easier and if the user is actually interested in the generated data later and wants to export it with a "real" email, they can update the email.

Indeed, as you know I'm in favor of this. Another use case is that users do change email from time to time, e.g. when they move from EPFL to PSI. ^^

Currently if the email of a user is changed (which is perfectly possible, even through the User ORM class), the profile no longer has a default user, if that user was the default one. Since AiiDA requires a default user to be set when creating any database entities, the code will start to except.

Can we not have the email property setter check if this is the default user of the profile and update the configuration if this is the case?

The import code will notice that the node already exists and skips it. However, the code thinks that the user a@email doesn't exist yet (because it checks based on email) and it adds it. We know have b@email in the database which has a single node, and a@email which has no nodes. Not sure yet whether this is a real problem.

This doesn't seem like a big problem to me. We definitely don't want to duplicate the nodes, and after all these operations the nodes are "owned" by the latest update of the user, so that seems fine?

So, in my view:

  • UUID is probably a bad idea.
  • Changing email should be possible, but only if we can update the default user in case its email is changed.

@sphuber
Copy link
Contributor Author

sphuber commented Nov 12, 2023

Can we not have the email property setter check if this is the default user of the profile and update the configuration if this is the case?

We could, but this is a bad idea, because then you are hard-coupling the ORM to the configuration code. For other use cases that are similar, we have the Manager as the central place that coordinates these kinds of actions that affect both the storage as well as the config.

Maybe we should prevent changing the user email directly through a User instance and instead provide a utility function change_user_email that does everything correctly. This can then also easily be exposed in verdi.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 21, 2024

Had another quick thought about this and found another disadvantage. The main idea for even having a User assigned to all database entities is that when they are exported, the nodes are always labeled with the user that created them. If we make this mutable, however, it essentially foregoes the purpose because the nodes of a particular user can always be easily changed to someone else just by changing the email address of the associated User node. Of course, even if the API doesn't provide this option, a user could always directly update the database, so ultimately there is nothing stopping anyone from doing this. So maybe this is not strong enough a reason to not support changing a User's email, since there are cases where it is useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants