New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collaborators: dataset-level permissions #5346
Conversation
It would be nice to integrate the package members into the existing group-based member table but I can't think of a clean way of doing it so for now I'm going with a separate table. I have not added the stateful mixin to it and tried to keep it as simple as possible.
These are similar to the existing member_* actions, but simplified and with a more restricted interface. package_member_create also updates, delete actually deletes the DB record. In addition, package_member_list_for_user returns all memberships for a user. In terms of auth, this initial implementation restricts permissions to admins from the organization a dataset belongs to (ie there are no admin collaborators than can add other collaborators. I believe this will follow more closely the current org-based perms, in that publishers will still have control on who can edit their org datasets. For package_member_list_for_user I restricted this to the own user (and sysadmins of course)
Org-level auth always takes precedence for approval, ie if a user belongs to an org as an editor, it can edit the the dataset regardless of if it is a collaborator or not. But if org-level authorization fails, then we check if the user is a collaborator with editor capacity (no admins yet). If it is, it passes auth.
All datasets get a generic label for all potential collaborators (collaborator-{dataset id}). On the user labels, they get a collaborator-{dataset id} label for each dataset they are a member of.
These may be a bit overzealous but better to be extra cautious. There's also a bit of repetition that could probably be addressed with parametrization
A new tab on the dataset Manage interface, with the same UI as the organization member manangement ones.
Add the ability to turn on and off the dataset collaborators via the `ckan.auth.allow_dataset_collaborators` config option. As this would affect existing authorization policies in place for existing sites, it is turned off by default.
Admin collaborators can manage dataset collaborators in addition to edit it. This allows users from outside the organization the dataset belongs to give permissions on the dataset to other external users, so this setting is turned off by default.
@amercader one item with editor collaborator is the potential to alter dataset ownership and/or visibility. Blocking updates that modify those parameters by non-owners would probably be sufficient. On a similar note, requiring an owner org admin to add collaborators seems a little restrictive. It is probably a good default, but enabling editors to modify collaborators would be handy. |
This is finally ready for review! 🐢 |
@pdekraker-epa I don't think the visibility change protection should be part of core. Preventing the change of owner organization is a sane default to prevent dataset owners getting locked out of their datasets but the rest of fields would depend on the specific needs of the site, so it's probably better placed in an extension. |
ckan/logic/action/create.py
Outdated
|
||
user = model.User.get(user_id) | ||
if not user: | ||
raise NotFound(_('User not found')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During last year we've merged few PRs which were trying to prevent exposing of the user account to anonymous request. For example, you won't see User not found when resetting the password, you'll see email was sent to your mailbox even if you misspelled username. But this error comes before access check and creates a way to verify, whether the account exists on the portal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What error would you suggest returning here? Something generic like "Object not found" for both missing datasets and users is not very useful as it's trivial to check if a (public) dataset exists. So maybe raise NotAuthorized
if the user is not found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's optimal solution, so +1 for NotAuthorized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValidationError
seems more apt for a problem with one parameter of an API call, but I agree it should be done after the _check_access
call below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wardi the problem I see with ValidationError
is that is generally tied to one or more fields values (eg "id is missing", "value not allowed", etc) and I'm not sure what error message we would include with one of them (ValidationError: User not authorized
seems wrong). I went for a NotAuthorized but happy to change it if you have an alternative.
Late in the day to ask, so more for clarity than to propose a change: why do this in core vs an extension? If some core functionality is needed what is it and could we just add a new plugin point? What are the trade-offs overall? (Motivation: keeping core "surface area" minimal so that the codebase is more maintanable and extensible). |
ckan/migration/versions/097_f789f233226e_add_package_member_table.py
Outdated
Show resolved
Hide resolved
There are a couple of auth checks performed by `has_user_permission_for_group_or_org` that use the "membership" permission. This permission is not actually present in the ROLE_PERMISSIONS mapping but the function defaults to checking "admin" which has essentially the same effect. To be consistent this adds the "membership" perm to the mapping.
Return NotAuthorized in users are not found
Add a function in authz that checks if a record for the package_member table exists with a provided user id and dataset id exists
Use a user_id, package_id primary key constraint
@smotornyuk @pdelboca @wardi Thanks a lot for your review comments. I think I addressed them all now. |
What is this and why it's useful
From the docs:
Naming
This was a tricky one as always with naming things, but I decided to go with "Collaborators" for the user facing naming of this feature. Although they resemble quite closely organization "Members" it felt more natural to call them collaborators as a dataset is not meant to be a collection of users. But I'm not a native English speaker so if people feel "Dataset members" work fine I'm happy to do a big search and replace.
Internally and at the API level though I thought it was important to be aligned with all the existing member based names (
Member
models andmember_*
actions) so models are namedPackageMember
and API actionspackage_member_*
.Feature overview
Authorization settings are a delicate thing that require some planning on the maintainers and implementers side so this is feature is turned off by default (at least on 2.9)
Enabling it adds a new tab on the dataset Manage section to manage collaborators. Authorized users (more on this later) can add any existing user in the instance (not invite for now) as collaborator of the dataset, with different roles. By default, these rolese are:
So by default, organization admins retain full control over who can edit their organization datasets. If they want to allow collaborators to add new collaborators they can enable the Admin role via config:
Another scenario enabled by this is non-org based CKAN where datasets don't need to belong to an organization. From the docs:
So to sum, in this implementation a user can manage collaborators if:
ckan.auth.allow_admin_collaborators
is set to True)ckan.auth.create_dataset_if_not_in_organization
andckan.auth.create_unowned_dataset
set to True)By default, collaborators can not change the owner organization of a dataset unless they are admins or editors in both the source and destination organizations. To allow collaborators to change the owner organization even if they don't belong to the source organization, set
ckan.auth.allow_collaborators_to_change_owner_org
toTrue
.Implementation overview
This adds a new
package_member
table to store the relations. I tried to use the existingmember
one but it has agroup_id
foreign key, and rather to do some complex pg migration I just went for the simpler approach of having another table.There are four new actions, the first three simplified versions of their
member
counterparts.package_member_create
- Create / update collaboratorspackage_member_delete
- Removepackage_member_list
- Return the list of all collaborators for a given packagepackage_member_list_for_user
- Return a list of all packages the user is a collaborator inAuth works as described in the previous section.
The critical parts of the existing authorization logic where it hooks are:
package_update
: Organization-based auth always take precedence. Only if org-based auth fails we check if the user is a collaboratorTests
Loads of them! as it touches auth stuff I rather be cautious and maybe a bit redundant. They could use some parametrization for sure but I still don't know pytest that well, and we can always improve them later.
TODO