Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collaborators: dataset-level permissions #5346

Merged
merged 49 commits into from Jun 26, 2020
Merged

Collaborators: dataset-level permissions #5346

merged 49 commits into from Jun 26, 2020

Conversation

amercader
Copy link
Member

@amercader amercader commented Apr 15, 2020

Screenshot_2020-04-14 Collaborators - Dataset - CKAN

What is this and why it's useful

From the docs:

In addition to traditional organization-based permissions, CKAN instances can also enable
the dataset collaborators feature, which allows dataset-level authorization. This provides
more granular control over who can access and modify datasets that belong to an organization,
or allows authorization setups not based on organizations. It works by allowing users with
appropriate permissions to give permissions to other users over individual datasets, regardless
of what organization they belong to.

Naming

This was a tricky one as always with naming things, but I decided to go with "Collaborators" for the user facing naming of this feature. Although they resemble quite closely organization "Members" it felt more natural to call them collaborators as a dataset is not meant to be a collection of users. But I'm not a native English speaker so if people feel "Dataset members" work fine I'm happy to do a big search and replace.

Internally and at the API level though I thought it was important to be aligned with all the existing member based names (Member models and member_* actions) so models are named PackageMember and API actions package_member_*.

Feature overview

Authorization settings are a delicate thing that require some planning on the maintainers and implementers side so this is feature is turned off by default (at least on 2.9)

Enabling it adds a new tab on the dataset Manage section to manage collaborators. Authorized users (more on this later) can add any existing user in the instance (not invite for now) as collaborator of the dataset, with different roles. By default, these rolese are:

  • Member: can read private datasets
  • Editor: can edit or delete datasets (including changing the organization)

So by default, organization admins retain full control over who can edit their organization datasets. If they want to allow collaborators to add new collaborators they can enable the Admin role via config:

  • Admin: can also add, update or remove collaborators.

Another scenario enabled by this is non-org based CKAN where datasets don't need to belong to an organization. From the docs:

For instance on instances where datasets don't need to belong to an organization (both :ref:ckan.auth.create_dataset_if_not_in_organization and :ref:ckan.auth.create_unowned_dataset are True), the user that originally created a dataset can also add collaborators to it (allowing admin collaborators or not depending on the ckan.auth.allow_admin_collaborators setting). Note that in this case though, if the dataset is assigned to an organization, the original creator might no longer be able to access and edit, as organization permissions take precedence over collaborators ones.

So to sum, in this implementation a user can manage collaborators if:

  1. Is an administrator of the organization the dataset belongs to
  2. Is a collaborator with role "admin" (assuming ckan.auth.allow_admin_collaborators is set to True)
  3. Is the creator of the dataset and the dataset does not belong to an organization (requires ckan.auth.create_dataset_if_not_in_organization and ckan.auth.create_unowned_dataset set to True)

By default, collaborators can not change the owner organization of a dataset unless they are admins or editors in both the source and destination organizations. To allow collaborators to change the owner organization even if they don't belong to the source organization, set ckan.auth.allow_collaborators_to_change_owner_org to True.

Implementation overview

This adds a new package_member table to store the relations. I tried to use the existing member one but it has a group_id foreign key, and rather to do some complex pg migration I just went for the simpler approach of having another table.

There are four new actions, the first three simplified versions of their member counterparts.

  • package_member_create - Create / update collaborators
  • package_member_delete - Remove
  • package_member_list - Return the list of all collaborators for a given package
  • package_member_list_for_user - Return a list of all packages the user is a collaborator in

Auth works as described in the previous section.

The critical parts of the existing authorization logic where it hooks are:

  • package_update: Organization-based auth always take precedence. Only if org-based auth fails we check if the user is a collaborator
  • Default permission labels: there are labels added for the dataset collaborators

Tests

Loads of them! as it touches auth stuff I rather be cautious and maybe a bit redundant. They could use some parametrization for sure but I still don't know pytest that well, and we can always improve them later.

TODO

  • Admin collaborators tests
  • Tests for unowned datasets
  • Handle organization changes

It would be nice to integrate the package members into the existing
group-based member table but I can't think of a clean way of doing it so
for now I'm going with a separate table.

I have not added the stateful mixin to it and tried to keep it as simple
as possible.
These are similar to the existing member_* actions, but simplified and
with a more restricted interface. package_member_create also updates,
delete actually deletes the DB record.

In addition, package_member_list_for_user returns all memberships for a
user.

In terms of auth, this initial implementation restricts permissions to
admins from the organization a dataset belongs to (ie there are no admin
collaborators than can add other collaborators. I believe this will
follow more closely the current org-based perms, in that publishers will
still have control on who can edit their org datasets.

For package_member_list_for_user I restricted this to the own user (and
sysadmins of course)
Org-level auth always takes precedence for approval, ie if a user
belongs to an org as an editor, it can edit the the dataset regardless
of if it is a collaborator or not.

But if org-level authorization fails, then we check if the user is a
collaborator with editor capacity (no admins yet). If it is, it passes
auth.
All datasets get a generic label for all potential collaborators
(collaborator-{dataset id}). On the user labels, they get a
collaborator-{dataset id} label for each dataset they are a member of.
These may be a bit overzealous but better to be extra cautious.
There's also a bit of repetition that could probably be addressed with
parametrization
A new tab on the dataset Manage interface, with the same UI as the
organization member manangement ones.
Add the ability to turn on and off the dataset collaborators via the
`ckan.auth.allow_dataset_collaborators` config option. As this would
affect existing authorization policies in place for existing sites, it
is turned off by default.
Admin collaborators can manage dataset collaborators in addition to
edit it. This allows users from outside the organization the dataset
belongs to give permissions on the dataset to other external users, so
this setting is turned off by default.
@amercader amercader changed the title Collaborators Collaborators: dataset-level permissions Apr 15, 2020
@amercader amercader added this to the CKAN 2.9 milestone Apr 15, 2020
@pdekraker-epa
Copy link
Contributor

@amercader one item with editor collaborator is the potential to alter dataset ownership and/or visibility. Blocking updates that modify those parameters by non-owners would probably be sufficient.

On a similar note, requiring an owner org admin to add collaborators seems a little restrictive. It is probably a good default, but enabling editors to modify collaborators would be handy.

@amercader
Copy link
Member Author

This is finally ready for review! 🐢
If someone from the @ckan/core team can help @pdelboca with the review that would be great!

@amercader
Copy link
Member Author

@pdekraker-epa I don't think the visibility change protection should be part of core. Preventing the change of owner organization is a sane default to prevent dataset owners getting locked out of their datasets but the rest of fields would depend on the specific needs of the site, so it's probably better placed in an extension.
And in any case I think this PR is already too big :) We can expand the feature in future versions.

@amercader amercader marked this pull request as ready for review June 15, 2020 12:08
ckan/authz.py Show resolved Hide resolved

user = model.User.get(user_id)
if not user:
raise NotFound(_('User not found'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During last year we've merged few PRs which were trying to prevent exposing of the user account to anonymous request. For example, you won't see User not found when resetting the password, you'll see email was sent to your mailbox even if you misspelled username. But this error comes before access check and creates a way to verify, whether the account exists on the portal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What error would you suggest returning here? Something generic like "Object not found" for both missing datasets and users is not very useful as it's trivial to check if a (public) dataset exists. So maybe raise NotAuthorized if the user is not found?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's optimal solution, so +1 for NotAuthorized

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidationError seems more apt for a problem with one parameter of an API call, but I agree it should be done after the _check_access call below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wardi the problem I see with ValidationError is that is generally tied to one or more fields values (eg "id is missing", "value not allowed", etc) and I'm not sure what error message we would include with one of them (ValidationError: User not authorized seems wrong). I went for a NotAuthorized but happy to change it if you have an alternative.

@rufuspollock
Copy link
Member

Late in the day to ask, so more for clarity than to propose a change: why do this in core vs an extension? If some core functionality is needed what is it and could we just add a new plugin point? What are the trade-offs overall?

(Motivation: keeping core "surface area" minimal so that the codebase is more maintanable and extensible).

ckan/authz.py Outdated Show resolved Hide resolved
ckan/migration/alembic.ini Outdated Show resolved Hide resolved
@rufuspollock rufuspollock moved this from In progress to Review in progress in Working On Jun 25, 2020
ckan/lib/helpers.py Outdated Show resolved Hide resolved
ckan/logic/action/create.py Outdated Show resolved Hide resolved
ckan/logic/action/get.py Outdated Show resolved Hide resolved
ckan/logic/auth/update.py Outdated Show resolved Hide resolved
ckan/logic/validators.py Outdated Show resolved Hide resolved
There are a couple of auth checks performed by
`has_user_permission_for_group_or_org` that use the "membership"
permission. This permission is not actually present in the
ROLE_PERMISSIONS mapping but the function defaults to checking "admin"
which has essentially the same effect. To be consistent this adds the
"membership" perm to the mapping.
Return NotAuthorized in users are not found
Add a function in authz that checks if a record for the package_member
table exists with a provided user id and dataset id exists
Use a user_id, package_id primary key constraint
@amercader
Copy link
Member Author

@smotornyuk @pdelboca @wardi Thanks a lot for your review comments. I think I addressed them all now.

@wardi wardi merged commit 709a660 into master Jun 26, 2020
Working On automation moved this from Review in progress to Done Jun 26, 2020
@smotornyuk smotornyuk deleted the collaborators branch February 2, 2022 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Working On
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants