Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset workflow plugins #108

Closed
wardi opened this issue Nov 13, 2014 · 12 comments
Closed

Dataset workflow plugins #108

wardi opened this issue Nov 13, 2014 · 12 comments

Comments

@wardi
Copy link
Contributor

wardi commented Nov 13, 2014

I'm proposing a new plugin interface to define a site's dataset editing and publishing workflows. In my case "workflow" means:

  • one or more approval stages on new datasets and edits to datasets within an organization
  • last approval stage might be a timed release "publish at date/time: X"
  • arrival at each approval stage may notify users that are responsible for that stage
  • users in the organization can tell what stage an approval is waiting on
  • users in the organization may view the the currently published and pending versions

These workflows could be configured per organization/dataset type with plugins that register which types or orgs they handle. This is something like the way resource view plugins work with resource types.

On the dataset editing form when changes are pending the plugins enabled would choose which version to present to the user (possibly with highlighting of fields that have changed) and expose new buttons besides "save" "delete", such as "publish", "submit for approval", "save for later", "suggest deletion". The default "save"/"delete" would likely be disabled for normal users. Users should be able to switch back to the "real" current version of the dataset to see the original values or possibly submit different competing changes into the approval process.

On the dataset view page users in the organization would be able to select the pending changes to view. Plugins would determine the default for a given user.

On the dataset search page the changes submitted would be indexed like private datasets. We would need urls for suggested changes so that we can link directly to them, e.g. `/dataset/traffic-data@suggested993'

Changeset data model

Most workflow plugins could be implemented on top of a single new model I've called "changeset" (feel free to suggest a better name)

Changeset field notes
user_id user that submitted this change
name slug to identify the change (e.g. "suggested993")
created_date date
target_id id of the dataset being modified (perhaps orgs, groups in future)
target_type 'dataset'
plugin_type identifier to find plugin responsible for managing this changeset
jsonpatch changes to make (in a format that works for edits and deletion?)
approvals list of approval strings (postgres text array?)
scheduled_date for future scheduled publish/delete events, null normally
plugin_data extra data (json) as required by plugin
deleted bool

Changesets can be active (able/unable to apply) or deleted (withdrawn/rejected). Unable to apply means that the jsonpatch 'test' conditions fail for the current version of the dataset.

Changesets are not hierarchical - they describe changes that apply directly to the current version of the dataset. It may be possible for a plugin to implement changesets that are displayed in a hierarchical manner using information in plugin_data or in other tables, but supporting such use is not part of the design of this feature.

IWorkflow methods

(to be completed)

Example: suggested changes

This kind of workflow plugin would allow the creation of up to one changeset per user per dataset: "User A suggests the following changes to dataset B"

Users that have permission to view a dataset but not edit it would be presented with a "suggest changes" button on the dataset view page. Clicking would lead to the normal dataset and resource edit pages, but instead of "save" user would have a "suggest change" button. After suggesting a change the edit screen for this dataset would also have a "withdraw suggested change" button.

When a user has a suggested change when viewing that dataset they may toggle between their suggested version or the currently published version of that dataset.

Users with permission to edit a dataset may view all users' suggested change versions. When viewing suggested changes they may "reject change" or "apply change".

Notifications: suggestions would be displayed on the dashboards of users with permissions to change the dataset. If a suggestion is accepted or rejected the change author could be notified by email. If a suggested change no longer applies (the original fields modified no longer match the values being changed) the change author could also be notified by email.

Example: moderated edits (single approval)

This kind of workflow plugin would replicate the "moderated edits" workflow: one "pending" version of a dataset exists that users in the same organization may view and edit. Admins may approve pending versions.

Users in the organization viewing a dataset can switch between the one official pending version of the dataset and the current published version. They may suggest deletion of a dataset, create a new shared pending version of a dataset (automatically deleting the existing one) or edit/withdraw an existing pending version.

Users in the organization may create a blank, private dataset that will exist to reserve a dataset id and allow creation of a to-be-published version based on a changeset.

Admins would be able to apply pending changesets on the dataset edit page or with a bulk approval tool.

Example: multiple approval

Similar to single approval above, but approvals are stored in the changeset until a changeset receives enough approvals (a determined by the workflow plugin) to actually publish the change.

Users in the organization would be able to see what approvals the pending changeset has received so far.

The workflow plugin would decide who to notify as as changeset is approved.

Example: scheduled approval

In any workflow plugin instead of immediately applying a change, a changeset may be marked with a scheduled_date. There will be a common mechanism to automatically attempt to apply any changeset with a scheduled_date in the past.

A workflow plugin supporting scheduled changes would have to decide whether to support edits on a new changeset while a scheduled change is still pending, or even to support multiple scheduled changes on the same dataset. A simpler approach may be to abort the scheduled change or disallow new changes until the scheduled change has been published.

@wardi
Copy link
Contributor Author

wardi commented Nov 20, 2014

@rossjones @davidread I'd like your input at this time. DGU has been extended to support interesting workflows, right?

@rossjones
Copy link
Contributor

I like this ideas a lot, I'm not sure of the DGU position on workflow (it tends to be reasonably straight-forward) but I know of at least two other CKAN instances (Leeds and London) that would love this level of control. I think in actual use the current core ACL/Workflow doesn't actually do what most people want.

Do you think this is feasible without changes to core? What changes do you think might be required - for instance, I've had issues where I've had to monkey patch some of the current state handling (where it redirects to based on status).

I like the idea of keeping it simple (one change per dataset per user), but I worry that with competing changes they're likely to be merge conflicts. It's probably enough to warn of this if the original dataset has changed since your changeset - probably want to avoid turning it into a git clone ;)

@wardi
Copy link
Contributor Author

wardi commented Dec 3, 2014

@rossjones I am suggesting core changes, specifically creating a new IWorkflow plugin interface (methods still TBD) and a model like the one above.

Absolutely agree. no merging or branches. Either your patch applies or it doesn't. If it doesn't you it's up to you (or possibly custom logic in a plugin) to create a new one.

@rossjones
Copy link
Contributor

I'm +10 on this, especially if #67 can be implemented in the same code. It'd be great to get some wider user stories, from existing sites about how they would prefer the workflow to work, and what it would like it to enable.

@Aaron-M
Copy link

Aaron-M commented Dec 8, 2014

We (Landcare in NZ) are +1 on this too. We have some 'pre-paid' support time with OKFN/CKAN and we would be prepared to direct some of that toward this work to help progress it/ensure it happens... and to be involved in testing etc.

@wardi
Copy link
Contributor Author

wardi commented Dec 8, 2014

@rossjones might not be ideal as a moderation queue mechanism, but both goals feel similar to me. I think we could cover a lot of the workflow and approval cases with something like this.

@Aaron-M what are your specific workflow requirements?

@Aaron-M
Copy link

Aaron-M commented Dec 8, 2014

The examples you have given pretty much cover the use cases I can think of. We're a Govt research organisation, using CKAN to publish our datasets. I (Research Data Manager) want to make it easy for researchers to 'self-serve' but at the same time there is a lot of trust in people allowing them to post data. We need to be sure that (for example) the data is of good quality before we release it, that we own/have permission to release the data, there may be an embargo period to allow the researcher to publish their results... So some checkpoints along the way for another set of eyes to check before being made public.

So you've thought of the things that are immediately useful to us.*

There is an aspect to workflow that is not captured here, but I think is outside the scope of what is achievable/sensible as part of this 'issue' - that of workflow whilst the data is still be actively used as part of the research (more data being gathered/added, tidying up, error checking, (private) sharing (ala drop box) with project members for analysis... prior to it being 'finalised' and ready to publish) - that is very appealing too, but a significant project in its own right I expect. But probably worth you being aware of as part of the bigger picture.

I think working on the issue/use cases as you have already described is a good place to start, and we'd be happy to help with that.

*One thing I've just now thought of - we are setting up some semi-automated processes whereby extracts from some of our other databases are summarised, and automatically deposited to CKAN using the API. In such cases where a rigorous (and repeated) process is setup you would want to be able to by pass any moderation steps ie. the checking/approval is done once when you establish the 'procedure', and then after that each time it is run it recognises it as a trusted depositor/process which can go straight to a published dataset (or new resource within an existing dataset). So that would be one additional thing to consider from us. How that is implemented we'd be open to discussion/ideas... and its not a show stopper if not doable.
[PS - we have setup a specific user account in ckan purely for the automated deposits, so it could simply be a matter of a user having a flag saying they 'are trusted' and don't require workflow moderation.]

@JJIguiniz
Copy link

Would it be possible to get a time stamp for each approval step or change in order to get basic performance measures out of the workflow? That way you could in theory see how long each step takes, which users, or organizations are more efficient, and use the plugin to inform six sigma style continuous improvement.

@wardi
Copy link
Contributor Author

wardi commented Oct 22, 2015

We aren't doing any work on this for the moment. This is a tricky thing to generalize and it might be better to have smaller, more focused features to address specific parts of this proposal.

@rbvictor
Copy link

Hello there! Could there be any chance of this issue being addressed in the short term?
If not, is it possible to implement these features by creating extensions or must the core be modified in order to accomplish this?

@wardi
Copy link
Contributor Author

wardi commented Mar 16, 2018

alternate approach in #211 that is simpler and allows for lots of other great features at the same time.

@wardi wardi closed this as completed Mar 16, 2018
@romeosanjose
Copy link

this is similar of what we have implemented and created plugin (notification system included) for a government agency site way back in november 2015.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants