Dataset workflow plugins #108

wardi · 2014-11-13T15:55:02Z

I'm proposing a new plugin interface to define a site's dataset editing and publishing workflows. In my case "workflow" means:

one or more approval stages on new datasets and edits to datasets within an organization
last approval stage might be a timed release "publish at date/time: X"
arrival at each approval stage may notify users that are responsible for that stage
users in the organization can tell what stage an approval is waiting on
users in the organization may view the the currently published and pending versions

These workflows could be configured per organization/dataset type with plugins that register which types or orgs they handle. This is something like the way resource view plugins work with resource types.

On the dataset editing form when changes are pending the plugins enabled would choose which version to present to the user (possibly with highlighting of fields that have changed) and expose new buttons besides "save" "delete", such as "publish", "submit for approval", "save for later", "suggest deletion". The default "save"/"delete" would likely be disabled for normal users. Users should be able to switch back to the "real" current version of the dataset to see the original values or possibly submit different competing changes into the approval process.

On the dataset view page users in the organization would be able to select the pending changes to view. Plugins would determine the default for a given user.

On the dataset search page the changes submitted would be indexed like private datasets. We would need urls for suggested changes so that we can link directly to them, e.g. `/dataset/traffic-data@suggested993'

Changeset data model

Most workflow plugins could be implemented on top of a single new model I've called "changeset" (feel free to suggest a better name)

Changeset field	notes
user_id	user that submitted this change
name	slug to identify the change (e.g. "suggested993")
created_date	date
target_id	id of the dataset being modified (perhaps orgs, groups in future)
target_type	`'dataset'`
plugin_type	identifier to find plugin responsible for managing this changeset
jsonpatch	changes to make (in a format that works for edits and deletion?)
approvals	list of approval strings (postgres text array?)
scheduled_date	for future scheduled publish/delete events, null normally
plugin_data	extra data (json) as required by plugin
deleted	bool

Changesets can be active (able/unable to apply) or deleted (withdrawn/rejected). Unable to apply means that the jsonpatch 'test' conditions fail for the current version of the dataset.

Changesets are not hierarchical - they describe changes that apply directly to the current version of the dataset. It may be possible for a plugin to implement changesets that are displayed in a hierarchical manner using information in plugin_data or in other tables, but supporting such use is not part of the design of this feature.

IWorkflow methods

(to be completed)

Example: suggested changes

This kind of workflow plugin would allow the creation of up to one changeset per user per dataset: "User A suggests the following changes to dataset B"

Users that have permission to view a dataset but not edit it would be presented with a "suggest changes" button on the dataset view page. Clicking would lead to the normal dataset and resource edit pages, but instead of "save" user would have a "suggest change" button. After suggesting a change the edit screen for this dataset would also have a "withdraw suggested change" button.

When a user has a suggested change when viewing that dataset they may toggle between their suggested version or the currently published version of that dataset.

Users with permission to edit a dataset may view all users' suggested change versions. When viewing suggested changes they may "reject change" or "apply change".

Notifications: suggestions would be displayed on the dashboards of users with permissions to change the dataset. If a suggestion is accepted or rejected the change author could be notified by email. If a suggested change no longer applies (the original fields modified no longer match the values being changed) the change author could also be notified by email.

Example: moderated edits (single approval)

This kind of workflow plugin would replicate the "moderated edits" workflow: one "pending" version of a dataset exists that users in the same organization may view and edit. Admins may approve pending versions.

Users in the organization viewing a dataset can switch between the one official pending version of the dataset and the current published version. They may suggest deletion of a dataset, create a new shared pending version of a dataset (automatically deleting the existing one) or edit/withdraw an existing pending version.

Users in the organization may create a blank, private dataset that will exist to reserve a dataset id and allow creation of a to-be-published version based on a changeset.

Admins would be able to apply pending changesets on the dataset edit page or with a bulk approval tool.

Example: multiple approval

Similar to single approval above, but approvals are stored in the changeset until a changeset receives enough approvals (a determined by the workflow plugin) to actually publish the change.

Users in the organization would be able to see what approvals the pending changeset has received so far.

The workflow plugin would decide who to notify as as changeset is approved.

Example: scheduled approval

In any workflow plugin instead of immediately applying a change, a changeset may be marked with a scheduled_date. There will be a common mechanism to automatically attempt to apply any changeset with a scheduled_date in the past.

A workflow plugin supporting scheduled changes would have to decide whether to support edits on a new changeset while a scheduled change is still pending, or even to support multiple scheduled changes on the same dataset. A simpler approach may be to abort the scheduled change or disallow new changes until the scheduled change has been published.

The text was updated successfully, but these errors were encountered:

wardi · 2014-11-20T19:24:36Z

@rossjones @davidread I'd like your input at this time. DGU has been extended to support interesting workflows, right?

rossjones · 2014-12-02T11:48:24Z

I like this ideas a lot, I'm not sure of the DGU position on workflow (it tends to be reasonably straight-forward) but I know of at least two other CKAN instances (Leeds and London) that would love this level of control. I think in actual use the current core ACL/Workflow doesn't actually do what most people want.

Do you think this is feasible without changes to core? What changes do you think might be required - for instance, I've had issues where I've had to monkey patch some of the current state handling (where it redirects to based on status).

I like the idea of keeping it simple (one change per dataset per user), but I worry that with competing changes they're likely to be merge conflicts. It's probably enough to warn of this if the original dataset has changed since your changeset - probably want to avoid turning it into a git clone ;)

wardi · 2014-12-03T18:04:41Z

@rossjones I am suggesting core changes, specifically creating a new IWorkflow plugin interface (methods still TBD) and a model like the one above.

Absolutely agree. no merging or branches. Either your patch applies or it doesn't. If it doesn't you it's up to you (or possibly custom logic in a plugin) to create a new one.

rossjones · 2014-12-08T08:50:57Z

I'm +10 on this, especially if #67 can be implemented in the same code. It'd be great to get some wider user stories, from existing sites about how they would prefer the workflow to work, and what it would like it to enable.

Aaron-M · 2014-12-08T20:13:26Z

We (Landcare in NZ) are +1 on this too. We have some 'pre-paid' support time with OKFN/CKAN and we would be prepared to direct some of that toward this work to help progress it/ensure it happens... and to be involved in testing etc.

wardi · 2014-12-08T20:52:46Z

@rossjones might not be ideal as a moderation queue mechanism, but both goals feel similar to me. I think we could cover a lot of the workflow and approval cases with something like this.

@Aaron-M what are your specific workflow requirements?

Aaron-M · 2014-12-08T21:48:02Z

The examples you have given pretty much cover the use cases I can think of. We're a Govt research organisation, using CKAN to publish our datasets. I (Research Data Manager) want to make it easy for researchers to 'self-serve' but at the same time there is a lot of trust in people allowing them to post data. We need to be sure that (for example) the data is of good quality before we release it, that we own/have permission to release the data, there may be an embargo period to allow the researcher to publish their results... So some checkpoints along the way for another set of eyes to check before being made public.

So you've thought of the things that are immediately useful to us.*

There is an aspect to workflow that is not captured here, but I think is outside the scope of what is achievable/sensible as part of this 'issue' - that of workflow whilst the data is still be actively used as part of the research (more data being gathered/added, tidying up, error checking, (private) sharing (ala drop box) with project members for analysis... prior to it being 'finalised' and ready to publish) - that is very appealing too, but a significant project in its own right I expect. But probably worth you being aware of as part of the bigger picture.

I think working on the issue/use cases as you have already described is a good place to start, and we'd be happy to help with that.

*One thing I've just now thought of - we are setting up some semi-automated processes whereby extracts from some of our other databases are summarised, and automatically deposited to CKAN using the API. In such cases where a rigorous (and repeated) process is setup you would want to be able to by pass any moderation steps ie. the checking/approval is done once when you establish the 'procedure', and then after that each time it is run it recognises it as a trusted depositor/process which can go straight to a published dataset (or new resource within an existing dataset). So that would be one additional thing to consider from us. How that is implemented we'd be open to discussion/ideas... and its not a show stopper if not doable.
[PS - we have setup a specific user account in ckan purely for the automated deposits, so it could simply be a matter of a user having a flag saying they 'are trusted' and don't require workflow moderation.]

JJIguiniz · 2015-10-07T01:05:49Z

Would it be possible to get a time stamp for each approval step or change in order to get basic performance measures out of the workflow? That way you could in theory see how long each step takes, which users, or organizations are more efficient, and use the plugin to inform six sigma style continuous improvement.

wardi · 2015-10-22T12:58:04Z

We aren't doing any work on this for the moment. This is a tricky thing to generalize and it might be better to have smaller, more focused features to address specific parts of this proposal.

rbvictor · 2016-06-22T14:19:26Z

Hello there! Could there be any chance of this issue being addressed in the short term?
If not, is it possible to implement these features by creating extensions or must the core be modified in order to accomplish this?

wardi · 2018-03-16T16:33:58Z

alternate approach in #211 that is simpler and allows for lots of other great features at the same time.

romeosanjose · 2019-06-07T04:39:02Z

this is similar of what we have implemented and created plugin (notification system included) for a government agency site way back in november 2015.

wardi mentioned this issue Dec 3, 2014

Moderation queue for objects #67

Open

This was referenced Feb 3, 2015

Workflow: Add 'embargo' feature for datasets to be released on specific date/time #51

Open

Index packages of all status (except 'deleted') not just the 'active' ones ckan/ckan#2261

Closed

wardi mentioned this issue Mar 16, 2018

Dataset workflow plugins (take 2) #211

Open

wardi closed this as completed Mar 16, 2018

rufuspollock added the superseded label May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset workflow plugins #108

Dataset workflow plugins #108

wardi commented Nov 13, 2014

wardi commented Nov 20, 2014

rossjones commented Dec 2, 2014

wardi commented Dec 3, 2014

rossjones commented Dec 8, 2014

Aaron-M commented Dec 8, 2014

wardi commented Dec 8, 2014

Aaron-M commented Dec 8, 2014

JJIguiniz commented Oct 7, 2015

wardi commented Oct 22, 2015

rbvictor commented Jun 22, 2016

wardi commented Mar 16, 2018

romeosanjose commented Jun 7, 2019

Dataset workflow plugins #108

Dataset workflow plugins #108

Comments

wardi commented Nov 13, 2014

Changeset data model

IWorkflow methods

Example: suggested changes

Example: moderated edits (single approval)

Example: multiple approval

Example: scheduled approval

wardi commented Nov 20, 2014

rossjones commented Dec 2, 2014

wardi commented Dec 3, 2014

rossjones commented Dec 8, 2014

Aaron-M commented Dec 8, 2014

wardi commented Dec 8, 2014

Aaron-M commented Dec 8, 2014

JJIguiniz commented Oct 7, 2015

wardi commented Oct 22, 2015

rbvictor commented Jun 22, 2016

wardi commented Mar 16, 2018

romeosanjose commented Jun 7, 2019