New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset workflow plugins #108
Comments
@rossjones @davidread I'd like your input at this time. DGU has been extended to support interesting workflows, right? |
I like this ideas a lot, I'm not sure of the DGU position on workflow (it tends to be reasonably straight-forward) but I know of at least two other CKAN instances (Leeds and London) that would love this level of control. I think in actual use the current core ACL/Workflow doesn't actually do what most people want. Do you think this is feasible without changes to core? What changes do you think might be required - for instance, I've had issues where I've had to monkey patch some of the current state handling (where it redirects to based on status). I like the idea of keeping it simple (one change per dataset per user), but I worry that with competing changes they're likely to be merge conflicts. It's probably enough to warn of this if the original dataset has changed since your changeset - probably want to avoid turning it into a git clone ;) |
@rossjones I am suggesting core changes, specifically creating a new IWorkflow plugin interface (methods still TBD) and a model like the one above. Absolutely agree. no merging or branches. Either your patch applies or it doesn't. If it doesn't you it's up to you (or possibly custom logic in a plugin) to create a new one. |
I'm +10 on this, especially if #67 can be implemented in the same code. It'd be great to get some wider user stories, from existing sites about how they would prefer the workflow to work, and what it would like it to enable. |
We (Landcare in NZ) are +1 on this too. We have some 'pre-paid' support time with OKFN/CKAN and we would be prepared to direct some of that toward this work to help progress it/ensure it happens... and to be involved in testing etc. |
@rossjones might not be ideal as a moderation queue mechanism, but both goals feel similar to me. I think we could cover a lot of the workflow and approval cases with something like this. @Aaron-M what are your specific workflow requirements? |
The examples you have given pretty much cover the use cases I can think of. We're a Govt research organisation, using CKAN to publish our datasets. I (Research Data Manager) want to make it easy for researchers to 'self-serve' but at the same time there is a lot of trust in people allowing them to post data. We need to be sure that (for example) the data is of good quality before we release it, that we own/have permission to release the data, there may be an embargo period to allow the researcher to publish their results... So some checkpoints along the way for another set of eyes to check before being made public. So you've thought of the things that are immediately useful to us.* There is an aspect to workflow that is not captured here, but I think is outside the scope of what is achievable/sensible as part of this 'issue' - that of workflow whilst the data is still be actively used as part of the research (more data being gathered/added, tidying up, error checking, (private) sharing (ala drop box) with project members for analysis... prior to it being 'finalised' and ready to publish) - that is very appealing too, but a significant project in its own right I expect. But probably worth you being aware of as part of the bigger picture. I think working on the issue/use cases as you have already described is a good place to start, and we'd be happy to help with that. *One thing I've just now thought of - we are setting up some semi-automated processes whereby extracts from some of our other databases are summarised, and automatically deposited to CKAN using the API. In such cases where a rigorous (and repeated) process is setup you would want to be able to by pass any moderation steps ie. the checking/approval is done once when you establish the 'procedure', and then after that each time it is run it recognises it as a trusted depositor/process which can go straight to a published dataset (or new resource within an existing dataset). So that would be one additional thing to consider from us. How that is implemented we'd be open to discussion/ideas... and its not a show stopper if not doable. |
Would it be possible to get a time stamp for each approval step or change in order to get basic performance measures out of the workflow? That way you could in theory see how long each step takes, which users, or organizations are more efficient, and use the plugin to inform six sigma style continuous improvement. |
We aren't doing any work on this for the moment. This is a tricky thing to generalize and it might be better to have smaller, more focused features to address specific parts of this proposal. |
Hello there! Could there be any chance of this issue being addressed in the short term? |
alternate approach in #211 that is simpler and allows for lots of other great features at the same time. |
this is similar of what we have implemented and created plugin (notification system included) for a government agency site way back in november 2015. |
I'm proposing a new plugin interface to define a site's dataset editing and publishing workflows. In my case "workflow" means:
These workflows could be configured per organization/dataset type with plugins that register which types or orgs they handle. This is something like the way resource view plugins work with resource types.
On the dataset editing form when changes are pending the plugins enabled would choose which version to present to the user (possibly with highlighting of fields that have changed) and expose new buttons besides "save" "delete", such as "publish", "submit for approval", "save for later", "suggest deletion". The default "save"/"delete" would likely be disabled for normal users. Users should be able to switch back to the "real" current version of the dataset to see the original values or possibly submit different competing changes into the approval process.
On the dataset view page users in the organization would be able to select the pending changes to view. Plugins would determine the default for a given user.
On the dataset search page the changes submitted would be indexed like private datasets. We would need urls for suggested changes so that we can link directly to them, e.g. `/dataset/traffic-data@suggested993'
Changeset data model
Most workflow plugins could be implemented on top of a single new model I've called "changeset" (feel free to suggest a better name)
'dataset'
Changesets can be active (able/unable to apply) or deleted (withdrawn/rejected). Unable to apply means that the jsonpatch 'test' conditions fail for the current version of the dataset.
Changesets are not hierarchical - they describe changes that apply directly to the current version of the dataset. It may be possible for a plugin to implement changesets that are displayed in a hierarchical manner using information in plugin_data or in other tables, but supporting such use is not part of the design of this feature.
IWorkflow methods
(to be completed)
Example: suggested changes
This kind of workflow plugin would allow the creation of up to one changeset per user per dataset: "User A suggests the following changes to dataset B"
Users that have permission to view a dataset but not edit it would be presented with a "suggest changes" button on the dataset view page. Clicking would lead to the normal dataset and resource edit pages, but instead of "save" user would have a "suggest change" button. After suggesting a change the edit screen for this dataset would also have a "withdraw suggested change" button.
When a user has a suggested change when viewing that dataset they may toggle between their suggested version or the currently published version of that dataset.
Users with permission to edit a dataset may view all users' suggested change versions. When viewing suggested changes they may "reject change" or "apply change".
Notifications: suggestions would be displayed on the dashboards of users with permissions to change the dataset. If a suggestion is accepted or rejected the change author could be notified by email. If a suggested change no longer applies (the original fields modified no longer match the values being changed) the change author could also be notified by email.
Example: moderated edits (single approval)
This kind of workflow plugin would replicate the "moderated edits" workflow: one "pending" version of a dataset exists that users in the same organization may view and edit. Admins may approve pending versions.
Users in the organization viewing a dataset can switch between the one official pending version of the dataset and the current published version. They may suggest deletion of a dataset, create a new shared pending version of a dataset (automatically deleting the existing one) or edit/withdraw an existing pending version.
Users in the organization may create a blank, private dataset that will exist to reserve a dataset id and allow creation of a to-be-published version based on a changeset.
Admins would be able to apply pending changesets on the dataset edit page or with a bulk approval tool.
Example: multiple approval
Similar to single approval above, but approvals are stored in the changeset until a changeset receives enough approvals (a determined by the workflow plugin) to actually publish the change.
Users in the organization would be able to see what approvals the pending changeset has received so far.
The workflow plugin would decide who to notify as as changeset is approved.
Example: scheduled approval
In any workflow plugin instead of immediately applying a change, a changeset may be marked with a
scheduled_date
. There will be a common mechanism to automatically attempt to apply any changeset with a scheduled_date in the past.A workflow plugin supporting scheduled changes would have to decide whether to support edits on a new changeset while a scheduled change is still pending, or even to support multiple scheduled changes on the same dataset. A simpler approach may be to abort the scheduled change or disallow new changes until the scheduled change has been published.
The text was updated successfully, but these errors were encountered: