Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curation tasks using the scripts and processes functionality #2820

Merged

Conversation

jonas-atmire
Copy link
Contributor

References

Add references/links to any related tickets or PRs.

Description

This PR adds in the functionality of the curation tasks to the recently merged scripts and processes functionality.
Documentation on the curation task can be found above.

Instructions for Reviewers

List of changes in this PR:

  • The curation tasks process has been moved to the scripts.xml config
  • CurationCli has been made a DSpaceRunnable
  • Configuration of the script can be found in CurationClientOptions
  • At present, all curation tasks are admin only (Which is checked in the CurationScriptConfiguration)

Starting a curation task can be done by performing a POST request on the endpoint

/api/system/scripts/curate/processes

All parameter values should be provided in the body that has to use the multipart/form-data content type.
The simples of this is with a properties parameter, than contains JSON with the required options, ie the following properties will start the 'noop' curation task(-t) for the provided eperson(-e) and handle (-i)

  {
    "name": "-e",
    "value": "atmirenv@gmail.com"
  },
  {
    "name": "-t",
    "value":"noop"
  },
  {
    "name": "-i",
    "value":"123456789/8"
  }
]

When using the '-T' option, this allows the code to handle a text file containing a list of tasknames to be performed. (An example which is also used in the IT can be found here
Note:
CurationScriptIT at present does not yet contain the check to see if calling the 'noop' task actually ends up on (This IS logged in the dspace logs, but not explicitly returned as a particular status, apart from the scripts and processes returning the properties that have been sent to the endpoint, but that is not a marker to see if it has been executed or not)
We need to investigate to properly and cleanly mock the 'NoOpCurationTask' (In combination with the autowiring that is being done in the class), after which we can verify if the 'perform' method has been called correctly.

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • My PR passes Checkstyle validation based on the Code Style Guide.
  • My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • If my PR includes new, third-party dependencies (in any pom.xml), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR modifies the REST API, I've linked to the REST Contract page (or open PR) related to this change.

@MarieVerdonck MarieVerdonck force-pushed the w2p-71513_Curation-tasks-scripts-and-processes branch from e3d5c54 to fec0456 Compare July 2, 2020 13:49
@benbosman benbosman added this to the 7.0beta4 milestone Jul 9, 2020
@benbosman benbosman added this to Needs Reviewers Assigned in DSpace 7 Beta 4 via automation Jul 9, 2020
@tdonohue tdonohue changed the base branch from master to main July 13, 2020 21:46
@tdonohue tdonohue moved this from Needs Reviewers Assigned to Under Review in DSpace 7 Beta 4 Jul 14, 2020
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonas-atmire : Overall, this code looks good. Just a minor request inline.

However, I will note this PR looks like it has the same security issue as the "metadata-import" script has... It should not allow specifying the --email param from the Admin UI. Instead, it should define that automatically to be the current user. See #2822

You may want to touch base with @KevinVdV on whether this issue needs to be resolved more generally. In general, no one should be allowed to bypass authorizations when running Scripts from the Admin UI. So, the --email param should be banned from the REST API & Admin UI for all scripts. If need be, we can solve this in a follow-up PR, but I'd like a clear plan for how we plan to resolve it prior to merging this PR.

I'm also surprised to see that the REST API is unable to return the results of the Curation Task. No matter which task I run, the results are only written to the log files, which makes this endpoint less useful -- you need commandline access anyhow to see the output, so why not just run it from commandline?

dspace/config/modules/curate.cfg Show resolved Hide resolved
DSpace 7 Beta 4 automation moved this from Under Review to Reviewer Approved Jul 31, 2020
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks @jonas-atmire . I've re-reviewed this and tested it with DSpace/dspace-angular#799 and everything is now working as expected.

As we previously discussed, the task output & security issues with the --eperson flag will be handled in separate PRs. So, this looks good to me as-is. Thanks again!

@tdonohue tdonohue added the tools: curation-tasks Related to Curation Task system or specific tasks label Jul 31, 2020
@tdonohue
Copy link
Member

Merging this with one approval, as this backend has received several reviews from me & testing via the corresponding Angular PR from several individuals.

@tdonohue tdonohue merged commit 1301e43 into DSpace:main Jul 31, 2020
DSpace 7 Beta 4 automation moved this from Reviewer Approved to Done Jul 31, 2020
cwilper added a commit to atmire/dspace-replicate that referenced this pull request Apr 25, 2021
This re-creates curate.cfg based on the out-of-box curate.cfg
for DSpace 7, adding only those configurations that are relevant
for RTS in DSpace 7.

Notably, the ui-related configurations have been removed due
to curation tasks now executing from the scripts and processes
functionality: DSpace/DSpace#2820

In addition, this corrects an error in the sample (commented) configs
for bagit and checkm, where the plugin syntax was not updated to
use the new way of specifying multiple values as of DSpace 6.
bbranan pushed a commit to DSpace/dspace-replicate that referenced this pull request Sep 22, 2021
This re-creates curate.cfg based on the out-of-box curate.cfg
for DSpace 7, adding only those configurations that are relevant
for RTS in DSpace 7.

Notably, the ui-related configurations have been removed due
to curation tasks now executing from the scripts and processes
functionality: DSpace/DSpace#2820

In addition, this corrects an error in the sample (commented) configs
for bagit and checkm, where the plugin syntax was not updated to
use the new way of specifying multiple values as of DSpace 6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tools: curation-tasks Related to Curation Task system or specific tasks
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants