-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DS-3763: Scripts and processes endpoint #17
DS-3763: Scripts and processes endpoint #17
Conversation
Hi @tomdesair : while I like this idea overall, I worry it's a possible extension of the scope of DSpace 7. We haven't made any promises in DSpace 7 to make CLI tools available from the Admin UI or REST API. That said, I do think it'd be nice to kick off administrative scripts via REST API and/or an Admin UI in Angular. I just worry this effort may require a significant refactor to the existing DSpace CLI, unless it is scoped more to kicking off Curation Tasks (which does need to occur from the Admin UI, as it exists in DSpace 6 in that same way). Overall, I'd like to better understand what changes are being proposed at the Java API layer. I'd rather not undertake another significant Java API refactor during the DSpace 7 efforts, as our time is better spent on getting an initial DSpace 7 release out the door. |
We would only need a small refactoring in some DSpace API script classes so that they can list their parameters in a standardised way. The actual parameters or the way a script is called from the CLI will not change. We will also not modify the actual script implementations. The idea is that starting a script from the REST API will result in an external process (in the same way if you would have invoked the script from the CLI). This has the advantage that the script is not executed within Tomcat and will not crash Tomcat if it fails (which can happen in DSpace 6 if you start, for example, an export of your full repository). I agree that we should not support all scripts right away. I think we should focus first on the following cases which are also available in DSpace 6:
So we will only modify those classes, but it will not be a full refactor. That's why I included this in the contract description:
I think the effort for supporting these features in 5 dedicated endpoints will be similar to creating 1 generic endpoint. In later versions of DSpace 7, we can then add support for more CLI scripts. |
I'm in favor of a generic endpoint able to execute pluggable processes. IMHO these should be the curation tasks and few scripts that if were developed after the creation of the curation framework would be done as curation tasks ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than the formal changes to the contract please review also my general comment to decide if we want to proceed on this maybe limiting it at the start to the current curation tasks
processes-endpoint.md
Outdated
``` | ||
|
||
## Execution File Output | ||
**GET /api/admin/processes/<:process-id>/file** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can have a /files that list all the existent file to download? or we can flat the files at the upper level and use the file name as rel name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean /api/admin/processes/<:process-id>/files
and /api/admin/processes/<:process-id>/files/<:file-name>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, or we can simplify in
/api/admin/processes/<:process-id>/<:file-name>
assuming (ie adding the convention) that no scripts will use output as the name of one of the generated file
scripts-endpoint.md
Outdated
* `string`: A regular string value | ||
* `date`: A string that has a valid date pattern | ||
* `boolean`: true or false | ||
* `file`: For parameters with this type, the user has to provide a file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should specify how file as supplied. Obviously one option is to provide them in a multipart POST request using the parameter as name of the file but we want to document that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll document the multipart post instructions. The ultimate goal for me would be to have file upload support using something like the tus protocol. Because then people can upload for example really large imports and datasets. But that's probably something for DSpace 7.1 or DSpace 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will like to have support for resumable upload but it is probably out-of-scope for dspace 7. Once introduced it should be added to all the upload endpoints (including the submission)
This was discussed a bit in today's DSpace 7 Mtg. Just noting that I'm withdrawing my concern about expansion of scope. Based on discussion with @artlowel and @abollini, it sounds like the primary goal of this work is to enable Curation Tasks at the REST API layer, but that @tomdesair is being careful to design this in a way that it could also apply to other CLI commands (in launcher.xml). That sounds like a good approach to me. I'd just simply recommend that, in terms of the implementation, we should not spend too much time enabling all CLI commands. A few is fine, but others can be implemented/refactored later (possibly even post DSpace 7). |
I like the capabilities that are suggested here. I have had a long time desire to extend the curation system so that it could (1)take task appropriate parameters (2)persist output.
Looking at the proposed scripts endpoint, could the proposed GET/POST methods described there also apply to any/all curation tasks? A few questions about output files
|
Updates based on community feedback and support for features such as export to ZIP or export to CSV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality from this pull request is useful for features such as the export to ZIP and export to CSV we're currently working on.
For that reason, I've revived the pull request and:
- solved the problems mentioned by the reviewers
- reduced the scope to ensure the initial implementation doesn't create too much work
- ensured it supports all data required to use it for features which are already supported in DSpace 6 from the UI (export to CSV, export to ZIP, curation tasks, …)
Concerning access to curation tasks: please be aware of DSpace/DSpace#2180 and DSpace/DSpace#2181 which are pending. UPDATE: these were merged to master. |
Thx @benbosman ! |
I'm confused about the status of this PR and the ongoing implementation work. @benbosman are your working on implementing it? do you plan to use this contract to support the curation task system from the UI or just to implement the export feature? I'm not sure about how to use this contract for the curation tasks UI. It is true that a curation task can be executed as a script but I don't expect to have all the curation tasks listed in the script-launcher.xml |
This PR is waiting for an approval since all feedback was processed. Once approved, the development can start. Specifically for curation tasks, more work is indeed required to also create a list of the available tasks. |
The contract looks fine to me in regard of its purpose (execute administrative script). Please note that I'm not convinced that it is useful for the immediate tasks to be done in the UI (import / export / curation tasks) as I'm still convinced that for such features dedicated endpoints/strategies need to be used. More precisely, to export something I suggest to use a simple GET request to the main target of the export with an appropriate Content-Type, i.e. we should use content-negotiation for the export. For the curation tasks as already stated we will need the ability to know which tasks are available. Of course we can expose such list as a controlled list for the Finally, for the implementation of the scripts and processes endpoints keep in mind the global goal to keep the REST webapp stateless and horizontal scalable, this mean that we should store the running processes in a shareable, temporary-persistent storage |
Thanks for approving it, to be used at least for executing administrative scripts. I think the discussion on whether this will be used for import, export, curation tasks would fit better in the DSpace meetings. The solution will indeed be implemented to ensure the webapp remains stateless and horizontal scalable. |
I'd like to discuss this further in our DSpace 7 meeting this week. I'm still concerned about possibly extending the scope of DSpace 7 in this endpoint. I agree that we need an endpoint for Curation Tasks, and an endpoint for Administrative import / export (of Zips). However, this endpoint as described seems to involve functionality that isn't in DSpace (yet)... namely the entire So, I'd like to better understand the implementation plans/proposal for this endpoint, so that we can ensure we are not expanding our scope. |
DSpace 6 does not export items or metadata through the GUIs, so far as I can see. That would not make much sense, since you'd have to go to the server anyway to collect the output. |
@mwoodiupui DSpace 6 XMLUI has a feature to export the metadata of the search results (or all items in a collection/community) as CSV and items in a collection or community can be exported as ZIP files. (Both may be enabled only for admins.) |
I will add my voice to those asserting that curation tasks and CLI tools need separate treatment. |
Where is this export facility? Do we document it somewhere? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : Thanks for the updates and continued work here. Overall, this looks good. But, it looks like there is one outstanding request (that I agree with, see below) that was never addressed. It's a minor fix though, so I'll go ahead and give this a +1.
This is a contract proposal for https://jira.duraspace.org/browse/DS-3763
DSpace offers functionality like import and export of metadata and items, running curation tasks, running an OAI harvest from the web UI. While we can implement each of these features as a dedicated endpoint, I think it would require the same effort to implement one generic endpoint that is able to run (REST-enabled) DSpace CLI scripts for the REST API.
I'm very interested to hear your thoughts on this.