Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DS-4281: Metadata suggestions in the live import #83

Merged
merged 10 commits into from
Dec 10, 2019

Conversation

benbosman
Copy link
Member

@benbosman benbosman commented Oct 15, 2019

https://jira.duraspace.org/browse/DS-4281
This Rest Contract will introduce the live import from https://wiki.duraspace.org/dsdoc6x/using-dspace/ingesting-content-and-metadata/submission-user-interface/2016-framework-for-live-import-from-external-sources
This will allow the user to choose an external source to import from, retrieve suggestions, and apply the metadata changes once the user has confirmed the suggested changes are applicable.

This Rest Contract is related to #74

It ensures a submission can be started based on an external source.
This uses a POST on the /api/submission/workspaceitems, with a URI-list (with the collection and external source).
The item metadata is imported in this case

@paulo-graca
Copy link
Contributor

Thank you @benbosman for adding this!
I think "Metadata suggestions from live import" should be brought to a wider group for discussion. I don't think it's specific to Entities group. So I would suggest to split this PR in 2. One to address "Metadata suggestions from live import" and the other for how to integrate the external sources.

metadata-suggestions.md Outdated Show resolved Hide resolved
metadata-suggestions.md Outdated Show resolved Hide resolved
The changes embedded are the suggested metadata changes to be applied to the given item

### Changes for a single entry
**/api/integration/metadata-suggestions/<:suggestion-name>/entryValues/<:entry-id>/changes**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint seems odd to me too... if you have the workspaceitem or workflowitem passed into /api/integration/metadata-suggestions/<:suggestion-name>/entryValues/<:entry-id>, couldn't you just return the "changes" from that endpoint? I'm not sure this changes subpath is necessary, plus it doesn't seem to follow our REST Best Practices. It's not really a "subresource" of the entry (represented by entry-id). It's more like a different "view" or "filter" of that entry. I'm not sure what to suggest here though yet, as I'm still trying to understand the parent endpoints here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are currently assumed not to be embedded in the /api/integration/metadata-suggestions/<:suggestion-name>/entries endpoint for performance reasons (although it's hard to estimate the performance impact prior to the implementation).
In order to ensure this scales well, it was introduced as a sub-resource. The changes are calculated based on the combination of the information in the entry and in the item

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : To clarify, my original statement here, I wasn't suggesting to embed changes...instead, I was suggesting that this specific endpoint might return a different response based on parameters. So for example:

  • GET /api/integration/metadata-suggestions/<:suggestion-name>/entryValues/<:entry-id> would return the full external entry (which would seem to be nearly identical to what /api/integration/externalsources/<:source-name>/entryValue/<:entry-id> might return (if the backends were identical)).
  • GET /api/integration/metadata-suggestions/<:suggestion-name>/entryValues/<:entry-id>?workspaceitem=1111-2222-3333 would return the difference (i.e. the "changes") between <:entry-id> and that WorkSpaceItem

So, this would mean you'd no longer need a subresource called changes (which as I said, doesn't seem like a normal subresource anyways).

It's just a different idea of how to possibly approach this specific endpoint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the features of both endpoints are quite different, it doesn't make much sense to ask for the metadata of e.g. PubMed's live import using /api/integration/externalsources/pubmed/entryValue/31680121 especially since there's a reasonable chance /api/integration/externalsources/pubmed doesn't exist (is not configured)

It would be similar to updating the collection metadata using /api/core/items/<:collectionID> just because items also have metadata, even though there's no item with that ID

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes subresource still bugs me, as it doesn't seem very RESTful. I understand the purpose here, and it seems like the goal is to simplify the client side processing, so the client side doesn't need to compare the response of the /metadata-suggestions request to the current metadata in the WorkspaceItem/WorkflowItem.

However, at the same time, I'm not sure the Server Side (REST API) will always know whether the /changes should return an add or replace or delete operation. For example, if the /metadata-suggestion returns one dc.title while the WorkspaceItem has a different dc.title, is that an add operation (to end up with two alternative titles) or a replace operation (to end up with one title)? Would the Server Side be able to make a proper /changes suggestion without input from an End User?

I'm starting to wonder if it's easier to drop this /changes endpoint altogether, and do the comparison on the Client Side (as these are simple string comparisons). In the above scenario, I could imagine a UI that simply shows that there's a difference in the values of dc.title to let the End User select the one they prefer, or even allow them to select both. Based on the User's selection, the UI would generate either a noop, an add or a replace using the existing Item PATCH metadata functionality.

This is also a topic that we could bring to the DSpace 7 Working Group meeting -- as this feature is not specific to Entities in any way. Maybe others see a different way of achieving this endpoint in a RESTful fashion.

@tdonohue tdonohue changed the title Metadata suggestions in the live import DS-4281: Metadata suggestions in the live import Nov 4, 2019
@tdonohue
Copy link
Member

tdonohue commented Nov 4, 2019

@benbosman : Revisiting this Contract, I'm looking for DSpace 6.x examples where we had Metadata suggestions using the Live Import framework (I'm not familiar with how this currently works, and I didn't realize it already supported Metadata suggestions -- for some reason I thought it was a full import in 6.x). Do you have examples or a demo from 6.x that we can base this Contract from? I'm still finding the use case differences between "Live Import" and PR#74 (Import from External Sources) to not be clear. I know you've tried to explain the differences, but I think it'd help me to step back here and walk through use cases (based on how this worked in DSpace 6.x).

@benbosman
Copy link
Member Author

benbosman commented Nov 5, 2019

@tdonohue in DSpace 6, the live import starts from an existing item as well. It offers a search feature, which returns a list of matching articles in PubMed. The UI displays a preview with the title and first authors.
The user is able to select a publication from PubMed to import. It doesn't create a new item, but rather updates the metadata of the current item.

This is a completely different feature compared to #74, with the overlap being:
If you have the same source configured for both live imports and external source, the same entry ID would typically result in the same preview metadata

Since the features are quite different, it doesn't make much sense to ask for the metadata of e.g. PubMed's live import using /api/integration/externalsources/pubmed/entryValue/31680121 especially since there's a reasonable chance /api/integration/externalsources/pubmed doesn't exist (is not configured)

@tdonohue
Copy link
Member

tdonohue commented Nov 5, 2019

@benbosman : I feel we need to discuss this endpoint further. I'm worried about the complexity of having to configure multiple "external APIs" multiple times for similar import mechanisms. I'm having a hard time following the logic that an import using "live imports" into an Item is vastly different from an import using "external sources" into an Entity (also an Item). It seems like, at a minimum, these two import mechanisms should share some underlying code or configuration -- so I'd like to better understand the plans for doing so. For example, just a few questions include:

  1. Are they planned to be configured via a single Spring Bean?
  2. Are they entirely separate configurations that need to be enabled separately, even though they use the same external API?
  3. Is there a reason why "external sources" couldn't just use/extend the existing "live import" feature/architecture to import into Entities?

Just for clarity, I (personally) don't like to see integrations/configurations that sound duplicative -- this sounds quite duplicative as we are talking about using the same external APIs for importing data, both triggerable in the submission UI, but in two different setups/configurations. I'd like to tease this apart further before I can approve it, as I still don't know how to describe the differences here to other DSpace users (and if I'm confused, I expect others will be confused as well)

I'd request we bring this for discussion in either an Entities WG or a DSpace 7 WG meeting.

@paulo-graca
Copy link
Contributor

@benbosman can you please consider this on this PR? #88

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : Updating my feedback on this PR, after more thought (and since we last discussed this in a meeting).

  1. Overall, I (now) understand the reasons why we'd want this new /metadata-suggestions endpoint. The overall goal of this endpoint is to port the "Live import" feature (from 6.x) to our new REST API. As that feature is primarily oriented towards metadata enhancement (and as we discussed in a meeting "live import" is an odd name for this feature), I agree that "metadata-suggestions" is a better name.
  2. However, I'm still not yet convinced that the /changes subresource is something we should implement. See this comment: https://github.com/DSpace/Rest7Contract/pull/83/files#r343257121

If the /changes subresource was removed, I think the rest of this Contract is mostly straightforward. However, I think this also may take away some of the usefulness of these metadata suggestions. I've been trying to locate other examples of REST APIs that provide these sorts of "suggestions" for changes/enhancements, but haven't found any.

I'm not sure of a better way to move this forward. We could discuss this more though in either the DSpace 7 Entities WG or the DSpace 7 WG to see if we can come up with ideas.

Copy link
Contributor

@paulo-graca paulo-graca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been following closely the enhances and comments in this PR.
I'm ok with the current status and I don't have any strong opinion about the use of metadata PATCH for informing which metadata to be changed if it will be done on the client of server side. So, whatever decided with @tdonohue, I will also be ok with it.

@paulo-graca
Copy link
Contributor

@tdonohue it would help to clarify if instead of:
/api/integration/metadata-suggestions/orcid/entryValues/0000-0002-4271-0436/changes

a different end-point was proposed (Changes for a single entry)?
/api/integration/metadata-suggestions/orcid/entryValuesChanges/0000-0002-4271-0436

@tdonohue
Copy link
Member

tdonohue commented Dec 3, 2019

I talked with @benbosman and Lieven about this further today. I suggested the following:

  1. First, I like @paulo-graca's suggestion above to move the /changes subresource into a separate resource path altogether...as it serves a different purpose than entryValues. So, creating a separate /entryValueChanges (or similar) path seems more appropriate to me.
  2. Second, I asked whether we might want to consider having this path return more of a traditional diff style response (e.g. think of a commandline diff or a GitHub diff, where the response just displays where differences exist... and that would allow the UI to determine what sort of JSON Patch request it wants to create (based on user input regarding which changes the user wants to keep or discard from the diff). I suggested this as I'm still not certain that the REST API can always accurately determine which types of changes a User might want to make based on the external source (e.g. as noted previously, the REST API may not be able to determine whether a different dc.contributor.author should be a replace request or an add request, whereas a user can usually tell whether two different author names likely represent the same author or not.)
    • @benbosman is going to talk with @artlowel for feedback on which type of response is easier for the Angular UI.
    • If Ben & Art agree that the JSON Patch formatted response is ideal, I'm willing to live with this endpoint response "as-is" for now. We can always revisit this later on during implementation if it needs future enhancements.

@benbosman
Copy link
Member Author

benbosman commented Dec 5, 2019

@paulo-graca I've updated the endpoint based on your suggestion

@tdonohue I've discussed the proposal with Art, and have changed it to be more focussed towards what the difference is while still ensuring the REST does the heavy lifting. With this implementation, the REST can suggest multiple ways to change the metadata, and the user can determine the most applicable solution (while REST can still suggest a sensible default as the first option). This new format will still keep it easy for Angular to build the actual PATCH based on the user's selected action

I've also adjusted the workspaceitem POST to be in line with https://github.com/DSpace/Rest7Contract/blob/master/items.md#creating-an-archived-item-from-an-external-source

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : I like these new changes! I think this is looking good. I only have minor feedback/tweaks to suggest before I think this can be merged.

@paulo-graca : If you have a chance, could you give this a quick look again to be sure you approve of the new direction here? (I know you previously approved, but as there's been some significant rework, it'd be good to ensure it still looks good to you.)

metadata-suggestions.md Outdated Show resolved Hide resolved
metadata-suggestions.md Outdated Show resolved Hide resolved
metadata-suggestions.md Show resolved Hide resolved
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for the updates, @benbosman . This looks good to me overall. I did clarify one minor request (which would be nice to still fix) in this comment. But, I approve of this direction.

@benbosman
Copy link
Member Author

Thanks @tdonohue, I've applied that change

@benbosman
Copy link
Member Author

Merging since this has 2 approvals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants