-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Entities support for external authority sources #74
Conversation
…-authority-sources
improving authority control contract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @benbosman for your work and important contribution for this discussion. As I see, we need to discuss about what is proposed. I think "external-authority-sources" is a redundant solution, one could achieve the same using "authorities" and thoroughly discuss about the "POST" methods. Inviting @abollini for that discussion.
external-authority-sources.md
Outdated
} | ||
``` | ||
|
||
**POST /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>/entity** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a single point for creating entities and, to me, must be align with @abollini concerns. Meaning entity it should be started on https://github.com/DSpace/Rest7Contract/blob/43ae7d5a3c37359becead13d9ebc43f772f865e6/workflowitems.md#post-method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @paulo-graca that we should avoid having ANY POST
commands on this new endpoint. As much as possible, I'd like to align this endpoint with /api/integrations/authorities
(in the hopes that maybe we can merge the two in DSpace 8 or beyond). Therefore, I think this externalsources
endpoint should behave similar to authorities
...it should be a READ-ONLY endpoint with a goal of being able to search/locate an external authority system (via its API) and select one (or more) entries.
My suggestion would be to use the existing POST /api/core/items
to create new Entities (or if that endpoint is insufficient, we could create a new /api/core/entities
endpoint / resource). https://github.com/DSpace/Rest7Contract/blob/master/items.md#creating-an-archived-item
However, there are two ways in which we could use POST /api/core/items
, depending on whether we want this Entity creation to occur more on the client or server side.
- [Option 1: Use existing endpoint as-is] In this option, the Client side must do a bit more of the "lifting" here, as it is expected to first use
GET /externalsources
to locate a specificentry-id
, and then callPOST /items
, passing in all (or some) of the metadata fields returned for thatentry-id
. - [Option 2: Do mapping on server-side] In this option, we'd enhance the
POST /items
endpoint to also optionally accept atext/uri-list
. In this scenario, once the client has found a specificentry-id
(again usingGET /externalsources
), it'd simply pass the full URI of thatentry-id
(e.g.https://dspace7.4science.cloud/server/api/integrations/externalsources/entries/<:entry-id>
) to thePOST /items
. Then, on the server side, a second request would be made to that URI to gather any metadata and "map" it into a new Entity. This behavior is somewhat similar to how we create aWorkflowItem
from aWorkspaceItem
(by passing in a uri-list): https://github.com/DSpace/Rest7Contract/blob/cb86c89bb6918128f4ab3a6d2c5e2ccb835776be/workflowitems.md#post-method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two options can also support creating the new item. I would prefer the second in that case because:
- The former will be very complicated for the UI
- The https://github.com/DSpace/Rest7Contract/blob/master/workflowitems.md#post-method has to be fixed to support a URI-list for defining the collection anyway (one of the current problems when starting a new submission). So adding the URI of the entry id at the same time and include the metadata would be more straightforward.
- There may be some metadata fields required for the external source which can't be entered using the submission forms (e.g. you'd need to store the source and the ID in that source in dedicated fields so it can be verified later whether there's an entity which matches this external record exactly)
external-authority-sources.md
Outdated
} | ||
``` | ||
|
||
**POST /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>/authority** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this requires more discussion and it creation should be based on authorities endpoint:
POST /api/integration/authorities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed this integration internally and we arose to the point where we consider the "Import feature" should only be applicable to Entities. In the Authority context the authority value or key is stored alongside with it's related metadatavalue entry and, after the item is submitted/saved, on server side, an Event Consumer it's executed (https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/authority/indexer/AuthorityConsumer.java) to process it. You will only know what to import or to process at this point. So it doesn't make sense, to us to have here this POST method when we really need it's a PATCH to store the authority value and confidence for a certain metadatavalue https://github.com/DSpace/Rest7Contract/blob/master/metadata-patch.md).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulo-graca This is actually part of the behavior which was discussed to be "problematic" or "confusing" in DSpace 6 and below.
Where in DSpace 6, you'd search on an ORCID author, and it somehow starts importing it into authority without the user understanding what happened, this is now a clear process.
The POST will create the authority value in solr (what used to be part of this consumer which worked with temp IDs). It will return the authority value.
The PATCH on the item metadata will add the returned authority value to the item.
For the user, this is what the import button will do: it will create the authority value and add it to the item. But these two parts got entangled deep in the code in previous versions of DSpace and are fixed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more thought here, I'm starting to agree with @paulo-graca . I don't believe this POST
endpoint is necessary, as authorities do NOT need to be POST
ed (at all) as there is no DSpaceObject to "create" when saving an authority value to an Item. This is because (as Paulo points out), Authorities are saved in the same manner as any other metadata field/value (as the "authority" property and "confidence" property appear on the Metadata fields). So, we should be able to use existing endpoints to save authorities (regardless of whether we call it "import" or not in the UI). For example, again as Paulo notes, PATCH for metadata fields can save authorities already: https://github.com/DSpace/Rest7Contract/blob/master/metadata-patch.md
Additionally, with regards to saving the entry to Solr, I'd recommend we continue to use the existing AuthorityConsumer
here. I don't like the idea of having a POST
to actually update Solr. We aren't really creating a REST Resource here (which is how POST
is supposed to be used)...instead we are adding an authority entry to a metadata field, and then it is being indexed into Solr using the AuthorityConsumer
. I'd prefer to keep that same process for DSpace 7 (to avoid changing the current Authority System).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The POST is not included here to replace the PATCH for metadata fields. The POST is included here to update solr. It should respond with the authority value, which is hereafter used in the item using the PATCH for metadata fields.
The reason for including this POST is to avoid using the AuthorityConsumer
. The way the AuthorityConsumer
works is creating a large amount of complex customizations to support ORCID directly from authority control. With this new external sources functionality, we don't need these customizations anymore and we can directly support REST access to convert an external source to the local authority.
I'd prefer not to extend the entire functionality underneath the AuthorityConsumer
which was implemented for converting an ORCID ID to an authority record, and which would need to support any external source. It can be implemented much more cleanly with a separate POST. And this implementation will require a lot less work and is much less error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman: While I understand that the AuthorityConsumer
is not ideal in this scenario, I'm hesitant for the Entities Working Group to change the behavior of the Authority backend, as that is outside the scope of our working group. If we feel strongly that we must go this route, I feel we then must propose this to the broader DSpace 7 Working Group.
Overall, I still feel that having a POST
generate an update to Solr is not an ideal approach in that it's not aligned with REST Best practices. POST
commands are supposed to create resources (usually objects). But, at this time, because of known flaws with how Authorities currently work, DSpace doesn't store an Authority "resource" in the database...instead DSpace stores the Authority in a metadata field, and links to an entry in a Solr index.
I agree that, longer term, we should rethink this current approach to storing Authorities. They likely should be managed in the database more formally. However, that is out-of-scope for DSpace 7...and I think this POST
command only is "valid" once we fix the behavior in how Authorities are stored.
This is why I'm not sure this POST
belongs in DSpace 7. It could be (re)introduced in a future release (DSpace 8 perhaps) once we find time to redesign Authorities and/or merge Entities and Authorities (if plausible). But, until then, I think we should retain the current behavior of Authorities. If this means that we cannot "import" into Authorities in the same way as we can with Entities, then we should re-scope our plans for DSpace 7 and perhaps (as Paulo suggested above) limit import to Entities only for now.
* entries: the list of values managed by the external source | ||
* entryValues: the endpoint to retrieve a single value | ||
|
||
## Linked entities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, these Entities differs from what we have being calling Entities (the Enhanced Items).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does indeed differ, most rest contracts contain a "Linked entities" section to detail the other REST endpoints linked from the main REST endpoint
@paulo-graca I understand you missed the meeting which discussed the explanation of separating an authority lookup from importing data from an external source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : I've done a much deeper analysis of this PR and the existing authorities
implementation. I've tried to summarize my thoughts below. Overall (after looking at the current implementations), I'm now realizing you are correct that we need a new endpoint here. However, I agree with @paulo-graca that this new endpoint should be READ-ONLY, and we should look to enhance existing POST
commands on other related endpoints (more on that inline below). I also want to AVOID touching/changing the current Authority Control backend...I think any improvements to Authority Control need to be left for a future release (DSpace 8 or beyond).
I'm glad to talk through this further this week as needed...just ping me on Slack if you want to talk through these comments.
The supported parameters are: | ||
* page, size [see pagination](README.md#Pagination) if supported by the external source | ||
* query: the terms, keywords or prefix to search: mandatory | ||
* parent: the key of the parent authority when searching in a hierarchical authority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify for others, this set of parameters
(which allow for searching/filtering external sources) is one of the key differences with the /authorities
endpoint. Internal Authorities require filtering by metadata field, while these external sources may have different needs/requirements (as they are all different APIs), so this set of supported parameters is likely to grow (and different external APIs may require different params)
external-authority-sources.md
Outdated
} | ||
``` | ||
|
||
**POST /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>/authority** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more thought here, I'm starting to agree with @paulo-graca . I don't believe this POST
endpoint is necessary, as authorities do NOT need to be POST
ed (at all) as there is no DSpaceObject to "create" when saving an authority value to an Item. This is because (as Paulo points out), Authorities are saved in the same manner as any other metadata field/value (as the "authority" property and "confidence" property appear on the Metadata fields). So, we should be able to use existing endpoints to save authorities (regardless of whether we call it "import" or not in the UI). For example, again as Paulo notes, PATCH for metadata fields can save authorities already: https://github.com/DSpace/Rest7Contract/blob/master/metadata-patch.md
Additionally, with regards to saving the entry to Solr, I'd recommend we continue to use the existing AuthorityConsumer
here. I don't like the idea of having a POST
to actually update Solr. We aren't really creating a REST Resource here (which is how POST
is supposed to be used)...instead we are adding an authority entry to a metadata field, and then it is being indexed into Solr using the AuthorityConsumer
. I'd prefer to keep that same process for DSpace 7 (to avoid changing the current Authority System).
external-authority-sources.md
Outdated
} | ||
``` | ||
|
||
**POST /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>/entity** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @paulo-graca that we should avoid having ANY POST
commands on this new endpoint. As much as possible, I'd like to align this endpoint with /api/integrations/authorities
(in the hopes that maybe we can merge the two in DSpace 8 or beyond). Therefore, I think this externalsources
endpoint should behave similar to authorities
...it should be a READ-ONLY endpoint with a goal of being able to search/locate an external authority system (via its API) and select one (or more) entries.
My suggestion would be to use the existing POST /api/core/items
to create new Entities (or if that endpoint is insufficient, we could create a new /api/core/entities
endpoint / resource). https://github.com/DSpace/Rest7Contract/blob/master/items.md#creating-an-archived-item
However, there are two ways in which we could use POST /api/core/items
, depending on whether we want this Entity creation to occur more on the client or server side.
- [Option 1: Use existing endpoint as-is] In this option, the Client side must do a bit more of the "lifting" here, as it is expected to first use
GET /externalsources
to locate a specificentry-id
, and then callPOST /items
, passing in all (or some) of the metadata fields returned for thatentry-id
. - [Option 2: Do mapping on server-side] In this option, we'd enhance the
POST /items
endpoint to also optionally accept atext/uri-list
. In this scenario, once the client has found a specificentry-id
(again usingGET /externalsources
), it'd simply pass the full URI of thatentry-id
(e.g.https://dspace7.4science.cloud/server/api/integrations/externalsources/entries/<:entry-id>
) to thePOST /items
. Then, on the server side, a second request would be made to that URI to gather any metadata and "map" it into a new Entity. This behavior is somewhat similar to how we create aWorkflowItem
from aWorkspaceItem
(by passing in a uri-list): https://github.com/DSpace/Rest7Contract/blob/cb86c89bb6918128f4ab3a6d2c5e2ccb835776be/workflowitems.md#post-method
@tdonohue thanks for your in-depth analysis. I'll also try to provide an example of the HAL links I was talking about yesterday. If an item has previously been created with the source set to ORCID and the ORCID ID set to 0000-0003-0809-5991, it would immediately display that author since you can't import it as a new entity. When double-checking the REST contract, I did however notice I forgot to include the HAL links in If we'd not be able to embed a link to the existing authority or entity in How would you approach this part? |
@benbosman : I'd recommend against having an endpoint that looks like So, the path here you've designed is not using best practices of REST. Instead, could we simply consider including as HAL links a direct link to the Entity/object itself? E.g. if the Entity matching this "entry-id" is know to have UUID |
Small changes to the external authority control
Looking to the meeting notes, the inline discussion and the proposed contract I'm still a bit confused about the purpose of introducing a different endpoint for "external authorities". If the need is to use more advanced api provided by ORCID or other external service we should think if this can be generalized so to add this capabilities as additional interface for the authority framework allowing each implementation to pick the interfaces that they can implement. If these cannot be generalized we should provide a dedicated endpoint. If all of that is just related to the "import the external ORCID record in the SOLRAuthority implementation"... oh well as already noted by other in the inline comments I would suggest to keep this out of DSpace 7 and postpone this discussion for DSpace 8. What I recommend here is to describe in the PR description or in a dedicated wiki page the use cases: what we need to do? which is the limitation of the current REST API that we want to sort out? Please keep in mind that authorities are for their nature "external", the SOLRAuthority implementation is just one of the implementation. |
authorities.md
Outdated
@@ -14,7 +14,6 @@ Example: | |||
{ | |||
"id": "srsc", | |||
"name": "srsc", | |||
"scrollable": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this flag was intent to be used by authorities that allow scrolling of all values without providing a filter query. It is needed by the dropdown, the current implementation doesn't use this information but I still see a reason for it (it could detect misconfiguration when a dropdown is used with a not compatibile authority and maybe allow when working in suggest mode to automatically propose suggestion for scrollable authorities without user input when the input receive the focus)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abollini This flag is however not functional at all. It's never set to true, and the dropdowns are currently broken.
Since the feature is not documented, not functional, and has usability issues compared to DSpace 6, it's best to have it removed unless there's an intention to rectify this in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : I'm confused why this PR is still modifying the authorities
REST Contract, as I thought the decision was to change this PR to only create the Read-Only external-sources
endpoint.
authorities.md
Outdated
}, | ||
{ | ||
"id": "Image, 3-D", | ||
"display": "Image, 3-D", | ||
"value": "Image, 3-D", | ||
"otherInformation": {}, | ||
"type": "authority" | ||
"metadata": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata has a special meaning in dspace so this could be confusing. The idea here is that the authority can expose raw information not necessary "metadata" and to deal with them additional knowledge is usually required to the client.
An example of additional information that could result useful and is not a metadata are metrics value, number of items in the repository or in general number of publications for a researcher etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The information being included here exactly matches the definition of metadata. It can also be represented similar to item metadata since the information currently included here is e.g.:
- An ORCID identifier
- A description
- A parent reference
By ensuring this is also metadata, the UI can render the search results in a manner consistent with other lists
The information you're referencing here is not included in the authority functionality and don't seem to be relevant to include in an authority record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : As above, I'm confused why this PR is still modifying the authorities
REST Contract, as I thought the decision was to change this PR to only create the Read-Only external-sources
endpoint. Maybe we should move this discussion to a separate PR to update the authorities
endpoint based on current behavior?
…authority control can be kept separate from this PR Adding sample HAL link to an entity
@tdonohue Since the solution to create the entity and to create the authority control value is still under discussion, I've removed these from the contract so we at least have a read-only contract to work with. We'll discuss the options and PRO/CON in more detail in the entities meeting I assume that the HAL link to reference the entity is the part https://github.com/DSpace/Rest7Contract/pull/74/files#diff-1f0dda62a7e6e5d84bb011254146955bR203 in the REST contract. Can you confirm that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work, with the added comments it's now more clear to me than what it was before. Thank you @benbosman and @tdonohue for this clarification! I now understand why we need both authorities and externalsources endpoints separated. I've just some minor enhancement suggestions and to me, if others agree, this PR can be merged.
"id": "orcid", | ||
"name": "orcid", | ||
"hierarchical": false, | ||
"type": "externalsource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a "metadata" attribute should be available here. In case you need to describe in more detail each source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From how I understand this based on the chat, this would be a single property. This may perhaps be better named "description" to reserve the term "metadata" for something where you have field names and values (e.g. https://github.com/DSpace/Rest7Contract/pull/74/files#diff-1f0dda62a7e6e5d84bb011254146955bR110)
It seems like a limited amount of work to have a configuration property for each source to describe it. Alternatively, this could be considered a UI aspect where internationalization labels are used to add detail per source (which would also ensure it can be displayed in the user's preferred language)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a limited amount of work to have a configuration property for each source to describe it. Alternatively, this could be considered a UI aspect where internationalization labels are used to add detail per source (which would also ensure it can be displayed in the user's preferred language)
But, considering we are supplying this source through REST API, that "description" should be implemented in all REST clients. But I'm ok with having it on the angular side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't pay much attention to @abollini comments but I think he raised some important questions. As @tdonohue referred, we might get back to this later.
Thank you @benbosman for your patience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readonly endpoint is fine, now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : This is looking better overall. Though, I suspect we need to remove the changes to authorities
altogether -- I'm not sure I understand why they relate to Entities support for external sources. Also, I've got a question below around whether each external source will have a different metadata
response, or if we will pre-map them to a DSpace schema?
authorities.md
Outdated
@@ -14,7 +14,6 @@ Example: | |||
{ | |||
"id": "srsc", | |||
"name": "srsc", | |||
"scrollable": false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : I'm confused why this PR is still modifying the authorities
REST Contract, as I thought the decision was to change this PR to only create the Read-Only external-sources
endpoint.
authorities.md
Outdated
}, | ||
{ | ||
"id": "Image, 3-D", | ||
"display": "Image, 3-D", | ||
"value": "Image, 3-D", | ||
"otherInformation": {}, | ||
"type": "authority" | ||
"metadata": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbosman : As above, I'm confused why this PR is still modifying the authorities
REST Contract, as I thought the decision was to change this PR to only create the Read-Only external-sources
endpoint. Maybe we should move this discussion to a separate PR to update the authorities
endpoint based on current behavior?
Only keeping the fixes to ensure the contract represents the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Thanks for the hard work here @benbosman
A few discussion points to log here for the future:
- As discussed in today's Entities Mtg and last week's meeting we are moving forward with this read-only endpoint as described.
- However, as noted by @abollini in comments above, it's possible that (during implementation) we may find difficulties in "mapping" all external sources to the single, generic
/externalsources
endpoint described in this PR. If we hit issues in implementation, we will revisit this generic endpoint and perhaps choose to replace it with separate endpoints per external source (e.g. an ORCID specific endpoint, etc). However, we hope this generic endpoint will make things easier for REST clients (like the Angular UI) without needing to understand different responses per external source. - Additionally, as a sidenote, the Entities team feels that in DSpace 8 we may want to revisit whether the
/authorities
and/externalsources
endpoints could be merged into a single endpoint. However, to minimize the scope for DSpace 7, we've decided to keep them separate for now. This has been logged as a topic to discuss in the future at https://wiki.duraspace.org/display/DSPACE/DSpace+Release+8.0+Status
This PR documents the REST implementation for supporting external sources, with queries and conversion to local authority and local entities
This is based on the functionality discussed in the meeting https://wiki.duraspace.org/display/DSPACE/2019-08-27+DSpace+7+Entities+WG+Meeting