Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entities support for external authority sources #74

Merged
merged 13 commits into from
Oct 15, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 34 additions & 15 deletions authorities.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ Example:
{
"id": "srsc",
"name": "srsc",
"scrollable": false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this flag was intent to be used by authorities that allow scrolling of all values without providing a filter query. It is needed by the dropdown, the current implementation doesn't use this information but I still see a reason for it (it could detect misconfiguration when a dropdown is used with a not compatibile authority and maybe allow when working in suggest mode to automatically propose suggestion for scrollable authorities without user input when the input receive the focus)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abollini This flag is however not functional at all. It's never set to true, and the dropdowns are currently broken.
Since the feature is not documented, not functional, and has usability issues compared to DSpace 6, it's best to have it removed unless there's an intention to rectify this in the near future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : I'm confused why this PR is still modifying the authorities REST Contract, as I thought the decision was to change this PR to only create the Read-Only external-sources endpoint.

"hierarchical": true,
"type": "authority",
"_links": {
Expand All @@ -32,7 +31,6 @@ Example:
{
"id": "common_types",
"name": "common_types",
"scrollable": false,
"hierarchical": false,
"type": "authority",
"_links": {
Expand All @@ -50,7 +48,6 @@ Example:
{
"id": "common_iso_languages",
"name": "common_iso_languages",
"scrollable": false,
"hierarchical": false,
"type": "authority",
"_links": {
Expand Down Expand Up @@ -89,7 +86,6 @@ Provide detailed information about a specific authority. The JSON response docum
{
"id": "srsc",
"name": "srsc",
"scrollable": false,
"hierarchical": true,
"type": "authority"
}
Expand Down Expand Up @@ -125,22 +121,37 @@ sample for an authority /server/api/integration/authorities/common_types/entries
"id": "Dataset",
"display": "Dataset",
"value": "Dataset",
"otherInformation": {},
"type": "authority"
"metadata": {},
"type": "authority",
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/common_types/entryValues/Dataset"
}
}
},
{
"id": "Image, 3-D",
"display": "Image, 3-D",
"value": "Image, 3-D",
"otherInformation": {},
"type": "authority"
"metadata": {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata has a special meaning in dspace so this could be confusing. The idea here is that the authority can expose raw information not necessary "metadata" and to deal with them additional knowledge is usually required to the client.

An example of additional information that could result useful and is not a metadata are metrics value, number of items in the repository or in general number of publications for a researcher etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information being included here exactly matches the definition of metadata. It can also be represented similar to item metadata since the information currently included here is e.g.:

  • An ORCID identifier
  • A description
  • A parent reference
    By ensuring this is also metadata, the UI can render the search results in a manner consistent with other lists

The information you're referencing here is not included in the authority functionality and don't seem to be relevant to include in an authority record.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : As above, I'm confused why this PR is still modifying the authorities REST Contract, as I thought the decision was to change this PR to only create the Read-Only external-sources endpoint. Maybe we should move this discussion to a separate PR to update the authorities endpoint based on current behavior?

"type": "authority",
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/common_types/entryValues/Image%2C%203-D"
}
}
},
{
"id": "Book",
"display": "Book",
"value": "Book",
"otherInformation": {},
"type": "authority"
"metadata": {},
"type": "authority",
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/common_types/entryValues/Book"
}
}
}
]
},
Expand Down Expand Up @@ -176,25 +187,33 @@ sample for a hierarchical authority (srsc): /server/api/integration/authorities
"id": "VR131402",
"display": "Research Subject Categories::SOCIAL SCIENCES::Social sciences::Social work::Family research",
"value": "Research Subject Categories::SOCIAL SCIENCES::Social sciences::Social work::Family research",
"otherInformation": {
"metadata": {
"parent": "SCB1314",
"note": "Familjeforskning"
},
"type": "authority",
"_links": {
"https://dspace7-internal.atmire.com/server/api/integration/authorities/srsc/entryValues/SCB1314": {
"href": "parent"
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/srsc/entryValues/VR131402"
},
"parent": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/srsc/entryValues/SCB1314"
}
}
},
{
"id": "ResearchSubjectCategories",
"display": "Research Subject Categories",
"value": "Research Subject Categories",
"otherInformation": {
"metadata": {
"note": "Ämneskategorier för vetenskapliga publikationer"
},
"type": "authority"
"type": "authority",
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/srsc/entryValues/ResearchSubjectCategories"
}
}
}
]
},
Expand Down
283 changes: 283 additions & 0 deletions external-authority-sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
# External sources Endpoints
[Back to the list of all defined endpoints](endpoints.md)

## Main Endpoint
**/api/integration/externalsources**
tdonohue marked this conversation as resolved.
Show resolved Hide resolved

Provide access to the configured external sources. It returns the list of existent external sources.

Example:
```json
{
"_embedded": {
"externalsources": [
{
"id": "orcid",
"name": "orcid",
"hierarchical": false,
"type": "authority",
"_links": {
"entryValues": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid/entryValues"
},
"entries": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid/entries"
},
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid"
}
}
},
{
"id": "ciencia",
"name": "ciencia",
"hierarchical": false,
"type": "authority",
"_links": {
"entryValues": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/ciencia/entryValues"
},
"entries": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/ciencia/entries"
},
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/ciencia"
}
}
},
{
"id": "my_staff_db",
"name": "my_staff_db",
"hierarchical": false,
"type": "authority",
"_links": {
"entryValues": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/my_staff_db/entryValues"
},
"entries": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/my_staff_db/entries"
},
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/my_staff_db"
}
}
}
]
},
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources"
}
},
"page": {
"size": 20,
"totalElements": 3,
"totalPages": 1,
"number": 0
}
}
```

## Single Authority
**/api/integration/externalsources/<:authority-name>**
benbosman marked this conversation as resolved.
Show resolved Hide resolved

Provide detailed information about a specific external source. The JSON response document is as follow
```json
{
"id": "orcid",
"name": "orcid",
"hierarchical": false,
"type": "authority"
}
```

Exposed links:
* entries: the list of values managed by the external source
* entryValues: the endpoint to retrieve a single value

## Linked entities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, these Entities differs from what we have being calling Entities (the Enhanced Items).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does indeed differ, most rest contracts contain a "Linked entities" section to detail the other REST endpoints linked from the main REST endpoint

### external source entries
**/api/integration/externalsources/<:authority-name>/entries**
benbosman marked this conversation as resolved.
Show resolved Hide resolved

It returns the filtered entries managed by the externally, see below

The supported parameters are:
* page, size [see pagination](README.md#Pagination) if supported by the external source
* metadata: the metadata field for which the authority is used: mandatory
* query: the terms, keywords or prefix to search: mandatory
* parent: the key of the parent authority when searching in a hierarchical authority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify for others, this set of parameters (which allow for searching/filtering external sources) is one of the key differences with the /authorities endpoint. Internal Authorities require filtering by metadata field, while these external sources may have different needs/requirements (as they are all different APIs), so this set of supported parameters is likely to grow (and different external APIs may require different params)


It returns the entries in the external source matching the query

sample for an external source /server/api/integration/externalsources/orcid/entries?metadata=dc.contributor.author&query=Smith&size=2
```json
{
"_embedded": {
"externalSourceEntries": [
{
"id": "Smith, Dean",
"display": "Smith, Dean",
"value": "Smith, Dean",
"metadata": {
"dc.identifier.orcid": "0000-0002-4271-0436",
"dc.identifier.uri": "https://orcid.org/0000-0002-4271-0436",
"dc.contributor.other": "University of Texas Southwestern Medical Center: TX, TX, US",
"person.familyName": "Smith",
"person.givenName": "Dean"
},
"type": "externalSource",
"_links": {
"authority": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/authors/entryValues/d4b5ca88-9d6d-4a87-b905-fef0f8cae26c"
},
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid/entryValues/0000-0002-4271-0436"
}
}
},
{
"id": "Smith, Charles",
"display": "Smith, Charles",
"value": "Smith, Charles",
"metadata": {
"dc.identifier.orcid": "0000-0003-3681-2038",
"dc.identifier.uri": "https://orcid.org/0000-0003-3681-2038",
"dc.contributor.other": "University of Mississippi: University, MS, US",
"person.familyName": "Smith",
"person.givenName": "Charles"
},
"type": "externalSource",
"_links": {
"entity": {
"href": "https://dspace7-internal.atmire.com/server/api/core/item/6fd90bf5-b84f-47b3-aaec-a55bde3a2a5a"
},
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid/entryValues/0000-0003-3681-2038"
}
}
}
]
}
}
```

### single entry
**GET /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>**
benbosman marked this conversation as resolved.
Show resolved Hide resolved

It returns the data from one entry in an external source

sample for an external source /api/integration/externalsources/orcid/entryValues/0000-0002-4271-0436
```json
{
"id": "Smith, Dean",
tdonohue marked this conversation as resolved.
Show resolved Hide resolved
"display": "Smith, Dean",
"value": "Smith, Dean",
"metadata": {
"dc.identifier.orcid": "0000-0002-4271-0436",
"dc.identifier.uri": "https://orcid.org/0000-0002-4271-0436",
"dc.contributor.other": "University of Texas Southwestern Medical Center: TX, TX, US",
"person.familyName": "Smith",
"person.givenName": "Dean"
},
"type": "externalSource",
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid/entryValues/0000-0002-4271-0436"
}
}
}
```

**POST /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>/authority**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires more discussion and it creation should be based on authorities endpoint:
POST /api/integration/authorities

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed this integration internally and we arose to the point where we consider the "Import feature" should only be applicable to Entities. In the Authority context the authority value or key is stored alongside with it's related metadatavalue entry and, after the item is submitted/saved, on server side, an Event Consumer it's executed (https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/authority/indexer/AuthorityConsumer.java) to process it. You will only know what to import or to process at this point. So it doesn't make sense, to us to have here this POST method when we really need it's a PATCH to store the authority value and confidence for a certain metadatavalue https://github.com/DSpace/Rest7Contract/blob/master/metadata-patch.md).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulo-graca This is actually part of the behavior which was discussed to be "problematic" or "confusing" in DSpace 6 and below.

Where in DSpace 6, you'd search on an ORCID author, and it somehow starts importing it into authority without the user understanding what happened, this is now a clear process.

The POST will create the authority value in solr (what used to be part of this consumer which worked with temp IDs). It will return the authority value.
The PATCH on the item metadata will add the returned authority value to the item.

For the user, this is what the import button will do: it will create the authority value and add it to the item. But these two parts got entangled deep in the code in previous versions of DSpace and are fixed here.

Copy link
Member

@tdonohue tdonohue Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more thought here, I'm starting to agree with @paulo-graca . I don't believe this POST endpoint is necessary, as authorities do NOT need to be POSTed (at all) as there is no DSpaceObject to "create" when saving an authority value to an Item. This is because (as Paulo points out), Authorities are saved in the same manner as any other metadata field/value (as the "authority" property and "confidence" property appear on the Metadata fields). So, we should be able to use existing endpoints to save authorities (regardless of whether we call it "import" or not in the UI). For example, again as Paulo notes, PATCH for metadata fields can save authorities already: https://github.com/DSpace/Rest7Contract/blob/master/metadata-patch.md

Additionally, with regards to saving the entry to Solr, I'd recommend we continue to use the existing AuthorityConsumer here. I don't like the idea of having a POST to actually update Solr. We aren't really creating a REST Resource here (which is how POST is supposed to be used)...instead we are adding an authority entry to a metadata field, and then it is being indexed into Solr using the AuthorityConsumer. I'd prefer to keep that same process for DSpace 7 (to avoid changing the current Authority System).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The POST is not included here to replace the PATCH for metadata fields. The POST is included here to update solr. It should respond with the authority value, which is hereafter used in the item using the PATCH for metadata fields.

The reason for including this POST is to avoid using the AuthorityConsumer. The way the AuthorityConsumer works is creating a large amount of complex customizations to support ORCID directly from authority control. With this new external sources functionality, we don't need these customizations anymore and we can directly support REST access to convert an external source to the local authority.
I'd prefer not to extend the entire functionality underneath the AuthorityConsumer which was implemented for converting an ORCID ID to an authority record, and which would need to support any external source. It can be implemented much more cleanly with a separate POST. And this implementation will require a lot less work and is much less error prone

Copy link
Member

@tdonohue tdonohue Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman: While I understand that the AuthorityConsumer is not ideal in this scenario, I'm hesitant for the Entities Working Group to change the behavior of the Authority backend, as that is outside the scope of our working group. If we feel strongly that we must go this route, I feel we then must propose this to the broader DSpace 7 Working Group.

Overall, I still feel that having a POST generate an update to Solr is not an ideal approach in that it's not aligned with REST Best practices. POST commands are supposed to create resources (usually objects). But, at this time, because of known flaws with how Authorities currently work, DSpace doesn't store an Authority "resource" in the database...instead DSpace stores the Authority in a metadata field, and links to an entry in a Solr index.

I agree that, longer term, we should rethink this current approach to storing Authorities. They likely should be managed in the database more formally. However, that is out-of-scope for DSpace 7...and I think this POST command only is "valid" once we fix the behavior in how Authorities are stored.

This is why I'm not sure this POST belongs in DSpace 7. It could be (re)introduced in a future release (DSpace 8 perhaps) once we find time to redesign Authorities and/or merge Entities and Authorities (if plausible). But, until then, I think we should retain the current behavior of Authorities. If this means that we cannot "import" into Authorities in the same way as we can with Entities, then we should re-scope our plans for DSpace 7 and perhaps (as Paulo suggested above) limit import to Entities only for now.


It creates an authority records from the external source

sample for an external source /api/integration/externalsources/orcid/entryValues/0000-0002-4271-0436/authority
```json
{
"id": "Smith, Dean",
"display": "Smith, Dean",
"value": "Smith, Dean",
"metadata": {
"dc.identifier.orcid": "0000-0002-4271-0436",
"dc.identifier.uri": "https://orcid.org/0000-0002-4271-0436",
"dc.contributor.other": "University of Texas Southwestern Medical Center: TX, TX, US",
"person.familyName": "Smith",
"person.givenName": "Dean"
},
"type": "authority",
"_links": {
"self": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/authorities/authors/entries/d4b5ca88-9d6d-4a87-b905-fef0f8cae26c"
},
"externalsource": {
"href": "https://dspace7-internal.atmire.com/server/api/integration/externalsources/orcid/entryValues/0000-0002-4271-0436"
}
}
}
```

**POST /api/integration/externalsources/<:authority-name>/entryValues/<:entry-id>/entity**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a single point for creating entities and, to me, must be align with @abollini concerns. Meaning entity it should be started on https://github.com/DSpace/Rest7Contract/blob/43ae7d5a3c37359becead13d9ebc43f772f865e6/workflowitems.md#post-method

Copy link
Member

@tdonohue tdonohue Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @paulo-graca that we should avoid having ANY POST commands on this new endpoint. As much as possible, I'd like to align this endpoint with /api/integrations/authorities (in the hopes that maybe we can merge the two in DSpace 8 or beyond). Therefore, I think this externalsources endpoint should behave similar to authorities...it should be a READ-ONLY endpoint with a goal of being able to search/locate an external authority system (via its API) and select one (or more) entries.

My suggestion would be to use the existing POST /api/core/items to create new Entities (or if that endpoint is insufficient, we could create a new /api/core/entities endpoint / resource). https://github.com/DSpace/Rest7Contract/blob/master/items.md#creating-an-archived-item

However, there are two ways in which we could use POST /api/core/items, depending on whether we want this Entity creation to occur more on the client or server side.

  • [Option 1: Use existing endpoint as-is] In this option, the Client side must do a bit more of the "lifting" here, as it is expected to first use GET /externalsources to locate a specific entry-id, and then call POST /items, passing in all (or some) of the metadata fields returned for that entry-id.
  • [Option 2: Do mapping on server-side] In this option, we'd enhance the POST /items endpoint to also optionally accept a text/uri-list. In this scenario, once the client has found a specific entry-id (again using GET /externalsources), it'd simply pass the full URI of that entry-id (e.g. https://dspace7.4science.cloud/server/api/integrations/externalsources/entries/<:entry-id>) to the POST /items. Then, on the server side, a second request would be made to that URI to gather any metadata and "map" it into a new Entity. This behavior is somewhat similar to how we create a WorkflowItem from a WorkspaceItem (by passing in a uri-list): https://github.com/DSpace/Rest7Contract/blob/cb86c89bb6918128f4ab3a6d2c5e2ccb835776be/workflowitems.md#post-method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two options can also support creating the new item. I would prefer the second in that case because:

  • The former will be very complicated for the UI
  • The https://github.com/DSpace/Rest7Contract/blob/master/workflowitems.md#post-method has to be fixed to support a URI-list for defining the collection anyway (one of the current problems when starting a new submission). So adding the URI of the entry id at the same time and include the metadata would be more straightforward.
  • There may be some metadata fields required for the external source which can't be entered using the submission forms (e.g. you'd need to store the source and the ID in that source in dedicated fields so it can be verified later whether there's an entity which matches this external record exactly)


It creates an entity from the external source

sample for an external source /api/integration/externalsources/orcid/entryValues/0000-0002-4271-0436/entity
```json
{
"uuid": "83914286-666b-450c-9c42-0d276b30c2f2",
"name": "Smith, Dean",
"handle": "10673/20",
"metadata": {
"dc.identifier.orcid": [
{
"value": "0000-0002-4271-0436",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dc.identifier.uri": [
{
"value": "https://orcid.org/0000-0002-4271-0436",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dc.contributor.other": [
{
"value": "University of Texas Southwestern Medical Center: TX, TX, US",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"person.familyName": [
{
"value": "Smith",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"person.givenName": [
{
"value": "Dean",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"inArchive": true,
"discoverable": true,
"withdrawn": false,
"lastModified": "2019-09-02T00:40:54.970+0000",
"type": "item"
}
```