Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring authority support / value pairs / controlled vocabularies #128

Merged
merged 13 commits into from
Jul 15, 2020

Conversation

abollini
Copy link
Member

@abollini abollini commented Jun 8, 2020

This PR try to formalize what have been discussed in the wiki https://wiki.lyrasis.org/display/DSPACE/4Science+proposal+to+improve+authority+support+in+DSpace7

to be sure that we have a common understanding to use as base of the implementation rework.
Of course some changes could become necessary (and will be proposed) during the implementation if we found any unforeseen technical issues.

Please note that compared to the latest name/wording proposed on the wiki I have only changes the subpath from
api/integration/vocabularies/<:vocabulary-name>/entries
to
api/integration/vocabularies/<:vocabulary-name>/suggestions

to make clear that the "objects" returned by this endpoint are not necessary resources available in the other endpoint
/api/integration/vocabularyEntries/<:vocabulary-name>:<entry-id>
the suggestions related to "value-pairs" are an example of such case as they don't have a stable ID (and no authority will be stored in the metadata)

@tdonohue
Copy link
Member

tdonohue commented Jun 8, 2020

@abollini : I'm not sure I agree with the decision to rename /entries to be /suggestions. The term /suggestions implies they are not required or can be modified (like the separate feature "metadata suggestions" which is planned for 7.1). Using that term here may add confusion between a "metadata suggestion" and a "vocabulary suggestion". I'd much rather keep the term VocabularyEntries here.

I also don't understand why we couldn't let value-pairs (like common-types or similar) also appear on the /vocabularyEntries endpoint (simply for consistency, and to support future improvements). The "ID" in this situation could just be "common-types:Book" (which is unique as there should be only one entry in "common-types" named "Book"). Keep in mind, I'm NOT suggesting we store this value-pairs ID as an authority in metadata (or use it anywhere) at this time (as we already decided against that). But, it'd establish some consistency between these three types of controlled vocabulary, and it'd allow us to call these all "VocabularyEntries" (which I feel is the right term here).

@abollini
Copy link
Member Author

abollini commented Jun 8, 2020

There are vocabularies that can offer different textual description of the same record all linked to the same entry (it can be a name authority that know the different variants used by the same person, or a vocabulary that manage synonyms, etc). The /api/integration/vocabularyEntries/<:vocabulary-name>:<entry-id> is intended to provide access to an unique record where it is defined. As dropdown doesn't have such "stable ID" we shouldn't have a record here.

If we add a fake ID to the dropdown vocabulary it would become more tricky for the rest client to understand which id are true and which one generated by the system for uniformity.

That said, I would like to keep this difference but I'm ok with renaming suggestions in whatever you find more understable including entries (terms, choices) if you feel comfortable with it

@abollini
Copy link
Member Author

abollini commented Jun 9, 2020

hi @tdonohue I have updated the vocabulary to reflect the re-wording discussed on slack

Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abollini : I gave this a more thorough review. Overall, the concepts look fine, but I still have several inline questions about expected behavior of these endpoints. Nothing major honestly, but a lot of minor cleanup or clarifications are needed.

vocabularies.md Outdated Show resolved Hide resolved
vocabularies.md Outdated Show resolved Hide resolved
vocabularies.md Outdated Show resolved Hide resolved
vocabularies.md Outdated Show resolved Hide resolved
vocabularies.md Outdated Show resolved Hide resolved
vocabularies.md Outdated Show resolved Hide resolved
vocabularies.md Show resolved Hide resolved
vocabularyEntryDetails.md Outdated Show resolved Hide resolved
vocabularyEntryDetails.md Outdated Show resolved Hide resolved
vocabularyEntryDetails.md Outdated Show resolved Hide resolved
@abollini
Copy link
Member Author

@tdonohue the contract now should reflect the agreed changes, all your feedback should have been included

DSpace 7 Beta 3 automation moved this from Ready to review to Reviewer approved Jun 16, 2020
Copy link
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me now. Thanks @abollini !

DSpace 7 Beta 3 automation moved this from Reviewer approved to Ready to review Jun 18, 2020
Copy link
Member

@benbosman benbosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included some inline comments, but there's also quite some English language cleanup necessary. Some of the sentences are really hard to understand given the current wording.

vocabularies.md Outdated Show resolved Hide resolved
* name: see id
* scrollable: if true mean that it is possible to scroll all the entries in the authority without providing a filter parameter, see (vocabulary entries)[vocabularies.md#vocabulary-entries]
* hierarchical: if true means that the vocabulary expose a tree structure where some entries are parent of others
* preloadLevel: for hierarchical vocabularies express the preference to preload the tree at a specific level of depth (0 only the top nodes are shown, 1 also their children are preloaded and so on)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this intended to be used? This doesn't exist in DSpace 6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be used by the Angular UI to open an hierarchical vocabulary with a number of level pre-expanded. This was highly demanded by users in an internal UX test. It is not strictly necessary so if needed we can remove that from the contract and the implementation (that already exists, see screens here DSpace/DSpace#2743 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if I understand it correctly, there are requests for this, and screenshots, but no implementation or config yet?
In that case I think it's best to remove it since we don't want to add more features making the config even more complicated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation is already here because we have implemented it firstly for DSpace-CRIS and in https://jira.lyrasis.org/browse/DS-4475 we have noted that explicitly as this allow to keep the effort for DSpace 7 limited.

About the configuration it is just a single line, with a proper default, similar to existing configuration for the DSpaceControlledVocabulary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see in https://github.com/DSpace/DSpace/pull/2743/files#diff-a60e58808bf42c545ed0dc04a4d6d5f9 there's indeed an open PR to get this in
I'll leave this for @tdonohue whether we allow more features outside the DSpace 6 scope

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll allow this into the contract (especially since an implementation PR exists as of now in DSpace/DSpace#2792). But, I'll want to review the implementation to see the impact. I'm generally OK with it though, as it seems small.

vocabularies.md Outdated

The supported parameters are:
* page, size [see pagination](README.md#Pagination)
* metadata: the metadata field name (e.g. "dc.type") for which the controlled vocabulary is used: mandatory. The entries returned by a vocabulary may be different for different metadata fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can they be different for separate metadata fields but still using the same vocabulary ID? Can you provide a sample?
Otherwise I'd recommend to remove this parameter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this seems to be confusing to Tim and me, I'd suggest to add an example to the contract. This doesn't have to be a full JSON example, but simply state that for the vocabulary-name solr (or what the actual name is), the behavior differs based on the metadata and collection because it's a single vocabulary whose data is retrieved via the dspace.cfg choices.plugin.dc.contributor.author = SolrAuthorAuthority (if that's how this field is determined)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to have included it here 0c57822#diff-1a0f138d702d2d84a2c28904e36e8150R271
or have I misunderstood your request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand it correctly, there are already separate endpoints per solr field. The fact that this vocabulary is named SolrAuthorAuthority does determine which entries to return (and the collection or metadata field still won't impact it)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly. It is named SolrAuthorAuthority because this is the name used in our dspace.cfg examples. It can be configured over more than one metadata and as you can see in the above link to the existing sourcecode (unchanged from previous version) the SolrAuthority will use the metadata information to query the authority core

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the below comment which applies here as well. metadata needs to be a required field because of value-pairs (which require it). So, I think we'll have to keep this as-is for now in the contract. If need be, we can always merge this as-is & I will gladly add more text to describe why both collection & metadata are required at this time -- the best example of both is value-pairs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to add in at least some detail on this line explaining this is currently only applicable for the solr authority, since it's still unclear from this contract.
Value pairs don't differ per metadata field, only per vocabulary-name
That should be a really small change, and this is very obviously an unclear area

vocabularies.md Outdated
The supported parameters are:
* page, size [see pagination](README.md#Pagination)
* metadata: the metadata field name (e.g. "dc.type") for which the controlled vocabulary is used: mandatory. The entries returned by a vocabulary may be different for different metadata fields.
* collection: the uuid of the collection where the item is being submitted: mandatory. The entries returned by a vocabulary may differ per collection based on your submission configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can they be different for separate collections fields but still using the same vocabulary ID? Can you provide a sample?
Otherwise I'd recommend to remove this parameter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dc.type can be configured as a controlled vocabulary in the submission form for a specific collection (form definition) and as free text for another submission. Moreover, the collection parameter is part of the choice interface so some implementation could also return different entries depending on it similar to how the SOLRAuthority is currently using the metadata parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #128 (comment), I assume this currently also only returns different entries for the same vocabulary-name for solr. If that example is included as I requested in #128 (comment), it should be much clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@tdonohue tdonohue Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbosman : I initially had the same confusion over the collection param until I looked at the current (DSpace 6 and below) code implementation. It is possible to configure in input-forms.xml (now submission-forms.xml) a value-pairs that is collection specific (as any submission field can be made to only appear for specific collections, or even have different configurations for different collections -- e.g. one collection could have a "dc.type" that uses value-pairs, while in another "dc.type" can be configured to be free text, or a different set of value-pairs). So, this means value-pairs really are both metadata field & collection specific....you need both to load the proper set of value-pairs.

You'll see in this prior discussion I was the one who suggested the above wording. So, I think that, for now both metadata and collection must be required to work with our current code (especially for value-pairs). In the future though, I fully agree this is a bit odd, and we may want to revisit this in a future release (not 7.0, more like 8.0 unless we find a need to change it earlier).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, https://jira.lyrasis.org/browse/DS-4371 is indeed another example clarifying how the mandatory metadata and collection fields can be misinterpreted, this time the vocabulary was determined based on the parameters instead of the vocabulary name.

Perhaps the parameters should only be set mandatory for authorities (a smaller change compared to completely removing them), since the others are not supposed to use them. That makes their use clearer. Angular is of course still free to include them for non-authorities, assuming REST will ignore them.
By using that in-between step, it's also easier to redesign the authority core to expect a separate name in e.g. 7.1 and remove the parameters completely

Copy link
Member

@tdonohue tdonohue Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abollini : I'm OK with @benbosman's suggestion here. If it's easier for now to just make these params only mandatory for authorities, then that seems like a step in the right direction. I also agree we can just have the Angular team pass these params the time (if desired), but in the REST API they can be ignored except when needed for authorities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, I don't see that as a simplification nor as a step forward the right solution. All the three things value-pairs, vocabularies and authorities need to be managed in the same way. Add an exception here is not the way to go imho. If we don't want to remove the collection/metadata from the authority interfaces we need to keep them also for the other "things" making sure that they are used in a consistent ways (i.e. they match with the specified vocabulary name to avoid the issue DS-4371. This is exactly what is done here in the current implementation https://github.com/DSpace/DSpace/pull/2792/files#diff-151c8c35ac6f56dc3482b82c56b0474cR85 see also the corresponding test https://github.com/DSpace/DSpace/pull/2792/files#diff-379fecc95ee8bd1f6f91349f0cd7e9d3R193-R206 ). Anyway, I'm trying to implement the change to remove completely these params in a parallel branch so that you can check exactly how much changes are needed (it seems less than what I initially suspected)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here the result of my investigation: 4Science/DSpace#157 it looks to me as a better, cleaner approac. Do you agree? should I remove completely the metadata & collection parameters? if you like that I will apply the related change also to this contract 4Science/DSpace#157

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since that change looks simple enough. I'd rather we just remove the metadata/collection parameters. If the three of us are confused by them, they will be even more confusing to others. So, if we can easily remove them now, I'm in favor of doing so.

vocabularies.md Show resolved Hide resolved
}
```

Attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't it planned to also contain a boolean authority?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems possible to avoid to introduce such attribute because the authority will be exposed only where it is liked to store it. In the srsc authority for instance where we need the "id" to allow the hierarchical browsing the authority is NOT included in the vocabularyEntry see
https://github.com/DSpace/Rest7Contract/pull/128/files#diff-1a0f138d702d2d84a2c28904e36e8150R215

https://github.com/DSpace/Rest7Contract/pull/128/files#diff-1a0f138d702d2d84a2c28904e36e8150R224

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For storage that would indeed work, but how can the submission UI determine when to display the lookup button if it's only determined in the returned values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the job of the submission-form definition endpoint. Looking to that I get the chance to update the attribute name to reflect the new wording (controlledVocabulary instead than authority)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're suggesting the submission-form definition endpoint is extended to add an "authority": true to ensure the UI can identify whether the lookup button should be displayed?
Schermafdruk 2020-06-25 12 08 49
Or will it always display the lookup unless the submission form type is dropdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the UI should use the type and the controlledVocabulary attribute in the formField to determine which widget is needed (no real changes on this aspect compared to what is already implemented in the master).
It can be a dropdown, a lookup button or a suggest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the plan for determining the UI would be:

  • If in the submission form type=dropdown => the vocabulary as a dropdown (and hierarchical is not supported)
  • Otherwise, if in vocabulary definition the hierarchical=true => render as a hierarchical browse and suggest
  • Otherwise if there's a vocabulary => render as a suggest
  • Support for lookup is not exposed yet => needs to be handled as a separate ticket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added more details about the submission forms, no changes are proposed here is just to document what exists f7076e5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this functionality. Since this is new in 7.0 and was not documented yet, I wasn't aware of that implementation
I didn't verify whether it supports https://github.com/DSpace/DSpace/blob/master/dspace/config/dspace.cfg#L1507 as well, or only adds the lookup buttons based on the submission-forms. But that's an implementation detail, and not a contract detail.

With this info, I assume the plan for determining the UI would be:

Since this is complicated set of rules, I would recommend this is documented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to provide advice to the UI / client on how this information should be used. I've added suggestions inline in this Contract for how to document what @benbosman noted above. See this inline comment: https://github.com/DSpace/Rest7Contract/pull/128/files#r447073406

vocabularyEntryDetails.md Outdated Show resolved Hide resolved
@abollini
Copy link
Member Author

hi @benbosman I have added more information and example as by your feedback. Thanks

@abollini
Copy link
Member Author

we are reviewing internally the implementation and discovered an inconsistency, the attribute used to hold the array in the json response for linked entries must reflect the subpath see e0de465

@benbosman
Copy link
Member

Thanks @abollini for the updates, I did update the threads and mark as resolved where I didn't see any further issues
These should again be very small things to update

@heathergreerklein heathergreerklein moved this from Needs Reviewers Assigned to Under Review in DSpace 7 Beta 3 Jun 23, 2020
@tdonohue tdonohue added this to the 7.0beta3 milestone Jun 23, 2020
@tdonohue
Copy link
Member

@abollini : it looks like this is nearly ready to go. Just needs minor updates per Ben's feedback from June 18: #128 (review)

@tdonohue tdonohue moved this from Under Review to Reviewer approved in DSpace 7 Beta 3 Jun 24, 2020
@tdonohue
Copy link
Member

tdonohue commented Jun 25, 2020

@benbosman and @abollini : I've added some inline comments above to try and help this along. For these outstanding questions though, they all seem minor & I think the approach in this Contract is "good enough" for now. It gives us what we wanted (which is better backwards compatibility to 6.x) without too much additional major refactoring of authority / value pairs / controlled vocab. (Larger refactors would have to wait for 8.0, when we have more of a chance to rethink these three similar concepts.)

So, if you both agree, I'd like to move this forward as it currently stands. If we need to improve some of the examples, I'd volunteer to do this. I also think some of these questions might be answered in the new implementation PR at DSpace/DSpace#2792

@benbosman
Copy link
Member

Thanks for the update.
This clarifies one of the problems, ensuring it's documented for the future.

There are still a few things which need documentation, but I think I know what the intended behavior is now.
I will review the implementation based on that, but the documentation is of course still required in this PR

submissionforms.md Outdated Show resolved Hide resolved
@tdonohue tdonohue modified the milestones: 7.0beta3, 7.0beta4 Jul 2, 2020
@heathergreerklein heathergreerklein added this to Needs Reviewers Assigned in DSpace 7 Beta 4 via automation Jul 8, 2020
@heathergreerklein heathergreerklein moved this from Needs Reviewers Assigned to Reviewer Approved in DSpace 7 Beta 4 Jul 8, 2020
@heathergreerklein heathergreerklein removed this from Reviewer approved in DSpace 7 Beta 3 Jul 8, 2020
@abollini
Copy link
Member Author

abollini commented Jul 9, 2020

@tdonohue @benbosman I have simplified the contract removing the collection and metadata params as discussed here #128 (comment) this is now also aligned with the REST API PR DSpace/DSpace#2792

@tdonohue tdonohue changed the base branch from master to main July 13, 2020 20:44
Copy link
Member

@benbosman benbosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e/16 Estimate in hours high priority
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants