-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CVOC : Indexed field accuracy (Ontoportal integration) #10505
CVOC : Indexed field accuracy (Ontoportal integration) #10505
Conversation
…iguration + handle ontoportal json formats from externalvocabularyvalue
|
||
### Updates on Support for External Vocabulary Services | ||
|
||
#### Indexed field accuracy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change:
Updates on Support for External Vocabulary Services
Multiple extensions of the External Vocabulary mechanism have been added. These extensions allow interaction with services based on the Ontoportal software and are expected to be generally useful for other service types.
These changes include:
Improved Indexing with Compound Fields
When using an external vocabulary service with compound fields, you can now specify which field(s) will include additional indexed information, such as translations of an entry into other languages. This is done by adding the indexIn
in retrieval-filtering
.
For more information, please check GDCC/dataverse-external-vocab-support documentation.
Broader Support for Indexing Service Responses
Indexing of the results from 'retrieval-filtering' responses can now handle additional formats including Json Arrays of Strings and values from arbitrary keys within a JSON Object.
(+ entries from the other Ontoportal PRs)
for (JsonString elm : childFields.getValuesAs(JsonString.class)) { | ||
dft = findByNameOpt(elm.getString()); | ||
logger.info("Found: " + dft.getName()); | ||
if (jo.containsKey("managed-fields")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming - this was a bug? - the schema requires managed-fields these days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bug, I would say old code, child-fields
does not exists anymore so I replaced it with managed-fields
. I think this code is not useful but at least the key is not obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - looks like it is just for logging purposes/to warn about incorrect field choices.
+ elm.getString()); | ||
+ managedFields.getString(s)); | ||
} else { | ||
logger.info("Found: " + dft.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change logger.info() to fine()
* | ||
* Retrieves indexable strings from a cached externalvocabularyvalue entry filtered through retrieval-filtering configuration. | ||
* <p> | ||
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change:
This method assumes externalvocabularyvalue entries have been filtered and that they contain a single JsonObject.
* Retrieves indexable strings from a cached externalvocabularyvalue entry filtered through retrieval-filtering configuration. | ||
* <p> | ||
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject. | ||
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change: Is handled : - > Cases Handled : A String, an Array of Strings, an Array of Objects with "value" or "content" keys, an Object with one or more entries that have String values or Array values with a set of String values.
(Trying to wordsmith as well as removing the "lang" requirement which isn't in the code.)
* <p> | ||
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject. | ||
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value. | ||
* The string, or the "value/content"s for each language are added to the set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change:
The string(s), or ...
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject. | ||
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value. | ||
* The string, or the "value/content"s for each language are added to the set. | ||
* This method can retrieve string values to be indexed in term-uri-field (parameter defined in CVOC configuration) or in "indexIn" field (optional parameter of retrieval-filtering defined in CVOC configuration). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change:
Retrieved string values are indexed in the term-uri-field (parameter defined in CVOC configuration) by default, or in the field specified by an optional "indexIn" parameter in the retrieval-filtering defined in the CVOC configuration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I suggested some doc changes and noted one place where a logger.info should be dropped to a fine(). Otherwise the code and tests looks good.
W.r.t. docs - across all of these PR, it seems like adding 'services based on Ontoportal' to the list of SkosMos and ORCID in https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#using-external-vocabulary-services would be good. Maybe adding 'how indexing is handled' to the sentence starting with "Configuration involves " as well.
…ntoportal integration) IQSS#10505'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All documentation suggestions had been taken in account.
for (JsonString elm : childFields.getValuesAs(JsonString.class)) { | ||
dft = findByNameOpt(elm.getString()); | ||
logger.info("Found: " + dft.getName()); | ||
if (jo.containsKey("managed-fields")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bug, I would say old code, child-fields
does not exists anymore so I replaced it with managed-fields
. I think this code is not useful but at least the key is not obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
…ntoportal integration) IQSS#10505'
@luddaniel can you please update this branch with the latest from develop. There are merge conflicts. Thanks! |
Following agreements on next steps of #10145
What this PR does / why we need it:
This PR allows admins to specify which indexed fields must be use for a :CVocConf configured service.
Suggestions on how to test this:
Related documentation : gdcc/dataverse-external-vocab-support#21
Ontoportal CVocConf example : cvoc-agroportal-ontoportal-conf.json
Default behaviour (with
indexIn
):With
indexIn
:Does this PR introduce a user interface change?
Indirectly, it changes search result snippet.