Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVOC : Indexed field accuracy (Ontoportal integration) #10505

Conversation

luddaniel
Copy link
Contributor

Following agreements on next steps of #10145

What this PR does / why we need it:

This PR allows admins to specify which indexed fields must be use for a :CVocConf configured service.

Suggestions on how to test this:

Related documentation : gdcc/dataverse-external-vocab-support#21
Ontoportal CVocConf example : cvoc-agroportal-ontoportal-conf.json

Default behaviour (with indexIn):

Screenshot from 2024-04-18 16-31-45

With indexIn:

Screenshot from 2024-04-18 16-28-38

Does this PR introduce a user interface change?
Indirectly, it changes search result snippet.

…iguration + handle ontoportal json formats from externalvocabularyvalue
@luddaniel luddaniel changed the title 9276 - CVOC : Indexed field accuracy (Ontoportal integration) CVOC : Indexed field accuracy (Ontoportal integration) Apr 18, 2024
@coveralls
Copy link

coveralls commented Apr 18, 2024

Coverage Status

coverage: 20.65% (+0.08%) from 20.574%
when pulling 4a6d3e4 on Recherche-Data-Gouv:9276-allow-mapping-of-indexable-fields-in-cvoc-conf
into 5bf6b6d on IQSS:develop.


### Updates on Support for External Vocabulary Services

#### Indexed field accuracy
Copy link
Member

@qqmyers qqmyers Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:

Updates on Support for External Vocabulary Services

Multiple extensions of the External Vocabulary mechanism have been added. These extensions allow interaction with services based on the Ontoportal software and are expected to be generally useful for other service types.

These changes include:

Improved Indexing with Compound Fields

When using an external vocabulary service with compound fields, you can now specify which field(s) will include additional indexed information, such as translations of an entry into other languages. This is done by adding the indexIn in retrieval-filtering.
For more information, please check GDCC/dataverse-external-vocab-support documentation.

Broader Support for Indexing Service Responses

Indexing of the results from 'retrieval-filtering' responses can now handle additional formats including Json Arrays of Strings and values from arbitrary keys within a JSON Object.

(+ entries from the other Ontoportal PRs)

for (JsonString elm : childFields.getValuesAs(JsonString.class)) {
dft = findByNameOpt(elm.getString());
logger.info("Found: " + dft.getName());
if (jo.containsKey("managed-fields")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming - this was a bug? - the schema requires managed-fields these days

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug, I would say old code, child-fields does not exists anymore so I replaced it with managed-fields. I think this code is not useful but at least the key is not obsolete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - looks like it is just for logging purposes/to warn about incorrect field choices.

+ elm.getString());
+ managedFields.getString(s));
} else {
logger.info("Found: " + dft.getName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change logger.info() to fine()

*
* Retrieves indexable strings from a cached externalvocabularyvalue entry filtered through retrieval-filtering configuration.
* <p>
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject.
Copy link
Member

@qqmyers qqmyers Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:
This method assumes externalvocabularyvalue entries have been filtered and that they contain a single JsonObject.

* Retrieves indexable strings from a cached externalvocabularyvalue entry filtered through retrieval-filtering configuration.
* <p>
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject.
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value.
Copy link
Member

@qqmyers qqmyers Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change: Is handled : - > Cases Handled : A String, an Array of Strings, an Array of Objects with "value" or "content" keys, an Object with one or more entries that have String values or Array values with a set of String values.

(Trying to wordsmith as well as removing the "lang" requirement which isn't in the code.)

* <p>
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject.
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value.
* The string, or the "value/content"s for each language are added to the set.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:
The string(s), or ...

* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject.
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value.
* The string, or the "value/content"s for each language are added to the set.
* This method can retrieve string values to be indexed in term-uri-field (parameter defined in CVOC configuration) or in "indexIn" field (optional parameter of retrieval-filtering defined in CVOC configuration).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:
Retrieved string values are indexed in the term-uri-field (parameter defined in CVOC configuration) by default, or in the field specified by an optional "indexIn" parameter in the retrieval-filtering defined in the CVOC configuration).

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I suggested some doc changes and noted one place where a logger.info should be dropped to a fine(). Otherwise the code and tests looks good.

W.r.t. docs - across all of these PR, it seems like adding 'services based on Ontoportal' to the list of SkosMos and ORCID in https://guides.dataverse.org/en/latest/admin/metadatacustomization.html#using-external-vocabulary-services would be good. Maybe adding 'how indexing is handled' to the sentence starting with "Configuration involves " as well.

@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Apr 23, 2024
Copy link
Contributor Author

@luddaniel luddaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All documentation suggestions had been taken in account.

for (JsonString elm : childFields.getValuesAs(JsonString.class)) {
dft = findByNameOpt(elm.getString());
logger.info("Found: " + dft.getName());
if (jo.containsKey("managed-fields")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug, I would say old code, child-fields does not exists anymore so I replaced it with managed-fields. I think this code is not useful but at least the key is not obsolete.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

luddaniel added a commit to Recherche-Data-Gouv/dataverse that referenced this pull request May 14, 2024
@sekmiller sekmiller self-assigned this Jun 10, 2024
@sekmiller
Copy link
Contributor

@luddaniel can you please update this branch with the latest from develop. There are merge conflicts. Thanks!

@sekmiller sekmiller merged commit ad58f3e into IQSS:develop Jun 13, 2024
12 checks passed
@pdurbin pdurbin added this to the 6.3 milestone Jul 3, 2024
@luddaniel luddaniel deleted the 9276-allow-mapping-of-indexable-fields-in-cvoc-conf branch September 19, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: 🚀 Done (Recherche Data Gouv)
Development

Successfully merging this pull request may close these issues.

6 participants