Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10288 Add keywordTermURI metadata in keyword block #10371

Conversation

stevenferey
Copy link
Contributor

@stevenferey stevenferey commented Mar 13, 2024

What this PR does / why we need it:

To continue on the subject of OntoPortal integration

Which issue(s) this PR closes:

Closes #10288

Special notes for your reviewer:

In addition to the addition of the keywordTermURI field, we added a script allowing administrators to optionally migrate data from keywordValue to keywordTermURI when the data starts with "http".

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

  • The keyword metadata block is modified:

Capture d’écran du 2024-02-28 17-51-00

  • Before data migration, here is an example with a keywordValue containing data starting with "http" :
    Capture d’écran du 2024-03-13 10-48-35

  • After optional data migration :
    Capture d’écran du 2024-03-13 10-48-52

Is there a release notes update needed for this change?:

Yes, a proposal is visible in the file doc/release-notes/10288-add-term_uri-metadata-in-keyword-block.md

@jggautier
Copy link
Contributor

Heya @stevenferey. Have you had a chance to check out the guidelines for metadata text at https://docs.google.com/document/d/1tY5t3gjrIgAGoRxVMWQSCh46fnbSmnFDLQ7aLkNLhJ8?

datasetfieldtype.keywordValue.title=Term
datasetfieldtype.keywordVocabulary.title=Controlled Vocabulary Name
datasetfieldtype.keywordVocabularyURI.title=Controlled Vocabulary URL
datasetfieldtype.keywordVocabularyURL.title=Controlled Vocabulary URL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to change to URL from URI? (While you're adding Term URI?) Minimally I think we should be consistent and, unless we're sure that there are no URNs in use, sticking with URI might be the better option. (I do see that we focus on URLs in the descriptions and the idea of web locations for things instead of talking about URIs as identifiers that might also be resolvable on the web. I'm not sure what the best way to resolve that is, or that it needs to be handled in this PR.) @jggautier - any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure and might not understand this enough. But maybe I could help see if there are URNs in use?

In case this info is helpful, I think we used "keywordVocabularyURI" for the database name because DDI Codebook uses "vocabURI". And maybe we thought that most people would be more familiar with the term "URLs", so that's what we used for the label people see on the page and in the tooltips. Of course this was all before I joined Dataverse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Recherche-Data-Gouv team, we decided to rollback modifications from keywordVocabularyURI to keywordVocabularyURL. Update is coming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update is done @qqmyers :)

@@ -0,0 +1,5 @@
-- update of the "keywordVocabularyURI" metadata to make it consistent with its name "Controlled Vocabulary URL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is obsolete now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description also still mentions changing from URI to URL (not in the code, but might confuse in review or QA)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes absolutely, I delete the script and update the PR description

@qqmyers
Copy link
Member

qqmyers commented Apr 25, 2024

@jggautier - any reason this can't get on the board now to move forward?

@jggautier
Copy link
Contributor

jggautier commented Apr 25, 2024

Hmm maybe. Could this get on the board in the next sprint's "This Sprint" column while we wait to hear back from @stevenferey about the tooltip text and how it does or doesn't follow the guidelines for metadata text? Or should that be resolved first?

I'd like to make sure that those guidelines are easy to find and are helpful when folks are working on the text associated with metadata, like the field labels and tooltips shown in the UI.

@stevenferey, I asked about this a few weeks ago because the current tooltip text for this Term URI field makes me think that you haven't seen those guidelines because they're hard to find or that the guidelines could be clearer.

@stevenferey
Copy link
Contributor Author

Sorry for the late response,
I was able to read the document after your message.

Is this description better?
Keyword URI points to the web presence of the term. Enter an absolute URI, e.g. starting by https://

We listen to your suggestions for improvement.

@jggautier
Copy link
Contributor

Hey @stevenferey. Thanks for taking a look!

The more important thing is to avoid adding instructions in the tooltip for how depositors should add metadata, since people looking for data also see those tooltips and that information isn't relevant for them. And we hope that the watermark and the field validation will ensure that depositors are entering what you expect.

So I think we should remove the second sentence about entering an absolute URI. And the rest of the changes I suggested below are mostly style changes for consistency:

A URI that points to the web presence of the Keyword Term

@jggautier
Copy link
Contributor

jggautier commented Apr 26, 2024

It would be cool if we could encourage folks to use a tool that would apply our style guidelines to their text.

The paid versions of Grammarly and a less popular tool called ProWritingAid say they can do this. Maybe there are free tools, too!

@stevenferey
Copy link
Contributor Author

Thank you for this feedback,

I pushed a label update to follow the recommendation.

@jggautier
Copy link
Contributor

@qqmyers after you asked me last if this could get on the board to move forward, I asked if it could while we waited to hear back about the tooltip text or if that should be addressed first.

I think the issue with that text's been addressed, so I can't think of any reason why this couldn't get on the board to move forward.

}
}

if (StringUtils.isNotBlank(subject)) {
subject_check = writeOpenTag(xmlw, "subjects", subject_check);
writeSubjectElement(xmlw, subjectScheme, schemeURI, subject, language);
// we prioritize the keywordTermURI metadata to populate schemeURI
writeSubjectElement(xmlw, subjectScheme, StringUtils.isNotBlank(keywordTermURI) ? keywordTermURI : keywordVocabURI, subject, language);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenferey - The DataCite Schema includes both a schemeURI and valueURI attributes - should this now include both rather than swapping the term URI into the schemeURI attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers , Thank you for this feedback
Indeed we can consider using the two metadata as specified in the xsd

With the following example:
Capture d’écran du 2024-05-15 16-44-44

We would obtain the following export:

<subjects>
<subject>Agricultural Sciences</subject>
<subject valueURI="https://term-uri.com/" schemeURI="https://vocabulary-uri.com/" subjectScheme="cvoc name">keyword term</subject>
</subjects>

Does this correspond to the proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qqmyers, The branch is updated with the modification of the OpenAire export. Thanks a lot

@coveralls
Copy link

coveralls commented May 15, 2024

Coverage Status

coverage: 20.589% (+0.009%) from 20.58%
when pulling a79cdbf on Recherche-Data-Gouv:10288-add-Term-URI-metadata-in-Keyword-block
into 1f9a682 on IQSS:develop.

@qqmyers
Copy link
Member

qqmyers commented May 28, 2024

@stevenferey - I think the change looks good. There is a (probably unrelated) failing test though that may be related to destroying a dataset - can you merge with develop again to pick up #10566. Details: expected: <404> but was: <500>
Stacktrace
org.opentest4j.AssertionFailedError: expected: <404> but was: <500>
at org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
at edu.harvard.iq.dataverse.api.HarvestingServerIT.testSingleRecordOaiSet(HarvestingServerIT.java:630)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

@luddaniel
Copy link
Contributor

Hello @qqmyers, as @stevenferey is in holiday for a full month, I will handle the next exchanges.
This branch is already up to date, and by jumping into debug mode I can tell that the expected should be <500>.
Screenshot from 2024-05-29 15-19-51
Caused by: jakarta.persistence.NoResultException: getSingleResult() did not retrieve any entities. handled error will throw .add("code", 500) response.
By fast crawling git blame I would say that no code has been modified for the last 3 months so I don't know when this IT code started to fail.
I did go ahead and fixed the error in the last commit a79cdbf, tell me if you want to do something else.

@qqmyers
Copy link
Member

qqmyers commented May 29, 2024

@luddaniel - thanks. We're seeing the test fail in other branches, one with only docs changes, so definitely unrelated. That said, we're looking to see if we can understand why it started failing recently and will probably want to pick up your try/catch fix. I'll let you know if there's anything else to do.

@pdurbin
Copy link
Member

pdurbin commented May 30, 2024

will probably want to pick up your try/catch fix

Yes, I cherry-picked it here:

Thanks, @luddaniel!

@pdurbin
Copy link
Member

pdurbin commented Jun 3, 2024

@stevenferey we cherry-picked a commit from this PR into #10601 and merged it. Can you please merge develop into this PR? Thanks.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is done at this point. The last build failed with the AWS instance not starting but probably not related. (Should rerun at some point.)

@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Jun 13, 2024
@stevenwinship stevenwinship self-assigned this Jun 14, 2024
@stevenwinship
Copy link
Contributor

Issue with "Data migration to the new keywordTermURI field"
No validation takes place until someone tries to edit the metadata in the UI

Before update:
Screenshot 2024-06-14 at 9 55 51 AM
After Update:
Screenshot 2024-06-14 at 9 57 24 AM
After trying to update the metadata:
Screenshot 2024-06-14 at 9 58 33 AM

@qqmyers
Copy link
Member

qqmyers commented Jun 14, 2024

I think this is how all fields of type url work, i.e. not related to the one field added in the PR.

@stevenwinship stevenwinship merged commit 00020e2 into IQSS:develop Jun 14, 2024
15 checks passed
@stevenwinship stevenwinship removed their assignment Jun 14, 2024
@pdurbin pdurbin added this to the 6.3 milestone Jun 14, 2024
@luddaniel luddaniel deleted the 10288-add-Term-URI-metadata-in-Keyword-block branch September 19, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: 🚀 Done (Recherche Data Gouv)
Development

Successfully merging this pull request may close these issues.

Feature Request/Idea: Add "Term URI" metadata in Keyword block
8 participants