-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remap oai_dc fields dc:type, dc:date, and dc:rights #10737
base: develop
Are you sure you want to change the base?
Conversation
The `oai_dc` export and harvesting format has had the following fields remapped: - dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset". - dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped only to the field "Publication Date". - dc:rights was not mapped to anything. Now it is mapped (when available) to terms of use, restrictions, and license.
@jggautier heads up that this relates to this issue in that we are now adding "dc:rights" to the |
@tcoupin I'm requesting a review from you because you modified the dc:date login in the following pull request and I changed it (as explained above): |
Re: dc:date - should it be mapped to the same field as https://guides.dataverse.org/en/latest/api/native-api.html#set-citation-date-field-type-for-a-dataset ? That is publicationDate by default. |
This comment has been minimized.
This comment has been minimized.
@qqmyers well, Publication Date is what @philippconzett asked for in the issue (#8129). |
@philippconzett 's notes also point out that this date potentially going to be interpreted as the citation date. Since we allow configuring that in the local installation, it seems like it could be confusing to hardcode it for harvesting. If the harvester used the field from that setting, citations would be consistent in the local display and harvesting sites, and it would default to publicationDate as requested in the issue. |
I don't have a strong opinion about it. |
I think @qqmyers's suggestion for dc:date makes sense. |
I've taken @tcoupin's role on Dataverse issues, so I am looking at this for him. Part of the context for the change he implemented (mapping dc:date to Publication Date if Production Date is empty) was that when Dataverse harvests another OAI-PMH repo, dc:date is mapped to productionDate and this production date is then used in the citation of the harvested dataset. #8733 and #8732 were both part of an effort to guarantee the coherence between citation dates when harvesting another Dataverse. So I agree with @qqmyers's suggestion on Hardcoding There might be an alternative solution where there is always at least a |
📦 Pushed preview images as
🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
@plecor thanks. One thing to consider with "dc:type" is that types other than datasets (like software and workflows) are coming... ... so maybe we can revisit "dc:type" once that pull request is merged. To all, I pushed some tests to exercise export and setting the citation date. Now I'm trying to see if there's a small change I can make to DublinCoreExportUtil to get the citation date out. I can get just the year (YYYY) with code like this... String citation = version.getCitation();
// We're looking for ", YYYY, " in a citation like this:
// Finch, Fiona, 1999, "Darwin's Finches", https://doi.org/10.5072/FK2/WSSYBE, Root, V1
Pattern pattern = Pattern.compile(", (\\d{4}), ");
Matcher matcher = pattern.matcher(citation);
matcher.find();
String yearInCitation = matcher.group(1);
writeFullElement(xmlw, dcFlavor+":"+"date", yearInCitation); ... but I need the full YYYY-MM-DD version to put in the the |
I dug a little more and our citation code is focused on returning just a 4 digit year for the date. This would be a change from what we do now (YYYY-MM-DD) @philippconzett @plecor @qqmyers what do you think? Should we change The spec Philipp found seems to say it's ok. Check out the year 1650 as an example at https://www.base-search.net/about/en/faq_oai.php#dc-date |
Since the citationDateFieldType is part of the Dataset, I'd think at some point it could/should be part of the DatasetDTO and JSON export, thereby being available to other exporters (will the SPA or other client need this info (in the JSON returned from the dataset api) at some point?). If that's too much for now, I think the idea of parsing it from the citation as YYYY makes sense, assuming that's sufficient for how people want to use that field. Alternately, I think you could 'go around' the exporter SPI interface and get the full value directly pretty easily as well, e.g. with something like:
This would not be the only current exporter doing that (e.g. the DDI exporter grabs the ExportInstallationAsDistributorOnlyWhenNotSet Setting it needs). |
What this PR does / why we need it:
The
oai_dc
export and harvesting format has had the following fields remapped:As these are backward incompatible changes, they have been noted in the API changelog: https://dataverse-guide--10737.org.readthedocs.build/en/10737/api/changelog.html
Which issue(s) this PR closes:
Special notes for your reviewer:
Should these backward-incompatible changes be hidden behind a feature flag?
Suggestions on how to test this:
See rules above under "what this PR does".
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.
Is there a release notes update needed for this change?:
Yes, included.
Additional documentation:
Yes, I updated the API changelog: https://dataverse-guide--10737.org.readthedocs.build/en/10737/api/changelog.html