Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format the value of publicationYear in oai_datacite #30

Closed
cessda-bitbucket-importer opened this issue Jun 20, 2022 · 4 comments
Closed
Assignees
Labels
bug Something isn't working major
Milestone

Comments

@cessda-bitbucket-importer
Copy link
Contributor

Original report on BitBucket by Toni Sissala (GitHub: toni-sissala).


publicationYear is supposed to be in format YYYY. The aggregator renders a full datestamp if available in source data.

Format the value by taking only the four characters of the source string:

  • YYYY-MM-DD → YYYY
  • YYYY-MM → YYYY
  • YYYY → YYYY

If the source datestamp would be in a different format, this will yield invalid results.

  • The primary source of this value is (in DDI-C): stdyDscr/citation/prodStmt/prodDate
  • And if not found, secondary is stdyDscr/citation/distStmt/distDate/@‌date

The distDate/@‌date format is enforced by CMV profile, and should give us correct results.

Should the order of lookup be reversed?

  • The new primary source of this value would be (in DDI-C): stdyDscr/citation/distStmt/distDate/@‌date
  • And if not found, secondary would be stdyDscr/citation/prodStmt/prodDate

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


It would be better to parse the ISO date and then extract the year from that.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Matthew Morris (GitHub: matthew-morris-cessda).


In regards to the lookup order, the profiles only specify distDate, so indeed that should be preferred.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


In regards to the lookup order, the profiles only specify distDate, so indeed that should be preferred.

The lookup order will be reversed.

It would be better to parse the ISO date and then extract the year from that.

Could you elaborate how would this be better?

The source data is a string (it is a string in the DB and OAI-PMH Repo Handler receives it as a string inside a JSON object), so parsing it as a date would require to first load it as a date-object and then extract the year and cast it as a string, which seems a bit overkill, since the date format is enforced by CMV profile and will always begin with ‘YYYY’ (the first four characters represents the year).

Profile states that distDate/@‌date format is:

Ideally 'YYYY-MM-DDThh:mm:ssZ' format, but can accept 'YYYY-MM-DD', 'YYYY-MM' or 'YYYY' as date format.

Code snippets to compare

parse as date and extract year as string:

import datetime
str(datetime.datetime.strptime('2001-12-31T12:33:59Z', '%Y-%m-%dT%H:%M:%SZ').year)

or extract first four characters from string:

'2001-12-31T12:33:59Z'[:4]

Parsing as a date will be even more complex if the source could also be in other formats ('YYYY-MM-DD', ‘YYYY-MM’, ‘YYYY’) as every format needs to be specified separately.

@cessda-bitbucket-importer
Copy link
Contributor Author

Original comment by Toni Sissala (GitHub: toni-sissala).


Fix bitbucket bugs #29, #30 & #31

All changes concern OAI Datacite serialization.

Add primary lookup location for Publisher. The previous lookup
location will remain as a secondary.

Format the value of publicationYear to only contain a year. Change
lookup order so that primary is
study.publication_years.attr_distribution_date.value, secondary is
study.publication_years.value.

Include property Date and use
study.publication_years.attr_distribution_date.value as source.

Require kuha_oai_pmh_repo_handler 1.0.2 in setup.py and requirements.txt.

Bump version to TBD.

Add changelog entry for TBD and write about the changes.

Fixes #29, #30, #31 at BitBucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

2 participants