Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong date parsed in crossref API #191

Closed
Adafede opened this issue Mar 19, 2022 · 10 comments
Closed

Wrong date parsed in crossref API #191

Adafede opened this issue Mar 19, 2022 · 10 comments

Comments

@Adafede
Copy link

Adafede commented Mar 19, 2022

Hi,

Thanks to @Daniel-Mietchen, we noticed we had an error in our bot, which was taking the wrong date from crossref API.

My first reflex was to look how you are doing it and it looks like we are doing it the same way...

p.publication_date = datetime.datetime.fromtimestamp(int(r['created']['timestamp']) / 1000)

This is the date the entry was created in CrossRef and not the date of publication, see http://api.crossref.org/works/10.1016%2Fs0031-9422%2800%2994305-x for an example.

This might imply some heavy curation of the article dates on WD...

Also tagging @bjonnh in case!

Happy to help!

@Adafede
Copy link
Author

Adafede commented May 13, 2022

No new?

andrawaag added a commit that referenced this issue Aug 31, 2022
@Daniel-Mietchen
Copy link
Contributor

Thanks, @andrawaag !

@andrawaag
Copy link
Collaborator

The incorrect date was indeed parsed and used. The issue was that the crossref API use a data model for timestamps that includes standardized timestamps for all included dates, except for the publication date. There the timestamp is only presented as a list within a list where the dates might have different forms. I assumed the date in that list follows the "year-month-day order", and that the list within the list does actually only have one date.

Format of the publication date
image

Format of other dates
image

@Adafede
Copy link
Author

Adafede commented Sep 1, 2022

Thank you very much indeed!

@Adafede
Copy link
Author

Adafede commented Sep 1, 2022

@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?

andrawaag added a commit that referenced this issue Sep 1, 2022
…d day value to function. When crossref does not provide those values the function fails. The fix is to add precision to the script for month and year and in those case where the month and/or the day values are missing the middle values are provided. ie. middle of the year July 2nd and middle of the month the 15th
andrawaag added a commit that referenced this issue Sep 1, 2022
…d day value to function. When crossref does not provide those values the function fails. The fix is to add precision to the script for month and year and in those case where the month and/or the day values are missing the middle values are provided. ie. middle of the year July 2nd and middle of the month the 15th
@Daniel-Mietchen
Copy link
Contributor

@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?

@Adafede I'm not aware of anyone running such a bot job. Given the scale of the edits, it should also probably be done by a dedicated account. Will think about it.

@carlinmack
Copy link

Hi, I'm currently looking at fixing this issue on Wikidata. I first want to elaborate on the conversation so far.

The example given in the OP of this issue is:

Listed are all the dates associated with this paper:

  • indexed: 2023-06-17
  • published-print: 1978-01
  • created: 2002-07-25
  • deposited: 2019-04-21
  • issued: 1978-01-21
  • journal-issue["published-print"]: 1978-01
  • published: 1978-01

Originally the publication date used created and then in c9ac5f4 this behaviour was changed to use 'issued' instead.

Now lets look at this example:

Listed are all the dates associated with this paper:

  • indexed: 2023-02-22
  • created: 2002-07-28
  • published-online: 2009-04-13
  • deposited: 2018-08-02
  • issued: 2009-04-13
  • published: 2009-04-13

Wikidata has the correct date for this paper (2002, source), however using the now preferred issued property we would say this was published in 2009.

issued does seem to be correct most of the time, but would be great to figure this out. I have more mismatches to go through and will update with other examples

@Adafede
Copy link
Author

Adafede commented Sep 19, 2023

Hi @carlinmack...sorry for not replying earlier, could you find out more?

@carlinmack
Copy link

I haven't looked thoroughly through the mismatches but I haven't find any other similar cases since. I found some documentation in the API for these dates:

created - sort by created date
deposited - sort by time of most recent deposit
indexed - sort by time of most recent index
is-referenced-by-count - sort by number of times this DOI is referenced by other Crossref DOIs
issued - sort by issued date (earliest known publication date)
published - sort by publication date
published-online - sort by online publication date
published-print - sort by print publication date
references-count - sort by number of references included in the references section of the document identified by this DOI
relevance - sort by relevance score
score - sort by relevance score
updated - sort by date of most recent change to metadata, currently the same as deposited

So I think issued is most correct date and I should most probably just report the issue with 10.1110/ps.4690102

@Adafede
Copy link
Author

Adafede commented Feb 28, 2024

@Daniel-Mietchen Do you know if there will be a bot taking care of the correction of the already created dates?

@Adafede I'm not aware of anyone running such a bot job. Given the scale of the edits, it should also probably be done by a dedicated account. Will think about it.

I just started a batch of 48k corrections: https://quickstatements.toolforge.org/#/batch/225537 (and 20 next ones)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants