Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Dataverse installations' harvesting sets exporting regularly? #5392

Closed
jggautier opened this issue Dec 11, 2018 · 6 comments
Closed

Are Dataverse installations' harvesting sets exporting regularly? #5392

jggautier opened this issue Dec 11, 2018 · 6 comments

Comments

@jggautier
Copy link
Contributor

Harvard Dataverse harvests datasets from QDR's dataverse installation. An installation admin, @adam3smith, noticed that only 31 of QDR's 48 published datasets were harvested to Harvard Dataverse. He wanted to know why and what could be done so that Harvard Dataverse harvests all of QDR's datasets. QDR's OAI set includes all of its published datasets.

My dashboard on Harvard Dataverse showed that the last harvest was the most recent Sunday, so it looks like scheduled harvesting to import datasets from QDR is running on the schedule I set, every Sunday. I tried a manual harvest, pushing the "Run harvesting" button, and the dashboard reported another successful harvest, but 0 datasets were harvested, and the total was still 31.

Sebastian then tried a manual export of QDR's OAI set from his installation's dashboard. Then when I ran a manual harvest again, 17 datasets were harvested, so the total was 48. Many of these 17 datasets were published months ago. Only three datasets were published this December, and none were published in November.

Is the harvesting server exporting OAI sets on a regular schedule? Is it necessary that installation admins export their OAI sets themselves (e.g. pushing the "Run harvesting" button in their dashboard) so that other systems harvesting those sets can import the latest dataset metadata?

@poikilotherm
Copy link
Contributor

This seems related to #5345 and the refactoring of timers.

@jggautier
Copy link
Contributor Author

jggautier commented Dec 12, 2018

The number of harvested records that Harvard Dataverse has harvested also doesn't match the number of records in the following installation's OAI sets. I ran the harvests manually just to make sure.

(I checked the total number of records in each set using a python script at https://gist.github.com/rlskoeser/880a6f9f20bbaf9202fb)

@juancorr
Copy link

juancorr commented Dec 13, 2018 via email

@djbrooke
Copy link
Contributor

djbrooke commented Feb 6, 2019

@jggautier - I think we were going to check in about this in order to scope it but I lost track of it. Let's figure out what we're trying to estimate here so that we can work on it.

@jggautier
Copy link
Contributor Author

Definitely. Harvard Dataverse is harvesting the 48 data projects (datasets) in QDR's "qdr_whole" oai-pmh set. Since this issue was opened, QDR has published two more data projects. All QDR data projects should be included in QDR's "qdr_whole" oai-pmh set, so the set is missing the 49th and 50th data projects.

Just in case there's something happening other than an export timer problem, @djbrooke asked me to try to look into if there's anything different about the metadata of these two data projects (https://doi.org/10.5064/F65BVECY and https://doi.org/10.5064/F6RL3PS2) compared to the ones that are in the set. I haven't noticed anything, yet.

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@cmbz cmbz closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants