-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-harvesting datasets from Roper Center #204
Comments
Folks at the Roper Center let me know that they changed things on their end so that the links that the Harvard Dataverse has for Roper datasets (in the collection at https://dataverse.harvard.edu/dataverse/roper) lead to the datasets (instead of those links leading to error pages like they did earlier this year). I haven't tried again to use Roper Center's OAI-PMH feed to harvest. I'll try again today or next week and report here and in the email thread with the folks from Roper. |
This is cool! TBH, I gave up on the Roper records we have in the database a while ago, I just assumed they were useless. They are most likely way out of date, even if some are resolving now. |
Ah okay. I'm not able to start a harvest on one of the test boxes. I was going to use Demo Dataverse, but I won't now. It sounds better to me if someone else uses a test box. It's more likely that whoever can do that will also be more capable of figuring out what went wrong if something goes wrong. And thinking more about it, it's probably better that anyone who continues to work on this wait for when @sbarbosadataverse can prioritize this on the "Harvard Dataverse Repository Instance" column on the Dataverse Global backlog. |
OK, I'll do that. |
Gene Wang from the Roper Center's been following up regularly about this. I can let him know we haven't looked into this more, yet. But I think it would be helpful if we could say when we could try harvesting from them, even if it's not right away. Is that possible? |
OK, I'll do it (an experimental harvest) this week, maybe even today. dataverse-internal is really not a good server for that (it's being used for testing PRs and needs to be restarted constantly), but I'm thinking of trying it on the perf cluster. Will post any updates here. |
Just deleting all the old, stale Roper from the prod. database is going to be a little non-trivial. I've experimented with that a bit this week on the perf cluster (using a copy of the prod. db there). If you do it the supported way, through the harvesting clients panel, our application attempts to delete all the records at once, and that's a bulky trunsaction with 20+K datasets. I'd like to avoid having to delete them by one by one, so I'm figuring that part out. |
2023/12/19: Prioritized during meeting on 2023/12/18. Added to Needs Sizing. |
2023/12/19: Roper's OAI does not implement the OAI Dublin Core. Unclear on their approach. @landreev will contact them to follow up, and determine next steps. |
I don't have much in terms of a status update. I haven't been able to re-test their OAI server because it's been down or broken for the past few days. I.e. all of these calls are returning a 500:
On the other hand,
are working; so their OAI server is still there - just not working properly. I'm a little self-conscious following up on that RT ticket (330637), since it's so old and since we (I) have dropped the ball on it before. But if their OAI doesn't come back to live miraculously in the next couple of days, I'll reach out and ask. |
Email Debt Forgiveness Day is on Feb 29 😛 (I'm kidding of course!) |
Resized to 3 during sprint kickoff |
Their OAI service was not showing any intent to "fix itself", so I finally emailed them via RT - hoping that the people on the other end of the ticket are still employed, and willing to talk to me. |
Resumed communication with the developer(s) on the Roper side. Hopefully we'll nail it this time around. Happy International Email Debt Forgiveness Day! |
2024/07/10
|
Managers of the Roper Center for Public Opinion Research emailed to let us know that they now make their dataset metadata available over OAI-PMH. See https://help.hmdc.harvard.edu/Ticket/Display.html?id=330637.
But when I tested harvesting the records, Demo Dataverse wasn't able to harvest any:
When I created the harvesting client, for the "Archive Type" I selected "Generic OAI Archive".
I tried the "Roper Archive" option, too, but that didn't work either.
I let the folks at Roper know that the Dataverse development team is working on improving how Dataverse harvests using OAI-PMH, and that once those improvements made it to the Harvard repository and Demo Dataverse, I would try to harvest again.
I also asked them what we should do about the stale records in the Harvard repository (https://dataverse.harvard.edu/dataverse/roper) whose links lead to error pages. Similar to the stale ICPSR records (#63), people who find these Roper datasets and realize that the links don't work could still go to Roper's website (or even try a general search engine) and search there by the dataset's title.
So maybe we could leave them there until we're able to re-harvest them using OAI-PMH.
Someone at Roper asked in the email thread if, in the meantime, we're able to make the links redirect to the dataset pages:
Or maybe we could remove them sooner (e.g. using the destroy dataset API endpoint)?
So for now I plan to:
Definition of done:
When we're able to harvest the metadata from all datasets in Roper's OAI-PMH feed and we remove the stale records that are in https://dataverse.harvard.edu/dataverse/roper
The text was updated successfully, but these errors were encountered: