-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Purge all the stale Nesstar harvested dataverses in production #153
Comments
Laura Huis in 't Veld from DANS-KNAW asked today if we could remove the records harvested at https://dataverse.harvard.edu/dataverse/dans, which were harvested from a Nesstar repository. The email is at https://help.hmdc.harvard.edu/Ticket/Display.html?id=324230. I let them know that we'll work on it. On the Harvard repository's Manage Clients page, there's no longer an entry for this client. |
2023/12/19: Prioritized during meeting on 2023/12/18. Added to Needs Sizing. |
2023/12/19: Sized at 10 during sizing meeting. |
For the practical purposes of our users seeing these records in the search results, they have all been "purged" already. In that they were all dropped from the Solr index when 6.0 was deployed. But they were still sitting in the database. They are being deleted now. It's just going to take some time, since there doesn't appear to be any better/faster way, other than with the Destroy api, one by one. |
Confirming that all the nesstar-harvested records were deleted overnight.
... and several thousand harvested datasets associated with them, as in These have all been removed from the actual production database. So, as of next Monday when the db copy is updated the numbers above will both be zero. |
Thanks @landreev. I'll let Laura Huis in 't Veld from DANS-KNAW know, in our email thread at https://help.hmdc.harvard.edu/Ticket/Display.html?id=324230, that the harvested datasets were removed and that by the end of next week I'll delete the collection at https://dataverse.harvard.edu/dataverse/dans. And I'll let them know to reach out to us again if they'd like us to harvest the metadata, whereever it exists now. For example, one of these purged harvested datasets used be at https://easy.dans.knaw.nl/ui/datasets/id/easy-dataset:33366, and it's now accessible at https://doi.org/10.17026/dans-xtu-d36b, which is at the DANS Data Station for Social Sciences and Humanities. So maybe they'd like us to harvest the datasets from that installation. |
Yeah, if any of it these metadata can be re-harvested from up-to-date sources, we can/should do that. |
@landreev, I'm not able to delete the collection at https://dataverse.harvard.edu/dataverse/SND, which had the harvested datasets that have been removed. When I try, the UI shows the error message: Do you think there's still something in the database that's preventing that collection from being deleted? |
@jggautier Yes, it looks like there was some junk in the database still referencing these nesstar collections, that prevented deleting them (old legacy stuff). I cleaned it up and was able to delete SND. Please let me know if you run into anything similar with the other collections. |
Short version: This is very stale content that we have no means to refresh or to serve meaningfully. All these harvested objects do is pad our dataset counts. But they are more of an embarrassment than they are worth by now, IMO.
History: we haven't supported harvesting from Nesstar repositories since v. 4 (!!!). The Nesstar-based harvesting clients and the corresponding dataverses and harvested datasets we have in production were grandfathered from DVN v3 via database migration. That was done on the assumption that we would add Nesstar support or otherwise revisit the issue sometime soon (sigh). Nesstar as a system has been completely reimplemented since then, so if we want to harvest content from these repositories in the future it would need to be reimplemented from scratch on our end. A lot of this content is completely stale by now.
There is some evidence that these harvesting clients and the corresponding dataverses cannot be removed using the normal client manager: #142. So purging them may require some manual API and/or database work (just like creating them did early on).
The text was updated successfully, but these errors were encountered: