Unable to create harvesting set of datasets in "Original Murray Collection" #124

jggautier · 2021-09-27T20:40:18Z

We need to create a harvesting set containing "original" Murray datasets (this discussion and work is tracked in #68). Datasets in the Murray Research Archive Dataverse were reorganized so that the "original" datasets are within a newly created Dataverse collection (or within its subcollections) called Original Murray Collection (https://dataverse.harvard.edu/dataverse/originalMRA), which is within the Murray Research Archive Dataverse.

When I try to create a harvesting set of the datasets in the newer Original Murray Collection, the "Create Harvesting Set" popup tells me that the search query returned no results:

5112855 is the database ID of the Original Murray Collection.

And when I created the set, no datasets were in it.

The search page and the Search API return the 339 datasets that are within the Original Murray Collection and that we need to be in a harvesting set.

Some troubleshooting

Harvard Dataverse Repository and Demo Dataverse were on v5.6 when I tried creating the harvesting set and the following troubleshooting.

To see if the issue was related to all of the datasets being moved into the collection (as opposed to being created in the collection), on Demo Dataverse I created a new Dataverse collection, moved already-published datasets into that new collection, and tried to create a harvesting set. Demo Dataverse told me that the search query returned no results.

To see if the issue might instead or also be related to trying to create a harvesting set of datasets contained in a relatively new collection (created past a certain date), I found a collection on Harvard Dataverse Repository published today with a published dataset (also created today), and tried to create a harvesting set using that "subtree" query to include datasets in that collection. The "Create Harvesting Set" popup told me that it found one dataset.

I didn't want to try isolating the issue further by moving or creating new collections or datasets on the Harvard Dataverse Repository (as opposed to Demo Dataverse) because of the extra work involved in notifying people about the testing and destroying datasets or moving others' datasets back to their original location.

But hopefully this is helpful for more investigation into what's not allowing me to create a harvesting set that contains the datasets in the Original Murray Collection (https://dataverse.harvard.edu/dataverse/originalMRA).

djbrooke · 2021-10-27T18:44:36Z

Check if the subtreePaths has to include the database ID of all parent collections of the collection whose datasets need to be in the harvesting set (excluding the "Root" Dataverse collection).

That is, the query for creating a harvesting set containing datasets in the Original MRA Collection should be subtreePaths:"/10/5112855", since the ID for the parent MRA Dataverse collection is 10. If that's how it should work, update the guide (@djbrooke and @jggautier)

jggautier · 2021-10-27T19:09:23Z

subtreePaths:"/10/5112855" worked. https://dataverse.harvard.edu/oai?verb=ListRecords&set=Original_MRA_Collection&metadataPrefix=oai_dc

So the subtreePaths has to include the database IDs of each of the collection's parent collections (excluding the "Root" Dataverse collection). I suppose it's not really a "path" if it doesn't include a kind of breadcrumb to the collection whose datasets need to be in the harvesting set. So a user could infer from the word "path" in "subtreePaths" that it must include the database IDs of its parent collections.

We could include this explanation in the guides.

In the v5.7 Admin Guide, here's the part of the "Managing Harvesting Server and Sets" page that describes subtreePaths:

I'm also wondering why the system couldn't figure out the path on its own, given the ID of the collection.

djbrooke · 2021-10-27T19:11:19Z

@jggautier - great that it works! I'll make a PR with a change that describes this in more detail - I didn't know about this either.

pdurbin · 2021-10-27T19:19:34Z

I'm also wondering why the system couldn't figure out the path on its own, given the ID of the collection.

I just thought I'd pipe in and say the system does figure it out for the Search API and you can pass the alias of the dataverse collection. The logic is all in Search.java and is only used by the Search API but it could be centralized. The Search API ultimately uses subtreePaths under the covers but it's pretty low-level. As you can see, you have to put database IDs in it. I think a longer term fix would be for an issue with a title something like "For harvesting, deprecate subtreePaths and introduce 'subtree' variable like the Search API". That is, make harvesting as easy as the Search API when it comes to creating the query. Centralize the logic. Stop using the low-level subtreePaths in harvesting. I hope that makes sense.

jggautier added the bug Something isn't working label Sep 27, 2021

djbrooke mentioned this issue Oct 28, 2021

doc updates to better explain harvest server query format for subcollections IQSS/dataverse#8197

Merged

pdurbin added a commit to IQSS/dataverse that referenced this issue Oct 28, 2021

link to API guide IQSS/dataverse.harvard.edu#124

7b71433

kcondon closed this as completed in IQSS/dataverse#8197 Nov 1, 2021

jggautier mentioned this issue Jan 25, 2022

Create OAI-PMH set for Murray dataverse #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to create harvesting set of datasets in "Original Murray Collection" #124

Unable to create harvesting set of datasets in "Original Murray Collection" #124

jggautier commented Sep 27, 2021 •

edited

Loading

djbrooke commented Oct 27, 2021 •

edited by jggautier

Loading

jggautier commented Oct 27, 2021 •

edited

Loading

djbrooke commented Oct 27, 2021

pdurbin commented Oct 27, 2021

Unable to create harvesting set of datasets in "Original Murray Collection" #124

Unable to create harvesting set of datasets in "Original Murray Collection" #124

Comments

jggautier commented Sep 27, 2021 • edited Loading

djbrooke commented Oct 27, 2021 • edited by jggautier Loading

jggautier commented Oct 27, 2021 • edited Loading

djbrooke commented Oct 27, 2021

pdurbin commented Oct 27, 2021

jggautier commented Sep 27, 2021 •

edited

Loading

djbrooke commented Oct 27, 2021 •

edited by jggautier

Loading

jggautier commented Oct 27, 2021 •

edited

Loading