Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Harvesting Client name to the Metadata Source facet #10464

Merged
merged 4 commits into from
Apr 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -897,7 +897,8 @@ public SolrInputDocuments toSolrDocs(IndexableDataset indexableDataset, Set<Long

if (dataset.isHarvested()) {
solrInputDocument.addField(SearchFields.IS_HARVESTED, true);
solrInputDocument.addField(SearchFields.METADATA_SOURCE, HARVESTED);
solrInputDocument.addField(SearchFields.METADATA_SOURCE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Probably need a release note about this feature that also notes that (async/background) reindexing is needed to populate the facet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qqmyers 👋🏼

Thanks! I will add the release note in a few just looking at some other PR right now but I will add it ASAP. Regarding the tests, I am not sure I will also look at this.

Best,
Juan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note added and test condition added to the harvesting test to search by the new collection name. 😃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting what I said during standup in writing:
Rather than using the HarvestingClient's nickname for this facet, we should probably use the name of the local collection into which the client is harvesting.
The clear advantages of doing that:

  1. This mirrors what we are doing for the local datasets: solrInputDocument.addField(SearchFields.METADATA_SOURCE, rootDataverse.getName());
  2. The name of the collection is likely to be more descriptive/better-looking to a human user
  3. While both the client nickname and the name of the local collection are chosen by the local admin, it is far easier to change the latter. The former is not editable at all. With the current implementation, if a prod. instance admin realizes that they named the harvesting client oai_3 and that's what's going to show up in the facet, the only way for them to address it would be to delete the client (and all the content associated with it), and re-create it with a better-looking nickname, then re-harvest.
  4. This will make it unnecessary to add an extra field with a descriptive label to the client class (as was mentioned during standup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if someone has multiple clients harvested into the same collection? 🤔 should we consider this scenario! Also @DS-INRA made some comments a few minutes ago on the issue confirming that the name would be what they need which is as I understand is the other PR associated with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to harvest into the same collection, yes. But then one could make an argument that if the local admin wants their users to see these datasets from different OAI archives (or sets/clients) show as the same collection to their users, they may actually prefer to have them under the same facet too...
All that said, I would agree that it makes sense to implement it the way the original requestor wants it to work. But let's make sure everybody is on the same page. Let me ask some followup questions in the issue.

dataset.getHarvestedFrom() != null ? dataset.getHarvestedFrom().getName() : HARVESTED);
} else {
solrInputDocument.addField(SearchFields.IS_HARVESTED, false);
solrInputDocument.addField(SearchFields.METADATA_SOURCE, rdvName); //rootDataverseName);
Expand Down