Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefering original resources in presentation of record duplicates #316

Open
teckart opened this issue Dec 1, 2020 · 2 comments
Open

Prefering original resources in presentation of record duplicates #316

teckart opened this issue Dec 1, 2020 · 2 comments
Assignees
Milestone

Comments

@teckart
Copy link
Contributor

teckart commented Dec 1, 2020

In cases where the VLO importer identifies record duplicates (currently based on name and language), the record presented on the search page might not be the one from the resource owner, but another record provided by an external catalogue. Ways to reduce this behavour have to be evaluated and implemented.

Example: "Arabic Speech Corpus" OTA vs. ELRA

Helpful links:

@twagoo twagoo added this to the VLO 4.10 milestone Dec 1, 2020
@teckart
Copy link
Contributor Author

teckart commented Dec 3, 2020

The Solr collapsing mechanism provides min/max/sort parameters to select a group's head. We could create an (optional) index field to indicate the preference of a specific resource based on its origin and use it in the query, but it is still unclear what information we would use for that. We could for example maintain a list of endpoints that are mostly "aggregators" (of external resources) for downvoting, but this would mean additional configuration & maintenance and would be a bit random in some cases (like LINDAT's "LRT inventory"). This might also be the case when prefering a dataProvider over others.

@teckart teckart self-assigned this Dec 3, 2020
@twagoo
Copy link
Member

twagoo commented Dec 4, 2020

Something to keep in mind: we already have boosts in place for things like availability, presence of description, position in hierarchy (see solrconfig.xml) that now help determine the group's head. By default the selection takes into account relevance with respect to the query as well.

We will have to carefully decide whether we want to add logic 'on top' of this, or have a completely separate policy for the selection of the head. I don't have a clear preference right now but we have to make sure that we don't inadvertently discard a useful ranking mechanism.

@twagoo twagoo modified the milestones: VLO 4.10, 4.11 Apr 22, 2021
@twagoo twagoo modified the milestones: VLO 4.11, VLO 4.12 Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants