Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DwC-A: cannot import occurrence data for existing species #2581

Open
sergeitarasov opened this issue Oct 14, 2021 · 23 comments
Open

DwC-A: cannot import occurrence data for existing species #2581

sergeitarasov opened this issue Oct 14, 2021 · 23 comments
Assignees
Labels
question Person who filed the issue wants feedback from others.

Comments

@sergeitarasov
Copy link

sergeitarasov commented Oct 14, 2021

As an experiment, I tried to import (many times with different modifications) one occurrence record using DwC-A (attached) with ‘restrict record to existing nmcl’ to match the species that is already in TW. But it does not work:

Protonym Parachorius not found with that name and/or classification. Importing new names is disabled by import settings.

I wonder how can I fix it? :)

DwC_Parach.xlsx
?

@mjy mjy added the question Person who filed the issue wants feedback from others. label Oct 14, 2021
@LocoDelAssembly
Copy link
Contributor

Problem seems to be that the importer is trying to locate Parachorius thomsoni Harold, 1873 directly under root since the dataset is not providing higher classification.

I could probably change the way existing names are located when restricting is enabled @mjy? This would mean ignoring higher classification if the name cannot be found in the provided parents path (would also fix cases when the dataset does not agree with the existing classification in the database). This requires a bit significant algorithm change.

@mjy
Copy link
Member

mjy commented Oct 14, 2021

I think that he doesn't want to restrict, he wants them to be created, correct @sergeitarasov? He is just missing an option somewhere?

@proceps
Copy link
Contributor

proceps commented Oct 14, 2021

I can envision many cases where classification does not match. I would say, that the classification should be ignored. The name string should match. There could be some issues so. For example we have both Protonym and Combination with the same ScientificName. We may also have homonyms. In some cases a manual resolution would still be required.

@proceps
Copy link
Contributor

proceps commented Oct 14, 2021

Following @mjy, my understanding, that @sergeitarasov specifically restricted creation of new names, he wants to link specimen records to the existing classification. That would be the requirements in most of the cases when we import data to 3i Auchenorrhyncha project as well.

@mjy
Copy link
Member

mjy commented Oct 14, 2021

We should definitely maintain the mode where import only succeeds when the hierarchy fully matches as an option, and the default. Having an alternate mode where name matches OTU#name or Otu.taxon_name.cached only (and there is only one match), as you imply, is also useful.

@sergeitarasov
Copy link
Author

Following @mjy, my understanding, that @sergeitarasov specifically restricted creation of new names, he wants to link specimen records to the existing classification. That would be the requirements in most of the cases when we import data to 3i Auchenorrhyncha project as well.

Yep, that's correct @proceps, I would like to link the records to the existing sp. This going to be the most frequent task for me to import with DwC-A. Do you have any idea of how I can fix it now? Adding a higher-level taxon on the csv?

@mjy
Copy link
Member

mjy commented Oct 14, 2021

Yes, adding higher level will make it work. It must match all the way up.

@proceps
Copy link
Contributor

proceps commented Oct 14, 2021

It is hard to envision managing classification in DwC. We have a table for 250 holotypes (just species name), updating the higher classification all the way up, this this would be a job comparable to creating collection objects manually using Comprehensive task.

@sergeitarasov
Copy link
Author

Current classification on TW: 'Parachorius thomsoni -> Parachorius-> Parachoriini-> Scarabaeinae-> Root'
I added 'tribe' and 'subfamily' but the import still returns the same error. Does that mean that I need to change the current classification to include the entire taxonomic path (all the way to Animalia)?

@mjy
Copy link
Member

mjy commented Oct 14, 2021

Not too hard. Pre-step is to build something like geographic name matcher service we have. You paste in one column of names, you get the higher matching names back, you paste those into your columns.

Again, both modes are warranted, I'm not debating that, but people importing data from diverse datasets are going to want the strict mode as well.

@mjy
Copy link
Member

mjy commented Oct 14, 2021

This is literally the challenge everyone wants to solve "trivially", which is anything but trivial when you want to make many decisions about your data.

For your data @proceps you have already pre-validated it all, this is different from others bringing in data that they haven't looked at.

@sergeitarasov
Copy link
Author

sergeitarasov commented Oct 14, 2021

I added 'higherCLassification': Scarabaeinae|Parachoriini|Parachorius
Now the error:
’Protonym thomsoni not found with that name and/or classification. Importing new names is disabled by import settings.’

@LocoDelAssembly
Copy link
Contributor

Which project is this? Is it in production?

@sergeitarasov
Copy link
Author

Which project is this? Is it in production?

Yep, in production 'Dung_Beetles'

@LocoDelAssembly
Copy link
Contributor

Getting production database into my development machine for testing. Will take me around 15 mins to setup.

@LocoDelAssembly
Copy link
Contributor

@sergeitarasov Sorry late reply, had some problems with the database and long meeting afterwards.

I tried with the attached spreadsheet and it had no trouble handling the name, but complained that the repository referenced in institutionCode with acronym AN does not exist (I changed it to something else to test):
image
image

Spreadsheet used: DwC_Parach.xlsx (higherClassification added on the far right)

@sergeitarasov
Copy link
Author

Thanks @LocoDelAssembly!
I tried your file and fixed the institution acronym. It still does not work for me though, if I restrict the import to 'Restrict import to existing nomenclature only'. The error is the same -- cannot find the partonym thomsoni.
However, if I do not restrict then it works but creates another Parachorius thomsoni within the existing Parachorius.

@LocoDelAssembly
Copy link
Contributor

@sergeitarasov right, sorry. On first try I forgot to use existing nomenclature, so it created the duplicate, but on second try I enabled and still succeeded. Investigating why is happening...

image

@sergeitarasov
Copy link
Author

Second try works for me too (with the restriction). TW adds the records to P. thomsoni that was previously imported with DwC-A but not to the original P. thomsoni.

@LocoDelAssembly
Copy link
Contributor

Found the problem. The importer was expecting authorship and year as data in the taxon name itself (as that is the way it creates them), but was failing in this case because authorship is derived from original citation. Fixed by matching by rank and name only, now it matched existing name:
image
image
image

@LocoDelAssembly
Copy link
Contributor

@sergeitarasov forgot to mention, this fix won't be available until we release 0.20.1. Please be sure to delete the duplicate name created by the importer to avoid confusion.

@sergeitarasov
Copy link
Author

Works for me now! Thanks for the help @LocoDelAssembly :)
Two quick Qs:

  1. Does that mean that the import for taxon name linked to author citation vs. linked to verbatim authroship is different at the moment?
  2. When approximately TW 0.20.1 will be realesed?

@mjy
Copy link
Member

mjy commented Mar 22, 2023

Can we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Person who filed the issue wants feedback from others.
Projects
None yet
Development

No branches or pull requests

4 participants