Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxon name, formatting, and uniqueness #1704

Closed
dustymc opened this issue Sep 21, 2018 · 32 comments
Closed

taxon name, formatting, and uniqueness #1704

dustymc opened this issue Sep 21, 2018 · 32 comments
Assignees
Labels
Function-Taxonomy/Identification Help wanted I have a question on how to use Arctos NeedsDocumentation When the issue is resolved in Arctos repository, this should be moved to the Documentation-wiki repo Priority-High (Needed for work) High because this is causing a delay in important collection work..

Comments

@dustymc
Copy link
Contributor

dustymc commented Sep 21, 2018

https://arctos.database.museum/taxonomy.cfm?taxon_name=Liopistha

contains

Liopistha concentrica

and

Liopistha (Psilomya) concentrica

(and Liopistha (Psilomya) concentricia)

That feels WRONG to me - I don't think the inclusion of the subgenus changes the "taxon concept" enough to warrant a new name in Arctos. I doubt I can reliably prevent that behavior either.

  • is this something we want to encourage/accept/tolerate?
  • if not, what should we do about it?
  • if so, can someone please document whatever it is that we're doing here?

(For whatever it's worth, I'd probably get rid of "traditional subgenus formatting" altogether. The inconsistent formatting makes creating hierarchies difficult, makes users guess what we've done to find specimens, and is indistinguishable from other "traditional" uses of that format.)

@dustymc dustymc added Function-Taxonomy/Identification NeedsDocumentation When the issue is resolved in Arctos repository, this should be moved to the Documentation-wiki repo Help wanted I have a question on how to use Arctos labels Sep 21, 2018
@dustymc dustymc added this to the Needs Discussion milestone Sep 21, 2018
@campmlc
Copy link

campmlc commented Sep 21, 2018 via email

@DerekSikes
Copy link

DerekSikes commented Sep 21, 2018 via email

@Jegelewicz Jegelewicz added this to Not Reviewed in Taxonomy Committee Sep 21, 2018
@dustymc
Copy link
Contributor Author

dustymc commented Sep 28, 2018

@campmlc most all taxa are eventually changed, it's good to be able to find related things, this actively prevents that by ~doubling the number of records that have to be tracked down and linked up (or I don't understand something). No relationships were added with the recently-created names.

It's only necessary to find records entered by one of the variants if we allow the variants. Minimally, I'd like documentation here. "When encountering {situation that's preventing users from finding specimens at this very moment}, do {something that might somehow minimize that}." Ideally I'd like rules - "disallow parentheses in taxon_name.scientific_name" is absolutely where I'd go, but that's from the perspective of a data-shuffler and not a taxonomist. I'm asking ya'll where we can or should go with this.

Note also that display_name is automagically generated and what's generally displayed. Given taxon name=Bla bla+genus=Bla + subgenus=Bla + species=Bla bla I can show most users "Bla (Bla) bla" by slightly altering the code which generates display name.

former is the same species as the latter.

I guess that's the root of my question - is there some difference in something that we care about between the two, or is this entirely "tradition"? Eg, given two specimens, if there a defensible reason for someone to catalog one of them as "Liopistha (Psilomya) concentrica" and the other as "Liopistha concentrica"?

There's more discussion of string-variants and automation and "preferred" in #757.

@campmlc
Copy link

campmlc commented Sep 28, 2018 via email

@dustymc
Copy link
Contributor Author

dustymc commented Sep 28, 2018

I think you may be mixing up names and classifications, but I'm not really sure of that either.

Agents are NOT analogous. I can name my poor kid 𓀉 𓃡 * aB1한글⌛ (Unicode is fun!) and you just have to deal with it. I very fortunately can't do that with taxonomy - we should (theoretically, anyway) be able to definitively say "that string does not appear as anything which might be considered a name in any publication which anyone might consider to be part of the taxonomic literature, so we won't accept it as a taxon name."

There are a couple million species-names in Arctos, and 97 of them have subgenus. Those are incredibly difficult to integrate with everything else. I see a few possibilities:

  1. Do nothing, except maybe document that subgenera are weird and it's likely nobody will find your specimens if you use them.
  2. Do something as names - allow subgenera, try to magic-link them to the "normal" format or add documentation or whatever.
  3. So something in classifications and ban parentheses from taxon names. This could be fully automated (through display_name), or we could add some sort of "alternative namestring" concept, or whatever.

I think the best answer comes from the answer to

given two specimens, if there a defensible reason for someone to catalog one of them as "Liopistha (Psilomya) concentrica" and the other as "Liopistha concentrica"

and if that works the way I think/hope then (3) above is probably the best approach (which still doesn't mean it's what we have to do - maybe tradition wins??).

@campmlc
Copy link

campmlc commented Sep 28, 2018 via email

@DerekSikes
Copy link

DerekSikes commented Sep 28, 2018 via email

@dustymc
Copy link
Contributor Author

dustymc commented Sep 28, 2018

too soon

Na. I might not be able to implement something, but I can probably avoid painting myself into that particular corner if I know what ya'll are thinking.

use subgenera

I'm not proposing to eliminate subgenera from the model, just wondering if we can avoid the funky formatting - eg, move them from the name into the classification, like we've done with all other similar data.

parentheses around author names

Those are classification data. "Echidna" is the name, "Echidna Forster, 1788" is metadata of the name (eg, classification data).

consistent

A model which allows nothing else will do it!

Taxon Concepts

I've never quite figured out how that could be usably implemented from the taxonomy side. I think the botanists have this one correct: it's "identification concepts" and not taxonomy at all. That is, to the extent I understand the concept of taxon concepts, the same sort of THING, at least from a specimen/collection perspective: "this critter is a member of that group-of-critters according to how this publication circumscribes said group-of-critters" - but it does it without making the mess that is taxonomy into a messier mess. This has been implemented in Arctos for quite some time.

@dustymc
Copy link
Contributor Author

dustymc commented Oct 2, 2018

Lacking further feedback, I'll move forward with this.

Ursus (Euarctos) will be deleted, and parentheses will be disallowed in taxon_name.scientific_name.

Nothing will change when subgenus is not included.

screen shot 2018-10-02 at 11 18 57 am

UAM@ARCTOSTE>  select generateDisplayName('1076622') from dual;

GENERATEDISPLAYNAME('1076622')
------------------------------------------------------------------------------------------------------------------------
<i>Ursus</i>

Euarctos will be created as a name. The classification for a subgenus-term should include genus.

screen shot 2018-10-02 at 11 14 41 am

Display name will be formatted in the traditional way:


UAM@ARCTOSTE>  select generateDisplayName('1ADFAF35-B80C-E3B7-29DF47F8A5FA241C') from dual;

GENERATEDISPLAYNAME('1ADFAF35-B80C-E3B7-29DF47F8A5FA241C')
------------------------------------------------------------------------------------------------------------------------
<i>Ursus (Euarctos)</i> Some Dude, Probably

Subgeneric terms will also generate "traditional" display name when subgenus is included.

screen shot 2018-10-02 at 11 21 42 am

 UAM@ARCTOSTE>  select generateDisplayName('97') from dual;

GENERATEDISPLAYNAME('97')
------------------------------------------------------------------------------------------------------------------------
<i>Ursus (Euarctos) americanus</i> Pallas 1780

screen shot 2018-10-02 at 11 22 20 am

UAM@ARCTOSTE> 
GENERATEDISPLAYNAME('1076623')
------------------------------------------------------------------------------------------------------------------------
<i>Ursus (Euarctos) americanus amblyceps</i>

If there are no objections or better ideas in ~a week, I'll move this to production.

Here are current subgenus-like names:

select scientific_name from taxon_name where scientific_name like '%(%';

SCIENTIFIC_NAME
------------------------------------------------------------------------------------------------------------------------
Anodontia (Anodontia) alba
Cassis (Echinophoria)
Chiton (Rhyssoplax) olivaceus
Galba (Galba) palustris
Lepidochitona (Lepidochitona) cinerea
Liopistha (Psilomya) concentrica
Liopistha (Psilomya) concentricia
Liopistha (Psilomya) elongata
Marginella (Prunum)
Neoechinorhynchus (Neoechinorhynchus)
Odostomia (Boonea) bisuturalis
Oxytropidoceras (Adkinsites) belknapi
Oxytropidoceras (Adkinsites) imlayi
Oxytropidoceras (Manuaniceras)
Oxytropidoceras (Manuaniceras) elaboratum
Oxytropidoceras (Oxytropidoceras) bravoensis
Paludina (Vivipara)
Pecten (Camptonectes) bubonis
Pecten (Pallium)
Potamides (Pirenella)
Sigaretus (Eunaticina) textilis
Symphynota (Lasmigona) costata
Ursus (Euarctos)
Vexillum (Costellaria)
Serpula (Cycloserpula) cragini
Fossarus (Gottoina) bella
Physa (Gyrina)
Alaria (Paralaria)
Alasmidonta (Pressodonta) calceola
Aporrhais (Perissoptera) prolabiata
Anchura (Drepanocheilus) mudgeana
Anchura (Drepanochilus) calcaris
Cardium (Granocardium) tippanum
Cardium (Trachycardium) longstreeti
Fulvia (Fulvia) laevigata
Mangilia (Pleurotomella) blakeana
Cucullaea (Idonearca) capax
Pedetontus (Verhoeffilis)
Fasciolaria (Piestochilus) galpiniana
Meretrix (Flaventia) belviderensis
Ostrea (Lopha) subovata
Diplopoma (Troschelvindex)
Clava (Clava) fasciata
Plagiola (Amygdalonaias) donaciformis
Plagiola (Amygdalonaias) elegans
Labiostomum (Eugenuris)
Astraea (Fissicella) denticulata
Turritella (Haustator) whitei
Turritella (Itaustator) whitei
Pachydiscus (Pachydiscus) kamishakensis
Plagiorchis (Plagiorchis)
Catostomus (Pantosteus) clarkii
Catostomus (Pantosteus) discobolus
Catostomus (Pantosteus) plebeius
Hexaplex (Trunculariopsis) princeps
Xenophora (Onustus) exuta
Lampsilis (Eurynia) iris
Lampsilis (Lampsilis) multiradiatus
Lampsilis (Proptera) alata
Lampsilis (Proptera) alatus
Lampsilis (Proptera) amphichaenus
Lampsilis (Proptera) inflata
Lampsilis (Proptera) purpuratus
Calliostoma (Jujubinus) exasperatum
Lymnaea (Galba)
Quadrula (Fusconaia) rubiginosa
Quadrula (Fusconaia) solida
Quadrula (Theliderma) asper
Quadrula (Theliderma) lachrymosa
Quadrula (Theliderma) sphaerica
Pterotrigonia (Scabrotrigonia) thoracica
Ischnochiton (Stenoradsia) conspicuus
Mortoniceras (Angolaites) wintoni
Aleochara (Calochara)
Nassa (Amycla) corniculum
Nassarius (Plicarcularia) jonasi
Hybodus (Leiacanthus)
Gryphaea (Texigryphaea)
Gryphaea (Texigryphaea) belviderensis
Gryphaea (Texigryphaea) navia
Gryphaea (Texigryphaea) pitcheri
Gryphaea (Texigryphaea) tucumcarii
Gryphaea (Texigryphaea) washitaensis
Alabastrina (Alabastrina)
Alabastrina (Alabastrina) subvanvincquiae
Alabastrina (Atlasica)
Alabastrina (Atlasica) aguergourensis
Alabastrina (Atlasica) interica
Alabastrina (Atlasica) tamanarensis
Alabastrina (Atlasica) tildiana
Alabastrina (Siretia)
Alabastrina (Siretia) pallaryi
Pleuriocardia (Dochmocardia) pauperculum
Pleurobema (Pleurobema) raveneliana
Elliptio (Elliptio) gibbosus
Amphidonte (Ceratostreon)
Amphidonte (Ceratostreon) texanum

97 rows selected.

and names which contain subgenus-bearing local classifications:

select scientific_name from taxon_name, taxon_term where taxon_name.taxon_name_id=taxon_term.taxon_name_id and
source in ('Arctos','Arctos Plants') and term_type='subgenus';

are https://docs.google.com/spreadsheets/d/144cwe4pPmcVduFye9ikn_J6Hty7OUtcAS8idNpgaAeM/edit?usp=sharing

@dustymc dustymc modified the milestones: Needs Discussion, Next Task Oct 2, 2018
@dustymc dustymc added the Priority-High (Needed for work) High because this is causing a delay in important collection work.. label Oct 2, 2018
@dustymc dustymc self-assigned this Oct 2, 2018
@DerekSikes
Copy link

DerekSikes commented Oct 2, 2018 via email

@dustymc dustymc modified the milestones: Next Task, Active Development Oct 8, 2018
@dustymc
Copy link
Contributor Author

dustymc commented Oct 9, 2018

Attached are the intended identification -->taxon name targets.

temp_former_subgenus_ids.csv.zip

Affected collections are:

select
substr(guid,1,instr(guid,':',1,2)) || ' @ ' ||count(*) 
from 
temp_former_subgenus_ids
group by
substr(guid,1,instr(guid,':',1,2))
  7  ;

SUBSTR(GUID,1,INSTR(GUID,':',1,2))||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
MSB:Para: @ 49
MSB:Host: @ 2
UNR:Fish: @ 2
USNPC:Para: @ 3
MSB:Fish: @ 4648
UNM:ES: @ 137
KNWR:Ento: @ 1
UAM:ES: @ 1
CHAS:Inv: @ 138
UAM:Ento: @ 3

10 rows selected.

That should happen today; I plan to leave the ID strings and just switch the pointers to taxa.

Here's one of the newly-created names (http://arctos.database.museum/name/Gryphaea%20pitcheri)

screen shot 2018-10-09 at 7 58 08 am

I clicked "seed hierarchy" for the Genus - note that this solves the "subgenera break the editor" thing (which has always really been a "inconsistent data cannot be hierarchical, and the editor can be nothing else" problem).

screen shot 2018-10-09 at 7 58 47 am

And can we add a point or two to #1698? There are something like 6 fractured hierarchies for 7 species here; it would be difficult to design a system better at hiding specimens from users.

@dustymc
Copy link
Contributor Author

dustymc commented Oct 9, 2018

done

@dustymc dustymc closed this as completed Oct 9, 2018
@Jegelewicz Jegelewicz moved this from Not Reviewed to Approved for Implementation in Taxonomy Committee Oct 24, 2018
@Jegelewicz Jegelewicz moved this from Approved for Implementation to Implemented or No Action Recommended in Taxonomy Committee Oct 24, 2018
@sharpphyl
Copy link

How do I get a new taxon name to include the subgenus in parentheses?

Specifically, WoRMS (in their "match taxa tool") refers me to Cancellaria (Pyruclia) solida aphiaID 464689 as an "exact subgenus match" for Cancellaria solida which it doesn't recognize.

Screen Shot 2019-04-05 at 11 55 10 AM

I was able to clone the WoRMS classification into the WoRMS (via Arctos) classification but now the Taxon Name is Cancellaria solida and the species in the WoRMS (via Arctos) classification is Cancellaria (Pyruclia) solida.

Screen Shot 2019-04-05 at 1 20 04 PM

Screen Shot 2019-04-05 at 1 12 59 PM

Somehow this didn't get into our tables through the normal WoRMS (via Arctos) refreshes.

@dustymc
Copy link
Contributor Author

dustymc commented Apr 5, 2019

How do I get a new taxon name to include the subgenus in parentheses?

You can't; that's the point! Cancellaria solida and Cancellaria (Pyruclia) solida are the same THING, if we allow them both then people can't find what they're looking for. The information is retained in the classification - that's what classifications do, this one is no different. If you want to use something other than the bare name as the identification the A {string} formula is available, same as any other case of not wanting to use the verbatim name as the identification.

@sharpphyl
Copy link

Got it!

@Jegelewicz
Copy link
Member

From #988

I am not seeing the inclusion of the subgenus in the display name. The example given above https://arctos.database.museum/name/Gryphaea%20pitcheri does not include the subgenus in the display name anywhere that I can find.

In fact, on the catalog record pages, what the display name does is put the subgenus where the genus should be.

image

I would expect this identification to be Gryphaea (Texigryphaea) pitcheri as shown on the classification.

image

Texigryphaea pitcheri is a thing, but apparently so is Gryphaea (Texigryphaea) pitcheri and now we have them muddled together? Or am I missing something?

With regard to #1704 (comment)

If someone wants to identify something as Gryphaea (Texigryphaea) our current position in Arctos is that they must use Gryphaea {Gryphaea (Texigryphaea)} correct? But we allow the use of this subgenus in the classification, so we will end up with competing classifications for Gryphaea, one that includes the subgenus and one that doesn't? None of this feels good to me and I'm sure we are eventually going to have a conversation about a better way to handle it.

@Jegelewicz Jegelewicz reopened this Jan 21, 2021
@dustymc
Copy link
Contributor Author

dustymc commented Jan 21, 2021

I am not seeing the inclusion of the subgenus in the display name

You're just not seeing display_name.

Screen Shot 2021-01-21 at 9 08 06 AM

is identification.scientific_name.

I could do WHATEVER with display_name (I don't think we currently do anything at all), almost certainly needs discussed in a dedicated Issue.

what the display name does is put the subgenus where the genus should be.

If you mean the little gray bits....

Screen Shot 2021-01-21 at 9 11 25 AM

... that's full_taxon_name (which is autogenerated by https://github.com/ArctosDB/PG_DDL/blob/master/function/getFlatTaxonomy.sql and could be changed)

expect this identification to be

I'd expect it to be whatever was specified, and it is.

apparently so is Gryphaea (Texigryphaea) pitcheri

That cannot be a THING in Arctos - that's what this Issue (wisely!) prevents. https://arctos.database.museum/name/Gryphaea%20pitcheri is a THING, if it has something to do with https://arctos.database.museum/name/Texigryphaea%20pitcheri then that should be documented via Relationships. I do not think that's something that can safely be derived from strings - a taxonomic assertion is needed.

f someone wants to identify something as Gryphaea (Texigryphaea) our current position in Arctos is that they must use Gryphaea {Gryphaea (Texigryphaea)} correct?

Correct.

But we allow the use of this subgenus in the classification, so we will end up with competing classifications for Gryphaea, one that includes the subgenus and one that doesn't?

Potentially yes, and your collection could prefer the "same as WHATEVER, but with subgenus" in front of the WHATEVER classifications. That's what classifications DO - allow various assertions of metadata for a name.

@Jegelewicz
Copy link
Member

BUT you could easily have both classifications used in a single collection - Some things ID'd to the subgenus and some not.

@dustymc
Copy link
Contributor Author

dustymc commented Jan 21, 2021

I'm lost.

Classification preference is by collection - there's some consistency in how things happen within a collection, but not necessarily between collections. So CollectionA might prefer my "same as WHATEVER, but with subgenus" classification while CollectionB ignores it, but everything in CollectionA would encounter the preferred classifications in the same order and end up with the same data.

If you mean they'll only sometimes-maybe use the A {string} IDs then yes, probably, that's up to them and their procedures (and it's why we've separated classifications and taxonomy, and why I'm insistent that we don't denormalize taxonomy).

@Jegelewicz
Copy link
Member

I mean, that Derek could say "This one is Gryphaea {Gryphaea (Texigryphaea)} and that one is Gryphaea, but he can only choose ONE taxonomic source for his classifications, so all of his Gryphaea will either include the subgenus (Texigryphaea) or they won't.

@dustymc
Copy link
Contributor Author

dustymc commented Jan 21, 2021

Ah - correct, both of those would use the first-encountered preferred classification for "Gryphaea." Perhaps that's a place where we could do more with taxon concepts - presumably asserting concepts is the goal if those are planned assertions and not just gappy procedures.

@Jegelewicz
Copy link
Member

Jegelewicz commented Jan 27, 2021

With regard to TPT Taxonomy, I plan to enter classifications that include a subgenera like this one.

Dennyus carljonesi carljonesi

Note that the name does not include the subgenus, but the classification does....

So far, I have not run into the problem discussed above, so I think this should work. @campmlc thoughts? Also, @sharpphyl what do you think?

@Jegelewicz
Copy link
Member

Of course, this makes the display name look weird...

image

@dustymc
Copy link
Contributor Author

dustymc commented Jan 27, 2021

makes the display name look weird...

The formatter expects "Collodennyus" as subgenus.

@Jegelewicz
Copy link
Member

makes the display name look weird...

The formatter expects "Collodennyus" as subgenus.

Yep, but that doesn't seem to be how subgenera are displayed in a classification?

image

@Jegelewicz
Copy link
Member

This is probably why there are issues with WoRMS classifications?

image

@Jegelewicz
Copy link
Member

But it looks good on the catalog record
image

@Jegelewicz
Copy link
Member

I think we just need to change the way display name formats when a subgenus is involved. BUT is display name USED for anything?

@Jegelewicz
Copy link
Member

Also, I think we need a way for people to SEARCH Cancellaria (Pyruclia) solida and get all of the Cancellaria solida because there will be people who do that....

@campmlc
Copy link

campmlc commented Jan 27, 2021 via email

@sharpphyl
Copy link

This is probably why there are issues with WoRMS classifications?

YES

And until I linked Cancellaria solida to the WoRMS aphiaID 464689 for Cancellaria (Pyruclia) solida I was using a taxon name without a WoRMS aphiaID so it didn't automatically update. I catch them one by one but, as you noted, the name that shows on our catalog record is not the same as in the WoRMS (via Arctos) classification.

Screen Shot 2021-01-27 at 11 21 35 AM

Screen Shot 2021-01-27 at 11 21 47 AM

@dustymc
Copy link
Contributor Author

dustymc commented Jan 27, 2021

even if we do something weird with the classification?

https://en.wikipedia.org/wiki/gigo, but that does raise an interesting point for #3311 - I get the sense that we want to do more with display name, but that relies on various things (predictable format, the existence of nomenclatural_code, the existence of ranks, etc.) that can't be expected to exist in "external" classifications (nor apparently in "internalized" classifications).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Taxonomy/Identification Help wanted I have a question on how to use Arctos NeedsDocumentation When the issue is resolved in Arctos repository, this should be moved to the Documentation-wiki repo Priority-High (Needed for work) High because this is causing a delay in important collection work..
Projects
No open projects
Taxonomy Committee
  
Implemented or No Action Recommended
Development

No branches or pull requests

5 participants