Change order of DMNS:Inv secondary identifications from "0" to "2" #6579

genevieve-anderegg · 2023-07-31T16:45:22Z

With the recent implementation of identification orders, I noticed that all secondary identifications on DMNS:Inv records were changed to order=0, with the primary/most recent identification as order=1. For a majority of DMNS:Inv records with two identifications, the secondary identification is usually the identification given to the specimen by the collector (or dealer) that is now out of date and has been updated with revised taxonomy as per WoRMS, as below:

These legacy identifications are not "incorrect", they are just "less correct" than the primary identification because often the taxonomy has been updated. Therefore, it would be more correct to give these secondary identifications order=2, instead of order=0, as per the Arctos handbook: https://handbook.arctosdb.org/documentation/identification.html#identification-order
Changing the legacy to order=2 from order=0 also displays the secondary ID in black instead of gray text, which we like!

However, there are some amount of records where the legacy ID was clearly incorrect, and has been re-identified correctly by us. In this case the legacy ID should be order=0 (as discussed with @sharpphyl @acdoll). So there is some data cleaning and review that will have to happen.
@dustymc @Jegelewicz , would there be a way to have a csv of the identifications for each record and their remarks sent to me for review, and have the rest of the secondary IDs magicked to order=2? Then I can go through and review the csv and determine which secondary identifications reflect IDs there are in fact incorrect, and keep those as order=0. Or is there a better way to go about doing this?
Thanks!

Jegelewicz · 2023-07-31T16:54:47Z

This looks like a job for Write SQL, or we need an Identifications view and download.

genevieve-anderegg · 2023-07-31T18:12:47Z

This looks like a job for Write SQL, or we need an Identifications view and download.

Can I 1) download a csv of identifications for review, and 2)bulk change ID orders myself? If so, some guidance would be great! Don't have a lot of experience with SQL

Jegelewicz · 2023-07-31T18:39:06Z

Can I 1) download a csv of identifications for review, and 2)bulk change ID orders myself?

I don't think you can do either of those things right now, but both seem like reasonable expectations of tasks one could perform. For the first, a toll like the identifiers view, but for identifications would make sense. For the second, the ability to edit identifications based upon the identification ID would be needed - that seems a bit more complicated, but I could be wrong.

Probably we need input from @dustymc

dustymc · 2023-07-31T18:51:25Z

I can help update. Could definitely be a loader, but I think we need some maturity with this model before considering such things.

I can get CSV, but I need to know exactly what you want in it (see #6532, there are no "columns"!).

Bigger-picture, I didn't quite do what I said I was going to in #3540 - a bunch of things (most recently #6552 (comment)) are built around the idea that there's one "best" identification so I didn't break that until there's a use case and some data to help everyone get on the same page. This looks like a use case and is definitely a movement towards there no longer being one clear 'best,' so - help?

https://arctos.database.museum/guid/DMNS:Inv:21694 is currently...


arctosprod@arctos>> select scientific_name from flat where guid='DMNS:Inv:21694';
 scientific_name 
-----------------
 Ostrea megodon

and I think it should probably be - assuming all the previous IDs get elevated above zero - something more like...

select string_agg(identification.scientific_name,'; ' order by identification_order)
from flat
inner join identification on flat.collection_object_id=identification.collection_object_id
where guid='DMNS:Inv:21694'
;

 Ostrea megodon; Ostrea megodon; Ostrea megodon; Undulostrea megodon; Ostrea megodon; Ostrea megodon; Ostrea megodon

That's of course assuming that we have to shove something into a "simple" table cell - the actual identification is eg #6532 (comment) (a big complex data object potentially involving lots of also-complicated data objects arranged in complex ways).

I suspect that'll also melt kinda every aggregator, and have absolutely no idea how to balance this complexity with the other issue, which is a request to somehow impossibly simplify the same data to appease those same aggregators!

Jegelewicz · 2023-07-31T19:30:22Z

I think it should probably be - assuming all the previous IDs get elevated above zero

I suspect that'll also melt kinda every aggregator

yes and yes BUT we can simplify a lot of this by ONLY sending identifications with order 1 to the aggregators. We could also send the identification history extension for everything else. This would not solve ALL aggregator problems, but would solve a lot of them and pare down the need for us to squish a bunch of information into a 'field' that only needs one thing.

dustymc · 2023-07-31T19:35:20Z

ONLY sending identifications with order 1 to the aggregators

I expect that'll result in a lot of things not having any identification at all,
I expect that'll still result in a lot of things having multiple identification,
If that's NOT what we're doing for "local cache" stuff (I think that's what "yes and yes" means) then it will require (significantly) more resources than I have available

sharpphyl · 2023-07-31T23:16:20Z

Is there a reason that "0" was chosen as the default for the legacy (or older, now unaccepted) scientific name instead of "2"?

dustymc · 2023-07-31T23:29:33Z

chosen

#3540 (comment) - order 1 (best) and 0 (don't accept) are identical to previous data/functionality/accepted_id_fg - there wasn't much choice involved once the model was solidified.

genevieve-anderegg · 2023-08-01T18:20:16Z

I can help update. Could definitely be a loader, but I think we need some maturity with this model before considering such things.

That would be great. Once I have a csv (see below), then I know which identifications on which records we want to change from order=0 to order=2 with your help (and also legacy identifications to keep as order=0 because they actually are incorrect, which is probably a small minority).

I can get CSV, but I need to know exactly what you want in it

A CSV with the catalog number/guid (DMNS:Inv:#####) and then additional columns with the specimen's identification data. If they all get concatenated into one column based on how that information is stored on the record then that's fine, I can just read through and parse all that information and then make a list of which records we don't want to update legacy IDs on. The main workflow block for me is just being able to view all that information at once, so even if it's a pretty ugly spreadsheet that's fine. Whatever you can get me will be great!

Jegelewicz · 2023-08-01T18:23:51Z

@dustymc I think the format of the part view/download would make sense where each row includes the GUID, identification fields plus identification attributes. let me know if that's just crazy talk....

dustymc · 2023-08-01T22:56:55Z

crazy talk

It doesn't include anything about taxa and can't accommodate an unknown number of attributes, so limited anyway.

pretty ugly

Here's a conversation starter.

temp_dmns_inv_ids.csv.zip

genevieve-anderegg · 2023-08-07T22:58:00Z

Thanks for the sheet Dusty! I'll take a look at this.
In the meantime, would you be able to update all identifications that are currently order=0 to order=2? I think this is what we want for a vast majority of our records. I can use spreadsheets and the Arctos backup I just made to review and try and catch any outliers (the few legacy identifications we have that were just plain wrong, vs. revised taxonomy, which is most).

sharpphyl · 2023-08-08T13:09:29Z

@genevieve-anderegg - I've looked at the csv and agree that we should change all the 0 to 2. Of the 8,743 records marked 0, there are 170 strings marked 0 which we might want to leave that way, but we can change them back when we look them over. Everything else should probably be marked 2.

dustymc · 2023-08-14T23:28:42Z

update all identifications that are currently order=0 to order=2

temp_dmns_inv_ids_zero.csv.zip

UPDATE 8739

genevieve-anderegg · 2023-08-15T15:26:57Z

Thank you so much Dusty!

genevieve-anderegg added Function-Taxonomy/Identification Data Quality labels Jul 31, 2023

dustymc mentioned this issue Aug 1, 2023

Error - Insufficient Information Given (Bulkload Citations) #6590

Closed

Jegelewicz mentioned this issue Aug 8, 2023

Identification Order? #6614

Closed

dustymc closed this as completed Aug 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change order of DMNS:Inv secondary identifications from "0" to "2" #6579

Change order of DMNS:Inv secondary identifications from "0" to "2" #6579

genevieve-anderegg commented Jul 31, 2023

Jegelewicz commented Jul 31, 2023

genevieve-anderegg commented Jul 31, 2023

Jegelewicz commented Jul 31, 2023

dustymc commented Jul 31, 2023

Jegelewicz commented Jul 31, 2023

dustymc commented Jul 31, 2023

sharpphyl commented Jul 31, 2023

dustymc commented Jul 31, 2023

genevieve-anderegg commented Aug 1, 2023

Jegelewicz commented Aug 1, 2023

dustymc commented Aug 1, 2023

genevieve-anderegg commented Aug 7, 2023

sharpphyl commented Aug 8, 2023

dustymc commented Aug 14, 2023

genevieve-anderegg commented Aug 15, 2023

Change order of DMNS:Inv secondary identifications from "0" to "2" #6579

Change order of DMNS:Inv secondary identifications from "0" to "2" #6579

Comments

genevieve-anderegg commented Jul 31, 2023

Jegelewicz commented Jul 31, 2023

genevieve-anderegg commented Jul 31, 2023

Jegelewicz commented Jul 31, 2023

dustymc commented Jul 31, 2023

Jegelewicz commented Jul 31, 2023

dustymc commented Jul 31, 2023

sharpphyl commented Jul 31, 2023

dustymc commented Jul 31, 2023

genevieve-anderegg commented Aug 1, 2023

Jegelewicz commented Aug 1, 2023

dustymc commented Aug 1, 2023

genevieve-anderegg commented Aug 7, 2023

sharpphyl commented Aug 8, 2023

dustymc commented Aug 14, 2023

genevieve-anderegg commented Aug 15, 2023