Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change order of DMNS:Inv secondary identifications from "0" to "2" #6579

Closed
genevieve-anderegg opened this issue Jul 31, 2023 · 15 comments
Closed

Comments

@genevieve-anderegg
Copy link

With the recent implementation of identification orders, I noticed that all secondary identifications on DMNS:Inv records were changed to order=0, with the primary/most recent identification as order=1. For a majority of DMNS:Inv records with two identifications, the secondary identification is usually the identification given to the specimen by the collector (or dealer) that is now out of date and has been updated with revised taxonomy as per WoRMS, as below:

image

These legacy identifications are not "incorrect", they are just "less correct" than the primary identification because often the taxonomy has been updated. Therefore, it would be more correct to give these secondary identifications order=2, instead of order=0, as per the Arctos handbook: https://handbook.arctosdb.org/documentation/identification.html#identification-order
Changing the legacy to order=2 from order=0 also displays the secondary ID in black instead of gray text, which we like!
image

However, there are some amount of records where the legacy ID was clearly incorrect, and has been re-identified correctly by us. In this case the legacy ID should be order=0 (as discussed with @sharpphyl @acdoll). So there is some data cleaning and review that will have to happen.
@dustymc @Jegelewicz , would there be a way to have a csv of the identifications for each record and their remarks sent to me for review, and have the rest of the secondary IDs magicked to order=2? Then I can go through and review the csv and determine which secondary identifications reflect IDs there are in fact incorrect, and keep those as order=0. Or is there a better way to go about doing this?
Thanks!

@Jegelewicz
Copy link
Member

This looks like a job for Write SQL, or we need an Identifications view and download.

@genevieve-anderegg
Copy link
Author

This looks like a job for Write SQL, or we need an Identifications view and download.

Can I 1) download a csv of identifications for review, and 2)bulk change ID orders myself? If so, some guidance would be great! Don't have a lot of experience with SQL

@Jegelewicz
Copy link
Member

Can I 1) download a csv of identifications for review, and 2)bulk change ID orders myself?

I don't think you can do either of those things right now, but both seem like reasonable expectations of tasks one could perform. For the first, a toll like the identifiers view, but for identifications would make sense. For the second, the ability to edit identifications based upon the identification ID would be needed - that seems a bit more complicated, but I could be wrong.

Probably we need input from @dustymc

@dustymc
Copy link
Contributor

dustymc commented Jul 31, 2023

I can help update. Could definitely be a loader, but I think we need some maturity with this model before considering such things.

I can get CSV, but I need to know exactly what you want in it (see #6532, there are no "columns"!).

Bigger-picture, I didn't quite do what I said I was going to in #3540 - a bunch of things (most recently #6552 (comment)) are built around the idea that there's one "best" identification so I didn't break that until there's a use case and some data to help everyone get on the same page. This looks like a use case and is definitely a movement towards there no longer being one clear 'best,' so - help?

https://arctos.database.museum/guid/DMNS:Inv:21694 is currently...


arctosprod@arctos>> select scientific_name from flat where guid='DMNS:Inv:21694';
 scientific_name 
-----------------
 Ostrea megodon

and I think it should probably be - assuming all the previous IDs get elevated above zero - something more like...

select string_agg(identification.scientific_name,'; ' order by identification_order)
from flat
inner join identification on flat.collection_object_id=identification.collection_object_id
where guid='DMNS:Inv:21694'
;

 Ostrea megodon; Ostrea megodon; Ostrea megodon; Undulostrea megodon; Ostrea megodon; Ostrea megodon; Ostrea megodon

That's of course assuming that we have to shove something into a "simple" table cell - the actual identification is eg #6532 (comment) (a big complex data object potentially involving lots of also-complicated data objects arranged in complex ways).

I suspect that'll also melt kinda every aggregator, and have absolutely no idea how to balance this complexity with the other issue, which is a request to somehow impossibly simplify the same data to appease those same aggregators!

@Jegelewicz
Copy link
Member

I think it should probably be - assuming all the previous IDs get elevated above zero

I suspect that'll also melt kinda every aggregator

yes and yes BUT we can simplify a lot of this by ONLY sending identifications with order 1 to the aggregators. We could also send the identification history extension for everything else. This would not solve ALL aggregator problems, but would solve a lot of them and pare down the need for us to squish a bunch of information into a 'field' that only needs one thing.

@dustymc
Copy link
Contributor

dustymc commented Jul 31, 2023

ONLY sending identifications with order 1 to the aggregators

  1. I expect that'll result in a lot of things not having any identification at all,
  2. I expect that'll still result in a lot of things having multiple identification,
  3. If that's NOT what we're doing for "local cache" stuff (I think that's what "yes and yes" means) then it will require (significantly) more resources than I have available

@sharpphyl
Copy link

Is there a reason that "0" was chosen as the default for the legacy (or older, now unaccepted) scientific name instead of "2"?

@dustymc
Copy link
Contributor

dustymc commented Jul 31, 2023

chosen

#3540 (comment) - order 1 (best) and 0 (don't accept) are identical to previous data/functionality/accepted_id_fg - there wasn't much choice involved once the model was solidified.

@genevieve-anderegg
Copy link
Author

I can help update. Could definitely be a loader, but I think we need some maturity with this model before considering such things.

That would be great. Once I have a csv (see below), then I know which identifications on which records we want to change from order=0 to order=2 with your help (and also legacy identifications to keep as order=0 because they actually are incorrect, which is probably a small minority).

I can get CSV, but I need to know exactly what you want in it

A CSV with the catalog number/guid (DMNS:Inv:#####) and then additional columns with the specimen's identification data. If they all get concatenated into one column based on how that information is stored on the record then that's fine, I can just read through and parse all that information and then make a list of which records we don't want to update legacy IDs on. The main workflow block for me is just being able to view all that information at once, so even if it's a pretty ugly spreadsheet that's fine. Whatever you can get me will be great!

@Jegelewicz
Copy link
Member

@dustymc I think the format of the part view/download would make sense where each row includes the GUID, identification fields plus identification attributes. let me know if that's just crazy talk....

@dustymc
Copy link
Contributor

dustymc commented Aug 1, 2023

crazy talk

It doesn't include anything about taxa and can't accommodate an unknown number of attributes, so limited anyway.

pretty ugly

Here's a conversation starter.

temp_dmns_inv_ids.csv.zip

@genevieve-anderegg
Copy link
Author

Thanks for the sheet Dusty! I'll take a look at this.
In the meantime, would you be able to update all identifications that are currently order=0 to order=2? I think this is what we want for a vast majority of our records. I can use spreadsheets and the Arctos backup I just made to review and try and catch any outliers (the few legacy identifications we have that were just plain wrong, vs. revised taxonomy, which is most).

@sharpphyl
Copy link

@genevieve-anderegg - I've looked at the csv and agree that we should change all the 0 to 2. Of the 8,743 records marked 0, there are 170 strings marked 0 which we might want to leave that way, but we can change them back when we look them over. Everything else should probably be marked 2.

@dustymc
Copy link
Contributor

dustymc commented Aug 14, 2023

update all identifications that are currently order=0 to order=2

temp_dmns_inv_ids_zero.csv.zip

UPDATE 8739

@dustymc dustymc closed this as completed Aug 14, 2023
@genevieve-anderegg
Copy link
Author

Thank you so much Dusty!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants