Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Table Request - AMNH: American Museum of Natural History #5886

Open
4 tasks
dustymc opened this issue Mar 13, 2023 · 24 comments
Open
4 tasks

Code Table Request - AMNH: American Museum of Natural History #5886

dustymc opened this issue Mar 13, 2023 · 24 comments
Labels
CodeTableCleanup Our bad data leads to more bad data. Fix it! Denormalizer Issue is making data less-accessible Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ...

Comments

@dustymc
Copy link
Contributor

dustymc commented Mar 13, 2023

Instructions

This is a template to facilitate communication with the Arctos Code Table Committee. Submit a separate request for each relevant value. This form is appropriate for exploring how data may best be stored, for adding vocabulary, or for updating existing definitions.

Reviewing documentation before proceeding will result in a more enjoyable experience.


Initial Request

Goal: Describe what you're trying to accomplish. This is the only necessary step to start this process. The Committee is available to assist with all other steps. Please clearly indicate any uncertainty or desired guidance if you proceed beyond this step.

All AMNH: American Museum of Natural History should be replaced with other ID type = other identifier and issued by agent American Museum of Natural History

Proposed Value: Proposed new value. This should be clear and compatible with similar values in the relevant table and across Arctos.

Proposed Definition: Clear, complete, non-collection-type-specific functional definition of the value. Avoid discipline-specific terminology if possible, include parenthetically if unavoidable.

Context: Describe why this new value is necessary and existing values are not.

Table: Code Tables are http://arctos.database.museum/info/ctDocumentation.cfm. Link to the specific table or value. This may involve multiple tables and will control datatype for Attributes. OtherID requests require BaseURL (and example) or explanation. Please ask for assistance if unsure.

Collection type: Some code tables contain collection-type-specific values. collection_cde may be found from https://arctos.database.museum/home.cfm

Priority: Please describe the urgency and/or choose a priority-label to the right. You should expect a response within two working days, and may utilize Arctos Contacts if you feel response is lacking.

Available for Public View: Most data are by default publicly available. Describe any necessary access restrictions.

Project: Add the issue to the Code Table Management Project.

Discussion: Please reach out to anyone who might be affected by this change. Leave a comment or add this to the Committee agenda if you believe more focused conversation is necessary.

Approval

All of the following must be checked before this may proceed.

The How-To Document should be followed. Pay particular attention to terminology (with emphasis on consistency) and documentation (with emphasis on functionality).

  • Code Table Administrator[1] - check and initial, comment, or thumbs-up to indicate that the request complies with the how-to documentation and has your approval
  • Code Table Administrator[2] - check and initial, comment, or thumbs-up to indicate that the request complies with the how-to documentation and has your approval
  • DBA - The request is functionally acceptable. The term is not a functional duplicate, and is compatible with existing data and code.
  • DBA - Appropriate code or handlers are in place as necessary. (ID_References, Media Relationships, Encumbrances, etc. require particular attention)

Rejection

If you believe this request should not proceed, explain why here. Suggest any changes that would make the change acceptable, alternate (usually existing) paths to the same goals, etc.

  1. Can a suitable solution be found here? If not, proceed to (2)
  2. Can a suitable solution be found by Code Table Committee discussion? If not, proceed to (3)
  3. Take the discussion to a monthly Arctos Working Group meeting for final resolution.

Implementation

Once all of the Approval Checklist is appropriately checked and there are no Rejection comments, or in special circumstances by decree of the Arctos Working Group, the change may be made.

Review everything one last time. Ensure the How-To has been followed. Ensure all checks have been made by appropriate personnel.

Make changes as described above. Ensure the URL of this Issue is included in the definition.

Close this Issue.

DO NOT modify Arctos Authorities in any way before all points in this Issue have been fully addressed; data loss may result.

Special Exemptions

In very specific cases and by prior approval of The Committee, the approval process may be skipped, and implementation requirements may be slightly altered. Please note here if you are proceeding under one of these use cases.

  1. Adding an existing term to additional collection types may proceed immediately and without discussion, but doing so may also subject users to future cleanup efforts. If time allows, please review the term and definition as part of this step.
  2. The Committee may grant special access on particular tables to particular users. This should be exercised with great caution only after several smooth test cases, and generally limited to "taxonomy-like" data such as International Commission on Stratigraphy terminology.
@dustymc
Copy link
Contributor Author

dustymc commented Mar 13, 2023

I will plan on proceeding with this about 2023-03-27 if there are no objections.

I will proceed immediately upon approval of each of the involved collections.

Data:
temp_amnh_american_museum_of_natural_history.csv.zip

Summary:

guid_prefix numrecs approve
Arctos:Entity 1
CHAS:Bird 8 yes
DMNS:Bird 3
DMNS:Mamm 46
MSB:Bird 30
MSB:Host 1
MSB:Mamm 3563
MVZ:Bird 84
MVZ:Herp 12
MVZObs:Bird 1
OWU:ES 22
UAM:Ento 22
UAM:ES 2
UAMObs:Ento 1
UCM:Egg 567 yes
UCM:Herp 21 yes
UCM:Mamm 178 yes

Users:
@ebraker
@mkoo
@campmlc
@ccicero
@DerekSikes
@atrox10
@wellerjes
@jrpletch
@AdrienneRaniszewski
@acdoll
@jldunnum
@jrdemboski
@msbparasites
@mlbowser
@catherpes
@Jegelewicz
@droberts49

See also #5771

@campmlc
Copy link

campmlc commented Mar 14, 2023

Cannot be converted to other identifier without additional processing. Used to link records across institutions. MSB:Mamm links could be replace more accurately by AMNH:Mamm, #5887, or however they reference their mammal specimens at GBIF. The MSB Mamm records will need to explicitly reference an AMNH Mamm catalog number, not an other ID.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 14, 2023

Nothing about the proposed change affects any of the things you've listed in any way.

@Jegelewicz
Copy link
Member

Jegelewicz commented Mar 14, 2023

Cannot be converted to other identifier without additional processing. Used to link records across institutions.

@campmlc I want to reassure you that the proposed change will make things BETTER. Perhaps this would be more palatable if we used American Museum of Natural History Mammology Collection as the agent for all mamm collections? Note that the agent is linked to the collection's entry in GRSciColl.

I can create agents for the other types of collections too - although the inverts might get crazy...

@Jegelewicz
Copy link
Member

bird collections - American Museum of Natural History Ornithology Collection

herp collections - American Museum of Natural History Herpetology Collection

The ES and Ento stuff needs review one by one and should probably just use the generic AMNH agent until someone can determine which of the AMNH collections issued the identifiers. (see the list of collections by scrolling down this page.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 14, 2023

for all mamm collections?

That's not an ENTIRELY safe assumption; every mammal collection (and probably everyone else) seems to have a few random things. That aside, if we can somehow be more specific then this is a great opportunity to do so. (And there's now a remarks field in which we might explain some assumptions if necessary.)

@campmlc
Copy link

campmlc commented Mar 14, 2023

Mammalogy collection is better- but the MSB records need to be updated.
Regardless, this cannot proceed until #6004 and #6002 are resolved.

@Jegelewicz
Copy link
Member

MSB Birds asks for institutional catalog number instead of other identifier.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 15, 2023

MSB Birds asks for institutional catalog number instead of other identifier.

#6012

@wellerjes
Copy link

Reviewed; okay to proceed for CHAS

@ebraker
Copy link
Contributor

ebraker commented Mar 16, 2023

This is getting complicated checking through all this. For UCM:

  • It is fine to make this conversion, but first all UCM:Mamm records relating to this field will need "AMNH " appended to the prefix (note that most of the AMNH catalog numbers already have alpha characters, so it will be: AMNH [space]

image

UCM:Herp and UCM:Fish already have the prefix in the field.

  • I just did a couple hour dive and I think the UCM:Egg records are actually all USMN not AMNH (our Denis Gale egg colln labels and ledger both say "Nat. Museum" - looks like Smithsonian has a few hundred of Denis Gale eggs, vs AMNH has 9 clutches to boost rationale). I'm pretty hesitant to do a global update on such a tight timeline, but I'd rather have UCM:Egg point to 'assigned by "Smithsonian National Museum of Natural History" rather than AMNH (and stop short of adding acronym as requested above). Is this possible?

image

@Jegelewicz
Copy link
Member

@ebraker I suggest that you do not add the "AMNH " prefix. I can't find their database, but in GBIF, their catalog numbers are just as you have them now:

image

image

@dustymc
Copy link
Contributor Author

dustymc commented Mar 16, 2023

complicated checking through all this

I'm not sure we're being clear on this point: If you do nothing, this is a lateral move. You don't have to do anything, and nothing will be lost if you don't do anything. (But it's a good opportunity to cleanup/check/note problems, or even fix/improve if possible.)

I'll post bulkloader files in a bit.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 16, 2023

Here are UCM files for Other Identifiers - Bulk Unload and
Other Identifiers - Bulkload
temp_ucm_amnh_loader.csv
temp_ucm_amnh_unloader.csv

should be easy to adjust whatever, or I can rebuild them with adjustments for your review if not. Let me know if you need something else.

@ebraker
Copy link
Contributor

ebraker commented Mar 16, 2023

@dustymc Thanks, this will work.

I suggest that you do not add the "AMNH " prefix. I can't find their database, but in GBIF, their catalog numbers are just as you have them now

@Jegelewicz I think it is important to have the institutional identifier display alongside catalog number (and I am certain that AMNH would want catalog numbers cited with the institutional acronym, as the 'MO' is the collection prefix for mammals). When other_ID_type had AMNH included, it served this purpose. With the suggested transition "assigned by [full institution name'" is less explicit than having the acronym "stamp", which generally conveys institutional ownership at some point. So I plan to make sure all UCM records retain this info. Most already do, there's just a bit of unevenness for some of them.

@ebraker
Copy link
Contributor

ebraker commented Mar 16, 2023

@dustymc Just double-checking before I hit the button... i should be seeing this message while unloading identifiers(?):

image

I'm sort of terrified since I combined all the unbulkload files together so this is well over a thousand records.

@dustymc
Copy link
Contributor Author

dustymc commented Mar 16, 2023


arctosprod@arctos>> select status,username,count(*) from cf_temp_delete_oids group by status,username;
     status     |  username  | count 
----------------+------------+-------
 Found count: 0 | ebraker    |     4
 Found count: 1 | acdoll     |     8
 Found count: 1 | sharpphyl  |    68
 Found count: 1 | jegelewicz |    77
 Found count: 1 | kderieg    |   108
 Found count: 0 | jtgiermak  |     2
 Found count: 0 | mbreslin   |   195
 Found count: 1 | ACHINN     |    20
 Found count: 1 | DEMBOSKI   |   453
 Found count: 1 | ebraker    |  1218
 Found count: 1 | adrienner  |   262
 Found count: 1 | ganderegg  |    74
 Found count: 1 | jtgiermak  |   635
 Found count: 1 | ffdss      |    64
 Found count: 1 | jessicatir |    82
(15 rows)

Looks like you're good - and an upgrade is on The List.

@ebraker
Copy link
Contributor

ebraker commented Mar 16, 2023

Thanks. Fixed UCM records via identifier loaders.

@dustymc dustymc added CodeTableCleanup Our bad data leads to more bad data. Fix it! Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ... Denormalizer Issue is making data less-accessible labels May 3, 2023
@mkoo
Copy link
Member

mkoo commented Nov 18, 2023

I can see why this issue got stalled but with MVZ:Bird and MVZ:Herp, let's keep it simple since we have a mix of prefixes too. @dustymc Please only add issued_by to American Museum of Natural History (https://arctos.database.museum/agent/10000305 -- not to a division) where null and and change id_type to 'identifier'. If one day, we confirm with AMNH their catalog numbers hopefully they will also have a proper URL we can link to. Meanwhile please help us clean this up!

I manually fixed our MVZObs and the Arctos Entity

@dustymc
Copy link
Contributor Author

dustymc commented Nov 18, 2023

temp_amnh_mvz.csv

UPDATE 95

Here's what's left
temp_amnh.csv.zip


 guid_prefix | count 
-------------+-------
 MSB:Mamm    |  3563
 MSB:Bird    |    30
 OWU:ES      |    22
 UAM:ES      |     2
 DMNS:Bird   |     3
 DMNS:Mamm   |    46
 MSB:Host    |     1
 UAMObs:Ento |     1
 UAM:Ento    |    22
 CHAS:Bird   |     8
(10 rows)

@mkoo
Copy link
Member

mkoo commented Nov 18, 2023

Chas:Bird said yes go ahead. Can you do that one please?!

@dustymc
Copy link
Contributor Author

dustymc commented Nov 18, 2023

temp_amnh_chas.csv

UPDATE 8

thx

@jrpletch
Copy link

jrpletch commented Mar 8, 2024

OWU's are updated, ours were Department of Vertebrate Paleontology.

@dustymc
Copy link
Contributor Author

dustymc commented Sep 9, 2024

@jldunnum there's absolutely no way #8042 (comment) can be addressed while this is half-done. The proposal is still to simply move the information to a more-accessible system, which will not change anything at all about the information carried by the legacy type and does not require any cleanup. (It does provide a clearly-needed mechanism to be much more precise, but that's for future use and has nothing to do with legacy data.) #5886 (comment) is, was, and will remain accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CodeTableCleanup Our bad data leads to more bad data. Fix it! Denormalizer Issue is making data less-accessible Priority - Wildfire Potential ignore this at everyone's peril, may smolder for now ...
Projects
None yet
Development

No branches or pull requests

7 participants