Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part Cleanup #3708

Closed
dustymc opened this issue Jul 2, 2021 · 30 comments
Closed

Part Cleanup #3708

dustymc opened this issue Jul 2, 2021 · 30 comments
Labels
CodeTableCleanup Our bad data leads to more bad data. Fix it! Help wanted I have a question on how to use Arctos

Comments

@dustymc
Copy link
Contributor

dustymc commented Jul 2, 2021

Spreadsheet: https://docs.google.com/spreadsheets/d/1c1PL2Ij1x-lweDPzBeqdJJ6OTthWebPLRfhRhfGIN0Y/edit#gid=609934249




original below

Preservation is now separated, let's not lose momentum on cleaning parts. Maybe this is better as a bunch of Issues, but I think there's a lot of overlap so I'll start as one and we can split if necessary.


DNA extraction - can this be merged with DNA, or does that reference somehow unextracted DNA, or ???


carcass - docs suggest these are just poorly managed but I think the term may be intended to mean something else??


  • leg bone
  • long bone

These seem overly vague, maybe. It would be very useful to somehow get input from users - is this level of detail important? Could this be 'bone' + remark? Do we need something analogous to https://arctos.database.museum/info/ctDocumentation.cfm?table=cttooth_type to go with part=bone? Should these just remain parts??

NMMNH:Paleo, UAM:ES, and UTEP:ES have loans involving these and might be able to shed light.


  • ... lithified
  • ... slide

could be moved to https://arctos.database.museum/info/ctDocumentation.cfm?table=ctpart_preservation


Confounded condition (and maybe attributes and perhaps other stuff)

  • ...dissected
  • ...headless
  • ...skinned
  • ... sectioned
  • ...partial
  • skull-less

can those all be simply prepended to condition and removed from the part, or is something else going on there?


Can these make use of simple parts and https://arctos.database.museum/info/ctDocumentation.cfm?table=ctanatomical_direction ?

  • ... caudal
  • ... cervical
  • ... dorsal
  • ... dermal
  • ... rostral
  • ... front
  • ... hind
  • fore...
  • hind...
  • ... upper

Not quite parts - these aren't the same THING as a liver sample or skeleton - can/should we somehow try to separate them?

  • ...content
  • urine
  • ectoparasite
  • calcareous worm tube
  • ... flush
  • nest
  • pellet

"body parts" - maybe this actually makes sense if "other" can contain the above.


Everything containing "mount" probably deserves a closer look, but I suspect there are various probably-taxon-specific parts and techniques embedded in that.


Everything containing "skin" (except skin itself), and maybe 'clip', seems like it should be skin + some attribute or condition.


can/should we somehow avoid the word 'tissue' (which means something entirely different by itself)?

  • mammary tissue
  • pinned tissues

skeletal element

="bone"??


"plus" - all need merged into whatever happens in #3654

ditto tongue, trachea, and syrinx


trace fossil --->"trace" + preservation????


Help!

@dustymc dustymc added Help wanted I have a question on how to use Arctos CodeTableCleanup Our bad data leads to more bad data. Fix it! labels Jul 2, 2021
@dustymc dustymc added this to the Requires User Action milestone Jul 2, 2021
@Jegelewicz
Copy link
Member

centrum cleanup - see #3713

This was referenced Jul 8, 2021
@lingnancollection
Copy link

lingnancollection commented Jul 15, 2021

The new code tables for Part_Name and Part_preservation does not include one option for whole organism (formalin-fixed, 70% ethanol). Now the only options are ethanol or formalin separated. I saw that for specimens previously uploaded as formalin-fixed, 70% ethanol it automatically changes in the specimen search to two attributes. Does this mean we need to add 2 Part_preservation columns per specimen now?

@lingnancollection
Copy link

So I just used the bulkloader it and it does not allow two columns for Part_preservation for the same part_name. So the answer to my previous question is that now we need to include double the columns for Part_name, Part_preservation, Part_condition, Part_disposition, to indicate that the specimen is preserved with both methods. Can we account for bulkloader efficiency when the code tables are completely rearranged?

@dustymc
Copy link
Contributor Author

dustymc commented Jul 15, 2021

@Jegelewicz
Copy link
Member

So the answer to my previous question is that now we need to include double the columns for Part_name, Part_preservation, Part_condition, Part_disposition, to indicate that the specimen is preserved with both methods.

This will create two parts, each with only one preservation, which will cause you more difficulty!

@lingnancollection the best method for recording multiple part preservations is to catalog without parts, then add parts using the part bulkload tool which will allow you to add multiple part attributes per part. Please let me know if you need any help with this.

@dustymc
Copy link
Contributor Author

dustymc commented Jul 15, 2021

then add parts using the part bulkload tool

Note that this is the same as using the "data entry extras" - it's just a UI to the parts (and other) bulkloaders.

@lingnancollection
Copy link

https://handbook.arctosdb.org/how_to/How-to-Enter-Data-for-a-Single-Record.html#extras

@dustymc add extras for a single record seems to suggest doing one record at the time to add a second preservation method. What we need is to be able to bulkload records, not to do things manually. Is that possible? Can we just go back to the option when both preservations (formalin-fixed, 70% ethanol) are added as one? This is how must amphibians and reptiles specimens are preserved.

@Jegelewicz We bulkload all our records. By adding parts using the part bulkload tool, do you mean there is a way to bulkload the records and then do a second bulkload just to add multiple attributes per part? This will still result in two bulkloads instead of one with all the data, but it is better that adding one by one manually.

@cjconroy
Copy link

"Can we just go back to the option when both preservations (formalin-fixed, 70% ethanol) are added as one?"
I agree. I do most of my data entry with the data entry screen, and this new situation is going to make my job harder. Because of the extra steps, it is surely going to lead to mistakes. Maybe it was already suggested, but why not preservation one, preservation two for each part on the data entry screen?

@Jegelewicz
Copy link
Member

By adding parts using the part bulkload tool, do you mean there is a way to bulkload the records and then do a second bulkload just to add multiple attributes per part? This will still result in two bulkloads instead of one with all the data, but it is better that adding one by one manually.

Yes but not exactly as you stated. The parts bulkload tool will allow you to load a part with up to six part attributes. So the steps are to load your catalog records without any parts, then load the parts + their attributes using the part bulkload tool. I will be writing up some documentation for this process shortly - apologies that it hasn't been complete already! If you want me to meet up with you to demonstrate, let me know.

@Jegelewicz
Copy link
Member

@lingnancollection I have created a first draft of How To Bulkload Parts. Please let me know if it is missing something or something in it doesn't make sense. As soon as I have some parts to load I'll work on making a video tutorial as well.

@Nicole-Ridgwell-NMMNHS
Copy link

leg bone
long bone

I think it would be ok to replace these with bone + part modifier appendicular.

@lingnancollection
Copy link

@Jegelewicz Thank you for drafting the instructions on how to use the parts bulkoad tool. However, this creates a lot more issues. If I understood correctly, this tool requires to leave the all part sections empty during the first main bulkload. Then it requires to create a second csv file "bulkloadPartAttributes" template. However the template available does not have part_preservation, part_condition, part_lot_count, and part_disposition as options. We also have 3 parts per specimen (whole organism, and two tissues (which now are separated because the code name table just changed from tissue to muscle and liver). Can we add all the parts in the same csv file "bulkloadPartAttributes", or do we require 3 different csv files (on per part). Either way this adds a lot of extra work on preparing the data and sorting the records. It requires multiple csv files and increases the space to introduce errors managing the collection database.

@dustymc @Jegelewicz Is adding "formalin-fixed, 70% ethanol" as a preservation method and option? it seems from @cjconroy comment that more people will benefit from this. This will solve some of the part issues. As a side notes I think changing the code names completely make it really hard as a user to keep up with Arctos. Our collection is in a different time zone so we can not attend the working groups. Maybe a wide survey to Arctos users before making huge changes will help to prevent numerous issues.

@Jegelewicz
Copy link
Member

@lingnancollection That is not the correct template. Please see the Bulkload Part Tool, NOT the Part Attribute Bulkload Tool. Again, I will offer to meet up if we need to, but the Part Bulkload tool contains all of the fields needed to create a part along with up to six attributes for the part.

Is adding "formalin-fixed, 70% ethanol" as a preservation method and option?

There has been much discussion about combined code table values both in parts and attributes and these cause issues with discoverability. You can create a code table request and the community can discuss it if you feel strongly about this.

As a side notes I think changing the code names completely make it really hard as a user to keep up with Arctos. Our collection is in a different time zone so we can not attend the working groups. Maybe a wide survey to Arctos users before making huge changes will help to prevent numerous issues.

This change is a lot, but he community felt very strongly that it needed to be done in order to ensure that Arctos can supply appropriate information for various purposes. The lumping together of parts, preservations and other information in part names and preservation types makes it more difficult for users to query in ways that allow for discovery of everything that has ever been in formalin. A wholesale change like this is not the norm and the community has spent at least two years discussing it and talking through the details here on GitHub. I realize that it is not possible for everyone to attend the working group meetings as they are currently scheduled, which is why the community tries to have and document important conversations in GitHub where anyone can read through them and contribute to the conversation. That being said - could the community schedule a meeting time monthly that would allow you to participate? I would be happy to hold a second working group issues meeting each month that would facilitate this! The community also attempts to announce big changes and ask for comments in the Arctos Newsletter, but it appears it has yet to happen for this change. I'll add that to the newsletter task list. I also cannot find any announcements through our Google Groups although the change was announced on the Arctos banner. I agree that more communication would have been better and I am going to look at ways the community can ensure that doesn't happen again.

Let me know if I can help with the part bulkload tool.

@lingnancollection
Copy link

lingnancollection commented Jul 22, 2021

Thanks for the thoughtful and detailed reply @Jegelewicz. Our collection just join Arctos in 2019 so we are still getting familiar with how everything works.

I still have a couple of questions and I will be happy to schedule a meeting if you have some time to go over them. What is the best way to schedule a zoom call? here or via email?
1. Bulkloading in general. So there is no way to bulkload all the information at the same time? the only way to give a specimen two part_preservation is to bulkload without parts and then upload the parts separately using the Bulkload Part tool?
2. Bulkload part tool. I am looking at the csv template and it does not have part_preservation. But part_preservation is still a column option in the bulkloader template. So I assume if I add it should be fine, although is not obvious.
3. Modify part_preservation. Can I add a second part preservation to multiple records that are already uploaded? in a "bulkload way" and not manually to each record.

I was looking at other issues and it seems like they overlap here are the links and maybe @Nicole-Ridgwell-NMMNHS is struggling with similar issues?
3750#issuecomment-883729323
issues/3747
issues/3748
issues/3750

@Jegelewicz
Copy link
Member

I still have a couple of questions and I will be happy to schedule a meeting if you have some time to go over them. What is the best way to schedule a zoom call? here or via email?

Here is a poll so that we can select a good time: https://www.when2meet.com/?12364246-wZsa6

  1. Bulkloading in general. So there is no way to bulkload all the information at the same time? the only way to give a specimen two part_preservation is to bulkload without parts and then upload the parts separately using the Bulkload Part tool?

Yes, the only way to give a single part more than one part attribute is to bulkload it separately. There are two ways you can do this.

  1. Bulkload your catalog record with the part and one attribute, then use the Part attribute Bulkload Tool to load the second part attribute.
  2. Bulkload you catalog record without parts, then use the Part Bulkload Tool to bulkload the parts and their associated attributes.

I prefer the second, because it can get complicated if you happen to have multiple parts of the same name in a single catalog record.

  1. Bulkload part tool. I am looking at the csv template and it does not have part_preservation. But part_preservation is still a column option in the bulkloader template. So I assume if I add it should be fine, although is not obvious.

You can get the Part Bulkload Template from within the tool itself (it is different than the main bulkload template):
image

I have also attached a copy of the template for you. There is no field for part preservation. Preservation is simply one of the attributes that can be assigned to a part.

  1. Modify part_preservation. Can I add a second part preservation to multiple records that are already uploaded? in a "bulkload way" and not manually to each record.

Yes, this can be done with the Part Attribute Bulkload Tool. This tool operates much like the Part Bulkload Tool except that it does not allow you to add any part field information (condition, disposition, remark) as it assumes the part is already in the catalog record.

Part Bulkload Template:
part_bulkload-template.zip

@Nicole-Ridgwell-NMMNHS
Copy link

If it would be helpful to have an example of a completed part bulkload file, you can take a look at the last one I did:
2019_PartsUpload.xlsx

@lingnancollection
Copy link

Thank you @Jegelewicz for the detailed instructions. I am sorry to hear bulkloading all at once is not possible. I filled out the when2meet, there are not many options that we overlap due to the time difference but I hope it works out! Let me know which of the dates you prefer. Thanks again for all your help:)

Thank you @Nicole-Ridgwell-NMMNHS for the bulkload file example! it helps a lot!

@Jegelewicz
Copy link
Member

@lingnancollection I sent an invite from my Google Calendar and I also invited @mkoo if she can make it. Let me know if you don't get the invite!

@lingnancollection
Copy link

@lingnancollection I sent an invite from my Google Calendar and I also invited @mkoo if she can make it. Let me know if you don't get the invite!

@Jegelewicz We did not get the google calendar invite, did you send it by email? Could you please send it to lingnancollection@ln.edu.hk and ccing itzuecs@ln.edu.hk

Thanks!

@Jegelewicz
Copy link
Member

Sent!

@Jegelewicz
Copy link
Member

Meeting notes - 2021-07-27

Walked through part and part attribute bulkload tools and demoed how to sort search results by catalog integer.

@lingnancollection
Copy link

Thank you so much @Jegelewicz for taking the time to meet and explaining me the process. I really appreciate your patience and detailed explanations. I will use the bulkload part tool and bulkload part attribute for now. But I hope we can work on making bulkloading a one step process moving forward. Thanks!

@Jegelewicz Jegelewicz added this to Code Table Admins Discussing in Code Table Management Aug 15, 2021
@dustymc
Copy link
Contributor Author

dustymc commented Sep 2, 2021

AWG: Start a spreadsheet with specific parts

@dustymc
Copy link
Contributor Author

dustymc commented Sep 2, 2021

@Nicole-Ridgwell-NMMNHS
Copy link

I think we need to find a way to distinguish:

scenario A 'this is the anterior half of this bone'

from

scenario B 'this bone is in an anterior position in this organism'.

A few possible but maybe not so great solutions off the top of my head:

  1. keep separate code tables for A and B - not good if we're trying to reduce the number of part attribute tables.
  2. the latter is basically part of condition, so we could define that anatomical directions in specific parts only means B - would this be clear to the average user?
  3. do nothing and emphasize that collections should use remarks/condition to clarify the difference

@Jegelewicz
Copy link
Member

I think 2 is the right option.

'this is the anterior half of this bone'

would translate to

part condition anatomical direction reference attribute
bone partial anterior

'this bone is in an anterior position in this organism'

translates to

part condition anatomical direction reference attribute
bone complete (or no comment at all) anterior

Comments in remarks could also help clarify, even if they are repetitive.

@Jegelewicz
Copy link
Member

of course, there is also 'this is the anterior portion of the anterior instance of this bone'....

part condition part remark anatomical direction reference attribute
bone partial anterior portion of anterior bone anterior

@dustymc
Copy link
Contributor Author

dustymc commented Oct 28, 2021

These will probably need split out to be resolved, but here are some preliminary considerations for the remaining messes.


These are taxa, the entire record should be rearranged:

amber - https://arctos.database.museum/name/Amber
mineral - https://arctos.database.museum/name/Mineral

-- these are taxa with a history; I doubt we'll be able to get rid of them, but perhaps they can be arranged in some way that gives the user a small chance of finding relevant material

nematode
cestode
trematode
ectoparasite
endoparasite


these are representations and should be moved to

model
cast


"limb" (ish, sorta)

arm
flipper
forelimb
-- maybe this is skin?? What exactly gets clipped?? See note at bottom.
fin clip


These are traces:
"a [fossil of a] footprint, trail, burrow, or other trace of an animal rather than of the animal itself."

trace fossil --> part='unknown'
bill content
cheek content
crop content
hindgut content
nest
pellet
pinned nest
pinned tissues
SEM stub
karyotype
stomach content
stomach plus crop
stomach plus crop content
stomach plus stomach content
urine
vaginal plug
venom
nasal flush
filter paper
egg content -- not so sure about this, it presumably/sometimes contains a critter


observation - evil, #1302


embryo - age class, but denormalize to part modifier for a realistic chance of actually getting something done
fetus - ditto, but do we need both terms?


IDK, perhaps more than one thing here, but these need help

body
body parts
carcass
skeletal element
skeleton
skeleton, spicule

hindquarters


fossil
lithified wood
tendon, ossified


needle - bone + modifier ???


Note for documentation-documentation: The wikipedia definitions are not in any way helpful; they detract from the data by suggesting things that I don't think align with the intention. Eg fin clip:

A clip of the most distinctive anatomical features of a fish. They are composed of bony spines or rays protruding from the body with skin covering them and joining them together, either in a webbed fashion, as seen in most bony fish, or similar to a flipper, as seen in sharks.

I'm just trying to figure out if it's all skin and can be merged there or if there's other stuff involved - a question similar to that potential users (who can probably also use wikipedia if that becomes necessary) might ask. Strongly suggest we find a way to better focus documentation.


done, data:

temp_swab_oc.csv.zip

swab, oral and cloacal

@ccicero these are all yours, same data as #4026 just with an extra attribute - I don't think there's anything potentially controversial, I'll go ahead and move them if I don't hear from you soonish.


@Jegelewicz
Copy link
Member

We had the quill ---> hair discussion already - #3189

@Nicole-Ridgwell-NMMNHS
Copy link

mineral discussion: #3080

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CodeTableCleanup Our bad data leads to more bad data. Fix it! Help wanted I have a question on how to use Arctos
Projects
No open projects
Code Table Management
  
Code Table Edit Complete
Development

No branches or pull requests

5 participants