-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - habitat and collection method should be in collection event, not specimen event #5703
Comments
Two standard "use cases"
I don't think we can defensibly adjust that part of the model, but you also don't have to use it - you can hang most anything off of collecting events. |
I agree - and would add collecting source to that list. Also, "specimen event" needs changed to "catalog item event". The layer of our event stack that associates the location and time of an event with the catalog record should only include
The "verifier" stuff really seems a bit redundant to me, wouldn't a confidence by the determiner be good enough? |
Ideally these are collected initially during the same collecting event. When the parasites are removed from the hosts, that is a different collecting event and the distance in time and space between the two can open up questions about whether the parasite was actually collected with the host it was found on. (A bunch of mice from a trap line get thrown in a bag and transported to the museum, they are then combed for lice. Any louse found on any of those mice could have easily jumped from some other mouse in the bag). In any case, the parasites were collected from the wild in the same place as the hosts. |
Also should include two events - the larger trap line and the single trap location are important and should be recorded so that they can be useful. Nobody wants to do this because it is work, but if we could take the time to record things as they are rather than however is simplest, perhaps we would have research-grade data! End of rant. |
Dusty, Please explain your #1 better. It makes no sense to me. I'm thinking of the billions of entomology specimens we'd love to have in Arctos, making these weird legacy vertebrate issues something like <0.0001% of the data (and not a compelling reason to force an awkward unintuitive model that easily leads to low quality data on everyone else). And regarding #2 - that argument begs the question, that is, "can't conceivably be separate events" depends on how we define collection event (which btw) has the word 'collection' as in 'collection method' in it). I define an event as habitat+method+date/time as do all entomologist and probably most collectors. |
@DerekSikes the solution is to request two new event attributes: habitat - free-text describing the habitat at the place and time of the And let the denormalization commence! |
Another problem - I now routinely do this: collect, return & make locality records & associated collection event records with unique event names that my lab techs can use to link specimens to the events. I put the collection method and habitat data in the collection event remarks so my techs can copy that information and put it in the specimen event habitat and method fields. As the data entry form is currently operating, when one uses a collection event name, all the linked information does not magically appear in the form as it had done with the old form, so the lab techs cannot see the event remarks I have typed, so this makes it very challenging for them to get the habitat and method data. This is a relatively minor issue but would vanish if habitat and method were in the CE rather than the SE. And re: upgrading legacy data - not impossible, just make a new CE for each different habitat. |
I'm suspecting one of the main reasons this might not happen is it will be difficult to do because the data are low quality because of the design. That is, one CE with a single collection method, could have many records that have that collection method spelled many different ways (pitfall trap vs pitfall traps vs pitfall) etc. Habitats are likely to be even worse! This would require tons of cleaning of data in those fields to standardized terms to get them into collection events - or risk having tons of pseudoduplicate collection events. But not fixing this means the problem will only be worse later and it is a really obvious design problem. |
My understanding of the current model is that collecting events are
space/time only and specimen events are used to capture method. In our
field collections we may be in one place at one time, eg a single
collecting event, using different collecting methods in different habitats,
eg different specimen events. So the collecting event is shared between all
these, but the specimen event will reflect the different collection methods.
…On Fri, Feb 24, 2023 at 12:00 PM DerekSikes ***@***.***> wrote:
* [EXTERNAL]*
I'm suspecting one of the main reasons this might not happen is it will be
difficult to do because the data are low quality because of the design.
That is, one CE with a single collection method, could have many records
that have that collection method spelled many different ways (pitfall trap
vs pitfall traps vs pitfall) etc. Habitats are likely to be even worse!
This would require tons of cleaning of data in those fields to
standardized terms to get them into collection events - or risk having tons
of pseudoduplicate collection events. But not fixing this means the problem
will only be worse later and it is a really obvious design problem.
—
Reply to this email directly, view it on GitHub
<#5703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDG6UK6BSQBTYGYJU3WZEALNANCNFSM6AAAAAAVFYLRLE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Yes, that is the current model but it is a poor model (and different from
what I expect all other collection databases use, and not intuitive, etc).
for the reasons I've explained. Because, among other points, a specimen cannot be caught with more than one method at the same time (nor in more than one habitat at the same time). Thus, all the specimens caught in a given trap will need to have that trap type name repeated in every specimen event (100 specimens = 100 different occurrences of the trap type name that could all be spelled differently). Same prob for habitat.
|
I disagree with this statement. Habitat descriptions are subjective and they often have a scale component; wetland vs cattails vs amongst the rhizomes of cattails - each one of these could be applied to different specimens collected at the same time at the same coordinates. I'm not against adding these attributes to the collecting events (it does make sense for well designed/documented collecting efforts), but I feel they should also stay with the catalog record event to cover many historical specimens that have low res locality data, yet specific habitat data. Will that lead to conflicting habitat qualifications? Quite possibly, but I don't see a great way to capture both needs. |
You make my point. A single specimen can be collected in only one of the 3 habitats you list: wetland vs cattails vs amongst the rhizomes of cattails And all the specimens that share that habitat should, in a relational database, be 'children' of that habitat. Yes, all collected at the same time & coordinates, but if 500 specimens were taken from cattails then there should be a collection event record that says 'cattails' as habitat which is shared by all 500. In the current model there is nothing to stop someone spelling 'cattail' many different ways, or editing some but not all of the habitat fields of those 500, at a later date etc. It's no better than an excel file for quality control of keeping track of the habitat (&collection method) of all the specimens that were caught in that habitat with that method. Perhaps this is less of a concern to vertebrate workers who only catch 1 specimen at a time :) but entomologists want a good way to manage the data of many hundreds of specimens that share the same habitat& coll method at that time & coordinates. |
Yep, but as far as denormalization goes is doesn't seem very far out on the EVIL spectrum from here - it's going to be strings, not very searchable no matter what, and I'd assume any collection organized enough to create and use events probably just isn't going to make huge messes of them. (Edit: maybe not, see below.)
About everything here - specimens, events, localities - can be milliseconds or centuries, individuals or reefs, cubic centimeters or continents. I don't think there's single viewpoint which adequately encompasses the diversity represented by Arctos, and making sure we don't paint ourselves into any overly-constrained corners has always been a prime development consideration. HOWEVER anyone thinks something ought to be done, it's pretty safe to assume some other user (or potential user) has/will find a compelling reason to do something else. (And those "something elses" tend to make Arctos a little better for everyone.)
AHA! That's easy: open an issue requesting a new categorical attribute (off whatever node seems most appropriate - above suggests collecting events) using a new (probably) table containing whatever values you wish to control. Or better yet, assuming you have decent coordinate-ish information (and I think you do), help me find a way to make something like #5597 happen - "what's the habitat in this place on this date?" doesn't sound like a difficult question for a good GIS system to answer. |
That would be amazing, but would it cover the habitat descriptions I'm putting down for post-fire work? |
I think the GIS probably stops around 'cattails' or 'wetland' in most places, which might also be about the granularity that could work as a controlled vocabulary. "Under a burned log several meters from edge of intact stream system" will probably always be free-text. (Now someone go do something amazing and prove me wrong!) |
Yes please! |
I don't need controlled vocabulary for habitat (or collection method) - just fields in Collection Event to put the data so that all the specimens of that event have the same habitat / collection method. If one edits either of these yet to be made fields in a CE then all the specimens of that event will get the edit as they should, because they were all collected in that habitat and with that method. Currently, to do this we have to bulk-edit all the specimens' specimen events. So.. if we have attributes 'habitat' and 'collection method' in Collection Events... but still have 'habitat' and 'collection method' in specimen event... won't that confuse the hell out of everyone? And how could we possibly present the data in a unified way for downloads and label making and searching etc. I don't think making attributes really solves this problem. I think the problem is in the model. |
Those are functionally the same thing. (Ish...)
yep, works for event attributes too
Ideally (??) we'd have unique names ("event " prefix?), but with decent documentation on any new attributes I don't think it's much of a problem either way. Anyone looking very close should notice the structure, and that's just unavoidably different. There's still some chance they can be misunderstood and used interchangeably and etc., but what can't? This seems to be a simple way to solve a real problem, and if it is somehow a horrible mistake the exit path is dead simple: push the event attribute to all specimen_events, nuke the attribute, and we're back where we are now.
I'm always up for better ideas, but they've got to support the reality of the data. Forcing these upstream of specimen_event does not do that in any way I can see. |
I'm late to this discussion, but it reminds me of an issue that seems
relevant. Are all specimen events unique, even if the data are shared by
all specimens collected at same place time habitat by same method? Or are
they shared based on above? If shared, how do we find all records sharing
the same specimen event? And how to we make bulk updates to the events for
those records? We can do this multiple ways for collecting events. But not
I'm aware of for specimen events, making it difficult to manage batches of
records with same event data using bulk edit tools, such as those under the
catalog record results Manage drop-down.
I know there is a broken tool there, but I've been confused by all this for
a long time.
…On Fri, Mar 3, 2023, 10:50 AM dustymc ***@***.***> wrote:
* [EXTERNAL]*
just fields in Collection Event
don't think making attributes really solves
Those are functionally the same thing. (Ish...)
edits either of these yet to be made fields in a CE then all the specimens
of that event will get the edit
yep, works for event attributes too
attributes 'habitat' and 'collection method' in Collection Events... but
still have 'habitat' and 'collection method' in specimen event... won't
that confuse the hell out of everyone?
Ideally (??) we'd have unique names ("event " prefix?), but with decent
documentation on any new attributes I don't think it's much of a problem
either way. Anyone looking very close should notice the structure, and
that's just unavoidably different. There's still some chance they can be
misunderstood and used interchangeably and etc., but what can't? This seems
to be a simple way to solve a real problem, and if it is somehow a horrible
mistake the exit path is dead simple: push the event attribute to all
specimen_events, nuke the attribute, and we're back where we are now.
problem is in the model
I'm always up for better ideas, but they've got to support the reality of
the data. Forcing these upstream of specimen_event does not do that in any
way I can see.
—
Reply to this email directly, view it on GitHub
<#5703 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBAGU6BFWEO7LDMFJVTW2IVORANCNFSM6AAAAAAVFYLRLE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Mariel - there is a bulk update specimen event tool that is pretty nice (at least it was last I used it). But of course it would be a lot easier and better design to go to 1 collection event to edit a single habitat field that is shared by potentially thousands of specimens than it is to find thousands of specimens and bulk edit their specimen event habitat fields. Dusty - I'm always up for better ideas, but they've got to support the reality of the data. Forcing these upstream of specimen_event does not do that in any way I can see. Isn't this really a decision of what is the lesser of two evils? 'The reality of the data' depends on which data we want to suffer the most when those data conflict - the rare or the common? a) current model - accommodates rare (<0.001% of potential data) desire (not need) to have 2+ different habitats linked to 1 collection event (and/or 2+ different collection events linked to 1 collection event). b) better and more universally understood model - requires such rare data to have 2+ collection events to manage their weird data WHICH causes no problems that are worse than the convolutions we are considering or actually doing to manage our current model for the vast majority of the data that have lots of specimens that share the same habitat + collection methods from a single event. Forcing these upstream of specimen_event does not do that in any way I can see. = forcing those people to have 2+ collection events is not a big deal!! Forcing everyone else to have hundreds of thousands of replicated habitat + collection method text strings that need to be kept identical using special tools and human effort that could be easily undone is a big deal. I also think this stems from a very vertebrateology-centric way of dealing with specimens and thus feels very much like it gives vertebrates more weight than invertebrates (thus contributing to a dislike of Arctos by those with the most specimens). |
@DerekSikes I stand by my suggestion above - let's just request new collecting event attributes and let the conversation move there. It is a solution that is available to you now and perhaps we can then get everyone else on board with using it. |
Call me crazy but I really think if we are to use attributes for this they should be used for the rare (weird) data in specimen event and the 'real' non-attribute fields for habitat and collection method should be in collection event. If 99.99% of the data fit that model (of these fields being in collection event) why make those users use attributes? |
Yes, we'd like to retain the current (verbatim or not) "habitat" and "collection method" in the specimen record even if they are added as new event attributes. We use single-record data entry for almost all of our collection, so maintaining consistency with more and more fields in the collecting event would become even more difficult. |
I'd like to hear more from @dustymc regarding the consequences of this idea to add these as attributes of collection events. Habitat & collection method are, in my opinion, two of the key fields that should be available in flat. If we start using these as attributes of collection event - will they be displayable in a catalog record search result list as they are now? Presumably it would not be hard to have these in my report csv for label generation? What happens when a collection uses BOTH - eg has data in specimen-event-habtiat AND collection-event-habitat? Presumably these will have different names and users will be able to select which (or both) to display in their results. As an attribute, will they accept uncontrolled text strings as they do now? This is desireable given the type of data they handle, but minimizing undesired inconsistency is one of the goals of having these in collection-event instead of specimen-event. I still think that sometimes we should solve problems like these by reducing complexity rather than increasing it. The latter is always tempting because it lets everyone get what they want, but the complexity growth over time has costs - not the least of which is the increase in the learning curve for all those new to Arctos. And I have yet to hear what bad things will happen if we go with my original proposal of moving these fields to collection event. Those who want 2+ habitats for the same event will be forced to have 2+ events - big deal? Why is that worse than our current design? |
#5703 (comment) - #5703 (comment) - sure.
That's a different issue, there's complexity, but from the record it's the same shape as the specimen_event fields so whatever's been done with them could be easily done for event attributes. They'd add themselves as part of the json locality stack object so not any trouble for folks used to more complex data.
No problem (at least for those with simple/flat data, and not much problem no matter the complexity).
Then they're both available (and hopefully documented and used defensibly in some way that makes sense for those particular data).
Up to you - and multiple not-quite-overlapping things with different structures is fine too. (There could be number+units, categorical, and free-text attributes all expressing slightly different aspects of the same system-or-whatever. "27 (number) percent (units) water coverage", "cattails" and/or "marsh" from some categorical attribute, and "Under a burned log several meters from edge of intact stream system" as free-text (plus maybe water depth and elevation from locality and ecozone from some BetterBerkeleyMapper) are all completely compatible.
Absolutely, and not just new users! If something can be simplified then it should be simplified. This however (as I understand it) is not a good candidate for simplification for all the reasons given above; this exists as it does because it needs to (and not just for some easily-ignored minority or those pesky vertebrates, perhaps unless liver flukes are very sneaky about showing their spines). "As simple as possible, as complex as necessary." |
Can those be summarized? I didn't see any compelling reasons. @sharpphyl indicated a dislike of making the single-data-entry form more complex The repeated refrain of parasites + hosts "needing" the same collection event but needing different habitats & collection methods doesn't make sense to me. I still don't understand why having 2 collection events in such a case would be bad. Please explain. |
Forcing users to manage what should be identical data in multiple places is not something that any human has ever been very successful at. Avoiding that is a fundamental tenet of good database design.
I'm not sure what this means, but on that note: #5193 is scheduled to happen soon (perhaps next week) and we've agreed that the bulkloader will then be stable for a few years; if you (or anyone) would like to use event attributes (of any kind, or anything else), please add to that Issue ASAP. (Adding event attributes to the data entry app as "extras" would be a separate consideration, and that generally doesn't have the stability concerns the core table does.) |
We're going around in circles. The 'should be identical data' in my mind is CE+Habitat; in others' minds is CE without habitat. I agree that forcing users to manage identical data (habitat & collection method) in multiple places is bad. In the current design I have to manage these in upwards of hundreds or thousands of different places, with each being a separate specimen-event. So we're not dealing with a situation of choosing among a) not forcing users to manage identical data in multiple places vs. b) forcing users to manage identical data in multiple places. We're dealing with this situation: a) forcing users to manage identical data in multiple places vs. b) forcing users to manage identical data in multiple places. The only difference between a & b is what the 'identical data' are (a = date/time b= habitat+collection method). |
Nope. I'm proposing a simple optional addition that lets you (and I'd guess others) manage your data how you want, without affecting anyone who doesn't want to use it in any way. (I think you're also asking to force a significant portion of the Arctos community into an indefensible model for reasons that I do not understand; preventing that is the extent of my objection, and perhaps that looks circular??)
You do not, what I think you are asking for is an attribute request away.
There is no aspect of that in the current model, nor the current model with some extra event attributes. |
I would like to add my two cents that having parasites and hosts share collecting events is critical to our current model for tracking these relationships, especially given that so much of our original model for parasite and host linkages has been altered/lost over the past decade. And there are entire collections devoted to these linked records - it is not a fringe use case. |
Ok. I support adding habitat & collection method attributes to collection event. [But for the record, I think it would better if the real fields were in collection event and the attributes used for the specimen event.] |
If this is in some way true there should definitely be something more than derogatory comments.
Thoughts on names and definitions? Giving them unique names would make it slightly less cryptic to treat them as columns, if there is or comes to be some reason to do so. The name+structure should make them pretty obvious, but we should probably spell out 'at the event' anyway. Here's a start, feel very free to ignore it all:
And particularly the second might make a lot of sense for eg manufacture events; bio-centric examples seems fine (if we need examples at all??) but the definition itself should be be too focused on collecting, if that's compatible with what's being requested. (If it's not that should be clear from the definition.) |
I like these (edits in bold):
Since we have 'specimen events' it seems important to emphasize the above are attributes of 'collection events'. I wasn't sure if you were thinking that conflating manufacturing method with collection method would be a good thing or not. I would think that manufacturing method would be better as its own attribute. |
Is a misnomer! These should be referred to as collection item events as they include:
So I would still leave collection out of the event habitat definition.
or the kind of method (collection, manufacture, creation) could be inferred from the event type. |
I think @DerekSikes is one item up the chain, at also-poorly-named table collecting_event. It should just be 'event' - lots of things (those you list, for starters) happen there nowadays.
Yes, that's exactly what I was thinking.
I still think I lean towards generic being better, but I don't think that's entirely clearcut so ??? |
they're both poorly named. "collecting events" should be events. They're (about) place plus time (both of which can be any precision), and can be/often are shared. "specimen events" should be record_events (or somesuch). They're the link between collecting_events and records, include habitat and such, and are not shared. https://arctos.database.museum/tblbrowse.cfm?tbl=collecting_event https://arctos.database.museum/tblbrowse.cfm?tbl=specimen_event |
|
Closing this as the issues above cover the issues raised here. |
Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html
A specimen cannot be collected in two or more different habitats or with two or more different methods at the same time.
I have a lot to say on this issue but just want to get the conversation re-started (not sure where /when it was last discussed) so I can better understand why Arctos has this structure - it's not explained here (eg allowing 1000 specimens from 1 trap event to have potentially 1000 differently spelled habitats and methods, among other problems).
Priority
Please assign a priority-label. Unprioritized issues gets sent into a black hole of despair.
medium
The text was updated successfully, but these errors were encountered: