-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - more data entry/bulkloader columns #5193
Comments
Can we have fewer parts - do we really need 12? |
Remove the preservation attribute shortcut and allow for at least two attributes per part in the bulkloader. |
defaults are https://handbook.arctosdb.org/documentation/bulkloader.html [ doc ] This is a shortcut to creating a part attribute of type preservation. Attribute date will default to current_date and determiner will default to enteredAgent |
Yes please. I feel all museums want more stuff.
What does this mean? I like being able to select the preservation and then get it to load without having to go through a second time. I probably just do not understand what the shortcut does. |
I'm happy with removing preservation attribute shortcut from the bulkloader. We should keep the option in the data entry forms - but it is really common to have two preservation types - e.g. 95% ethanol and frozen, or fixed in 95% EtOH and preserved in 70% EtOH, and we need to make it as easy as possible to capture this info. |
That's not possible - the data entry form is just(ish) a UI for the catalog record bulkloader.
I need to know exactly what this means. |
I like @campmlc idea of a limit to the number of columns in any given bulkload file, but allowing them to be any combination of columns (Maybe I want 12 parts with no part attributes, but she wants one part with 5 part attributes each). Possible? |
I just meant removing the limit on number of attributes etc. |
Not really - I mean, I suppose I can refuse to deal with more than 86 columns or something, but why? My hard limit is what I can get PG to accept, which is something less than 1600 (depending on some techy details). If ya'll can deal with WHATEVER, as long as it's below that, then I can too. I can't deal with anything above that, at least not in a simple table structure. (I could potentially parse to component loaders or something, but anyone who could navigate that probably doesn't need it.)
For flat data, the only way to do that is for you to tell me exactly what you want - we either do or do not have a column called 'part_75_attribute_16,' there is no way for a flat object to just take whatever comes.
That's the "ish" above - the data entry form is also a UI to a bunch of component loaders, and those are not flat - you can happily add your 947th part, I don't care or need to know, it'll just work. (And I thought that had solved all of this, but here we are anyway - this needs clear instructions from ya'll to proceed.) |
Just as a side note - if the data entry form and the bulkloader are truly
interchangeable, I think it would be clearer if all the field names could
be the same between both forms, so that is obvious to everyone. Ditto for
the component loaders. But this can be a longer term project.
…On Tue, Oct 25, 2022 at 3:06 PM dustymc ***@***.***> wrote:
* [EXTERNAL]*
Possible?
Not really - I mean, I suppose I can refuse to deal with more than 86
columns or something, but why? My hard limit is what I can get PG to
accept, which is something less than 1600 (depending on some techy
details). If ya'll can deal with WHATEVER, as long as it's below that, then
I can too. I can't deal with anything above that, at least not in a simple
table structure. (I could potentially parse to component loaders or
something, but anyone who could navigate that probably doesn't need it.)
removing the limit
For flat data, the only way to do that is for you to tell me exactly what
you want - we either do or do not have a column called
'part_75_attribute_16,' there is no way for a flat object to just take
whatever comes.
"extras" menu can be opened
That's the "ish" above - the data entry form is also a UI to a bunch of
component loaders, and those are not flat - you can happily add your 947th
part, I don't care or need to know, it'll just work.
(And I thought that had solved all of this, but here we are anyway - this
needs clear instructions from ya'll to proceed.)
—
Reply to this email directly, view it on GitHub
<#5193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBDPOXSSUXMULYZZ4JDWFBDVFANCNFSM6AAAAAARKQVNNA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Let's call this an administrative problem. If there's a difference it's because someone's asked for it; I would LOVE to have a policy to point at while refusing to relabel the next time this comes up.
There is not really a bulkloader 'form' - it's very purposefully done in the most portable manner possible (CSV), and there's been an API for a very long time so there's no real reason to use any Arctos form. If we're doing this, the table columns have to be the controlling element. Related, columns are hard to change/rename. We can mitigate that by picking good names in this issue. I'll start:
|
Here's an attempt at a table which addresses the issues mentioned, I hope, maybe. Implementing this as attached would require a few other issues to be addressed, I'm hoping we can get this and everything attached to it as one big release (anything else is going to require rebuilding the same complex things over and over). First considerations:
If we get past the above, Eventually:
Current table in first comment. |
Consider pulling #4707 into this as well, it's going to change a lot of the vocabulary and targeting - although we probably DO want to retain part_condition (but make it optional) for magicking bare-bones 'condition report' part attributes. And does that mean we need more than 2 attributes per part? A goal of this should be stability, suggest that's sufficient to bump to 3 part attributes. |
Possibly the same issue as above? |
This seems like a reasonable and good idea - see also #6103 (comment) where I suggested something of this nature. |
This has it's own issue - posting this there for posterity. |
This definitely makes fixing things ourselves much easier and I support making the order of columns consistent however we can, if we can. This includes Bulkloader Builder I do not know the technical difficulties involved (which may be too difficult to overcome), but if naming columns appropriately can facilitate sorting that makes sense, perhaps that is where we should be looking to improve. |
Yes - is there a different csv list of column headers? Sorry, I was looking at the summary . . . |
CSV: Dusty's actual can always be found here - #5193 (comment) |
So I guess I scrolled through and carefully examined1133 column headers in the wrong file :( |
I am truly sorry for that - I just don't have time to re-categorize everything every time the csv changes. :( |
I'll look over temp_maybe_new_bulkloader(8).csv and provide updated comments. |
A couple of comments on what I hope is the correct csv this time:
1, 2, and 5 above are the only really critical things that need fixing, in my view. Otherwise looks good? |
I vote we drop the "ORIG" |
I agree with adding "TYPE" to this to make the expected contents more clear. |
Not all users will care about the type and the issuer may be more important (so those humans will want issued by to come first). This is one of those places where we have two ways of doing things and some people prefer one while others prefer the other. However, I don't really care so much about the order and if nobody actively opposes this proposal, then I think that putting them in the order issued_by, type, value should make this workable for most? |
To be clear, for MSB, we need the order to be type, value, issued by. This
is necessary because our primary identifier is in the format "NK 12345",
with NK as the type.
This would also make more sense for collector and preparatory numbers, eg:
"collector number ABJ 12345 issued by Andrew B. Johnson"
…On Mon, Apr 17, 2023 at 5:13 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
Move all OTHER_ID_NUM_TYPE to the first column before OTHER_ID_NUM_VALUE
and before issued by. I need to be able to write : "NK" as the type
followed by "12345" as the number, so I can keep track of what identifier
is what in the row. In the current order, we have the number and issued by
first, followed by the type - please swap so that type is first to the
left, as this is how humans would read the data. This is also consistent
with how we record attributes, with the "type" field first, followed by the
value.
Not all users will care about the type and the issuer may be more
important (so those humans will want issued by to come first). This is one
of those places where we have two ways of doing things and some people
prefer one while others prefer the other. However, I don't really care so
much about the order and if nobody actively opposes this proposal, then I
think that putting them in the order issued_by, type, value should make
this workable for most?
—
Reply to this email directly, view it on GitHub
<#5193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBHNZPHTPNFWF42NNULXBXE77ANCNFSM6AAAAAARKQVNNA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Not to be a broken record - but this is what MSB needs and others may "need" issued by then number so this one just isn't super clear cut in my opinion. |
Then we have a major problem.
…On Mon, Apr 17, 2023, 5:25 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
* [EXTERNAL]*
for MSB, we need the order to be type, value, issued by.
Not to be a broken record - but this is what MSB needs and others may
"need" issued by then number so this one just isn't super clear cut in my
opinion.
—
Reply to this email directly, view it on GitHub
<#5193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADQ7JBD6Y6ZQB7VVI4KDFQ3XBXGOZANCNFSM6AAAAAARKQVNNA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I don;t understand why. I suggested the order
Which puts type right next to value but allows for those who want issued_by next to value to see things that way too. However, I'm not even sure any of this placement is possible? #5193 (comment) |
Apologies, I've had dinner now and can think clearly. Putting issued by first is fine, as long as ID TYPE and ID VALUE are together, so that we have the same UI as we currently see in the catalog record page. |
Just confirming what the status is on this?
|
Code table working group discussing now. Consensus is full rebuild, don't go out of the way to preserve any column names. Still not clear if we have too many columns at the moment - @mkoo ?? Not at all clear how we actually finalize this and begin development (https://github.com/ArctosDB/internal/issues/258#issuecomment-1515440177) |
What's the summary of the meeting ? (or where is it?) just trying to understand the consensus and needs... |
See https://docs.google.com/spreadsheets/d/1qstLM0xpW8gkkEnRxUpZWOZGtJkv2NTu8zIgKxv-nYc/edit?usp=sharing BUT - also I will be posting an issue regarding localities/georeferencing that could alter this and I think we should hash it out soon. |
temp_maybe_new_bulk_cols.csv.zip I'll put this somewhere more "official" at some point, but for now I think this is primarily to make sure I've properly understood the proposal. temp_maybe_new_bulkloader is what the bulkloader may become, temp_maybe_new_bulk_cols is a transposed version (but still built from the builder code, so there should be no differences - use whichever makes the most sense to you). I think there's still some question as to how many columns need to be in here, but that should be adjusting some variable ('number_parts' or similar) which is an easy exercise. I'm feeling completely overwhelmed, and from that at the moment I think the locality stuff should probably be skipped - I don't see sorting that out now-ish, and putting the bulkloader builder off yet again doesn't seem like a great idea. (And the rebuild is going to take significantly more time if it involves rebuilding basically everything, so the stretch is already getting stretched.) Rebuilding the bulkloader because we have a new model seems a completely different thing that rebuilding the bulkloader because someone decided they need one more thing. I also like the idea of stability. I'm not sure how to balance those. I think some of the locality concerns also involve DWC, which might be a simple mapping adjustment - after we've rebuilt the bulkloader to fully incorporate #5120. |
This has become something else, closing. |
@dustymc In today's AWG - it was made clear that two things are missing from the columns listed in the file above associated species |
This may be dead
The replacement proposal is #6171 (more columns, but also rename everything)
Implementation will begin after 2023-04-14.All change or addition requests must be received before then.Briefly discussed by AWG, consensus is that lots of columns is a workable idea. https://docs.google.com/document/d/1VEUSR-8UK0-9WeFOyiJDRit9UCIDJbq-HMpXUbBRvm8/edit# @Jegelewicz @ccicero @Nicole-Ridgwell-NMMNHS @ebraker @wellerjes @mkoo @genevieve-anderegg @campmlc @atrox10
Whatever's decided here will be in effect for (some time - 5 years, maybe?) - please pass this on to anyone who might care.
Current table is always https://arctos.database.museum/tblbrowse.cfm?tbl=bulkloader
CSV:
temp_maybe_new_bulkloader(8).csv
Summary: https://docs.google.com/spreadsheets/d/1ssNP_jAiOok7TYIPq8b-OKcLCw9NngtCElPwbS6ajFo/edit#gid=0
Column Count: 1133
lastedit: add identification_order_
lastedit: two identifications
MAYBE:
encumbrances (#703) - ask is not clear so requirements are not clear
TODO
DOCUMENTATION! (merge from #5196)
OLDSTUFF
bumping to 15 attributes for #5210
AWG discussion suggests 2 part attributes isn't sufficient, try 4, and moving to format in #5193 (comment)
Identifications: #4416
event attributes (#4230) - each requires 7 columns
Generalizing, we've been maybe overly-conservative about adding stuff, perhaps we don't need to anymore? No idea what Excel supports, PG supports ~a thousand columns, let's see if we can use 'em.
Parts are currently 7 columns, one of which is preservation.
Part attributes require 6 columns.
Attributes (see #5210) require 7 columns.
Identifiers (see #5164) are currently one column.
I'm not merging, but #5120 should be resolved here too - is that magic-mapping of existing columns, use the existing locality attributes, ???
Coordinate-stuff should be better arranged, see #4716
QUESTION: Should we also add taxon concepts? Hard "no" vote from DLM - it's not being used, there's no way of knowing if the shape will change.
The text was updated successfully, but these errors were encountered: