-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCM Agent Bulkload Request #7894
Comments
and others with a dash in preferred name. First names should not include punctuation other than a period. Are we sure these are people and should they be:
?
Is not a person, but a research vessel? If so, this may be added as an organization.
This really feels like someone somewhere mis-transcribed an A for an R or the other way around?
Ditto for the F and R in these two
And the D and K here
Add the "spouse of" relationship between these after they are added?
I assume these people are related? Do we know how? Can I be convinced that these are really not the same person?
Or these two?
All of the "not the same as" relationships require a method and determiner. I am not trying to be obstructionist, but it seems like there is still some cleanup that could be done before we add these agents? I stopped looking at the near matches, so there are probably others I would add to the categories above. |
No worries. Thanks for catching those. Updated agent list attached: |
@dustymc Thanks for including me in the #7649 issue. Maybe we should pair our list down so that the only agents that get uploaded are ones that have full names (i.e., no initial) or have one (or more) attribute that distinguishes them (makes them unique) from other agents? So, for instance if we have a J. Smith the only way we can upload that person as an agent is if we had an attribute, say "child of", linked to that agent. Would that work? |
That will help, but the ones I am struggling with include things like Barbara Waleis which feels like it may be a mistranscription of Barbara T. Waters Charles A. Nelson feels like a mistranscription of Charles D. Nelson (or perhaps it is the other way around, A and D can look very similar when written or maybe these ARE two different people, but I have no way to decide that) Chin-Tsong Lewis and Chin-Tsong Lo - one of these must be a misspelling, an alternate name for the same person, or are they related people? You may have no way to figure out if my "feelings" are justified, but if you do, it might be good to get things like this sorted before making agents. As before, I did not peruse the entire list to look for these internal issues, but there are probably others! Do not take this as a summary of everything that I think needs review - just ideas for looking at the data you have in-house even before comparisons to Arctos agents. |
I can confirm that Barbara Waleis and Barbara T. Waters are two different people. Waleis is a collector from the 1930s, while Waters is a collector from the 1980s. The others are all agents for the invert zoo collection, which will need to be checked by @Krmartin3 when she gets back from vacation. I can say that the Chinese do use hyphenated first names. So, Arctos may need to figure that one out, but I'll let Kelly chime in when she is back.
In the mean time, I'm going to pull all of invert zoo's agents from the sheet, as I think most of the issues are coming from that side (sorry Kelly). I'll reupload a new sheet of agents here in a bit. |
@Jegelewicz new list of agents attached |
@javanveldhuizen the dates in that CSV have been mangled (probably by Excel?). |
@dustymc Interesting, the dates look fine on my end. Should I use a different program to edit the CSV instead? |
@dustymc Ok. I edited the CSV using Notepad and changed all the dates into the desired format: yyyy-mm-dd. Let me know if that doesn't work. |
Yea, but they don't SAVE fine (eg unambiguously), which is why we require CSV. https://handbook.arctosdb.org/how_to/How-to-Excel-for-Arctos.html#dates (I wrote the 'eat your data' bits but not the niceties at the top!) Thanks, I've got those in the pre-loader. The first thing in my view is "Humboldt Museum" - surely that's https://arctos.database.museum/agent/21336826 or https://arctos.database.museum/agent/21348575?? |
It's kind of actually neither of those things. The specimens I have tied to the Humboldt Museum were donated to us from a researcher at the Humboldt-Universität zu Berlin. What's unclear is whether these were actually part of the museum at that university, which later became the Museum fuer Naturkunde der Humboldt-Universitaet Berlin, or if they were a part of a researchers lab collection. I kept is Humboldt Museum until I could fully untangle it. Feel free to delete it from the list if you feel that it is not an appropriate true agent. |
@dustymc Here is the agent sheet again with the Humboldt Museum removed |
Ugh, that should not be the path, @ArctosDB/agents-committee HELP! Lacking further guidance, that seems a somewhat defensible position to me (and a remark would be useful, if that's not already there). I loaded data to https://docs.google.com/spreadsheets/d/1it7JgDc0Fxnccn5yD_bO6kdYFjPRrbJhqptOVAOu3G8/edit?gid=907589706#gid=907589706 Again an "interesting" situation on the first line! First your agent will load, then Arctos will run....
except two results will be returned - this one and the one just created - which will result in an error. Maybe that's somehow my problem, but I'm not quite sure how to address it. https://arctos.database.museum/agent/21333592 will always be unambiguous, but isn't great for humans to work with in a spreadsheet. Beyond that, I don't know how to proceed. (I'd use verbatim agents as a first pass so we don't have to guess from strings, but I seem to have lost that argument!) <style type="text/css"></style>
<style type="text/css"></style>
<style type="text/css"></style>
look pretty suspicious (and maybe that's OK, I don't know, this should still not be my call @ArctosDB/agents-committee !!) I didn't scroll very far, just enough to grab a couple examples. I don't see any super-obvious duplicates or mistyped agents or such in the file. I REALLY don't want this to be my call (see above, I'd do something entirely different!), and the ~30 flagged by the checker could definitely use careful review, but loading this doesn't seem unreasonable. @Jegelewicz @mkoo thoughts?? |
I have deleted David Taylor from my list and will make him a verbatim agent for now until that issue is fixed. I can confirm that the David Taylor already in Arctos is not the same David Taylor in my data.
For some reason Sarah Reiboldt keeps reappearing in this list even though I keep deleting it. Anyway, I've deleted it once again and I can confirm that the Sarah Reiboldt already in Arctos is the same Sarah Reiboldt in my data.
The Bill Simpson I have in my data is an amateur collector in the Denver area and not the William Simpson already in Arctos. These are two separate people, as indicated by the "not the same as" attribute.
The BYU Museum of Paleontology and the BYU Life Science Museum are two different organizations. Here are their websites so you can confirm: New list here: |
You can also just create the agent manually (where everything involved IDs instead of strings).
Sorry, I didn't look very carefully (was aiming for general considerations, not specifics!), thanks!
running.... https://docs.google.com/spreadsheets/d/1SBF83EZncUko6u1KkVzbQdhaPGDULnNVNKuSEn6Leak/edit?usp=sharing I suppose I should just load that??? @mkoo |
@javanveldhuizen I found a problem on my end and am rolling a partial load back, but during that I noticed Ward Scientific in these data. Surely those are both duplicates of https://arctos.database.museum/agent/21293521? |
@dustymc I deleted those agents. They need some verification. New list here: |
Done and blamed on you @javanveldhuizen There's one full-duplicate low-data copy of another low-data agent that maybe ought to have something done with it.
and one that errored out |
cf_temp_pre_bulk_agent_download_version ready.csv
Please bulkload the agents in the attached file.
Note: The file should be results from the Agent Prebulkload Tool. If the file is too large for Github attachments, comment here and an email address or shared file space will be provided to you.
The text was updated successfully, but these errors were encountered: