Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort out the 3 different tables we have of the author names spreadsheet for Herbarium C. #93

Closed
RebekkaML opened this issue May 1, 2024 · 2 comments
Assignees
Labels
data issue issues with the data NHMD Natural History Museum Denmark

Comments

@RebekkaML
Copy link

RebekkaML commented May 1, 2024

This was originally part of #74 , but got it's own Issue now.

The author spreadsheet (#74 ) currently exists in 3 parts with partial overlaps.

1: The "authorDropdown" created by Jan to cover boc 1-285.
2: The "authorDropdown_box202_215" created by Jan after realizing that boxes 202-215 were missing from the first version
(This was because the project wasn't set to DaSSCo in Specify so they weren't included when he exported everything.)
3: The "Author list_missing entries" sheet where we added everything we couldn't find in the first sheet while filling it in. This also
included boxes 202-215 because we made it before Jan made his supplementary sheet for those boxes.

Ideally, we want to have only 1 table that includes everything.

Problem 1: There is a partial overlap between 2 and 3 (but this is easy to sort out).
Problem 2: The first two tables have a dropdown function for authors based on Specify exports, while Nr. 3 is a table we filled in manually. Does this impact importing these data back into Specify? How do we best merge them? (We will need to ask @bhsi-snm or @FedorSteeman about this.)

@RebekkaML RebekkaML added NHMD Natural History Museum Denmark data issue issues with the data labels May 1, 2024
@RebekkaML RebekkaML self-assigned this May 1, 2024
@RebekkaML
Copy link
Author

Meeting with @bhsi-snm about this on 23.04.2024.

We can combine the 3 sheets into one, as long as we do it without the dropdown function (copy only the values into a new excel sheet). The dropdown is not needed for the import into Specify, so it is not a problem to discard it.

@RebekkaML
Copy link
Author

RebekkaML commented May 1, 2024

Merged the 3 sheets into 1:

  • Copied values from the 2 dropdown tables into new worksheet so the dropdown doesn’t interfere with anything.
  • Adjusted “missing entries” sheet so it fits the other two, mainly by putting “Box” in front of the box numbers. Copy pasted this
    into the new worksheet with the other 2 tables. Added a column called “source” that is either “Specify export” or
    “missing_entries”, to be able to check the origin of the data.
  • Sorted datasheet by box number, then taxon name, then source, so that all entries I have twice always end up next to each
    other. Went through it and in case of duplicates always deleted the one from the missing_entries sheet since it doesn’t include
    taxon id. If both entries had a comment attached, I transferred the other comment to the entry I was keeping.

Stored new document on the N Drive: "N:\SCI-SNM-DigitalCollections\DaSSCo\Workflows and workstations\Herbarium\authorDropdown_MERGE.xlsx"

Or here:
authorDropdown_MERGE.xlsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data issue issues with the data NHMD Natural History Museum Denmark
Projects
Development

No branches or pull requests

1 participant