-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a global admin, I want a one-time import of all documents and fragments currently in the PGP spreadsheet and the fields in the db populated accordingly, in order to work with the data in the database. #66
Comments
Some potential pitfalls in parsing each field are listed in #1 |
Re-estimating as 5 points since the story now includes importing documents and fragments. |
Hello @sluescher — several questions and notes based on what's been done so far. (I'm sure you'll need to confer or coordinate with others on many of these, but thought I'd let you decide who and how to resolve.)
shelfmark & join questions
language questions and requests
|
|
List of languages that need to be corrected by the PGP team:
We'll need the PGP team to either add these languages to the ontology spreadsheet or correct them in the metadata spreadsheet. |
Preliminary import is available on the test site. Note that some aspects of this are still provisional, pending some data work and decisions from the team. For now, we're using a provisional mapping of languages in the spreadsheet to language name in the new Language+Script model, and we're ignoring records that need to be demerged. I'm attaching the full output of the import script — kind of noisy because it reports on all the skipped records with multiple types and missing languages, but may be useful to refer to as you're testing. |
|
@rlskoeser - I'm having trouble tracking down any multifragments that show up as such in the database. None of the 25 documents that the PGP search finds (under "multifragement" in the description) seem to have enough information in the spreadsheet to migrate to the new database. Trying to think of a creative way to find the right docs to search, but will need to come back with more energy for this later. Everything else looks great! |
Ah, sounds like the multifragment filter problem you identified on #75 is a blocker for the testing. Should have a fix for that soon.
What kind of filter are you thinking would be useful? (I tested and there are too many values for the django list filter to be useful; I wasn't sure if you meant search, but that doesn't seem ideal either). From the last meeting, I now know we would like to sort on this field — I'd like to track that as a separate user story, since it's a new requirement. If we add logic to parse this field into an actual date/time (or partial date for some? or approximate date?), then we could turn on django's date hierarchy, which would give you a nice way to drill down based on date. We'll also want to add logic to set the 'date_entered' field for new documents created in the database after the migration is complete. FYI, ~9000 of the documents currently imported have no legacy input date. Do we need to do anything about that? @richmanrachel can you coordinate writing a story for the input date and adding it to GitHub? Could just be something to the effect that you want to sort and do date-based filtering on input date with a brief explanation of why it's valuable. |
Changing status to "tested needs attention" while awaiting a fix for search by multifragment. @rlskoeser - could you give me a sample of what kind of entries are in the input date field so I can think about how to make it into a user story? |
Sure. Some of them are full dates in MM/DD/YYYY format; some are year only; a few have ranges; some have notes in addition to a date. Some include multiple dates, not sure what we should do with that. Here are some examples:
|
Revise multifragment handling on import:
|
@kmcelwee - Marina really likes your Display Name category, and is adding a new column in the languages ontology: https://docs.google.com/spreadsheets/d/1m-6SWU2gSNcferuU4Uzri2IQLUSbj3U5voas0xCofOQ/edit#gid=0 She's trying to address most of the other languages in that spreadsheet too. Coptic numerals are a bigger issue (they're technically Greek, but you don't want to say there's Greek on a document w/o Greek language, and most people call them "Coptic"). |
I added a new story so we can track importing language+script display name separately. #98 |
Looks great! Closing :) |
testing notes
Check a variety of documents and fragments from the PGP metadata spreadsheet and test how they have been imported.
for fragments:
for documents:
Check a few documents with joins to confirm that the document is linked to all fragments referenced by shelfmark in the join column
I want the following fields populated from the spreadsheet: library, shelfmark (current/historical), recto or verso, language/script, description, type and tags, and, if available, link to image.
dev notes
revisions after testing:
infer missing library based on shelfmark(data cleanup requested)The text was updated successfully, but these errors were encountered: