-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As an admin, I want a bulk import of content from Gale/ECCO so that I can add content to the site that is not available from HathiTrust. #369
Comments
Delivering first version of Gale/ECCO import (missing Marc metadata). Here's the summary information from the script running in qa:
Here are the 5 ids that were not found:
|
Fixed IDs (Excel deleting trailing zeros) to the following, which RSK imported individually. Added CW0107745171 and CB0131329414. |
1183 records from Gale successfully imported - number in admin matches expected number in most up-to-date CSV. On admin site, Source ID, author, and page count metadata successfully imported from ECCO. @rlskoeser assuming missing Place of Publication, Publisher, and Pub date metadata will be supplied via MARC records? Log entry for individual record history documenting import by script confirmed. (e.g., link). Collections metadata successfully and accurately imported from CSV. Internal curation notes also successfully imported in correct field. (Glanced through results pages and hand checked the following IDs, which represent a swath of collections-belonging and note content: CW0114965944; CW0114031766; CB0127365931; CW0111758520; CW0114122952; CW0112450946; CW0111189178) Testing in django admin complete, need to test public facing functionality before closing. |
@mnaydan yes, the metadata I want to pull from MARC records is title, subtitle, sort title, place of publication, publisher, and pub date. We need to decide whether to track that on this issue or make a new issue for that as a refinement. |
Decided to track MARC record issue separately as #389 so closing. |
testing notes
Review in django admin:
Review in public site archive search and detail pages — I think this will be most fully tested based on the user stories for public site functionality. If you want to test and review independently, I could give you access to the test Solr instance so you can see how page content is being indexed.
dev notes
New
gale_import
script equivalent to existinghathi_import
scriptadd_from_gale
method onDigitizedWork
analogous toadd_from_hathi
Gale
as an option forDigitizedWork
source fieldthe following (and more) may need to be refactored as part of this task:
Page.page_index_data
should be split out into sub methods for HT (current logic) and galeDigitizedWork.count_pages
DigitizedWork.page_index_data
DigitizedWork.get_metadata
(assumes HT bib API)The text was updated successfully, but these errors were encountered: