As an admin, I want a bulk import of content from Gale/ECCO so that I can add content to the site that is not available from HathiTrust. #369

rlskoeser · 2021-06-03T20:54:56Z

testing notes

Review in django admin:

filter by source = Gale to see everything imported from gale
confirm metadata is imported (provisional for now since we're not using Marc yet)
confirm there is a log entry in the record history documenting import by script
confirm that collection membership is set based on the flags in the import csv file

Review in public site archive search and detail pages — I think this will be most fully tested based on the user stories for public site functionality. If you want to test and review independently, I could give you access to the test Solr instance so you can see how page content is being indexed.

dev notes

New gale_import script equivalent to existing hathi_import script

Write new API wrapper code analogous to Hathi bib api wrapper. (No need to store content locally like we do with Hathi pairtree data — index directly from the API feed, since that’s how they provide it)
new add_from_gale method on DigitizedWork analogous to add_from_hathi
Add Gale as an option for DigitizedWork source field
The script will take a CSV file, and in addition to standard import, should set collection assignment based on the spreadsheet

the following (and more) may need to be refactored as part of this task:

Page.page_index_data should be split out into sub methods for HT (current logic) and gale
DigitizedWork.count_pages
DigitizedWork.page_index_data
DigitizedWork.get_metadata (assumes HT bib API)

The text was updated successfully, but these errors were encountered:

rlskoeser · 2021-06-15T14:24:31Z

Delivering first version of Gale/ECCO import (missing Marc metadata).

Here's the summary information from the script running in qa:

Processed 1,182 items for import.
Imported 1,177; skipped 0; 5 errors; imported 385,155 pages.

Here are the 5 ids that were not found:

CB132045539
CB110589871
CB129347115
CB125450132
CB128058273

mnaydan · 2021-06-15T15:59:17Z

Fixed IDs (Excel deleting trailing zeros) to the following, which RSK imported individually.
CB0132045539
CW0110589871 (duplicate - deleted)
CB0129347115
CW0125450132
CB0128058273

Added CW0107745171 and CB0131329414.

mnaydan · 2021-06-15T16:17:31Z

1183 records from Gale successfully imported - number in admin matches expected number in most up-to-date CSV.

On admin site, Source ID, author, and page count metadata successfully imported from ECCO. @rlskoeser assuming missing Place of Publication, Publisher, and Pub date metadata will be supplied via MARC records?

Log entry for individual record history documenting import by script confirmed. (e.g., link).

Collections metadata successfully and accurately imported from CSV. Internal curation notes also successfully imported in correct field. (Glanced through results pages and hand checked the following IDs, which represent a swath of collections-belonging and note content: CW0114965944; CW0114031766; CB0127365931; CW0111758520; CW0114122952; CW0112450946; CW0111189178)

Testing in django admin complete, need to test public facing functionality before closing.

rlskoeser · 2021-06-15T16:19:08Z

@mnaydan yes, the metadata I want to pull from MARC records is title, subtitle, sort title, place of publication, publisher, and pub date. We need to decide whether to track that on this issue or make a new issue for that as a refinement.

mnaydan · 2021-06-16T17:08:30Z

Decided to track MARC record issue separately as #389 so closing.

rlskoeser added this to the v3.6 milestone Jun 3, 2021

rlskoeser self-assigned this Jun 7, 2021

rlskoeser added the awaiting testing label Jun 15, 2021

mnaydan closed this as completed Jun 16, 2021

mnaydan removed the awaiting testing label Jun 16, 2021

mnaydan mentioned this issue Mar 4, 2024

As an admin, I want a bulk import of metadata and full text from EEBO-TCP works so that I can add content to the site that is not available from HathiTrust or Gale/ECCO. #600

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As an admin, I want a bulk import of content from Gale/ECCO so that I can add content to the site that is not available from HathiTrust. #369

As an admin, I want a bulk import of content from Gale/ECCO so that I can add content to the site that is not available from HathiTrust. #369

rlskoeser commented Jun 3, 2021 •

edited by mnaydan

rlskoeser commented Jun 15, 2021

mnaydan commented Jun 15, 2021

mnaydan commented Jun 15, 2021

rlskoeser commented Jun 15, 2021

mnaydan commented Jun 16, 2021

As an admin, I want a bulk import of content from Gale/ECCO so that I can add content to the site that is not available from HathiTrust. #369

As an admin, I want a bulk import of content from Gale/ECCO so that I can add content to the site that is not available from HathiTrust. #369

Comments

rlskoeser commented Jun 3, 2021 • edited by mnaydan

testing notes

dev notes

rlskoeser commented Jun 15, 2021

mnaydan commented Jun 15, 2021

mnaydan commented Jun 15, 2021

rlskoeser commented Jun 15, 2021

mnaydan commented Jun 16, 2021

rlskoeser commented Jun 3, 2021 •

edited by mnaydan