Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As an admin, I want to include book excerpts and articles as well as full volumes from Gale/ECCO, so that I can include material that is specifically about prosody from longer works about other subjects. #443

Closed
4 tasks done
rlskoeser opened this issue Jul 27, 2022 · 15 comments
Assignees
Milestone

Comments

@rlskoeser
Copy link
Contributor

rlskoeser commented Jul 27, 2022

Gale equivalent of #393

dev notes

  • update gale_page_index_data to honor page_span ; equivalent to logic in hathi_page_index_data
  • add a unit test to confirm excerpt import logic
  • check that document details display the page range
  • update gale link to reference the page if possible
  • modify gale import to include excerpt page ranges, item type, and metadata
@rlskoeser rlskoeser added this to the v3.8 milestone Jul 27, 2022
@rlskoeser rlskoeser self-assigned this Aug 16, 2022
@rlskoeser
Copy link
Contributor Author

Ran the updated export script with the latest version of the Gale excerpt spreadsheets. Here's the summary output:

  • ECCO_Excerpts_Book_8.16.22.csv

    Processed 193 items for import.
    Imported 193; 0 missing MARC records; skipped 0; 0 errors; imported 4,547 pages.

  • ECCO_Excerpts_Periodicals_8.16.22.csv

    Processed 105 items for import.
    Imported 105; 0 missing MARC records; skipped 0; 0 errors; imported 752 pages.

@mnaydan
Copy link
Contributor

mnaydan commented Aug 18, 2022

Testing notes for myself:

  • check that you can edit new excerpt fields in the admin interface:
    • item type
    • book/journal title
    • volume
    • original page range
    • page range in source
  • check that item type is displayed in the digitized works list view, and defaults to full for all existing content
  • test that invalid page ranges are rejected for digital/source pages (try out of order, non-numeric, etc); make sure that multiple ranges are supported as expected
  • test converting an existing record into an excerpt: when you save with digital page range, it should recalculate the page total for that item and it should update the index to only include the pages in range.
  • spot check that the records imported from the script are searchable on frontend, and that only the excerpted pages are searchable (check CB and CW, discontinuous page range, multiple excerpts from single source)

@mnaydan
Copy link
Contributor

mnaydan commented Aug 18, 2022

Everything passed! @rlskoeser can you think of anything to test that I missed here, before I close?

@rlskoeser
Copy link
Contributor Author

@mnaydan the only other thing I can think of is to check that log entries were created documenting when/how the excerpt records were created

@mnaydan
Copy link
Contributor

mnaydan commented Aug 18, 2022

@rlskoeser it's creating the log entries, the only weird thing is that when I change a field, it's logging that I also changed place of publication when I didn't. This happened when I changed just the title field and also just the volume field.

@rlskoeser
Copy link
Contributor Author

@mnaydan hmm, that does sound weird. Could it be a whitespace change? Do you want to share a link to an example?

@rlskoeser
Copy link
Contributor Author

@mnaydan it might be a difference between None and the empty string "" which is different in python but I think would be invisible to you. If that's the case, my import script may be setting things to the wrong empty value

@mnaydan
Copy link
Contributor

mnaydan commented Aug 18, 2022

@rlskoeser Ah, that makes sense! It's minor enough that I don't want you spending time on it unless it's a really quick fix.

@rlskoeser
Copy link
Contributor Author

@mnaydan code change should be trivial; would require another round of testing, but we could do just one or two records

@rlskoeser
Copy link
Contributor Author

aha, I think it's a trailing whitespace issue!

@mnaydan
Copy link
Contributor

mnaydan commented Aug 18, 2022

in the code?

@rlskoeser
Copy link
Contributor Author

@mnaydan it looks like some of the fields have trailing whitespace — e.g. for publisher, we have some logic to strip out of part of the text that's in the MARC records that we don't care about, but apparently in some cases that results in a trailing whitespace. And I think that trailing whitespace must automatically get removed when you save the record via admin. Working on a fix.

@rlskoeser
Copy link
Contributor Author

@mnaydan I updated the test site, deleted the records with source id CW0113164666 and then re-imported — please test one of these records and see if the behavior is resolved.

@mnaydan
Copy link
Contributor

mnaydan commented Aug 18, 2022

@rlskoeser looks good now! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants