Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need a workflow for attaching image links to records #940

Open
annamorphism opened this issue Aug 4, 2023 · 22 comments
Open

We need a workflow for attaching image links to records #940

annamorphism opened this issue Aug 4, 2023 · 22 comments
Assignees

Comments

@annamorphism
Copy link

In an email, Debra requested "a method for me to submit or request that image links be attached to records".
"I used to send Jan a URL of a manuscript in a digital library, and he created the links. Depending on the urls, he could match our folio numbers to the correct images. (Sometimes we could just link to the first page, but for many we have direct links to each page.)"

I will note for posterity that on OldCantus the Admin area included a link simply labeled "Image Links", which leads to https://cantus.uwaterloo.ca/image-links-csv; it seems somebody (presumably Jan) could upload a CSV with all the links here.

image

This particular CSV seems to have the source id, folio #s in sequence, and a URL for their page view (which increment by 1 every folio)--presumably each CSV is/was individually created by inspection of the URLs and then gets fed into the image links for each chant view.
image

@jacobdgm
Copy link
Contributor

jacobdgm commented Aug 8, 2023

We've let Debra know that she needs to send us (whoever's developing on CantusDB at the time) a link to the images and a link to the MS, and we'll add the proper image links to all the chants in the MS. We can set up tools to help us accomplish this however we want (since Jan was the only one to interact with this page on OldCantus, we think, we don't need to set this up the exact same way), so it might make sense to wait for Debra to send us a few of these. In going through the import process several times, it will become clear the best way to automate it.

@annamorphism
Copy link
Author

I think I have a list of manuscripts needing links somewhere, actually! Is what you want the Source ID# and the URL to e.g. folio 1r?

@jacobdgm
Copy link
Contributor

jacobdgm commented Aug 8, 2023

I think I have a list of manuscripts needing links somewhere, actually! Is what you want the Source ID# and the URL to e.g. folio 1r?

yes, that should be all we need!

@dchiller
Copy link
Contributor

dchiller commented Sep 1, 2023

We have our first new example of this!

GB-Ob Can. Lit. 202 has images available on the Digital Bodleian website. This is a case where we have direct links to each page; take, for example, this link to folio 22r.

Thoughts on how to proceed?

As Jacob has noted elsewhere, we have a bare-bones way of doing this in CantusUltimus -- an interface for mapping each image in an IIIF manifest with a specific folio. I don't know/think that we want to limit images linked in CantusDB to IIIF servers only, so we can't steal that version whole-cloth (plus, that version needs some work anyway).

@annamorphism
Copy link
Author

I'm guessing the previous steps to do this went something like:

  1. Given a link to some source facsimile, create links for each page. For some sources this is easy (if the link is www.library.org/SomeSource/page1 you can just increment ); for Digital Bodleian each page has a long unintelligible modifier, but they can be found in the IIIF manifest; I don't know what approach one would want for something that has weird links and no manifest (sounds annoying).
  2. Given a CSV containing a bunch of pages in some source with their relevant links (from above), for each page, find all the chants that match (source) and (page), and send an edit request to update (link) using the third value in the list. (Is this a thing that is possible?)

(If there is a way to mass-update a bunch of chants satisfying certain conditions, I can imagine it coming in handy in other ways, like "increment all the folios from 139 to 150 by one, because I mislabeled 138w as 139" or "I deleted a chant on the folio, please decrease the sequence numbers by one to reflect this" or something like that. )

@dchiller
Copy link
Contributor

dchiller commented Sep 1, 2023

This seems right to me.

2 is definitely possible (in the sense that it would be fairly easy to implement in django, not that it is already implemented in some way -- I don't know if Cantus DB has any sort of "bulk update" process already). I think the question for 1 is whether we want to automate any of that (for example, in the case where the images are served by an iiif server or in the case where there is some easy thing to increment) or leave that up to whoever is creating the csv and their own excel or python magic to one-off create the csv.

@jacobdgm jacobdgm removed the on hold label Sep 1, 2023
@jacobdgm jacobdgm self-assigned this Sep 1, 2023
@jacobdgm
Copy link
Contributor

jacobdgm commented Sep 1, 2023

The way the OldCantus API works (accepting a CSV file where each line has a source ID and a folio, as well as a URL) makes sense. I'll try to get something like this set up within the next week.

@annamorphism
Copy link
Author

I think having a "if there is a IIIF manifest, do blah" step seems sensible--those manuscripts will eventually be in CU anyway (if they aren't already), so it seems worthwhile to spend the time figuring out where the manifest is, how it is set up, etc, in some way that hopefully can be productive across both platforms.
I don't think the non-IIIF ones are going to be sufficiently standard for it to be worth trying to automate it (vs just doing some excel magic on it) but I could be wrong.

@ahankinson
Copy link
Member

It might be worth the Cantus site hosting its own IIIF viewer and then opening IIIF manifests in it. (E.g. https://cantusdatabase.org/viewer.html?manifest=https://example.com/manifest.json.

most viewers will let you link to canvases directly, so you would control the page links for yourself, rather than relying on outside sites.

@jacobdgm
Copy link
Contributor

jacobdgm commented Sep 5, 2023

I was thinking of writing a function that would take a CSV file and add/update image links based on the information in the file - the thought was that we would do some amount of manual set-up in building the CSV file depending on how the image links worked, and then it could be loaded into this function. Eventually we can set up a view to call this function, but initially it could be entirely non-user-facing.

I have no experience with IIIF - I'm thinking I'll start working on this function on Thursday when I'm next in the lab. But if I should spend some time researching IIIF to set this up to be more automatic right from the get-go, I can do that - let me know within the next ~40 hours if this would be a better idea than setting up the simple CSV-based function.

@dchiller
Copy link
Contributor

dchiller commented Sep 5, 2023

I was thinking of writing a function that would take a CSV file and add/update image links based on the information in the file - the thought was that we would do some amount of manual set-up in building the CSV file depending on how the image links worked, and then it could be loaded into this function. Eventually we can set up a view to call this function, but initially it could be entirely non-user-facing.

+1 to this initial approach.

I have no experience with IIIF - I'm thinking I'll start working on this function on Thursday when I'm next in the lab. But if I should spend some time researching IIIF to set this up to be more automatic right from the get-go, I can do that - let me know within the next ~40 hours if this would be a better idea than setting up the simple CSV-based function.

Let's maybe talk? We could crib from Cantus Ultimus and have code that basically extracts what's needed from the IIIF manifests (this is exactly what we do in the CU map folios procedure).

Alternatively, we could go with the "host a viewer" approach. Although in this case, maybe we just treat Cantus Ultimus as the "viewer." For example, for the Salzinnes manuscript, the image links on Cantus DB link directly to Cantus Ultimus.

@fujinaga
Copy link
Member

fujinaga commented Sep 8, 2023

The long-term plan is to use Cantus Ultimus as the viewer whenever there're IIIF images available.

@dchiller
Copy link
Contributor

dchiller commented Sep 8, 2023

The long-term plan is to use Cantus Ultimus as the viewer whenever there're IIIF images available.

So if, as is the case now, we have a manuscript for which we just found publicly-available images (MS. Canon. Liturg. 202) and those images are available from an IIIF server, should we just put that manuscript on Cantus Ultimus and link from Cantus DB to Cantus Ultimus? (I guess, in other words, by long-term do you mean it shouldn't be done now?)

@fujinaga
Copy link
Member

fujinaga commented Sep 9, 2023

You should be putting in as many IIIF manuscripts as possible on CU. If you are planning to link them to Cantus DB, we just need to be careful about the URLs.
Note that in CU: "Visit record in Cantus Database" points to the old Cantus DB. (Separate issue?)

@dchiller
Copy link
Contributor

dchiller commented Sep 9, 2023

You should be putting in as many IIIF manuscripts as possible on CU. If you are planning to link them to Cantus DB, we just need to be careful about the URLs.

Ok, perfect. The most recent version of Cantus Ultimus now has stable ID's for manuscripts, so we should have fewer (no?) problems with links going forward.

Note that in CU: "Visit record in Cantus Database" points to the old Cantus DB. (Separate issue?)

Yes...we have a pull request ready to go for that but need to wait for a fix that has been made in CantusDB to go to production or there will be a broken link. So that should be fixed very soon!

@jacobdgm
Copy link
Contributor

jacobdgm commented Sep 12, 2023

Yes...we have a pull request ready to go for that but need to wait for a fix that has been made in CantusDB to go to production or there will be a broken link. So that should be fixed very soon!

Which CantusDB fix is this that needs to go to production? I feel like I'm missing something - nothing in #1027 seems relevant to CU...?

@dchiller
Copy link
Contributor

Hmm...I thought I was waiting on #969, but it seems like that has already been put on production. I re-ran tests a few days ago and it failed so I assumed the issue was still that the redirects were not on production. But clearly it's something else! I'll investigate.

@dchiller
Copy link
Contributor

@jacobdgm On more investigation, it looks like the failing test has to do with the json_node_export API in CantusDB. I've opened a new issue #1031 .

@jacobdgm
Copy link
Contributor

Just to clarify: the failing test is happening on CU and not CD, correct?

(there are several intermittently failing tests in the CD test suite - see, for example, #778; there are several others - that I've been meaning to fix for the past year, and that developers currently need to just learn to ignore)

@dchiller
Copy link
Contributor

the failing test is happening on CU and not CD, correct

Yes...to recap as I understand it: the failing test is on CU, it was failing because of the change to the json_node_export API from OldCantus to NewCantus, the way that NewCantus does it now is good, so I've made changes to CU so that it works with the NewCantus API, and now the tests on CU pass. (and we can "close" this tangent on this issue -- sry!)

@jacobdgm jacobdgm changed the title attaching image links to records We need a workflow for attaching image links to records Sep 22, 2023
@jacobdgm
Copy link
Contributor

jacobdgm commented Jan 5, 2024

relevant - in an email from Debra:

I'm looking at the Muenster Antiphoner
https://cantusdatabase.org/chants/?source=123724&folio=330r&feast=&search_text=&genre=

And there is a printer error on the folio numbers = 319 goes to 330 but there is nothing missing.

Our image links from CantusDB, however, jump to 340 at this spot (for 330). I think they are all matching up before that point in the source.
Our folio 330r should go to this one: https://daten.digitale-sammlungen.de/0009/bsb00090351/images/index.html?fip=193.174.98.30&id=00090351&seite=645

Our 340 still goes to the image for 350 but it gets sorted out sometime after that. Do you need me to find where it's correct again? It seems fine by folio 350 and I spot-checked to the end.

Let me know if you have any questions, or if you want a more detailed list of the correct and incorrect image links.
(And, should I just fix these by manually copy-and-pasting or is it better for you to fix them?)

@annamorphism
Copy link
Author

annamorphism commented Jan 5, 2024

just inspecting a little further...it's possible the Muenster Antiphoner problem was half-corrected at some point but not quite done right.
As Debra says, folio 330r should point to image 645, but in fact points to 665. This stays that way until folio 340r, which also points to 645 (correctly). So it's just [the chants on] folios 330-339 that need to be remapped to images 645-664.
for this particular set of images and URLs this shouldn't be hard to do automatically and would save enough work to be worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants