New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user tags for tasks to enable shared image albums #631

Closed
alexandermendes opened this Issue Feb 6, 2018 · 22 comments

Comments

Projects
None yet
3 participants
@alexandermendes
Copy link
Member

alexandermendes commented Feb 6, 2018

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Mar 1, 2018

It would also be cool to then generate a IIIF manifest for these user-generated albums so that they can be shared via whatever IIIF viewer.

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Mar 16, 2018

The like button is probably being removed when these customisable tags are added. We have a big enough sample now to check if there is any value in keeping this like button.

If anyone has used the like button up until now perhaps we should transform those likes into a "liked" tag.

@mialondon

This comment has been minimized.

Copy link
Collaborator

mialondon commented Apr 6, 2018

Can you see how many times it's been used via the backend?

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Apr 6, 2018

Yep, It hasn't.

Well, apart from by me and this person 😉

@Addaci

This comment has been minimized.

Copy link

Addaci commented Apr 6, 2018

MarineLives & Signs of Literacy (our new community to study historical literacy is doing some serious thinking about creating IIIF manifests to display markes, initials and signatures contained in manuscript pages. The functionality we are interested in is to be able to display text areas (or image areas) within a manuscript image and/or its matching full text transcription which contain relevant markes, initials and signatures, as well as to display the whole image page or whole matching full text transcription page. We want to be able to create IIIF manifests, which will pull up relevant markes, initials and signatures from multiple institutions with IIIF servers and content. For example the British Library and the Stadsarchief Amsterdam.

We are thinking how to semantically annotate or tag the images or text pages, and the specific image or text regions within the whole pages, so that manifests can be created.

The sort of tags or annotations we are thinking of are simple

e.g. Occupation [wine cooper; mariner; shipwright]
e.g. Type of signoff [marke; initial(s); signature]
e.g. Place of residence [e.g. Wapping; Cadiz]
e.g. Date of signoff [e.g. March 13th, 1629]

A specific signoff (marke, initial or signature) could have multiple tages, e.g. text reading "Jo Bloggs, mariner, living in Wapping, aged 23" in a deposition dated March 13th 1629 could be tagged

mariner; Wapping; age 23; 1629

Our application envisages dealing with tens of thousands of legal records from the English High Court of Admiralty (TNA) and the Amsterdam notarial archives (Stadsarchief Amsterdam). Probably a minimum of 50,000 images, but could be many more.

We want to be able to crowdsource the tagging of the signoffs, having used Transkribus, with its line and taxt area recognition capability (or some other technology) to create an XML pixel level map of where the signoff is on the page of the manuscript.

image

image

The tagging data, in the case of a High Court of Admiralty deposition, will actually be derived from the start of the deposition, which could be up to three or five pages before, though it is usually on the same page, with the signoff at the end of the deposition.

image

Prior to crowdsourcing this tagging, we would want to take all the admiralty court and notarial images and upload them to one or more IIIF servers.

That's how far we have got. I have had an initial discussion with Digirati about this, but have parked the discussion for the moment, lacking funding. I will be giving a paper at the IIIF Washington DC conference, May 21-25th, at which I will be laying out a vision for greater integration of IIIF and Transkribus ecosystems, and making a pitch for the development of the above functionality I have described.

I would be very interested to hear your thoughts on this, and to see what sort of design solutions you come up with. I also have some interest from the Technical Director of Pelagios/Recogito, Rainer Simon at the Austrian Institute of Technology, for this sort of functionality.

See our Signsofliteracy GitHub wiki
See our agenda and issue analysis for planned June 5th, 2018 Stadsarchief workshop on Technology Tools to explore Historical Literacy
See proposed invitees to our workshop

@Addaci

This comment has been minimized.

Copy link

Addaci commented Apr 7, 2018

A second related idea we are looking at is user created shareable tag driven IIIF manifests. User created as opposed to project or archival created, though these are also important. Would probably want an authorship label in the manifest metadata to show author.

Users LOVE personal and themed boards. Witness Pinterest. But our concept is to make the boards or manifests independent of viewing platform. So you could view through any IIIF compatible viewer, such as Universal Viewer or Mirador, niot be forced to cut and paste your favourite playbills from LibCrowds. Your tags and ideally your and other people's semantic annotations would be available through your chosen IIIF viewer for interrogation.

  • BL; Stadsarchief Amsterdam; Jo Bloggs

How would it work? Here is an example: a user could select the tags:

  • lighterman + thames + wapping + signoff

This would generate a manifest containing all IIIF served images containing those tags.

We would have a simple pre-specified ontology of tags. In this case lighterman is in the class of the tag ocupation; thames is in the class of the tag place of work; wapping is in the class of the tag place of residence; signoff is a class tag.

The class tag signoff would contain four primary component or base tags. They would be

  • marke; initial; initials; signature

The ontology of the place of residence class tag is more complex, since in our case it needs to allow for

  • streets; sub-parish; parish; town/city; county; region; country; continent [ignoring for the moment the need to handle different geographic nesting structures in different countries]

Because Signs of Literacy intends to be a comparative multi-country user driven community with multiple archival contributors of images and a mixture of general public and academic users, we need to think carefully about the conceptual design of the above.

We also need to think through the relationship between tags and annotations. As we engage with IIIF consortium, with the help of IIIF technical cordinator Glen Robson and with the help of the Recogito annotation platform team, we hope to develop this thinking.

We would love to involve LibCrowds in this discussion and to learn from your experience and experimentation with tag driven IIIF manifests.

I would be delighted to include tag driven IIIF manifests as a specific discussion point on the agenda of our technical tools for exploring historical literacy workshop on June 5th 2018 in Amsterdam.

@Addaci

This comment has been minimized.

Copy link

Addaci commented Apr 9, 2018

@alexandermendes @mialondon @christianalgar We at MarineLives/Signs of Literacy would be interested in a conversation about BL's/our own proposed functionality for user tags to enable shared IIIF image albums, both in the short term and in the medium/longer term.

Core to our Signs of Literacy long term plans to apply machine learning in the study of historical literacy is to create large scale training data sets and control data sets. Both data sets will require large scale markup of images, both regions on the images containing markes, initials and signatures, and regions on the images containing metadata for person name, occupation, place of residence and date of deposition.

Core to our plans is also the use of crowdsourcing to tag and/or annotate the training and control data sets. Hence our strong interest in LibCrowds as a IIIF compliant crowdsourcing platform with a strong development team and multiple use cases from catalogues to playbills.

Proposition

Short term

  • MarineLives/Signs of Literacy piggy back onto your implementation of user tasks to enable shared image albums (IIIF manifests). We can't offer short term funding, but we would be happy to contribute to design ideas, possibly with some technical input from the Recogito technical team, and to user test your functionality using your Play Bills use case or any other use case you want to test.

  • Using our own resource, we would create a very small demo of the functionality you develop (using our own use case of markes, initials and images). We would use in the order of twenty images from High Court of Admiralty data and twenty images from Alle Amsterdamser Akten data, which we would markup and tag using LibCrowds. Easiest would be on your server, though potentially we could implement our own server, but we would have to look at what would be required on the backend. (It is possible that the Recogito technical team would donate some technical time to do this for us; Digirati are also a possibility, though I would have to pay them).

  • We could demo something at the IIIF Washington conference, in late May 2018, with full acknowledgement to the BL., to supplement our broad vision of IIIF enablement and the use of IIIF compliant crowdsourcing to explore historical literacy. Alternatively, we could demo at our June 5th , 2018 workshop in Amsterdam, to which I have invited @alexandermendes in person or by Skype, and would be more than delighted if Alex would present on LibCrowds and show the tag demo in that context. If the functionality development took longer, we have a final demo opportunity at the Digital Humanities Congress 23018, in Sheffield, in early September, at which I am presenting a co-authored paper with Dr Mark Hailwood (Bristol), again about IIIF enablement. Glen Robson, IIIF technical coordinator, is showing a keen interest in our initiative, and may be Skyping into our Amstrdam workshop.

Medium term/longer term

  • We could partner to build LibCrowds into the Signs of Literacy initiative as its core crowd sourcing platform. This would encourage exposure of the LibCrowds platform to multiple GLAM institutions, researchers and the General Public.

  • Chronoscopic Education/Signs of Literacy would explicitly raise money to support British Library development costs, infrastructure costs, overhead and other costs related to any such partnership.

Brief context

  • Colin Greenstreet is giving a paper at IIIF conference in Washington DC, May 21-25. Title: Creating an IIIF/Transkribus enabled manuscript community to explore C17th literacy. One key topic from our abstract is "We are exploring the potential of IIIF standards, viewers and annotation tools to support our vision. We are developing an IIIF demo of manifests for markes, initials and signatures from multiple IIIF image servers (e.g. by occupation, type, year range), which we will use with historians and computer scientists as the starting point for a robust, flexible spec."

  • It is our intention to show a mock-up at the conference of the functionality we envisage for IIIF manifests of markes, initials and signatures. We do not have the funding to create a real demo for the conference, so at the moment our intention is to a dummy, static, demo using "storyboards".

  • We (the Signs of Literacy community initiative for research into historical literacy, sponsored by Chronoscopic Education, MarineLives and Dr Mark Hailwood (Bristol), and intending to add further community partners) are spending April-December 2018 developing our vision for technology enablement for research into historical literacy. At the same time we are seeking to generate technical, GLAM and research interest (historians, linguists), and potential funder interest in Signs of Literacy. We plan multiple grant bids in 1H 2019, rather than one huge grant bid. This will enable individual GLAM institutions in the community to pair up with researchers to pursue topics of particular interest to them, under an overall umbrella and community governance.

  • Driving our initiative is a vision of using pattern recognition and machine learning at scale (absolute minimum of 50,000 marke, initials and signature images, but it could be in the hundreds of thousands). We are working with the Alan Turing Institute to develop our proposal further, and are likely to bring in additional machine learning experts from Bristol and the Netherlands.

  • Our scope is comparative, with the initial focus on data from England and the Netherlands, with the English data being English High Court of Admiralty depositions (HCA 13), sourced from the TNA/MarineLives, and the Dutch data sourced from the Alle Amsterdamser Akten (the Amstrdam notarial archives, a UNESCO rated resource which is being mass digitised with Mondrian Stichting funding, but with the approach and governance designed to allow the addition of countries and GLAM institutions.

  • We are running a half day workshop on Tuesday, June 5th, 2018, hosted by Mark Ponte of the Stadsarchief Amsterdam, who is a member of the Alle Amsterdamser Akten core project team. This workshop, which is by invitation and has maximum capacity of 24, is attracting considerable interest from technical and content researchers. For example, Dr Rainer Simon, technical director of the Recogito/Pelagios Community initiative, Dr Marieke van Erp, the new head of the Digital Humanities Lab at the KNAW Humanities Clustre (Huygens ING, IISH and Meertens Institute) and Dr Jelle van Lottum, a senior maritime & economic historian at the Meertens Institute.

  • The workshop is designed to examine technology tools to support historical literacy research, and will examine the potential of IIIF, Transkribus and Recogito to enable and contribute to this, and will drill down into the potential for pattern recognition and machine learning in historical literacy research.

  • Core to our plans will be to create large scale training data sets and control data sets. Both data sets will require large scale markup of images, both regions on the images containing markes, initials and signatures, and regions on the images containing metadata for person name, occupation, place of residence and date of deposition.

  • Core to our plans is also the use of crowdsourcing to tag and/or annotate the training and control data sets. Hence our strong interest in LibCrowds as a IIIF compliant crowdsourcing platform with a strong development team and multiple use cases from catalogues to playbills.

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Apr 12, 2018

Hi @Addaci, I'm waiting to have a conversation about this internally when people are back in the office, so apologies for the slow reply, I will get back to this!

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Apr 12, 2018

It sounds like there could be some interesting opportunities for collaboration but ultimately it's going to come down to a matter of resources. We may not be able to commit that much time to developing this feature - in terms of development work it's only me working on this, as you've probably noticed!

However, I do hope to build this in a generic enough way that it could be useful for other projects. As mentioned before, the solution here probably involves generating IIIF Annotation lists for each tag, or set of tags. A reference to these lists could then be included in the original manifests.

Actually, we're already part of the way to this being implemented, in some form. We already have a way of creating one specific type of tag per project (e.g. see our 'Mark the titles' projects). These tags are serialised as Web Annotations but the plan is to also make them available as IIIF Annotation lists. It might be interesting to look at the LibCrowds data model.

If you were to use your own server, it should hopefully be pretty easy to get an instance of LibCrowds up and running. But I'll also check with others about the possibility of setting up a collection microsite on ours.

@Addaci

This comment has been minimized.

Copy link

Addaci commented Apr 12, 2018

No problem. Feel free to call me to chat about, or I'm happy to come into the British Library. I put the idea forward to the LibCrowds team for discussion, and even if there is no internal interest, I am still very happy to contribute further to the development of tagging/annotation capability tied to user creation of IIIF manifests. As further context, I am beginning to work informally with the Pelagios Commons team, including Rainer Simon, their technical director, who is driving the development of Recogito. This is not a formal partnership, but we will see where it goes. Pelagios/Recogito are interested in getting closer to GLAMs and to IIIF. I have introduced Rainer Simon to Glen Robson, technical director of IIIF, and they are now in discussion. I will also be giving a paper at the IIIF Washington DC conference on 'Creating an IIIF/Transkribus/Recogito enabled manuscript community to explore C17th literacy'. We are having a @Signsofliteracy discussion on overlapping ecosystems, including IIIF and Glen Robson response (issue 5) and on user tag driven IIF manifests (issue 8)

@Addaci

This comment has been minimized.

Copy link

Addaci commented Apr 12, 2018

@alexandermendes @mialondon Many thanks for this further response. Fully understand resource issues. The collaboration idea(s) is a medium to long term idea, kicking off in 2019, and would include funding from Chronoscopic Education (which is applying end May 2018 for charitable incorporated status). In the short term, we have tech constraints. Potentially, as a favour, I could ask Rainer Simon, lead developer at Pelagios Commons if he could help us install a IIIF server as a favour - presumably we need an annotation server as well. Or I could pay Digirati to do this. I could even try Klokan Technologies through their IIIF server service to see if they would help, presumably again paid. Ideal, though, since I only want to do a tiny demo (max 40 images), would be to be able to use a collection microsite - I would put up (say) twenty English High Court of Admiralty images and twenty Alle Amsterdamser Akten (Amsterdam notarial archives) images, and show the functionality in principal. I would ideally demo this at the IIIF conference in Washington, May 21-25, 2018, with full credit to LibCrowds. If that timing is too quick, I would demo it for the first time at our June 5th workshop at the Stadsarchief Amsterdam. I am at early stage of getting the Huygens ING institute and the Digital Humanities Lab KNAW Humanities Cluster interested in collaboration in 2019 in one or more projects to be structured and grant money to be raised under the Signs Of Literacy umbrella, and I am sure they would also be interested in this tag driven IIIF manifest functionality

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented May 11, 2018

Minor update - have been thinking about this for the past few days and I think we're going to settle on a relatively simple solution for our tagging system. Too much going on at the moment to go into this in any huge depth but we should be able to come up with the start of something interesting.

Our user tags will be sent to a Web Annotations server with something like the structure below:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "motivation": "tagging",
  "type": "Annotation",
  "body": {
    "type": "TextualBody",
    "value": "foo",
    "format": "text/plain"
  },
  "target": [
    {
      "source": "https://api.bl.uk/image/iiif/ark:/81055/vdc_100022589157.0x000003/full/max/0/default.jpg",
      "type": "Image",
      "scope": "https://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022589158.0x000002/manifest.json",
      "renderedVia": {
        "id": "https://api.bl.uk/image/iiif",
        "type": "Software",
        "schema:softwareVersion": "2.0"
      }
    },
    {
      "source": "https://api.bl.uk/image/iiif/ark:/81055/vdc_100022589157.0x000004/full/max/0/default.jpg",
      "type": "Image",
      "scope": "https://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022589158.0x000002/manifest.json",
      "renderedVia": {
        "id": "https://api.bl.uk/image/iiif",
        "type": "Software",
        "schema:softwareVersion": "2.0"
      }
    }
  ]
}

Each target links to the image itself as we also want to use this for images not served from a IIIF Image API service (we also have projects using Flickr images). However, to help identify those that do have an associated image server we add some details of that server in the renderedVia section (I still need to choose a suitable tag that already exists as part of the context to identify the compliance level).

We then take the IRI for one of these annotations and use it to iterate over the targets and generate a basic manifest, probably using iiif-prezi. The manifest for each item is in there as the scope so we can later decide how we might want to go back and handle the metadata of the original items.

@Addaci

This comment has been minimized.

Copy link

Addaci commented May 11, 2018

@alexandermendes alexandermendes referenced this issue May 18, 2018

Merged

Dev #767

@alexandermendes alexandermendes added this to the v1.0.0-beta.10 milestone May 23, 2018

@mialondon

This comment has been minimized.

Copy link
Collaborator

mialondon commented Jun 21, 2018

@alexandermendes a quick reminder to fix the markdown issue for the tagging link text!

@mialondon

This comment has been minimized.

Copy link
Collaborator

mialondon commented Jun 21, 2018

@alexandermendes how do I add multi-word tags? It seems to submit as soon as I type a space.

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Jun 21, 2018

Use a hyphen, there is an explanation of this in the modal!

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Jun 21, 2018

There is a reason for this. Using spaces in PostgreSQL full-text searches complicates things as the queries operate on a per-token basis. It is possible to run phrase searches but that would probably require creation of a separate API endpoint. Could go into this further but basically this is the simplest way to handle multi-word tags without creating a bunch of extra work!

@mialondon

This comment has been minimized.

Copy link
Collaborator

mialondon commented Jun 21, 2018

Ah! We'll need to tweak that text so the most important information is first. I stopped reading somewhere around 'programmatic research purposes'.

alexandermendes added a commit that referenced this issue Jun 21, 2018

@mialondon

This comment has been minimized.

Copy link
Collaborator

mialondon commented Jun 27, 2018

@alexandermendes is it forcing tags to lower-case? That's inappropriate when you're tagging place names or personal names (actors, cities, etc)

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Jun 27, 2018

Not actually a totally straightforward solution here. So, do we store "english", "English" and "ENGLISH" as separate tags?

When the user goes to the browse page and types in "eng" they will see all of these options in the dropdown box, which do they select? And are they then shown items tagged as "english" and not as "English"? Or, do we normalise them in some way and if it's not into lowercase how do we normalise?

@mialondon

This comment has been minimized.

Copy link
Collaborator

mialondon commented Jun 28, 2018

@alexandermendes title case, as elsewhere?

@alexandermendes

This comment has been minimized.

Copy link
Member

alexandermendes commented Jun 28, 2018

Ok, will change all tags to title case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment