Author: Daniel van Strien https://orcid.org/0000-0003-1684-6556
Originally delivered as part of Digital Humanities and Digital Archives workshop at the National Library of Estonia. This notebook is intended to work as a 'stand alone' resource but doesn't cover all the topics in depth. It is instead intended as an overview of a range of topics.
This workshop/notebook aims to cover a few main things:
- 📒 show how Jupyter notebooks can be particularly useful for working with digitised collections at scale
- 👀 give a brief sense of what is possible using computer vision with image collections
- 🤖 give some ideas for how existing GLAM infrastructure (in this case IIIF) can support new machine learning-based approaches
The above use maps as an example, but most of the workshop is not specific to maps.
The notebooks make use of the delightful Newspaper Navigator dataset from the Library of Congress:
This notebooks uses a sample of images drawn from the Newspaper Navigator dataset, in particular images from the Newspaper Navigator dataset predicted as 'maps'. It turns out that machine learning isn't perfect! In this notebook we look at some potential approaches to:
- 'cleaning' up this dataset
- using ipywidgets as a middle ground between developing 'full' GUIs and 'code only' sharing of methods
- traning a computer vision model
- a couple of potential approches for working with images at scale
- brief discussion of working with probable labels.
The model trained in the notebook is trained on a dataset; Images from Newspaper Navigator predicted as maps, with human corrected labels
An appendix notebook which is used to generate an example JSON dataset of predicted labels for 10,000 images from the Newspaper Navigator dataset of images predicted as maps.
Credit: This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.