Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.

Jupyter Notebooks using
the British Library’s Digital Collections & Data

A Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

A list of Jupyter Notebook' projects using BL's digital collections / data is available below:

  • Applied Data Analysis as taught at DHOxSS 2019 (Oxford DH Summer School 2019): covers tidying data, visualization, modeling, and advanced applications of data analysis. By Giovanni Colavizza and Matteo Romanello (run them on Binder);

  • LibCrowds Jupyter Notebooks - explore the data created via the British Library's LibCrowds platform, which includes crowdsourced transcriptions of historic playbills and catalogue data from card catalogues (run them on Binder);

  • topic-modeling-billing Jupyter Notebook - Topic Models are a type of statistical language models used for discovering hidden structure in a collection of texts; this example is based on a dataset that comprises 264 volumes of digitised theatrical playbills published between 1660 – 1902 (mostly 19th century) from England, Scotland, Wales and Ireland. Digitised from the British Library's physical collection of over 500 volumes of playbills, the dataset contains text files in Optical Character Recognition (OCR) format. More information about the dataset at (run it on Binder);

  • Using the openVirus EThOS API - British Library's EThOS indexed Theses can be queried using Apache Solr API; this Notebook looks for all the theses that mention 'coronavirus' or 'coronaviruses', and counts their hits. Read more here, "Searching eTheses for the openVirus project" by Andrew N. Jackson, Technical Lead for the UK Web Archive, and Peter Murray-Rust’s github pages about the The openVirus Project (open in Colab);

  • GLAM Workbench: Web Archives: collection of Jupyter notebooks that document web archive data sources and standards, and walk through methods of harvesting, analysing, and visualising that data. To see which Notebooks work with UKWA (UK Web Archive), please consult the Web Archives entry at the GLAM Workbench site. By Tim Sherratt, "historian and hacker", Associate Professor of Digital Heritage at the University of Canberra, Australia.

  • British Library FREYA Persistent Identifier case study: a simple notebook demonstrating how to fetch information about British Library DOIs from the DataCite GraphQL API and summarise this using Python; this work was done as part of the EU-funded FREYA project (run it on Binder)

Other projects using BL Collections data and Sources

  • r-for-newspapers-data - The repository for the in-progress open book, 'R for Newspaper Data'. By Yann Ryan, former Newspaper Curator for the British Library.

In this Repository (Work In Progress)

  • 19th Century Books - a series of Notebooks that explore the collection of 65,000 digitised books largely from the 19th Century:

    • base.ipynb: examples of different types of access and ways to load data from The British Library’s digital collections; includes some basic processing and visualisation (see it on JN Viewer);

    • load_JSON_files.ipynb: example on how to load JSON files from a directory and its sub-directories, loading all their data into a single DataFrame, removing blocks of text that are empty (see it on JN Viewer; run it on Binder or on Google Colab -- note: on Colab, it will require to download the 295MB zip file used to sample this process -- already included on Binder).

    • maps_from_Flickr_BLPhotos_19cBooks_Collection.ipynb: subset of 53,367 images that were identified as maps (see it on JN Viewer; run it on Binder or on Google Colab -- note: on Colab, it will require to download the 20.8MB zip source file -- already included on Binder).

  • LOD_SPARQL / British National Bibliography:

    • Books by Subject: retrieve books' records (that have a ISBN) indexed under a given LCSH Subject / Topic (see it on JN Viewer);

    • Resources for Two Subjects: compare the evolution of published resources by year, indexed under a given LCSH Subject / Topic (see it on JN Viewer);

    • Linked Data / Interactive Map: queries the BNB SPARQL EndPoint, and enriches the results with Linked Open Data from Geonames and WikiData to display the places of publication in an interactive map. Forked / Copied from Author: Gustavo Candela (, Research and Development department at The Biblioteca Virtual Miguel de Cervantes, University of Alicante, Spain (see it on JN Viewer).

Explore and experiment with the British Library’s digital collections

The British Library community is able to flourish online thanks to freely available resources such as this.

You can help support our mission to continue making our collection accessible to everyone, for research, inspiration and enjoyment, by donating on the British Library supporter webpage here.

Thank you for supporting the British Library.

The British Library


Jupyter notebooks projects using BL's Collections data and Sources






No releases published


No packages published