The data can be downloaded from the GLAM Workbench website.
The data comes from the GLAM Workbench website (follow the link at the bottom of the page to download the data), and is comprised of 9,738 documents harvested and kindly made available by Tim Sherratt.
The code Trove_Digitised_Books.ipynb
file is a Jupyter notebook documenting my initial exploration of the data.
The code is written in Python 3.7. The notebook can also be viewed here.
If you are using the Anaconda distribution, you can reproduce my virtual environment
by using the provided environment.yml
configuration file. This can be done by running
conda env create -f environment.yml
in a terminal.
Note: On macOS I had to use the following to install the CLD2 library:
export CC=clang; CFLAGS=-stdlib=libc++ pip install --ignore-installed pycld2
The text is released under a Creative Commons Attribution 4.0 International License, and the code is released under the MIT license.