Tools for geolocation analysis of textual data with related workshop notebook
This repository relates to a workshop that was presented at the 2022 ASA National Annual Conference for the Australian Society of Archivists (ASA) on 20 Oct 2022. The workshop taught how to use software to recognise placenames in historical documents and then use online gazetteers to determine what known locations the placenames correspond to, and gather related geolocation data like coordinates.
This site provides a combination of slides and Jupytr notebooks, along with audio presentations and explanations.
- Geolocating Australian Historical Resources: Finding placenames and locations with gazetteers (Workshop slides)
- Introduction to Named Entity Recognition with spaCy (Jupytr Python notebook)
- ATAP Notebook for the Geolocation project (Jupytr Python notebook)
This was developed as part of the ATAP project.
The above links to the Binder service enable you to load the notebook in a online cloud environment, rather than having to install the software on your own computer (it might take a little while to load). This is a free service hosted by AARnet, but any users must use CILogon, which is normally only available to academic researchers. Note that cloud sessions will close if you stop using the notebooks, and no data will be saved. Make sure you download any changed notebooks or harvested data that you want to save.
To execute each stage of a notebook, click on the "run" triangle symbol to execute the command cell you have selected. The cells are sequential, so you must have previously executed all previous cells in the correct order.
The Australian Text Analytics Platform (ATAP) is an open source environment that provides researchers with tools and training for analysing, processing, and exploring text. This includes using a range of resources, including the Language Technology and Data Analysis Laboratory (LADAL) which aims to help develop computational and digital skills by providing information and practical, hands-on tutorials on data and text analytics as well as on statistical methods relevant for language research.
The ATAP projects received investment from the Australian Research Data Commons (ARDC). The ARDC is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).
