RecordSearch

This repository contains Jupyter notebooks to work with data from the National Archives of Australia's RecordSearch database.

RecordSearch is the online collection database of the National Archives of Australia. Based on the series system, RecordSearch provides rich, contextual information about series, items, agencies, and functions.

Unfortunately RecordSearch doesn't provide access to machine-readable data through an API, so we have to resort to screen scraping. The notebooks here make use of either the RecordSearch Data Scraper or the older RecordSearch Tools library to handle the scraping. I'm in the process of upgrading all the notebooks to use the newer scraper.

See the RecordSearch section of the GLAM Workbench for more details.

Notebook topics

Harvesting data

Harvest items from a search in RecordSearch – save the results of an item search in RecordSearch as a downloadable dataset, you can also save images and PDFs from digitised files
Harvest files with the access status of 'closed' – find out what we're not allowed to see by harvesting details of 'closed' files
Harvest recently digitised files from RecordSearch – save details of files digitised in the past month
Harvest details of all series in RecordSearch – get details of all series registered in RecordSearch, also generates a summary dataset with the total number of items digitised, described and in each access category
Harvesting functions from the RecordSearch interface – extract information from the RecordSearch interface about the hierarchy of functions it uses to describe the work of government agencies
Harvest agencies associated with all functions – loops through the list of functions saving details of the agencies associated with each

Analysing data

Exploring harvested series data – generates some basic statistics from the harvest of series data
How many of the functions are actually used? – looks at the harvest of functions to see how many are actually in use
Who's responsible? – pick a function to which which agencies are have been responsible for it over time

Useful tools

DIY Redaction Art Collages – generates a random sample of ASIO redactions and packs them into one big image
Download the contents of a digitised file – get a digitised files as a folder full of images
Get a list of agencies associated with a function - pick a function and create a downloadable list of agencies responsible for it
DFAT Cable Finder – helps you find numbered cables created by DFAT

Data downloads

Summary data about all series in RecordSearch (15mb CSV) – contains basic descriptive information about all the series currently registered on RecordSearch (May 2021) as well as the total number of items described, digitised, and in each access category.
Recently digitised files (CSV) – containing details of files digitised between 25 February and 26 March 2021, for an ongoing record of digitised files see this repository which creates weekly snapsots.

Cite as

See the GLAM Workbench or Zenodo for up-to-date citation details.

This repository is part of the GLAM Workbench.
If you think this project is worthwhile, you might like to sponsor me on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

RecordSearch

Notebook topics

Harvesting data

Analysing data

Useful tools

Data downloads

Cite as

Files

index.md

Latest commit

History

index.md

File metadata and controls

RecordSearch

Notebook topics

Harvesting data

Analysing data

Useful tools

Data downloads

Cite as