Skip to content

read an EAD XML archival finding aid and output the item/folder contents, or replace stuff in the XML document

Notifications You must be signed in to change notification settings

BAM-PFA/EADoodler

Repository files navigation

EADoodler

This script uses lxml to generate a representation of an EAD XML document that you can do different stuff with. The basic EAD class handles the namespace junk that EAD includes and has a built-in function to extract all the item or folder level info from the finding aid. One major assumption is that the EAD is generated by ArchivesSpace, but it's probably applicable to other sources (as long as they use the same XML namespaces provided by ArchivesSpace/EAD spec).

Modes

You have to choose one of two modes, items or replace.

items

items is the default and in this mode the script produces a CSV file with each line representing an item or folder level description from the finding aid. It includes the System ID, Title, Date, Scope & Content Note.

replace

replace lets you select a part of the finding aid that you want to modify in bulk, provided a condition to be met and a replacement value. This requires a CSV including valid XPATH to select the target data you want to replace, a value for the XPATH to evaluate, and the replacement value. The example CSV in this directory shows the format to use. You need to use the first row for column headers.

Getting the right XPATH expression can be a pain, but there are many online XPATH constructors where you can test out your expression.

Note! You also need to include the e prefix to tag names in your XPATH expression. This captures the "empty" namespaced (non-namespaced? naked?) tags in the file. For example: //*[c03] vs //*[e:c03]. If you're using an online XPATH validator, you won't need to include namespaces, just be sure to add them in the CSV you create for this script.

For example, if you want to change all the URLs in the digital object <dao> tags, you would have these three elements in each line of the CSV:

  • XPATH expression to get to the href attribute inside the <dao> tag (this is probably the same for all the rows)
  • Some ID or other hook for the XPATH to search for, for example the id attribute of the parent <c> tag, or perhaps the unittitle or some other unique way to get to the right dao
  • The URL that you want to apply to the <dao href="__"> attribute

You could also replace the text content of a tag using the same process.

The output is a new xml file in the same directory as the input EAD file with _new appended.

future steps?

The next logical step would be to include an add mode where you can add new tags to the finding aid, like a new <scopecontent> note for all the items, or what have you.

dependencies

pip3 install lxml

About

read an EAD XML archival finding aid and output the item/folder contents, or replace stuff in the XML document

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages