A script to automate and version the export of updated data from ArchivesSpace.
- Python 3.4 or higher Make sure you install the correct version. On some operating systems, this may require additional steps. It is also helpful to have pip installed.
- ArchivesSnake
- requests_toolbelt
- git
-
Install dependencies
-
Get a copy of the repo
git clone git@github.com:RockefellerArchiveCenter/as_export.git
or just download the zip file of this repo
-
Create a local configuration file named
local_settings.cfg
in the same directory as the script and add variables. A sample file looks like this:[ARCHIVESSPACE] baseurl:http://localhost:8089 repository:2 user:admin password:admin [EAD] unpublished:false daos:true numbered:false [LAST_EXPORT] filepath:last_export.txt [DESTINATIONS] data = data ead = ead mets = mets
-
Set up repositories
-
Create local git repositories at your data export locations
git init
-
Create Github repositories to push to
-
Add a remote named
github
in each of your local repositories pointing to the appropriate Github repositorygit remote add github git@github.com:YourGithubAccount/YourRepo.git
-
Create (if necessary) and add your SSH key to Github
-
Make sure your Github username and email are correctly configured on the server
git config --global user.name "Your Name" git config --global user.email you@example.com
-
-
Set a cron job to run
as_export.py
at an interval of your choice. This should be done in the crontab of the user whose SSH key has been added to Github.
The first time you run this, the script may take some time to execute, since it will attempt to export all published resource records in your ArchivesSpace repository. If you ever want to do a complete export, simply delete last_export.txt
and the last_export
variable will be set to zero (i.e. the epoch, which was long before ArchivesSpace or any of the resources in it existed).
The script supports a few arguments, which will include or exclude specific functions. These arguments are also available via the command line by typing as_export -h
.
--update_time
updates last exported time stored in external file to current time. Useful when you want to avoid exporting everything after you've run reindexing when migrating to a new version.
--digital
exports METS for all digital object records, regardless of when those resources were last updated. When this argument is used, the script does not update the last run time.
--resource %identifier%
exports EAD for a specific resource record matching the ArchivesSpace %identifier%
, regardless of when that resource was last updated. When this argument is used, the script does not update the last run time.
--resource_digital %identifier%
exports METS digital object records associated with the the resource record matching the ArchivesSpace %identifier%, regardless of when those records were last updated. When this argument is used, the script does not update the last run time.
Exports EAD files from published resource records updated since last export (including updates to any child components or associated agents and subjects), as well as METS records for digital object records associated with those resource records. If a resource record is unpublished, this script will remove the EAD, PDF and any associated METS records. Exported or deleted files are logged to a text file log.txt
. (Python)
This repository contains a configuration file for git pre-commit hooks which help ensure that code is linted before it is checked into version control. It is strongly recommended that you install these hooks locally by installing pre-commit and running pre-commit install
.
This code is released under the MIT License. See LICENSE.md
for more information.