Skip to content

artefactual-labs/AIPscan

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

GitHub CI codecov

About

AIPscan was developed to provide a more in-depth reporting solution for Archivematica users. It crawls METS files from AIPs in the Archivematica Storage Service to generate tabular and visual reports about repository holdings. It is designed to run as a stand-alone add-on to Archivematica. It only needs a valid Storage Service API key to fetch source data.

License

Apache License Version 2.0
Copyright Artefactual Systems Inc (2021)

Contents

Screenshots

AIPScan fetch job

screencap1

Finding an AIP

screencap2

Viewing an AIP

screencap3

Selecting a report

screencap4

Example: pie chart "format types" report

screencap5

Example: tabular "largest files" report

screencap6

Installation

AIPscan is a web-based application that is built using the Python Flask micro-framework. Below are the developer quickstart instructions. See INSTALL for production deployment instructions. See CONTRIBUTING for guidelines on how to contribute to the project, including how to create a new AIPscan report.

AIPScan Flask server

  • Clone files and cd to directory: git clone https://github.com/artefactual-labs/AIPscan && cd AIPscan
  • Set up virtualenv in the project root directory: virtualenv -p python3 venv
  • Activate virtualenv: source venv/bin/activate
  • Install requirements (this includes Flask & Celery): pip install -r requirements/base.txt
  • Enable DEBUG mode if desired for development: export FLASK_CONFIG=dev
  • In a terminal window, start the Flask server: python run.py
  • Confirm that the Flask server and AIPscan application are up and running at localhost:5000 in your browser.. You should see a blank AIPscan page like this:

screencap5

Background workers

Crawling and parsing many Archivematica AIP METS xml files at a time is resource intensive. Therefore, AIPscan uses the RabbitMQ message broker and the Celery task manager to coordinate this activity as background worker tasks. Both RabbitMQ and Celery must be running properly before attempting a METS fetch job.

RabbitMQ

You can downnload and install RabbitMQ server directly on your local or cloud machine or you can run it in either location from a Docker container.

Docker installation

docker run --rm \
  -it \
  --hostname my-rabbit \
  -p 15672:15672 \
  -p 5672:5672 rabbitmq:3-management

Download and install

  • Download RabbitMQ installer.

  • In another terminal window, start RabbitMQ queue manager:

    export PATH=$PATH:/usr/local/sbin
    sudo rabbitmq-server

RabbitMQ dashboard

  • The RabbitMQ dashboard is available at http://localhost:15672/
  • username: guest / password: guest
  • AIPScan connects to the RabbitMQ queue on port :5672.

Celery

Celery is installed as a Python module dependency during the initial AIPscan requirements import command: pip install -r requirements.txt

To start up Celery workers that are ready to receive tasks from RabbitMQ:

  • Open a new terminal tab or window.
  • Navigate to the AIPscan root project directory.
  • Activate the Python virtualenv in the AIPscan project directory so that the Celery dependency gets automatically loaded: source venv/bin/activate
  • Enter the following command:
    celery worker -A AIPscan.worker.celery --loglevel=info
  • You should see terminal output similar to this to indicate that the Celery task queu is ready:

screencap6

Development

Requires Docker CE and Docker Compose.

Clone the repository and go to its directory:

git clone https://github.com/artefactual-labs/AIPscan
cd AIPscan

Build images, initialize services, etc.:

docker-compose up -d

Optional: attach AIPscan to the Docker Archivematica container network directly:

docker-compose -f docker-compose.yml -f docker-compose.am-network.yml up -d

In this case, the AIPscan Storage Service record's URL field can be set with the Storage Service container name:

http://archivematica-storage-service:8000

Access the logs:

docker-compose logs -f aipscan rabbitmq celery-worker

Shut down the AIPscan Docker containers:

docker-compose down

Shut down the AIPscan Docker containers and remove the rabbitmq volumes:

docker-composer down --volumes

Usage

  • Ensure that the Flask Server, RabbitMQ server, and Celery worker queue are up and running.
  • Go to localhost:5000 in your browser.
  • Select "New Storage Service"
  • Add an Archivematica Storage Service record, including API Key, eg. https://amdemo.artefactual.com:8000
  • Select "New Fetch Job"
  • Check the black and green terminal to confirm that AIPscan successfully connected to the Archivematica Storage Service, that it received the lists of available packages from Archivematica, and that it has begun downloading and parsing the AIP METS files.
  • This could take a while (i.e. a few hours) depending on the total number of AIPs in your Storage Service and the size of your METS xml files. Therefore, if you have the option, it is recommended that you test AIPscan on a smaller subset of your full AIP holdings first. This should help you estimate the total time to run AIPscan against all packages in your Storage Service.
  • When the Fetch Job completes, select "View AIPs" button, "AIPs" menu, or "Reports" menu to view all the interesting information about your Archivematica content in a variety of layouts.

About

Crawl Archivematica AIPs to provide repository-wide reporting.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published