Skip to content

Ingests the contents of Data Conservancy Packages into a Fedora 4 repository.

Notifications You must be signed in to change notification settings

DataConservancy/dcs-package-ingest

Repository files navigation

Simple Data Conservancy Package Ingest Service

Build Status

This package ingest service is intended transfer the contents of file archives (i.e. “packages”) into an LDP linked data repository such as Fedora.  It includes:

  • A core library in Java for ingesting packages of various formats
  • A Simple HTTP API
  • An API-X extension for exposing deposit endpoints on repository containers.

Premise

An archive contains custodial content (i.e. packaged files), and possibly additional packaging-specific metadata.  A profile defines how these are distinguished.  For example, it can be presumed that all content of a simple zip or tar file is custodial content.  BagIt defines custodial content as all files underneath a /data directory, and specifies additional “tag files” which may describe the circumstances of creating a bag (its author, date, etc), checksums for files, etc.  

The package ingest service creates a repository resource (an LDPR) from each file in the custodial content of a package.  

Additional processing rules may apply for each supported profile which may enhance the contents of LDPRs (e.g. add metadata), or create additional LDPRs.  For example, If the package relates its resources into an LDP containment or membership hierarchy, the packaging profile may provide a way to encode this information, if this information is not otherwise present within the resources in the package

The original package may be discarded, or may be kept as part of an audit trail, used for authorization, etc. based upon policy.   At minimum, the package ingest service will provide a log of all events that occurred during ingest.

If ingesting a package succeeds, further interaction with the newly created resources may be performed as usual via Fedora’s LDP-based API.

Goals

  • Accommodate arbitrarily large packages with stream-oriented processing
  • Allow the use of using simple command-line tools to deposit and verify success/failure (e.g curl, grep, etc)
  • Accommodate backend workflows and policies
  • Support synchronous and asynchronous paradigms in exposed APIs

Workflow

  1. Produce a package.  For example
  • Zipping up a file system
  • Export from a repository
  • Generating resources by some local process (e.g. a desktop GUI, laboratory instrument, etc)
  1. Choose a container in the repository to deposit into (an LDPC, identified by its URI)
  • No specific discovery mechanism is defined; it is presumed that a client can inspect repository resources and pick one to deposit into, or is given a URI for this purpose.
  1. Submit the package to the container.
  • A new member resource will be created, and contents of package placed into it
  1. Follow the deposit results.
  • An event stream indicates processing as it happens, and indicates success or failure

Quick start

A docker-compose file is provided in order to offer a way to quickly get the package ingest extension running in Docker for demonstration or evaluation purposes. It runs a API-X, Fedora, and package ingest extension docker images. See package-ingest docker for a description of the package ingest docker image, and how it is configured.

Start the package ingest extension, Fedora, and API-X in Dovker

  1. Install docker and docker-compose. See the API-X demo instructions for how to install and verify docker and docker-compose
  2. Edit the .env file to set any environment variables you want (e.g. to change the defaults). This is optional, except for users of docker-machine. Docker-machine users have to edt the APIX_BASEURI variable and change the host from localhost to the IP address of their docker-machine instance.
  3. Start the services via docker-compose up -d. Use docker-compose down to stop all containers and destroy all daya, docker-compose stop merely to stop the containers.

Deposit a package

  1. Create (or select) a container in Fedora to deposit into, vue the UI or command line (e.g. curl -X POST -H "Slug: myContainer" http://localhost/fcrepo/rest), that will create a container myContainer at http://localhost/fcrepo/rest/myContainer
  2. Obtain a package to deposit, or use a test package
  3. Use standard API-X service discovery to find the package ingest endpoint for the container, or just craft a URI that you know will work in the demo environment http://localhost/services/myContainer/dcs:ingest
  4. POST the package to the ingest endpoint for the container. curl -v -X POST -H "Content-Type: application/zip" --data-binary @my-package.zip http://localhost/services/myContainer/dcs:ingest
  5. You'll see a response of type text/event-stream. If successful, the last event will be:
    event: success
    data: Ingest successfully completed
  1. Look inside the container you just deposited into. Browse the contents of http://localhost/fcrepo/rest/myContainer

About

Ingests the contents of Data Conservancy Packages into a Fedora 4 repository.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages