Simple Data Conservancy Package Ingest Service
This package ingest service is intended transfer the contents of file archives (i.e. “packages”) into an LDP linked data repository such as Fedora. It includes:
- A core library in Java for ingesting packages of various formats
- A Simple HTTP API
- An API-X extension for exposing deposit endpoints on repository containers.
An archive contains custodial content (i.e. packaged files), and possibly additional packaging-specific metadata. A profile defines how these are distinguished. For example, it can be presumed that all content of a simple zip or tar file is custodial content. BagIt defines custodial content as all files underneath a
/data directory, and specifies additional “tag files” which may describe the circumstances of creating a bag (its author, date, etc), checksums for files, etc.
The package ingest service creates a repository resource (an LDPR) from each file in the custodial content of a package.
Additional processing rules may apply for each supported profile which may enhance the contents of LDPRs (e.g. add metadata), or create additional LDPRs. For example, If the package relates its resources into an LDP containment or membership hierarchy, the packaging profile may provide a way to encode this information, if this information is not otherwise present within the resources in the package
The original package may be discarded, or may be kept as part of an audit trail, used for authorization, etc. based upon policy. At minimum, the package ingest service will provide a log of all events that occurred during ingest.
If ingesting a package succeeds, further interaction with the newly created resources may be performed as usual via Fedora’s LDP-based API.
- Accommodate arbitrarily large packages with stream-oriented processing
- Allow the use of using simple command-line tools to deposit and verify success/failure (e.g curl, grep, etc)
- Accommodate backend workflows and policies
- Support synchronous and asynchronous paradigms in exposed APIs
- Produce a package. For example
- Zipping up a file system
- Export from a repository
- Generating resources by some local process (e.g. a desktop GUI, laboratory instrument, etc)
- Choose a container in the repository to deposit into (an LDPC, identified by its URI)
- No specific discovery mechanism is defined; it is presumed that a client can inspect repository resources and pick one to deposit into, or is given a URI for this purpose.
- Submit the package to the container.
- A new member resource will be created, and contents of package placed into it
- Follow the deposit results.
- An event stream indicates processing as it happens, and indicates success or failure
A docker-compose file is provided in order to offer a way to quickly get the package ingest extension running in Docker for demonstration or evaluation purposes. It runs a API-X, Fedora, and package ingest extension docker images. See package-ingest docker for a description of the package ingest docker image, and how it is configured.
Start the package ingest extension, Fedora, and API-X in Dovker
- Install docker and docker-compose. See the API-X demo instructions for how to install and verify docker and docker-compose
- Edit the
.envfile to set any environment variables you want (e.g. to change the defaults). This is optional, except for users of
docker-machine. Docker-machine users have to edt the
APIX_BASEURIvariable and change the host from
localhostto the IP address of their
- Start the services via
docker-compose up -d. Use
docker-compose downto stop all containers and destroy all daya,
docker-compose stopmerely to stop the containers.
Deposit a package
- Create (or select) a container in Fedora to deposit into, vue the UI or command line (e.g.
curl -X POST -H "Slug: myContainer" http://localhost/fcrepo/rest), that will create a container
- Obtain a package to deposit, or use a test package
- Use standard API-X service discovery to find the package ingest endpoint for the container, or just craft a URI that you know will work in the demo environment
- POST the package to the ingest endpoint for the container.
curl -v -X POST -H "Content-Type: application/zip" --data-binary @my-package.zip http://localhost/services/myContainer/dcs:ingest
- You'll see a response of type
text/event-stream. If successful, the last event will be:
event: success data: Ingest successfully completed
- Look inside the container you just deposited into. Browse the contents of