Ingests the contents of Data Conservancy Packages into a Fedora 4 repository.
Java HTML
Latest commit 7bfb2f7 Feb 23, 2017 @birkland birkland committed on GitHub Add HTTP API specification (#31)
Permalink
Failed to load latest commit information.
package-ingest-api Implement HTTP deposit sync API (#28) Feb 17, 2017
package-ingest-http
package-ingest-impl
package-ingest-integration
package-ingest-jar Implement HTTP deposit sync API (#28) Feb 17, 2017
package-ingest-karaf
package-ingest-test
.gitignore Update dependencies and style. (#23) Feb 3, 2017
.travis.yml Add travis config (#25) Feb 3, 2017
README.md
pom.xml

README.md

Simple Data Conservancy Package Ingest Service

Build Status

This package ingest service is intended transfer the contents of file archives (i.e. “packages”) into an LDP linked data repository such as Fedora.  It includes:

  • A core library in Java for ingesting packages of various formats
  • A Simple HTTP API
  • An API-X extension for exposing deposit endpoints on repository containers.

Premise

An archive contains custodial content (i.e. packaged files), and possibly additional packaging-specific metadata.  A profile defines how these are distinguished.  For example, it can be presumed that all content of a simple zip or tar file is custodial content.  BagIt defines custodial content as all files underneath a /data directory, and specifies additional “tag files” which may describe the circumstances of creating a bag (its author, date, etc), checksums for files, etc.  

The package ingest service creates a repository resource (an LDPR) from each file in the custodial content of a package.  

Additional processing rules may apply for each supported profile which may enhance the contents of LDPRs (e.g. add metadata), or create additional LDPRs.  For example, If the package relates its resources into an LDP containment or membership hierarchy, the packaging profile may provide a way to encode this information, if this information is not otherwise present within the resources in the package

The original package may be discarded, or may be kept as part of an audit trail, used for authorization, etc. based upon policy.   At minimum, the package ingest service will provide a log of all events that occurred during ingest.

If ingesting a package succeeds, further interaction with the newly created resources may be performed as usual via Fedora’s LDP-based API.

Goals

  • Accommodate arbitrarily large packages with stream-oriented processing
  • Allow the use of using simple command-line tools to deposit and verify success/failure (e.g curl, grep, etc)
  • Accommodate backend workflows and policies
  • Support synchronous and asynchronous paradigms in exposed APIs

Workflow

  1. Produce a package.  For example
    • Zipping up a file system
    • Export from a repository
    • Generating resources by some local process (e.g. a desktop GUI, laboratory instrument, etc)
  2. Choose a container in the repository to deposit into (an LDPC, identified by its URI)
    • No specific discovery mechanism is defined; it is presumed that a client can inspect repository resources and pick one to deposit into, or is given a URI for this purpose.
  3. Submit the package to the container.
    • A new member resource will be created, and contents of package placed into it
  4. Follow the deposit results.
    • An event stream indicates processing as it happens, and indicates success or failure