The examples in this project show how to implement a client in Java that interacts with the EASY SWORD2 Deposit service at DANS.
Depositing in EASY via the SWORD v2.0 protocol is basically a two-phase process:
- Submitting a deposit for ingest.
- Tracking the state of the deposit as it goes through the ingest-flow, until it reaches ARCHIVED status.
The following diagram details this a bit further.
- Client creates a deposit package.
- Client sends deposit package to SWORD2 Service, getting back a URL to track the deposit's state.
- SWORD Service unzips and validates deposit.
- EASY Ingest Flow performs checks and transformations and creates a dataset in Archival Storage.
- EASY Ingest Flow reports back success or failure to SWORD Service.
3-5. During this time the Client periodically checks the deposit state through the URL received in step 2.
If the final state of
ARCHIVED is reached, the process is concluded successfully. Other outcomes may be
INVALID (the package did not meet the requirements of the SWORD service)
REJECTED (the package did not meet the requirements of the EASY Ingest Flow).
In case the server encountered an unknown error
FAILED will be returned.
The following is a step-by-step instruction on how to run a simple example using the DANS acceptance test server at https://demo.easy.dans.knaw.nl/.
Getting access to the acceptance server
- From your account manager at DANS request access to the acceptance test server. The account manager will provide the information necessary to connect.
- Create an EASY account via https://demo.easy.dans.knaw.nl/ui/register.
- From your account manager at DANS request the account to be enabled for SWORD deposits.
- From your account manager at DANS inquire which flow (see next section) the account is configured for.
- You will start receiving reports via e-mail concerning the deposits you are sending.
Depending on the type of agreement that the depositor organization has with DANS, your deposits will be processed by different flows. The flow configured for your account will be one of the following:
Agreement- The datasets will be disseminated by DANS. DANS will mint DOIs for the datasets.
NoAccess- The files are not to be disseminated by DANS. The depositor organization must mint DOIs for the datasets.
NoDoi- The files are not to be disseminated by DANS. The depositor organization must not mint DOIs for the datasets, DANS will not mint DOIs for the datasets.
Depositing your first dataset
Running the SimpleDeposit example
If your account is configured for
NoAccessthe following extra step is required (for
Agreementyou can skip this, for
NoDoiyou can use the
- Copy the directory
src/main/resources/noaccess-flow/valid/audiencesto a temporary directory, say
- Change the DOI in
audiences/metadata/dataset.xmlto another value (it must be unique).
- Calculate the MD5 checksum for
- Change the line for
audiences/tagmanifest-md5.txtoverwriting the existing MD5 with the new one.
- Copy the directory
Execute the following command from the base directory of you clone of this project:
./run.sh Simple https://demo.easy.dans.knaw.nl/sword2/collection/1 <user> <password> <bag>
<user>your EASY account name;
<password>the password of your EASY account;
src/main/resources/agreement-flow/valid/audiencesif you account is configured for
tmp/audiencesif you account is configured for
In the introduction the SWORD2 ingest process is described in 5 stages, the response messages give some indication how far along the process is. The output will take the following form, starting with the part of the response representing step 2. The UUID will of course be different.
SUCCESS. Deposit receipt follows: <entry xmlns="http://www.w3.org/2005/Atom"> <generator uri="http://www.swordapp.org/" version="2.0" /> <id>https://demo.easy.dans.knaw.nl/sword2/container/a5bb644a-78a3-47ae-907a-0bdf162a0cd4</id> <link href="https://demo.easy.dans.knaw.nl/sword2/container/a5bb644a-78a3-47ae-907a-0bdf162a0cd4" rel="edit" /> <link href="https://demo.easy.dans.knaw.nl/sword2/container/a5bb644a-78a3-47ae-907a-0bdf162a0cd4" rel="http://purl.org/net/sword/terms/add" /> <link href="https://demo.easy.dans.knaw.nl/sword2/media/a5bb644a-78a3-47ae-907a-0bdf162a0cd4" rel="edit-media" /> <packaging xmlns="http://purl.org/net/sword/terms/">http://purl.org/net/sword/package/BagIt</packaging> <link href="https://demo.easy.dans.knaw.nl/sword2/statement/a5bb644a-78a3-47ae-907a-0bdf162a0cd4" rel="http://purl.org/net/sword/terms/statement" type="application/atom+xml; type=feed" /> <treatment xmlns="http://purl.org/net/sword/terms/"> unpacking  verifying integrity  storing persistently</treatment> <verboseDescription xmlns="http://purl.org/net/sword/terms/">received successfully: bag.zip; MD5: 494dd614e36edf5c929403ed7625b157</verboseDescription> </entry> Retrieving Statement IRI (Stat-IRI) from deposit receipt ... Stat-IRI = https://demo.easy.dans.knaw.nl/sword2/statement/a5bb644a-78a3-47ae-907a-0bdf162a0cd4
As the deposit is being processed by the server the client polls the Stat-IRI to track the status of the deposit. During this stage steps 3 and 4 are performed.
Start polling Stat-IRI for the current status of the deposit, waiting 10 seconds before every request ... Checking deposit status ... SUBMITTED Checking deposit status ... SUBMITTED Checking deposit status ... SUBMITTED Checking deposit status ... SUBMITTED
The 5th and final step of the process is represented by the following response messaging.
Checking deposit status ... ARCHIVED SUCCESS. Deposit has been archived at: <urn:uuid:a5bb644a-78a3-47ae-907a-0bdf162a0cd4>. With DOI: [10.17026/test-Lwgy-zrn-jfyy]. Dataset landing page will be located at: <https://demo.easy.dans.knaw.nl/ui/datasets/id/easy-dataset:24>. Complete statement follows: <feed xmlns="http://www.w3.org/2005/Atom"> <id>https://demo.easy.dans.knaw.nl/sword2/statement/a5bb644a-78a3-47ae-907a-0bdf162a0cd4</id> <link href="https://demo.easy.dans.knaw.nl/sword2/statement/a5bb644a-78a3-47ae-907a-0bdf162a0cd4" rel="self" /> <title type="text">Deposit a5bb644a-78a3-47ae-907a-0bdf162a0cd4</title> <author> <name>DANS-EASY</name> </author> <updated>2019-05-23T14:51:15.356Z</updated> <category term="ARCHIVED" scheme="http://purl.org/net/sword/terms/state" label="State">http://demo.easy.dans.knaw.nl/ui/datasets/id/easy-dataset:24</category> <entry> <content type="multipart/related" src="urn:uuid:a5bb644a-78a3-47ae-907a-0bdf162a0cd4" /> <id>urn:uuid:a5bb644a-78a3-47ae-907a-0bdf162a0cd4</id> <title type="text">Resource urn:uuid:a5bb644a-78a3-47ae-907a-0bdf162a0cd4</title> <summary type="text">Resource Part</summary> <updated>2019-05-23T14:51:22.342Z</updated> <link href="https://doi.org/10.5072/dans-Lwgy-zrn-jfyy" rel="self" /> </entry> </feed>
The deposit will go through a number of statuses. The following statuses are possible after sending a SWORD deposit:
||The deposit is being prepared by the depositor. It is not submitted to the archive yet and still open for additional data.|
||The deposit is in the process of being submitted. It is waiting to be finalized. The data is completely uploaded. It will automatically move to the next stage and the status will be updated accordingly.|
||The deposit is in the process of being submitted. It is being checked for validity. It will automatically move to the next stage and the status will be updated accordingly.|
||The deposit is not accepted by the archive as the submitted bag is not valid. The description will detail what part of the bag is not according to specifications. The depositor is asked to fix the bag and resubmit the deposit.|
||The deposit is valid and being processed by the Ingest Flow. It will automatically move to the next stage and the status will be updated accordingly.|
||The deposit does not meet the requirements of the Ingest Flow for its type. The description will detail what part of the deposit is not according to specifications. The depositor is asked to fix and resubmit the deposit.|
||The deposit failed to be archive because of an unexpected condition during the Ingest Flow. DANS monitors the FAILED reports and aims to fix these issues as readily as possible. A following report should typically list the FAILED deposits as ARCHIVED.|
||The deposit is successfully archived in the data vault.|
If an error occurs the deposit will end up INVALID, REJECTED (client error) or FAILED (server error).
The text of the
category element will contain details about the state.
Creating test data
The easy-sword2 service requires deposits to be sent as zipped bags (see BagIt). The EASY archive adds some extra requirements. These are documented in the DANS BagIt Profile. A command line tool called xmllint can be used to validate xml files locally.
Some examples of bags which meet the specifications of the SWORD depositing interface can be found in the resources directory. These bags are categorized by the flow which they are designed for. You can use these as starting points for you test data or start a new bag from scratch (see next section).
Creating your own examples
To upload a dataset it must be properly formatted. Some example bags can be found in the resources directory, as well as the specifications the bags must follow.
A dataset can be created by performing the following steps. For this you will need the
bagit command line tool which is only available on MacOS and can be installed
brew command. See this blog post for a list of other BagIt tools.
mkdir my-bag; mkdir my-bag/data; mkdir my-bag/metadata; bagit baginplace my-bagto create the bag
- Place the data files in the
- Create the
my-bag/metadata/files.xmladd the appropriate metadata. See DANS BagIt Profile and the pre-made examples for guidance about what constitutes appropriate metadata.
- Update the the
my-bag/bag-info.txtto include the Created date:
- Update the checksums with
bagit makecomplete my-bag my-bag --payloadmanifestalgorithm SHA1
- verify that the bag is valid according to Bagit with
bagit verifyvalid my-bag
Testing different scenarios
This project contains 4 Java example programs which can be used as a guide to writing a custom client to deposit datasets using the SWORD2 protocol.
The examples take one or more bags as input parameters. These bags may be directories or ZIP files.
The code copies each bag to the
target-folder of the project, zips it (if necessary) and sends it to the specified SWORDv2 service.
The copying step has been built in because in some examples the bag must be modified before it is sent.
SimpleDeposit.javasends a zipped dataset in a single chunk and reports on the status.
ContinuedDeposit.javasends a zipped bag in chunks of configurable size and reports on the status.
SequenceSimpleDeposit.javacalls the SimpleDeposit class multiple times to send multiple bags belonging to a sequence.
SequenceContinuedDeposit.javacalls the ContinuedDeposit class multiple times to send multiple bags belonging to a sequence.
Common.java class contains elements which are used by all the other classes. This would include parsing, zipping and sending of files.
The project directory contains a
run.sh script that can be used to invoke the Java programs. For example:
mvn clean install # Only necessary if the code was not previously built. ./run.sh Simple https://demo.easy.dans.knaw.nl/sword2/collection/1 myuser mypassword bag ./run.sh Continued https://demo.easy.dans.knaw.nl/sword2/collection/1 myuser mypassword chunksize bag ./run.sh SequenceSimple https://demo.easy.dans.knaw.nl/sword2/collection/1 myuser mypassword bag1 bag2 bag3 ./run.sh SequenceContinued https://demo.easy.dans.knaw.nl/sword2/collection/1 myuser mypassword chunksize bag1 bag2 bag3
DANS sends out e-mails concerning the status of the deposits both in the deposit area and the DANS archives.
DOI report for prefix <prefix>
<prefix>-doi-report-<date>.csv: An overview of all the doi with this prefix in the DANS archives.
DANS-EASY Error report: status of failed EASY deposits this e-mail contains two reports about failed deposits:
DANS-EASY-report-error-yesterday-<date>.csv: A deposit-report with all the FAILED / REJECTED / INVALID deposits of the last day. This
DANS-EASY-report-error-<date>.csv: A deposit-report with all the failed deposits that are in the deposit area. In case a
REJECTEDdeposit has been resend, the old one is still mentioned here.
DANS-EASY Report: status of EASY deposits, Een e-mail met rapportages over alle deposits in de deposit area:
DANS-EASY-report-full-yesterday-<date>.csv: A deposit-report containing all the deposits made in the last day, both
DANS-EASY-report-summary-<date>.txt: A summary of the data that's being held in the deposit area, split into the different Statuses
DANS-EASY-report-summary-yesterday-<date>.txt: A summary of the data that's being added to the deposit area in the last day, split into the different Statuses.
The deposit-reports are csv files with the following columns:
|DEPOSITOR||the account name of the depositor|
|DEPOSIT_ID||the UUID under which the deposit is registered at DANS-EASY|
|BAG_NAME||the directory name of the bag|
|DEPOSIT_STATE||the state of the deposit, see the
|ORIGIN||the source of the deposit, either SWORD or an internal source|
|LOCATION||the current location of the deposit|
|DANS_DOI||the DOI that DANS-EASY assigns to the deposit, if any|
|ORGANIZATIONAL_ID||the organizational identifier given by the depositor in the bag-info.txt, if any|
|DOI_REGISTERED||whether the DANS_DOI has been registered at Datacite|
|FEDORA_ID||the identifier of the deposit in the web interface.|
|DATAMANAGER||the name of the datamanager assigned to the deposit, or
|DEPOSIT_UPDATE_TIMESTAMP||the timestamp of the last update on this deposit during the ingest into the DANS archive|
|DESCRIPTION||a description of the current state of the deposit. To be used together with DEPOSIT_STATE|
|NBR_OF_CONTINUED_DEPOSITS||the number of packages received for this deposit so far|
|STORAGE_IN_BYTES||the amount of data stored in the deposit area for this deposit|