Pools may copy files to nearline storage such as tape (a process also refered to as "flushing") and recall those files at a later point (aka "staging").
To interface with such nearline storage systems, a driver is needed. The default driver shipped with dCache merely calls out to a configurable script. Although such scripts are easy to develop, the requirement to call out to an external script for every file being flushed to nearline storage or staged from nearline storage significantly limits the scalability of this approach. In particular for tape systems it is crucial to batch together recalls from tapes to minimize tape mounts, however a script invocation per file limits how many files can be recalled at a time.
dCache has a third party accessible plugin system. One of the pluggable components is the nearline storage interface.
A nearline storage drive must support three types of requests: flush to the nearline storage, stage from the nearline storage, and removal of a file from a nearline storage. dCache calls out to the driver providing a corresponding request object, either org.dcache.pool.nearline.spi.FlushRequest, org.dcache.pool.nearline.spi.StageRequest, or org.dcache.pool.nearline.spi.RemoveRequest. The request object describes the file that is to be flushed, staged or removed, and acts as a callback through which the driver reports progress, completion or failure of the request.
Each request object has a unique ID and dCache may cancel a nearline request at any time using this ID.
In dCache, a physical copy of a logical file is called a replica. Thus pools contain replicas. It is the responsibility of the nearline storage driver to copy replicas from the pool to the nearline storage and to copy replicas back from the nearline storage into the pool.
Traditionally, dCache pools have stored replicas in the local file system, and a
flush or stage request would refer to the replica in the local file system by
path. Starting with dCache 3.0, replica store plugins may provide alternative
backends such as CEPH and thus replicas in a pool may not be accessible through
the local file system. For this reason, dCache 3.0 and newer identify the
replica in the pool by URI. Unless an alternative replica store is used, such a
URI will always use the file:
scheme. If compatibility with versions earlier
than 2.17 is desired, a driver should refrain from using the getReplicaUri
method.
Once flushed to a nearline storage, the replica is identified by a URI generated
by the driver. This URI should use the nearline storage type (typically
enstore
or osm
) as the scheme and the nearline storage instance name as the
authority. The rest of the URI can be used to encode other information needed by
the driver to identify the file. dCache stores the generated URI in its name
space. Upon recalling a replica from nearline storage or removing it from
nearline storage, the URI is used to identify it.
From the point of view of dCache, the request goes through three steps: QUEUED, ACTIVE and then either COMPLETED or FAILED. It is the responsibility of the driver to queue and schedule requests as appropriate. This is because only the driver knows e.g. the tape layout and can schedule requests to minimize tape mounts. The driver must however issue the appropriate callbacks to dCache to signal the transition from QUEUED to ACTIVE and from ACTIVE to either COMPLETED or FAILED.
A nearline request has a completion deadline. Once the deadline is reached, the pool will likely cancel the request. This is not a guarantee that it will be cancelled, nor is this a guarantee that it will not be cancelled earlier. A driver may use the deadline in it scheduling decision.
Upon activating a request, the driver must be prepared that dCache aborts the
operation. E.g. upon flushing a file, dCache will verify that the file still
exists in the name space and if not will abort the flush. Similarly, a driver
must be prepared that the activation is not instantaneous. E.g. upon staging a
file from tape, the pool does not reserves space for the file until the stage is
activated and thus activation may block until space becomes available (files are
garbage collected or previously staged files are migrated away from the pool).
For this reason the activation callback is asynchronous, providing a
ListenableFuture
as a result. The driver can obtain the result of the
activation from this future once it is available.
A third party plugin must implement two interfaces: An implementation of org.dcache.pool.nearline.spi.NearlineStorageProvider acts as a loader or factory for the nearline storage, while org.dcache.pool.nearline.spi.NearlineStorage is that actual driver instance.
dCache uses the Java service loader facility to discover plugins. To make the
plugin discoverable by dCache, the compiled classes and a file called
META-INF/services/org.dcache.pool.nearline.spi.NearlineStorageProvider
have to
be packaged as a jar file. The
org.dcache.pool.nearline.spi.NearlineStorageProvider
file must contain the
fully qualified class name of the NearlineStorageProvider
implementation.
Place this jar in a sub-directory underneath /usr/local/share/dcache/plugins/
.
After restarting the dCache pool, the driver should be visible in the admin
shell.
To simplify setting up the basic structure of a nearline storage plugin, we provide a Maven archetype. See below for further details.
Although it appears simple, the NearlineStorageProvider
interface can be
difficult to implement as it is an asynchronous interface. Only an asynchronous
interface allows the nearline storage SPI to scale to tens or hundreds of
thousands simulatenous requests. Compared to the classic callout to a script,
even a blocking implementation will provide significant advantages.
To simplify development, dCache provides the AbstractBlockingNearlineStorage base class. Subclasses implement the abstract flush, stage and remove methods in a blocking fashion and the base class deal with request activation, cancellation and completion or failure callbacks. Subclasses provide three thread pools to use, allowing full control over concurrency and request queueing.
In fact, the script
driver shipped with dCache is a subclass of
AbstractBlockingNearlineStorage
.
The easiest way to develop your own nearline storage plugin is to use a Maven
Archetype, a template for new Maven projects. dCache provides an archetype, called
dcache-nearline-plugin-archetype
, that generates all essential directories and
files to create a ready to use plugin. To use the archetype for your own plugin,
you have to execute the following steps:
First, you need to get the source code of dCache and change it to version 6.0.11. This can easily be done with
$ git clone https://github.com/dCache/dcache.git
$ cd dcache
$ git checkout 6.0.11
Furthermore, you have to check if Maven uses Java 8. To check the version number, run
$ mvn -v
This will give you an output with the java-version among other information. You are now ready to install the archetype. Therefore run the following command in your dCache directory:
$ mvn clean install -DskipTests -am -pl archetypes/dcache-nearline-plugin-archetype
Now you're ready to generate the archetype. Head into the directory where you want to develop the plugin and run
$ mvn archetype:generate -DarchetypeCatalog=local
Maven will prompt you to select an archetype from a list. Select
org.dcache:dcache-nearline-plugin-archetype
. Maven will then prompt you to
enter a group ID (typically starts with a domain name under your control in
reverse order, e.g. org.example
), an artifact ID (which identifies your Maven
project), a project version number, a Java package name (often starts with the
group ID), a descriptive string, and a driver name that identifies it in the
admin interface.
The dCache version defaults to the archetype version. It can be altered by refusing to accept the entered values. This will cause Maven to prompt for the values a second time and this time is also prompts for the dCache version. This should rarely be necessary for a nearline storage plugin, as the SPI does not change often. There is a good chance that your plugin will work with many different versions of dCache (we provide no guarantees though!).
Example:
$ mvn archetype:generate -DarchetypeCatalog=local
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:3.1.2:generate (default-cli) > generate-sources @ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:3.1.2:generate (default-cli) < generate-sources @ standalone-pom <<<
[INFO]
[INFO]
[INFO] --- maven-archetype-plugin:3.1.2:generate (default-cli) @ standalone-pom ---
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: local -> org.dcache:dcache-nearline-plugin-archetype (dCache Nearline Storage Plugin Archetype)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 1
Define value for property 'groupId': org.example
Define value for property 'artifactId': test-nearlinestorage
Define value for property 'version' 1.0-SNAPSHOT: :
Define value for property 'package' org.example: : org.example.dcache
[INFO] Using property: dcache = 6.0.11
Define value for property 'description': Test NearlineStorage to demonstrate the archetype
Define value for property 'name' org.example.dcache.test-nearlinestorage: : org.example.test
Confirm properties configuration:
groupId: org.example
artifactId: test-nearlinestorage
version: 1.0-SNAPSHOT
package: org.example.dcache
dcache: 6.0.11
description: Test NearlineStorage to demonstrate the archetype
name: org.example.test
Y: : y
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: dcache-nearline-plugin-archetype:6.0.11
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: org.example
[INFO] Parameter: artifactId, Value: test-nearlinestorage
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: package, Value: org.example.dcache
[INFO] Parameter: packageInPathFormat, Value: org/example/dcache
[INFO] Parameter: package, Value: org.example.dcache
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: name, Value: org.example.test
[INFO] Parameter: dcache, Value: 6.0.11
[INFO] Parameter: groupId, Value: org.example
[INFO] Parameter: description, Value: Test NearlineStorage to demonstrate the archetype
[INFO] Parameter: artifactId, Value: test-nearlinestorage
[INFO] Project created from Archetype in dir: /private/tmp/test-nearlinestorage
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:56 min
[INFO] Finished at: 2020-03-18T12:26:17+01:00
[INFO] ------------------------------------------------------------------------
The result is a ready to compile Maven project:
$ cd test-nearlinestorage/
$ tree
.
├── LICENSE.txt
├── README.md
├── pom.xml
└── src
└── main
├── assembly
│ └── plugin.xml
├── defaults
│ └── org.example.test.properties
├── docs
│ └── README.md
├── java
│ └── org
│ └── example
│ └── dcache
│ ├── PluginNearlineStorage.java
│ └── PluginNearlineStorageProvider.java
└── resources
└── META-INF
└── services
└── org.dcache.pool.nearline.spi.NearlineStorageProvider
12 directories, 9 files
To build your plugin and create a jar file, you can use Maven, too:
$ mvn package
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.example:test-nearlinestorage >------------------
[INFO] Building Test NearlineStorage to demonstrate the archetype 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ test-nearlinestorage ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ test-nearlinestorage ---
[INFO] Changes detected - recompiling the module!
[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!
[INFO] Compiling 2 source files to /private/tmp/test-nearlinestorage/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ test-nearlinestorage ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /private/tmp/test-nearlinestorage/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ test-nearlinestorage ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ test-nearlinestorage ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ test-nearlinestorage ---
[INFO] Building jar: /private/tmp/test-nearlinestorage/target/test-nearlinestorage-1.0-SNAPSHOT.jar
[INFO]
[INFO] --- maven-assembly-plugin:2.6:single (make-assembly) @ test-nearlinestorage ---
[INFO] Reading assembly descriptor: src/main/assembly/plugin.xml
[INFO] Copying files to /private/tmp/test-nearlinestorage/target/test-nearlinestorage-1.0-SNAPSHOT
[WARNING] Assembly file: /private/tmp/test-nearlinestorage/target/test-nearlinestorage-1.0-SNAPSHOT is not a regular file (it may be a directory). It cannot be attached to the project build for installation or deployment.
[INFO] Building tar: /private/tmp/test-nearlinestorage/target/test-nearlinestorage-1.0-SNAPSHOT.tar.gz
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.508 s
[INFO] Finished at: 2020-03-18T12:28:41+01:00
[INFO] ------------------------------------------------------------------------
The resulting jar-file in target/
can be shipped to customers and moved to
/usr/local/share/dcache/plugins
. Obviously you need to modify the generated
files to do something useful first.
The simplest functional nearline storage driver is probably the copy
driver
that ships with dCache. It sequentially copies the replicas to and from another
directory in the local file system. The main class is
CopyNearlineStorage,
although the bulk of the code is in the base class
FileSystemNearlineStorage,
itself a subclass of AbstractBlockingNearlineStorage
.
For a more complicated example, check out the script
driver. It demonstrates
configurable concurrency. The main class is
ScriptNearlineStorage.
For a full blown asynchronous driver, check out the [Endit Provider] (https://github.com/neicnordic/dcache-endit-provider/) developed at NDGF to interface with the Endit tape integration system.