Skip to content

VEuPathDB/util-user-dataset-handler-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exporter Script Server

Build Status Go Report Card Latest Release

Exposes configured exporter / validation scripts over HTTP.

Usage

In Script Container

Custom Use

To create a custom setup not based on the demo container, this server can be configured by performing the following steps.

🔥
Currently, this server app only supports linux & mac environments.
  1. Download the latest release from the releases page and unpack the archive in your project directory.

  2. Run the command ./service gen-config to generate a server configuration template file. This file will be named config.tpl.yml.

  3. Edit the configuration file with your desired service name and command configurations.

    ℹ️
    Check the {file-config-readme}[config file readme] for more information about the configuration file.
  4. Once you have edited the configuration file, rename it and place it in the desired relative to the service binary.

  5. Validate the configuration by running the command ./service check-config --config=path/to/your/config.yml. This command will validate the configuration and report any errors.

    Bad Config Example
    $ ./service --config=config.yml check-config

Script Expectations

This server is language agnostic when it comes to calling scripts. Anything may be used as long as it can be executed via a cli call.

Output

The script expectations are based on the Galaxy tooling. This means scripts are expected to output a .tgz file suitable for being directly loaded into iRODS.

The pattern for the tar file output name must match the iRODS trigger’s expected format of dataset_u{user-id}_t{timestamp-int64}_p{process-id}.tgz. For example: dataset_u12345_t1647364816_p132.tgz.

This means the tar file must contain a meta.json, dataset.json, and a directory named datafiles containing the actual dataset file(s).

This server will automatically pick up that file and return it to the udis service to be loaded into iRODS.

The reason for this is the HTTP server which unpacks and starts the command does not know what belongs or doesn’t belong in the iRODS tar.

Error handling

A script can indicate a fatal error by exiting with a non-zero status code. An error message will be generated using the text printed to stderr by the script.

Errors may be plaintext, a json object, or plaintext followed by a json object.

To indicate an error is a user error and not a script failure, a json object error must be returned.

Valid stderr output examples
# A plaintext error
Some error message that will be recorded as a script/server error.

# Error output followed by a final fatal error.
Some error text followed by a {"error": "json", "message": "object"}

# A structured error
{"error": "user", "message": "some user error"}

When returning both plaintext and a json object, the plaintext will be prepended to the error message in the json output.

For example:

Raw stderr output
Some error messages
Logged while executing the script
{"error": "fatal", "message": "failed to open input file"}
Recorded error
{
  "error": "fatal",
  "message": "Some error messages\nLogged while executing the script\nfailed to open input file"
}

Json Object Error Format

{
  "error": "user", (1)
  "message": "some error text" (2)
}
  1. The "error" field is used to define the type of error. A value of "user" indicates that the error was a user error and the user import service will record the error as such. Any other value will be recorded as a script error.

  2. The "message" field defines the error message to record.

🔥
Both fields are required.

Config File

Components

config.yml
service-name: my service (1)
command:
  executable: /some/path/to/your/executable (2)
  args: (3)
    - <<input-files>> (4)
  1. The display name of the running service

  2. Path to an executable script or binary to be called by the server to process uploaded datasets.

  3. An array of arguments that will be passed to the configured executable. Each array element is effectively a single, quote wrapped cli arg.

    ⚠️
    Space separated entries in the args array will be treated as a single argument.
  4. An Injected Variable

Command Context

The configured command will be executed in an isolated subshell, but will be provided the same environment as the server itself, meaning it is possible to set environment variables for the script simply by setting them on the docker container.

The output of the script’s stdout will be piped through the server’s logging mechanism and will appear in the container logs.

The output of the script’s stderr will be both piped through the server’s logging mechanism (like stdout) but will also be captured for parsing and returning to the caller.

Variables

<<cwd>>

The working directory for the job that is running in the current request.

Example
/workspace/12345

<<date>>

The current date formatted as YYYY-MM-DD.

Example
1994-02-13

<<date-time>>

The current datetime in RFC3339 format.

Example
2018-10-31T23:37:18.013557+0500

<<ds-description>>

The user provided description for the current dataset upload.

Example
My dataset upload containing foo and bar

<<ds-name>>

The user provided name for the current dataset upload.

Example
My Dataset 3

<<ds-summary>>

The user provided summary of the current dataset upload.

Example
Some summary text for my dataset upload

<<ds-origin>>

The source/origin of the user dataset should be either galaxy or direct-upload.

Example
direct-upload

<<ds-user-id>>

WDK user ID of the user that uploaded the dataset.

Example
123456

<<input-files>>

A space separated list of the files that were unpacked from the uploaded zip or tar file sorted by name ascending.

Examples
Upload Contents
dataset.tgz
 ├─ foo.txt
 ├─ bar.xml
 ├─ fizz.json
 └─ buzz/
     └─ fazz.yml
<<input-files>>
bar.xml buzz fizz.json foo.txt
<<input-files[n]>>

Array style access of the input file list allowing retrieval of a single file name from the input file list.

Examples

These use this example input.

<<input-files[0]>>
bar.xml
<<input-files[2]>>
fizz.json

<<time>>

The current time formatted as HH:MM:SS.

Example
03:47:58

<<timestamp>>

The current unix timestamp in seconds.

Example
783647299

A comma-separated list of project identifiers.

Example
VectorBase,PlasmoDB

<<handler-params.p1>>

Access of handler-specific parameter values from the request body of the import request.

Examples
Request Body
{
    "datasetName": "foo",
    "projects": [
        "PlasmoDB"
    ],
    "datasetType": "gene-list",
    "origin": "direct-upload",
    "formatParams": {
        "genome": "test-genome-1",
        "otherParam": "other-param-value"
    }
}
<<handler_params.genome>>
test-genome-1
<<handler_params.otherParam>>
other-param-value