Skip to content

Commit

Permalink
Alibaba Cloud BCS backend and OSS storage nio interface (#3101)
Browse files Browse the repository at this point in the history
  • Loading branch information
crisish authored and geoffjentry committed Jan 9, 2018
1 parent 0fb8584 commit fdd70eb
Show file tree
Hide file tree
Showing 67 changed files with 5,737 additions and 4 deletions.
20 changes: 20 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,11 @@ lazy val gcsFileSystem = (project in file("filesystems/gcs"))
.dependsOn(core % "test->test")
.dependsOn(cloudSupport % "test->test")

lazy val ossFileSystem = (project in file("filesystems/oss"))
.withLibrarySettings("cromwell-ossFileSystem", ossFileSystemDependencies)
.dependsOn(core)
.dependsOn(core % "test->test")

lazy val databaseSql = (project in file("database/sql"))
.withLibrarySettings("cromwell-database-sql", databaseSqlDependencies)

Expand Down Expand Up @@ -100,9 +105,20 @@ lazy val jesBackend = (project in backendRoot / "jes")
.dependsOn(gcsFileSystem % "test->test")
.dependsOn(services % "test->test")

lazy val bcsBackend = (project in backendRoot / "bcs")
.withLibrarySettings("cromwell-bcs-backend", bcsBackendDependencies)
.dependsOn(backend)
.dependsOn(ossFileSystem)
.dependsOn(gcsFileSystem)
.dependsOn(core % "test->test")
.dependsOn(backend % "test->test")
.dependsOn(ossFileSystem % "test->test")
.dependsOn(services % "test->test")

lazy val engine = project
.withLibrarySettings("cromwell-engine", engineDependencies, engineSettings)
.dependsOn(backend)
.dependsOn(ossFileSystem)
.dependsOn(gcsFileSystem)
.dependsOn(wdl)
.dependsOn(cwl)
Expand All @@ -112,6 +128,7 @@ lazy val engine = project
// For now, all the engine tests run on the "Local" backend, an implementation of an impl.sfs.config backend.
.dependsOn(sfsBackend % "test->compile")
.dependsOn(gcsFileSystem % "test->test")
.dependsOn(ossFileSystem % "test->test")

// Executables

Expand All @@ -132,6 +149,7 @@ lazy val root = (project in file("."))
// Next level of projects to include in the fat jar (their dependsOn will be transitively included)
.dependsOn(engine)
.dependsOn(jesBackend)
.dependsOn(bcsBackend)
.dependsOn(tesBackend)
.dependsOn(sparkBackend)
.dependsOn(engine % "test->test")
Expand All @@ -149,8 +167,10 @@ lazy val root = (project in file("."))
.aggregate(dockerHashing)
.aggregate(engine)
.aggregate(gcsFileSystem)
.aggregate(ossFileSystem)
.aggregate(jesBackend)
.aggregate(services)
.aggregate(bcsBackend)
.aggregate(sfsBackend)
.aggregate(sparkBackend)
.aggregate(tesBackend)
Expand Down
49 changes: 49 additions & 0 deletions cromwell.examples.conf
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,14 @@ engine {
# gcs {
# auth = "application-default"
# }
# oss {
# auth {
# endpoint = ""
# access-id = ""
# access-key = ""
# security-token = ""
# }
# }
local {
#enabled: true
}
Expand Down Expand Up @@ -319,6 +327,47 @@ backend {
# }
#}

#BCS {
# actor-factory = "cromwell.backend.impl.bcs.BcsBackendLifecycleActorFactory"
# config {
# root = "oss://your-bucket/cromwell-exe"
# dockerRoot = "/cromwell-executions"
# region = ""

# #access-id = ""
# #access-key = ""
# #security-token = ""

# filesystems {
# oss {
# auth {
# #endpoint = ""
# #access-id = ""
# #access-key = ""
# #security-token = ""
# }
# }
# }

# default-runtime-attributes {
# #failOnStderr: false
# #continueOnReturnCode: 0
# #cluster: "cls-mycluster"
# #mounts: "oss://bcs-bucket/bcs-dir/ /home/inputs/ false"
# #docker: "ubuntu/latest oss://bcs-reg/ubuntu/"
# #userData: "key value"
# #reserveOnFail: true
# #autoReleaseJob: true
# #verbose: false
# #workerPath: "oss://bcs-bucket/workflow/worker.tar.gz"
# #systemDisk: "cloud 50"
# #dataDisk: "cloud 250 /home/data/"
# #timeout: 3000
# #vpc: "192.168.0.0/16 vpc-xxxx"
# }
# }
#}

#SGE {
# actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
# config {
Expand Down
252 changes: 252 additions & 0 deletions docs/backends/BCS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
**Alibaba Cloud BCS Backend**

This backend adds support for execution jobs on Alibaba Cloud's BatchCompute service in a workflow.

### Configuring Backend

The backend is specified via the actor factory `BcsBackendLifecycleActorFactory`:

```hocon
backend {
providers {
BCS {
config {
actor-factory = "cromwell.backend.impl.bcs.BcsBackendLifecycleActorFactory"
# ... other configuration
}
}
}
}
```

You'll likely also want to change the default backend to your new backend, by setting this configuration value:

```hocon
backend {
providers {
default = BCS
}
}
```

Before reading further in this section please see the [Getting started on Alibaba Cloud](../tutorials/BCSIntro.md) for instructions on configuring to Alibaba Cloud services.

The configuration file for Alibaba Cloud will look like the following.

```hocon
backend {
providers {
BCS {
config {
actor-factory = "cromwell.backend.impl.bcs.BcsBackendLifecycleActorFactory"
root = "oss://<test-bucket>/cromwell-dir"
region = "<test-region>"
access-id = "<test-access-id>"
access-key = "<test-access-key>"
filesystems {
# ... to be filled in
}
default-runtime-attributes {
# ... to be filled in
}
}
}
}
}
```

- `<test-bucket>` : OSS bucket name.
- `<test-region>` : Region in Alibaba Cloud chosen to deploy cromwell, it must be the same as the region of `<test-bucket>`.
- `<test-access-id>` : Access ID to access Alibaba Cloud services through restful API.
- `<test-access-key>` : Access key to access Alibaba Cloud services through restful API.

The values above are necessary for Cromwell to submit and poll status of workflow jobs to and from Alibaba Cloud BatchCompute service.
The `filesystems` stanza in the backend config defines how to configure a filesystem in Alibaba Cloud. Details of filesystem related configurations will be explained in the next section.

### File Systems

Currently, this backend only works with objects on an Alibaba Cloud OSS filesystem. It's necessary to supply all values in
the configuration key `backend.providers.BCS.config.filesystems.auth` in order to read/write OSS file system objects in Alibaba Backend jobs. A typical config looks like this:

- `<test-oss-endpoint>` - API endpoint to access OSS bucket `<test-bucket>`.
- `<test-access-id>` - Access ID to access Alibaba Cloud services through restful API.
- `<test-access-key>` - Access key to access Alibaba Cloud services through restful API.

```hocon
backend {
providers {
BCS {
config {
# BCS related configurations mentioned above
filesystems {
oss {
auth {
endpoint = "<test-oss-endpoint>"
access-id = "<test-access-id>"
access-key = "<test-access-key>"
}
}
}
default-runtime-attributes {
# ... to be filled in
}
}
}
}
}
```

### Runtime Attributes

This backend supports additional runtime attributes that are specified in the configuration key `backend.providers.BCS.config.runtime-attributes`.
It uses the same syntax as specifying runtime attributes in a task in WDL. A typical runtime attributes example for BCS backend looks like this:

```hocon
backend {
providers {
BCS {
config {
# BCS and OSS related configurations mentioned above
default-runtime-attributes {
cluster: "OnDemand ecs.sn1ne.large "
mounts: "oss://<test-bucket>/inputs/ /home/inputs/ false"
docker: "ubuntu/latest oss://<test-bucket>/registry/ubuntu/"
userData: "key value"
reserveOnFail: true
autoReleaseJob: true
verbose: false
workerPath: "oss://<test-bucket>/cromwell_test/worker.tar.gz"
systemDisk: "cloud 50"
dataDisk: "cloud 250 /home/data/"
timeout: 3000
}
}
}
}
}
```

#### cluster

There are two different ways of specifying an Alibaba Cloud BatchCompute cluster in which workflow jobs run.

- Reserved cluster - A pre-created cluster ID in BatchCompute service like this:

```hocon
default-runtime-attributes {
cluster: "cls-your-cluster-id"
}
```

- Auto cluster - Cluster configuration to create a new runtime cluster bound to the workflow job:

- `<resource-type>` - Type of resource, can only support `OnDemand` and `Spot` currently.
- `<instance-type>` - Type of VM instance. Go to <a href="https://help.aliyun.com/document_detail/25378.html" target="_blank">Alibaba Cloud BatchCompute Instance Type</a> to choose a suitable type for you.
- `<image-id>` - Image ID of Alibaba Cloud BatchCompute service to create a VM.

```hocon
default-runtime-attributes {
cluster: "<resource-type> <instance-type> <image-id>"
# Maybe like cluster: "OnDemand ecs.sn1ne.large img-ubuntu"
}
```

#### mounts

BCS jobs can mount an OSS object or an OSS prefix to local filesystem as a file or a directory in VM.
It uses distribute-caching and lazy-load techniques to optimize concurrently read requests of the OSS file system.
You can mount your OSS objects to VM like this:

- `<mount-src>` - An OSS object path or OSS prefix to mount from.
- `<mount-destination>` - An unix file path or directory path to mount to in VM.
- `<write-support>` - Writable for mount destination, only works for directory.

```hocon
default-runtime-attributes {
mounts: "<mount-src> <mount-destination> <write-support>"
}
```



#### docker

This backend supports docker images pulled from OSS registry.

```hocon
default-runtime-attributes {
docker: "<docker-image> <oss-registry-path>"
}
```

- `<docker-image>` - Docker image name such as: ubuntu:latest.
- `<oss-registry-path>` - Image path in OSS filesyetem where you pushed your docker image.

#### userData

If a runtime cluster is specified, it's possible to pass some environment variables to VM when running BCS jobs.
It looks like this:

```hocon
default-runtime-attributes {
userData: "key1 value1, key2, value2"
}
```

#### autoReleaseJob

Job count may hit the user quota of Alibaba Cloud BatchCompute service if not deleted in time.
It's possible to tell backend to delete the related job when a workflow task finishes by set `autoReleaseJob` to
`true` like this:

```hocon
default-runtime-attributes {
autoReleaseJob: true
}
```

#### workerPath

This backend needs a worker package to run workflow job. We have prepared it, but it's still necessary for you to upload it to OSS and
specify the object path as the value of runtime attributes key `workerPath`:

- `<oss-object-path>` - The oss object path which you upload worker package to. A string like "oss://<test-bucket>/worker.tar.gz"

```hocon
default-runtime-attributes {
workerPath: "<oss-object-path>"
}
```

#### systemDisk

If it's necessary to run a job with a particular system disk type or disk size, a runtime attribute named `systemDisk` can be used to
specify disk type and size.

- `<disk-type>` - Disk type to be used, can only support `cloud` or `cloud_efficiency` currently.
- `<disk-size-in-GB>` - Disk size to be used.

```hocon
default-runtime-attributes {
systemDisk: "<disk-type> <disk-size-in-GB>"
}
```

#### dataDisk

The system disk size can support up to 500GB. One can mount another data disk in VM if needed.

- `<disk-type>` - Disk type to be used, can only support `cloud` or `cloud_efficiency` currently.
- `<disk-size-in-GB>` - Disk size to be used.
- `<mount-point>` - Destination the data disk mounted to in VM.

```hocon
default-runtime-attributes {
dataDisk: "<disk-type> <disk-size-in-GB> <mount-point>"
}
```
3 changes: 3 additions & 0 deletions docs/backends/Backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Cromwell distribution:
* Launch jobs on servers that support the GA4GH Task Execution Schema (TES).
* **[Spark](Spark)**
* Supports execution of Spark jobs.
* **[Alibaba Cloud](BCS)**
* Launch jobs on Alibaba Cloud BatchCompute service.

HPC backends are put under the same umbrella because they all use the same generic configuration that can be specialized to fit the need of a particular technology.

Expand Down Expand Up @@ -54,6 +56,7 @@ The backend/filesystem pairings are as follows:

* Local, HPC and Spark backend use the [Shared Local Filesystem](HPC/#filesystems).
* Google backend uses the [Google Cloud Storage Filesystem](Google/#google-cloud-storage-filesystem).
* Alibaba Cloud backend uses the OSS Storage FileSystem.

Additional filesystems capabilities can be added depending on the backend.
For instance, an HPC backend can be configured to work with files on Google Cloud Storage. See the [HPC documentation](HPC) for more details.

0 comments on commit fdd70eb

Please sign in to comment.