simple jobs submission via command-line for AWS batch
Clone or download
brentp interactive jobs (#6)
* output command to ssh into instance and exec into docker container

* get full ssh command

there is apparently no way in the ecs api to get the actual docker id
so we get that by introspection of the metadata URL from the actual
instance.
Latest commit da19d38 Nov 29, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
cmd/batchit
ddv update imports Nov 29, 2017
exsmount
logof get role directly Feb 1, 2018
s3upload parallel s3 uploading Jun 11, 2018
submit interactive jobs (#6) Nov 29, 2018
LICENSE Initial commit Nov 17, 2017
README.md interactive jobs (#6) Nov 29, 2018
batchit.go try to get more device addresses Nov 22, 2018

README.md

batchit is a collection of utilities for working with AWS batch.

usage

batchit Version: $version

ddv        : detach and delete a volume by id
ebsmount   : create and mount an EBS volume from an EC2 instance
efsmount   : EFS drive from an EC2 instance
localmount : RAID and mount local storage
submit     : run a batch command


submit

example:

batchit submit \
            --image worker:latest \
            --role worker-role \
            --queue big-queue \
            --jobname my-work \
            --cpus 32 \
            --mem 32000
            --envvars "sample=SS-1234" "reference=hg38" "bucket=my-s3-bucket" \
            --ebs /mnt/my-ebs:500:st1:ext4 \
            align.sh

Interactive

To get an interactive job, use the submit command, but instead of a script (align.sh) above, use, for example, "interactive:20" to get an interactive job that will run for 20 minutes.

This command will start a job that sleeps for 20 minutes and the output an ssh command that will drop the user into the docker container running that command.

This is useful for debugging as it quickly drops a user into the same environment that the jobs will be run in.

batchit requirements

AWS

AWS Batch itself requires the AWSBatchServiceRole and ecsInstanceRole generated by running the first-run wizard. Because batchit containers use EC2 and S3 services, batchit requires an ecsTaskRole with AmazonEC2FullAccess and AmazonS3FullAccess. This is the worker-role above.

Docker

The image must be present in your elastic container registry and the container will itself need batchit as a dependency if --ebs is used. A typical Dockerfile entry for this will look like:

RUN apt-get install -y wget
RUN wget -qO /usr/bin/batchit https://github.com/base2genomics/batchit/releases/download/v0.4.2/batchit
RUN chmod +x /usr/bin/batchit

In this example align.sh contains the commands to be run. It will have access to a 500GB st1 volume created with ext4 and mounted to /mnt/my-ebs. This will automatically set docker to run in privileged mode so that it has access to the EBS volume that is attached to /dev in the instance. (We stole this idea from aegea)

The volume will be cleaned up automatically when the container exits.

Note that array jobs are also supported with --arraysize INT parameter. Currently, the user is responsible for specifying the dependency mode (N_TO_N or SEQUENTIAL) to the --dependson parameter.

For this example a simplified align.sh might look like (always include the first two lines):

set -euo pipefail
cd $TMPDIR
aws s3 cp s3://${bucket}/${sample}_r1.fq .
aws s3 cp s3://${bucket}/${sample}_r2.fq .
aws s3 sync s3://${bucket}/assets/${reference} .
bwa mem -t ${cpus} ${reference}.fa ${sample}_r1.fq ${sample}_r2.fq \
      | samtools sort -o ${sample}.bam
samtools index ${sample}.bam
aws s3 cp ${sample}.bam s3://${bucket}/
aws s3 cp ${sample}.bam.bai s3://${bucket}/

ebsmount

create, attach, format, and mount an EBS volume of the specified size and type to the specified mount-point. If -n is greater than 1, then it will automatically RAID0 (performance, not reliability) the drives. This is used (in shorthand) by the --ebs argument to batchit submit above.

Usage: batchit [--size SIZE] --mountpoint MOUNTPOINT [--volumetype VOLUMETYPE] [--fstype FSTYPE] [--iops IOPS] [--n N] [--keep]

Options:
  --size SIZE, -s SIZE   size in GB of desired EBS volume [default: 200]
  --mountpoint MOUNTPOINT, -m MOUNTPOINT
                         directory on which to mount the EBS volume
  --volumetype VOLUMETYPE, -v VOLUMETYPE
                         desired volume type; gp2 for General Purpose SSD; io1 for Provisioned IOPS SSD; st1 for Throughput Optimized HDD; sc1 for HDD or Magnetic volumes; standard for infrequent [default: gp2]
  --fstype FSTYPE, -t FSTYPE
                         file system type to create (argument must be accepted by mkfs) [default: ext4]
  --iops IOPS, -i IOPS   Provisioned IOPS. Only valid for volume type io1. Range is 100 to 20000 and <= 50\*size of volume.
  --n N, -n N            number of volumes to request. These will be RAID0'd into a single volume for better write speed and available as a single drive at the specified mount point. [default: 1]
  --keep, -k             dont delete the volume(s) on termination (default is to delete)
  --help, -h             display this help and exit
  --version              display version and exit

efsmount

This is a trivial wrapper around mounting an EFS volume.

Usage: batchit [--mountoptions MOUNTOPTIONS] EFS MOUNTPOINT

Positional arguments:
  EFS                    efs DNS and mount path (e.g.fs-XXXXXX.efs.us-east-1.amazonaws.com:/mnt/efs/)
  MOUNTPOINT             local directory on which to mount the EBS volume

Options:
  --mountoptions MOUNTOPTIONS, -o MOUNTOPTIONS
                         options to send to mount command
  --help, -h             display this help and exit