SciBob is a longer term project to create a meta-builder and management tool for scientific software, it will drive other tools such as EasyBuild/EESSI, Spack and Conda/Mamba to build a single environment using lmod modules and integrates documentation from various sites.
The first element of SciBob is aws-eb
, a tool to build recent EasyBuild software (EasyConfigs) in AWS.
This tool does 2 things:
- First, it gives you quick access to a large stack of performance optimized HPC software compiled by the Easybuild (EB) Framework, which could otherwise take more than a week to deploy. You can install the software including the lmod envionment modules directly from a public s3 bucket (s3://easybuild-cache). Use the
aws-eb download
sub-command to load the software onto your machine. - Second, it allows you to run a fully automated build of all latest EB (and soon Bioconda) packages (newest version only). After a successful build, it will tar and upload these packages to AWS S3 for later use and sharing with others. Previously built packages will be automatically uploaded/downloaded which allows aws-eb to use unreliable instances in the low cost AWS spot market by default. You can also use the
aws-eb launch
sub-command on-premises but you need an S3 compatible bucket (e.g. Ceph). The EB root (EASYBUILD_PREFIX) is currently set to /opt/eb.
Note: https://www.eessi.io/ is a new approach for using software generated by EasyBuild that does not require you to download anything. As more Software is added to EESSI over time, it will likelty be the preferred approach for accessing EasyBuild scientific software.
curl -Ls https://raw.githubusercontent.com/dirkpetersen/scibob/main/aws-eb.py?token=$(date +%s) -o ~/.local/bin/aws-eb
chmod +x ~/.local/bin/aws-eb
python3 -m pip install --user --upgrade boto3
Boto3 is the Python interface to AWS. Make sure ~/.local/bin is in your PATH.
To download the software, the target folder (default: /opt/eb) must exist, be writable and have about 300 GB free disk space. Each package is compressed as an archive ending eb.tar.gz and will be automatically untarred after downloading. Each operating system and CPU type combination requires about 100GB of eb.tar.gz files in S3.
By default, please use Amazon Linux 2023
with a modern --cpu-type such as graviton-3
, epyc-gen-4
or xeon-gen-4
. If you prefer using the latest RHEL (aka Rocky) or the latest Ubuntu LTS, please select xeon-gen-1
. This will be the best choice if you are using the tool on-premises. CPU type xeon-gen-1
will work on all x86-64 Intel and AMD cpus that are offerd on AWS but performance will not be optimized on newer CPUs. We will start with graviton-3
, as this ARM cpu is the cheapest option with performance similar to Xeon but 1/3 slower than Epyc
ec2-user@aws-eb:~$ aws-eb download --cpu-type graviton-3
Downloading packages from s3://easybuild-cache/aws/amzn-2023_graviton-3 to /opt/eb ...
Downloading Modules ...
Rclone copy: 4198 file(s) with 3.262 MiB transferred.
Downloading Software ...
Rclone copy: 2097 file(s) with 76.219 GiB transferred.
Untarring packages ...
Unpacking /opt/eb/software/Anaconda3/Anaconda3-2023.09-0.eb.tar.gz into /opt/eb/software/Anaconda3...
Unpacking /opt/eb/software/ASAP/ASAP-2.1-foss-2022a.eb.tar.gz into /opt/eb/software/ASAP...
Unpacking /opt/eb/software/BEDTools/BEDTools-2.31.0-GCC-12.3.0.eb.tar.gz into /opt/eb/software/BEDTools...
Successfully unpacked: /opt/eb/software/BEDTools/BEDTools-2.31.0-GCC-12.3.0.eb.tar.gz
Unpacking /opt/eb/software/Automake/Automake-1.16.4-GCCcore-11.2.0.eb.tar.gz into /opt/eb/software/Automake...
Unpacking /opt/eb/software/BLAST+/BLAST+-2.2.31.eb.tar.gz into /opt/eb/software/BLAST+...
Successfully unpacked: /opt/eb/software/Automake/Automake-1.16.5-GCCcore-12.3.0.eb.tar.gz
All software was downloaded to: /opt/eb
To use these software modules, source .bashrc after adding MODULEPATH, e.g.:
echo "export MODULEPATH=${MODULEPATH}:/opt/eb/modules/all" >> ~/.bashrc
source ~/.bashrc
now you can use the HPC module system (e.g. Lmod) to load and run software
$ ml R
$ which R
/opt/eb/software/R/4.3.2-gfbf-2023a/bin/R
$ R
R version 4.3.2 (2023-10-31) -- "Eye Holes"
The aws-eb download
sub-command will try to detect your OS and download the binaries compiled for your OS. If that does not work, you can use the --prefix
option instead of --cpu-type
. You can also download to a different directory, for example:
aws-eb download --prefix ubuntu-22.04_xeon-gen-1 /your/folder
Note: If you select a different download folder, you need to create a symlink (ln -s /your/folder /opt/eb
) so that calls to /opt/eb are redirected to your other folder.
Before building your own EasyBuild stack you should first review how many packages are already built with success or error and which ones have been skipped, for example because a compiler toolchain is too old. Then we will run aws-eb config
to implement a few settings, for example your own AWS bucket.
Each OS/CPU combination has a eb-build-status.json file where the status of each package build is tracked. aws-eb buildstatus
provides a summary.
$ aws-eb buildstatus amzn-2023_graviton-3
Summarizing s3://easybuild-cache/aws/amzn-2023_graviton-3/eb-build-status.json ...
Status: 'success'
Total Occurrences: 1357
Reasons:
- easyconfig built successfully: 1357 occurrences
Status: 'error'
Total Occurrences: 509
Reasons:
- n/a: 509 occurrences
Status: 'skipped'
Total Occurrences: 1886
Reasons:
- toolchain not supported: intel: 398 occurrences
- toolchain version too old: GCCcore-10.2.0: 140 occurrences
- toolchain version too old: foss-2021b: 105 occurrences
- toolchain version too old: foss-2021a: 104 occurrences
- dependencies have errors: 94 occurrences
- dependency requires too old toolchain: 10 occurrences
AWS-EB requires only minimal configuration (aws-eb config
) such as setting your own S3 storage bucket. When asked for the "root path" please leave the default (aws) as other options are untested.
$ aws-eb config
Installing rclone ... please wait ... Done!
*** Asking a few questions ***
*** For most you can just hit <Enter> to accept the default. ***
*** Enter your email address: ***
[Default: ec2-user@us-west-2.compute.internal] myuser@domain.edu
*** Please confirm/edit S3 bucket name to be created in all used profiles.: ***
[Default: aws-eb-myuser-domain-edu] my-aws-bucket-name
*** Please confirm/edit the root path inside your S3 bucket: ***
[Default: aws]
*** Please confirm/edit the AWS S3 Storage class: ***
[Default: INTELLIGENT_TIERING]
*** Please confirm/edit the AWS S3 region: ***
[Default: us-west-2]
Verify that bucket 'my-aws-bucket-name' is configured ...
Done!
To change the compiler toolchains and their minimum versions supported by aws-eb you can edit the json structure in min_toolchains under ~/.config/aws-eb/general
:
cat ~/.config/aws-eb/general/min_toolchains
{
"system": "system",
"GCC": "11.0",
"GCCcore": "11.0",
"LLVM": "12.0",
"foss": "2022a",
"gfbf": "2022a"
}
Now you have 2 options. You can either start building from scratch, for example if you would like a OS/CPU combination that is currently not provided via easybuild-cache
. Or you can use the binaries that are already in S3 bucket easybuild-cache
and use them as a template. As building from scratch will take a long time, building from template is recommended.
For building on top of existing binaries you use the aws-eb launch
sub-command and as a template you pick the graviton-3 cpu. If you do not choose otherwise aws-eb will pick the lastest Amazon Linux (currently 2023) by default. We will use the --skip-sources
option because almost everything is already built and it is not worth preloading 200GB of source files. You also need to copy all the content of the template bucket easybuild-cache
into your own bucket. The option --first-bucket
will do that for you:
$ aws-eb launch --cpu-type graviton-3 --skip-sources --first-bucket easybuild-cache
c7g.xlarge is the cheapest spot instance with at least 4 vcpus / 8 GB mem
Using amazon image id: ami-03bd21ae09xxxxxx
IAM Instance profile: None.
c7g.xlarge in us-west-2b costs $0.1450 as on-demand and $0.0504 as spot.
Launching spot instance i-0ec773e13xxxxxx ... please wait ...
|██████████████████████████████--------------------| 60.0%
Security Group "sg-0dc530211xxxxxxxxx" attached.
Instance IP: 35.167.xx.xxx
Waiting for ssh host to become ready ...
will execute 'aws-eb launch -c graviton-3 -f easybuild-cache --skip-sources --build' on 35.167.xx.xxx ...
Executed bootstrap and build script ... you may have to wait a while ...
but you can already login using "aws-eb ssh"
Sent email "AWS-EB build on EC2" to peterxxx@xxxx.edu!
We see that a spot instance has been launched that costs about 1.25 Cents per core/hour. As a reference, mid-size on-premises HPC systems cost typically between 1 and 2 Cents per core/hour in the US. It depends what you include, when you calculate your TCO. In any case, 1.25 Cents is not bad at all.
After waiting a while for the copy of easybuild-cache to your bucket to finish you can launch a few more configurations with different cpu types such as epyc-gen-4
or xeon-gen-4
. Finally, you probably want to monitor the build process, use the aws-eb ssh
sub-command for that (first use --list or -l to list all running instances)
$ aws-eb ssh -l
Listing machines ... Running EC2 Instances:
35.88.195.44 | i-0c2123d527545469c | m7g.xlarge | al2023 | 00-00:15 | (OK)
34.219.173.164 | i-0703fce0a761d7690 | r7a.xlarge | al2023 | 00-00:02 | (OK)
54.191.181.2 | i-0b920c11fe09e1818 | c7i.xlarge | al2023 | 00-00:01 | (OK)
aws-eb chooses the lowest cost instance for a specific cpu type. In the AWS spot market this may be a m7
, r7
, or c7
type instance at any given time. (m7g = ARM graviton-3, r7a = AMD epyc-gen-4, c7i = Intel xeon-gen-4). Let's login to our graviton-3 instance and enter the history
command, which shows us a number of prepared commands we can run simply by selecting them via arrow-key-up
.
$ aws-eb ssh 35.88.195.44
Last login: Tue Jan 2 9:33:09 2024
$ history
1 touch ~/no-terminate && pkill -f aws-eb
2 pkill -f easybuild.main # skip the currently building easyconfig
3 grep -B1 -A1 'chars): Couldn.t find file' ~/out.easybuild.54.191.181.2.txt | grep FAILED:
4 grep -A1 '^== FAILED:' ~/out.easybuild.54.191.181.2.txt
5 grep -A1 '^== COMPLETED:' ~/out.easybuild.54.191.181.2.txt
6 tail -n 100 -f ~/out.easybuild.54.191.181.2.txt
7 tail -n 30 -f ~/out.bootstrap.54.191.181.2.txt
The build-process may only take a few seconds if there are no EasyConfigs found as the script will loop through eb-build-status.json and skip all packages for which a build has been attempted previously. To review more details, let's evalutate a build from scratch:
For a clean start build from scratch without downloading binaries or sources. RHEL has not been built with the latest Epyc CPU yet and aws-eb
will not read anything from your bucket even if you have used the --first-bucket
option before. It will however try to download about 200GB source files. To prevent this, use the --skip-sources
option. Since a build from scratch will take a long time, you want to focus on your research area first. If you are in life sciences you might prefer --include bio,math
to only build certain module classes, but there are also astro,geo,chem,phys
and generic modules such as ai,cae,compiler,data,debugger,devel,lang,lib,mpi,numlib,perf,system,toolchain,tools,vis
. aws-eb
uses the development branch of EasyBuild by default, but you want to play it safe and use the released version by adding --eb-release
to the command line. Finally, you decide to double the amount of virtual CPUs (--vcpus
) to 8 from the default 4 as much horsepower is needed initially to build large compiler toolchains. (The larger the number of vcpus and memory, the less likely your AWS instance will hang using EasyBuild)
$ aws-eb launch --cpu-type epyc-gen-4 --os rhel --skip-sources --include bio,math --eb-release --vcpus 8
c7a.2xlarge is the cheapest spot instance with at least 8 vcpus / 8 GB mem
Using rhel image id: ami-093bd987f8e53e1f2
IAM Instance profile: None.
c7a.2xlarge in us-west-2c costs $0.4106 as on-demand and $0.1956 as spot.
Launching spot instance i-0ceb79495xxxxxx ... please wait ...
|███████████████████████████████████---------------| 70.0%
Security Group "sg-0dc530211xxxxxxx" attached.
Instance IP: 52.35.44.xxx
Waiting for ssh host to become ready ...
will execute 'aws-eb.py launch -c epyc-gen-4 -o rhel -s -i bio,math -e -v 8 --build' on 52.35.44.xxx ...
Executed bootstrap and build script ... you may have to wait a while ...
but you can already login using "aws-eb ssh"
Sent email "AWS-EB build on EC2" to peterxxx@xxxx.edu!
Now run the aws-eb ssh
sub-command. If you have multiple instances running, you need to enter the ip address also. Now hit the arrow-key-up
key and Enter
to review the output of 'bash bootstrap.sh &' that sets up the basic system. At the end it will show you some details about the CPU.
$ aws-eb ssh
Last login: Tue Jan 2 9:33:09 2024
$ tail -n 30 -f ~/out.bootstrap.52.35.44.xxx.txt
Now run tail -n 100 -f ~/out.easybuild.52.35.44.xxx.txt
to track the output of easybuild. This will run for a while. Hit ctrl+c and use the arrow-key-up
again to review a few of the other grep commands. One of the first issues that you will notice with EasyBuild in production: It cannot download many of the source files and we must search for file-not-found errors:
$ grep -B1 -A1 'chars): Couldn.t find file' ~/out.easybuild.52.35.44.xxx.txt | grep FAILED:
== FAILED: Installation ended unsuccessfully (build directory: /opt/eb/build/Java/1.8.0_66/system-system): build failed (first 300 chars): Couldn't find file jdk-8u66-linux-x64.tar.gz anywhere, and downloading it didn't work either... Paths attempted (in order): /home/rocky/.local/easybuild/easyconfigs/j/Java/j/Java/jdk-8u66-linux-x64.tar.gz, /home/rocky/.local/easybuild/easyconfigs/j/Java/Java/jdk-8u66-linux-x64.tar.gz, /home/rocky/.l (took 0 secs)
== FAILED: Installation ended unsuccessfully (build directory: /opt/eb/build/Perseus/2.0.7.0/GCCcore-11.2.0): build failed (first 300 chars): Couldn't find file Perseus_v2.0.7.0.zip anywhere, please follow the download instructions above, and make the file available in the active source path (/opt/eb/sources) (took 0 secs)
In this example you see that jdk-8u66-linux-x64.tar.gz cannot be found. No surprise, this is Oracle Java which is not available to download by automated processes. You need to login to the Oracle website and download that file and then upload it to your s3 bucket into the sources/j/Java folder
aws s3 cp jdk-8u66-linux-x64.tar.gz s3://your-bucket/aws/sources/j/Java/
Next time you build it can be automatically pulled from there as long as you are not using --skip-sources
with aws-eb
Below is some addional background information that may lead to a better understanding of aws-eb
The aws-eb script will use rclone
in the background to download data in parallel. For troubleshooting and to see what is actually inside the easybuild-cache bucket you can also use the aws CLI with the --request-payer requester
option
aws s3 ls s3://easybuild-cache/aws/ --request-payer requester
PRE amzn-2023_epyc-gen-4/
PRE amzn-2023_graviton-3/
PRE amzn-2023_xeon-gen-4/
PRE rhel-9_xeon-gen-1/
PRE sources/
PRE ubuntu-22.04_xeon-gen-1/
Note: rclone is also configured to run with "request-payer=requester". The easybuild-cache
bucket is installed in the us-west-2
region and if you are downloading the binaries for one OS/CPU (~100 GB) to another region, it will cost you around $5. If you are downloading to on-premises and do not have DirectConnect nor any Egress waivers, it may cost you up to $10.
Each OS/CPU combination has their own eb-build-status.json. This file is used to keep track of all EasyConfigs that were attempted to build. By default, each easyconfig is only ever tried once. Why? This design was chosen to allow for quick execution of all new EasyConfigs. The goal is to run aws-eb once a week and compile as much new software as possible within one hour (aws charges by the hour). If we had to re-try many easyconfigs that were skipped or failed previously, this process would take many hours.
$ aws s3 cp --request-payer requester s3://easybuild-cache/aws/amzn-2023_graviton-3/eb-build-status.json .
download: s3://easybuild-cache/aws/amzn-2023_graviton-3/eb-build-status.json to ./eb-build-status.json
$ tail -n 20 eb-build-status.json
{
},
"gfbf-2022a.eb": {
"status": "success",
"reason": "easyconfig built successfully",
"returncode": 0,
"errorcount": 0,
"trydate": "2023-12-29T11:50:15.631301-08:00",
"buildtime": 3,
"modules": null
},
"Ferret-7.5.0-foss-2019b.eb": {
"status": "skipped",
"reason": "toolchain version too old: foss-2019b",
"returncode": -1,
"errorcount": 0,
"trydate": "2023-12-29T11:51:43.229852-08:00",
"buildtime": 0,
"modules": null
}
}
Note: If you want to re-try EasyConfigs that were previously skipped you need to use the --check-skipped
option with aws-eb launch
. If you would like to retry an EasyConfig that previously failed with "status": "error" you need to remove the entire dictionary of that eb file from the json file.
You can review individual STDOUT/STDERR logs, for example:
$ aws s3 ls --request-payer requester s3://easybuild-cache/aws/amzn-2023_graviton-3/logs/
PRE failed/
2023-12-29 12:26:48 8494 out.bootstrap.34.213.192.111.txt
2023-12-29 12:26:49 30148226 out.easybuild.34.213.192.111.txt
and individual output logs of failed builds:
$ aws s3 ls --request-payer requester s3://easybuild-cache/aws/amzn-2023_graviton-3/logs/failed/
2023-12-29 12:26:48 42724 tensorflow-compression-2.11.0-foss-2022a-CUDA-11.7.0.eb-easybuild-NCCL-2.12.12-20231214.083639.kOQdV.log
2023-12-29 12:26:48 54373599 tensorflow-probability-0.19.0-foss-2022a.eb-easybuild-axqqwyso.log
2023-12-29 12:26:48 1312872 tidymodels-1.1.0-foss-2022b.eb-easybuild-rztecz4d.log
2023-12-29 12:26:48 129476 torchvision-0.13.1-foss-2022a-CUDA-11.7.0.eb-easybuild-magma-2.6.2-20231218.112341.arpdJ.log
This is happening behind the scenes:
- run
aws-eb launch
on your machine - Launch cheapest AWS instance in spot that meets criteria
- install system software and settings via cloud-init script (_ec2_cloud_init_script)
- Attach new 750 GB EBS volumne and mount to /opt
- Upload and launch bootstrap.sh script and other configs
- Install basic software in /home of ec2-user
- Launch aws-eb script with same options and args but add
--build
option - Download all modules and tarred binaries from S3 and unpack them and download sources optionally
- Loop through all *.eb files (except
__archive__
) and for each eb file:- check for allowed toolchains and --include and --exclude
- install all osdependencies via dnf or apt
- download software from shared S3 bucket
- set all files under sources/generic to executable
- untar all .eb.tar.gz files to ./software
- check dependencies of each eb file with
eb --missing-modules
- install each dependency with
eb --umask 0002 dependency.eb
- run each easyconfig that contains -CUDA- with
eb--ignore-test-failure
- true up install by running
eb --robot --umask 0002 software.eb
- tar new software to .eb.tar.gz files
- upload software to shared S3 bucket
- automatically terminate instance once build is finished.
Tarred binaries and modules are copied to a platform specific folder (e.g amzn-2023_epyc-gen-4/software) and sources are copies to a shared folder that all platforms use.
Each CPU family or GPU type is mapped to all AWS instance families that have this CPU family or GPU installed. This will allow to pick any spot instance that has a certain compatible hardware configuration.
self.cpu_types = {
"graviton-2": ('c6g', 'c6gd', 'c6gn', 'm6g', 'm6gd', 'r6g', 'r6gd', 't4g' ,'g5g'),
"graviton-3": ('c7g', 'c7gd', 'c7gn', 'm7g', 'm7gd', 'r7g', 'r7gd'),
"graviton-4": ('c8g', 'c8gd', 'c8gn', 'm8g', 'm8gd', 'r8g', 'r8gd'),
"epyc-gen-1": ('t3a',),
"epyc-gen-2": ('c5a', 'm5a', 'r5a', 'g4ad', 'p4', 'inf2', 'g5'),
"epyc-gen-3": ('m6a', 'c6a', 'r6a', 'p5'),
"epyc-gen-4": ('c7a', 'm7a', 'r7a'),
"xeon-gen-1": ('c4', 'm4', 't2', 'r4', 'p3' ,'p2', 'f1', 'g3', 'i3en'),
"xeon-gen-2": ('c5', 'c5n', 'm5', 'm5n', 'm5zn', 'r5', 't3', 't3n', 'dl1', 'inf1', 'g4dn', 'vt1'),
"xeon-gen-3": ('c6i', 'c6in', 'm6i', 'm6in', 'r6i', 'r6id', 'r6idn', 'r6in', 'trn1'),
"xeon-gen-4": ('c7i', 'm7i', 'm7i-flex', 'r7i', 'r7iz'),
"core-i7-mac": ('mac1',)
}
The most cost efficient instance type is not clear yet. I started with c7a.xlarge with 4 vcpus and 8GB RAM. It may not make sense to use a larger instance type as there are long periods of time where only a single vcpu is running. At the tail end it installs R packages for hours which is limited to a single vcpu. Perhaps run just more instances
It has this envionment.
ec2-user@aws-eb:$ cat ~/.easybuildrc
test -d /usr/share/lmod/lmod/init && source /usr/share/lmod/lmod/init/bash
export MODULEPATH=/opt/eb/modules/all:/opt/eb/modules/lib:/opt/eb/modules/lang:/opt/eb/modules/compiler:/opt/eb/modules/bio
export EASYBUILD_JOB_CORES=8
export EASYBUILD_CUDA_COMPUTE_CAPABILITIES=7.5,8.0,8.6,9.0
# export EASYBUILD_BUILDPATH=/dev/shm/$USER # could run out of space
export EASYBUILD_PREFIX=/opt/eb
export EASYBUILD_JOB_OUTPUT_DIR=$EASYBUILD_PREFIX/batch-output
export EASYBUILD_DEPRECATED=5.0
export EASYBUILD_JOB_BACKEND=Slurm
export EASYBUILD_PARALLEL=16
# export EASYBUILD_GITHUB_USER=$USER
export EASYBUILD_UPDATE_MODULES_TOOL_CACHE=True
export EASYBUILD_ROBOT_PATHS=/home/rocky/.local/easybuild/easyconfigsrocky
These are currently the only allowed toolchains
min_toolchains = {'system': 'system', 'GCC': '11.0', 'GCCcore' : '11.0',
'LLVM' : '12.0', 'foss' : '2022a', 'gfbf': '2022a'}
You can change this here
vi ~/.config/aws-eb/general/min_toolchains
{
"system": "system",
"GCC": "11.0",
"GCCcore": "11.0",
"LLVM": "12.0",
"foss": "2022a",
"gfbf": "2022a"
}
$ aws-eb --help
usage: aws-eb [-h] [--debug] [--profile <aws-profile>] [--no-checksums] [--version] {config,cnf,launch,lau,download,dld,buildstatus,sta,ssh,scp} ...
A (mostly) automated build tool for building Sci packages in AWS. The binary packages are stored in an S3 bucket and can be downloaded by anyone.
positional arguments:
{config,cnf,launch,lau,download,dld,buildstatus,sta,ssh,scp}
sub-command help
config (cnf) You will need to answer just a few questions about your cloud setup.
launch (lau) Launch EC2 instance, build new Easybuild packages and upload them to S3
download (dld) Download built eb packages and lmod modules to /opt/eb
buildstatus (sta) Show stats on eb-build-status.json in this S3 folder (including prefix), e.g. 'amzn-2023_graviton-3', 'amzn-2023_epyc-gen-4',
'amzn-2023_xeon-gen-4' rhel-9_xeon-gen-1 or ubuntu-22.04_xeon-gen-1.
ssh (scp) Login to an AWS EC2 build instance
optional arguments:
-h, --help show this help message and exit
--debug, -d verbose output for all commands
--profile <aws-profile>, -p <aws-profile>
which AWS profile in ~/.aws/ should be used. default="aws"
--no-checksums, -u Use --size-only instead of --checksum when using rclone with S3.
--version, -v print AWS-EB and Python version info
Basic configuration
$ aws-eb config --help
usage: aws-eb config [-h] [--list] [--software] [--monitor <email@address.org>]
optional arguments:
-h, --help show this help message and exit
--list, -l List available CPU/GPU types and supported prefixes (OS/CPU)
--software, -s List available Software (Names of Easyconfigs)
--monitor <email@address.org>, -m <email@address.org>
setup aws-eb as a monitoring cronjob on an ec2 instance and notify an email address
Build software on AWS
./aws-eb launch --help
usage: aws-eb launch [-h] [--cpu-type <cpu-type>] [--os OS] [--vcpus <number-of-vcpus>] [--gpu-type <gpu-type>] [--mem <memory-size>]
[--instance-type <aws.instance>] [--az AZ] [--on-demand] [--monitor] [--build] [--first-bucket <your-s3-bucket>] [--skip-sources]
[--eb-release] [--check-skipped] [--include INCLUDE] [--exclude EXCLUDE] [--force-sshkey]
optional arguments:
-h, --help show this help message and exit
--cpu-type <cpu-type>, -c <cpu-type>
run config --list to see available CPU types. (e.g graviton-3)
--os OS, -o OS build operating system, default=amazon (which is an optimized fedora) valid choices are: amazon, rhel, ubuntu and any AMI name including wilcards *
--vcpus <number-of-vcpus>, -v <number-of-vcpus>
Number of vcpus to be allocated for compilations on the target machine. (default=4) On x86-64 there are 2 vcpus per core and on Graviton (Arm) there is one core per vcpu
--gpu-type <gpu-type>, -g <gpu-type>
run --list to see available GPU types
--mem <memory-size>, -m <memory-size>
GB Memory allocated to instance (default=8)
--instance-type <aws.instance>, -t <aws.instance>
The EC2 instance type is auto-selected, but you can pick any other type here
--az AZ, -z AZ Enforce the availability zone, e.g. us-west-2a
--on-demand, -d Enforce on-demand instance instead of using the default spot instance.
--monitor, -n Monitor EC2 server for cost and idle time.
--build, -b Execute the build on the current system instead of launching a new EC2 instance.
--first-bucket <your-s3-bucket>, -f <your-s3-bucket>
use this bucket (e.g. easybuild-cache) to initially load the already built binaries and sources
--skip-sources, -s Do not pre-download sources from build cache, let EB download them.
--eb-release, -e Use official Easybuild release instead of dev repos from Github.
--check-skipped, -k Re-check all previously skipped software packages and build them if possible.
--include INCLUDE, -i INCLUDE
limit builds to certain module classes, e.g "bio" or "bio,lib,tools"
--exclude EXCLUDE, -x EXCLUDE
exclude certain module classes, e.g "lib" or "dev,lib", only works if --include is not set
--force-sshkey, -r This option will overwrite the ssh key pair in AWS with a new one and download it.
Download binaries
usage: aws-eb download [-h] [--cpu-type CPUTYPE] [--prefix <s3_prefix>] [--vcpus VCPUS] [--with-source] [<target_folder>]
positional arguments:
<target_folder> Download to other folder than default
optional arguments:
-h, --help show this help message and exit
--cpu-type CPUTYPE, -c CPUTYPE
run --list to see available CPU types, use --prefix to select OS-version_cpu-type
--prefix <s3_prefix>, -p <s3_prefix>
your prefix, e.g. amzn-2023_graviton-3, ubuntu-22.04_xeon-gen-1
--vcpus VCPUS, -v VCPUS
Number of vcpus to be allocated for compilations on the target machine. (default=4) On x86-64 there are 2 vcpus per core and on Graviton (Arm) there is one core per vcpu
--with-source, -s Also download the source packages
Check the status summary of your builds
usage: aws-eb buildstatus [-h] <s3_prefix>
positional arguments:
<s3_prefix> your prefix, e.g. amzn-2023_graviton-3
optional arguments:
-h, --help show this help message and exit
Login via ssh or copy via scp
usage: aws-eb ssh [-h] [--list] [--terminate <hostname>] [--add-key <private-ssh-key.pem>] [sshargs ...]
positional arguments:
sshargs multiple arguments to ssh/scp such as hostname or user@hostname oder folder
optional arguments:
-h, --help show this help message and exit
--list, -l List running AWS-EB EC2 instances
--terminate <hostname>, -t <hostname>
Terminate EC2 instance with this public IP Address.
--add-key <private-ssh-key.pem>, -a <private-ssh-key.pem>
Generate a pub key and add it to a remote authorized_keys file.
after few days of building I see this in the life sciences section
ec2-user@aws-eb:~$ ml ov
------------------------------------------ /opt/eb/modules/bio -------------------------------------------
ADMIXTURE (1) KrakenUniq (1) alleleIntegrator (1)
AGAT (1) KronaTools (1) angsd (1)
ANIcalculator (1) LSD2 (2) anndata (1)
ASCAT (1) LTR_retriever (1) bam-readcount (1)
AUGUSTUS (1) L_RNA_scaffolder (1) bamFilters (1)
AdapterRemoval (1) Lighter (1) bases2fastq (1)
Alfred (1) Longshot (1) bcbio-gff (2)
AlphaFold (1) MACH (1) bcl2fastq2 (1)
AptaSUITE (1) MACS2 (1) biobakery-workflows (1)
Arriba (2) MACS3 (1) biobambam2 (1)
Artemis (1) MAFFT (3) biom-format (2)
BA3-SNPS-autotune (1) MAGMA-gene-analysis (1) breseq (1)
BAMM (1) MAGeCK (1) bwa-meth (1)
BAli-Phy (1) MCL (2) bwakit (1)
BBMap (2) MDAnalysis (2) bx-python (1)
BCFtools (3) MEGACC (1) canu (1)
BEDOPS (1) MEGAN (1) castor (1)
BEDTools (3) MMSEQ (1) cooler (1)
BLAST+ (3) MMseqs2 (1) cromwell (1)
BLAST (2) MRPRESSO (1) cutadapt (2)
BUSCO (1) MSPC (1) cuteSV (1)
BUStools (1) MUMmer (2) dRep (1)
BWA (2) MUSCLE (2) dcm2niix (1)
BXH_XCEDE_TOOLS (1) MView (1) deepTools (1)
BamTools (4) MaSuRCA (1) duplex-tools (1)
Bandage (1) Maq (1) dxpy (1)
BayesAss3-SNPs (1) Mash (2) easel (1)
BayesTraits (1) Mashtree (1) ebGSEA (1)
Beagle (1) MetaBAT (1) edlib (2)
Beast (1) MetaEuk (2) eggnog-mapper (1)
BioPerl (2) MetaGeneAnnotator (1) elprep (1)
Biopython (2) MetaPhlAn (1) epiScanpy (1)
Bismark (1) MethylDackel (1) fastPHASE (1)
Bowtie (1) Mikado (1) fastahack (1)
Bowtie2 (2) MinPath (1) fastml (1)
Bracken (1) Minipolish (1) fastp (1)
CD-HIT (2) MitoHiFi (1) flowFDA (1)
CMSeq (1) MixMHC2pred (1) genomepy (1)
CSBDeep (1) Monocle3 (1) genozip (1)
Canvas (1) NGS (1) gffutils (1)
CapnProto (3) NanoCaller (1) goalign (1)
CellChat (1) NextGenMap (1) gofasta (1)
CellOracle (1) OMA (1) gotree (1)
ChIPseeker (1) Oases (1) gubbins (1)
CheckM (1) OpenMM (1) hic-straw (1)
Clair3 (1) PALEOMIX (1) hifiasm (1)
Cluster-Buster (1) PAML (1) humann (1)
CmdStanR (1) PAUP (1) iced (1)
ColabFold (1) PHASE (1) inferCNV (1)
Coot (1) PICRUSt2 (1) intervaltree-python (1)
CopyKAT (1) PIPITS (1) kb-python (1)
Crumble (1) PIRATE (1) king (1)
Cytoscape (1) PLINK (1) kma (1)
DALI (1) PRANK (1) kneaddata (1)
DBG2OLC (1) PREQUAL (1) lDDT (1)
DIA-NN (1) Phenoflow (1) leafcutter (1)
DIAMOND (2) PhyloPhlAn (1) lifelines (1)
DSRC (1) PsiCLASS (1) loomR (1)
Delly (1) Pysam (2) loompy (1)
DendroPy (2) QIIME2 (1) mandrake (1)
DiffBind (1) QUAST (1) mapDamage (1)
DoubletFinder (1) QuickTree (1) meRanTK (1)
EDirect (1) R-bundle-Bioconductor (2) medaka (1)
EUKulele (1) RAxML-NG (1) mgltools (1)
Exonerate (1) RDP-Classifier (1) miniasm (1)
FASTA (1) RMBlast (1) minimap2 (3)
FASTX-Toolkit (1) RSEM (1) mosdepth (1)
FLASH (1) RTG-Tools (1) mpath (1)
FastANI (2) Racon (2) mrcfile (1)
FastME (1) RagTag (1) msprime (1)
FastQC (2) Raven (1) muMerge (1)
FastQ_Screen (1) Reads2snp (1) multichoose (1)
FastTree (1) RegTools (1) mygene (1)
Flye (1) RepeatMasker (2) nanoget (1)
FragPipe (1) ResistanceGA (1) nanopolish (1)
FreeSurfer (1) Restrander (1) ncbi-vdb (1)
GATK (1) RnBeads (1) nichenetr (1)
GCTA (1) Roary (1) novaSTA (1)
GD (1) SAMtools (5) ntCard (1)
GDGraph (1) SAP (1) olego (1)
GEM (1) SEPP (2) ont-fast5-api (2)
GFF3-toolkit (1) SHAPEIT (1) ont-guppy (1)
GOATOOLS (1) SMAP (1) oxDNA (1)
GTDB-Tk (1) SMC++ (1) parasail (2)
GTOOL (1) SNAP-HMM (1) pftoolsV3 (1)
GapFiller (1) SNAP (1) phyx (1)
GenMap (1) SPAdes (2) picard (2)
GenomeThreader (1) SRA-Toolkit (2) plinkliftover (1)
GetOrganelle (1) SSAHA2 (1) plot1cell (1)
GffCompare (1) STAR (3) pod5-file-format (1)
GimmeMotifs (1) SUPPA (1) pplacer (1)
Giotto-Suite (1) SURVIVOR (1) preseq (1)
Godon (1) SVIM (1) prodigal (2)
HAPGEN2 (1) SVclone (1) pyBigWig (1)
HH-suite (1) Sabre (1) pyGenomeTracks (1)
HISAT2 (1) Salmon (2) pySCENIC (1)
HMMER (2) Satsuma2 (1) pybedtools (2)
HTSeq (1) SeaView (1) pyfaidx (2)
HTSlib (4) Seaborn (2) pyslim (1)
HTSplotter (1) SeqAn (1) python-parasail (2)
Health-GPS (1) SeqKit (1) qnorm (1)
HiC-Pro (1) Seurat (2) rapidNJ (1)
HiCExplorer (1) SeuratDisk (1) scGSVA (1)
HiCMatrix (1) SeuratWrappers (1) scanpy (1)
Hybpiper (1) Sniffles (1) sceasy (1)
IGV (1) SoupX (1) scikit-bio (1)
IMPUTE2 (1) SpatialDE (1) scrublet (1)
IQ-TREE (1) Strainberry (1) seqtk (2)
ITSx (1) StringTie (1) silhouetteRank (1)
IgBLAST (1) Structure (1) smfishHmrf (1)
Inferelator (1) T-Coffee (1) splitRef (1)
Infernal (1) TM-align (1) spoa (1)
InterProScan (1) TRF (1) sradownloader (1)
Iris (1) TRUST4 (1) starparser (1)
IsoQuant (1) TWL-NINJA (1) tabix (1)
IsoSeq (1) TransDecoder (1) tabixpp (1)
IsoformSwitchAnalyzeR (1) Trimmomatic (1) trimAl (1)
Jasmine (1) USEARCH (1) unimap (1)
Jellyfish (1) UniFrac (1) vcflib (1)
KMC (1) VSEARCH (1) velocyto (1)
KMCP (1) WFA2 (1) wtdbg2 (1)
Kalign (1) WhatsHap (1)
Kraken (1) alleleCount (1)
aws-eb will parse {ID}-{VERSION_ID} as operating system from /etc/os-release
(base) ec2-user@froster:~$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2023"
ID="amzn"
ID_LIKE="fedora"
VERSION_ID="2023"
PLATFORM_ID="platform:al2023"
PRETTY_NAME="Amazon Linux 2023"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2023"
HOME_URL="https://aws.amazon.com/linux/"
BUG_REPORT_URL="https://github.com/amazonlinux/amazon-linux-2023"
SUPPORT_END="2028-03-01"
dp@r03:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
[dp@node-08-1 ~]$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
dp@grammy:~/gh/dptests$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"