This repository contains a workflow for the Biodepot-workflow-builder (Bwb) implementing a benchmarking workflow for CLIJ, a library and extension for the Fiji image processing suite allowing the use of GPUs for accelerated processing (using OpenCL). Our goal is to provide an open-source methodology that allows users to utilize GPUs to process their image data using on-demand cloud computing.
The workflow implemented is the one described in the Supplementary Materials section of the original CLIJ paper (see References below), with some modifications to adapt it to the Bwb platform. Like all Bwb workflows, the benchmarking workflow is containerized, which means it can be deployed on a more powerful cloud server with minimal effort, and does not require installation of anything besides Docker (and video drivers). This repository provides detailed instructions on how to set up and launch CLIJ in the Bwb using Amazon Web Services (AWS).
CITATION for this work: Accelerated and Reproducible Fiji for image processing using GPUs on the cloud. Ling-Hong Hung, Evan Straw, Zachary Colburn, Ka Yee Yeung. Pre-print bioRxiv 10.1101/2022.07.15.500283. doi: https://doi.org/10.1101/2022.07.15.500283
- Requirements
- AWS Setup
- Manual Installation
- Usage
- More on the AMI
- Licensing
- References
The workflow should be run on a machine with an NVIDIA GPU and
appropriate drivers installed. (Note that AMD GPUs are not currently
supported, as Bwb uses the --gpus
option in Docker, which does not
support AMD GPUs at time of writing (this is an open issue, see
docker/cli#2063 ). Use of a cloud server is
recommended; the workflow has been tested on Amazon Web Services'
g4dn.*
instances, which have an NVIDIA Tesla V100 GPU. See AWS
Setup below.
If using an AWS cloud server, the use of our Amazon Machine Image (AMI) is recommended; this contains an installation of Ubuntu with all the necessary GPU drivers and scripts to run the workflows preinstalled. See Using the AMI below.
Additionally, AWS may impose a limit (often 0!) on the number of vCPUs
(not the number of instances) on GPU-enabled instances that new
users may create; you may need to request a limit increase if this is
the case. The recommended instance type (g4dn.2xlarge
) has 8
vCPUs. You can check your current limits at
https://us-east-2.console.aws.amazon.com/ec2/home?region=us-east-2#Limits
(replace us-east-2
with your desired region) by searching for
"Running on-demand G instances". If it is too low (or zero) you can
select it and press "Request limit increase".
If you are not using an AWS cloud server or you do not wish to use the AMI, see Manual Installation below.
Bwb requires that Docker be installed (instructions here). Additionally, you will need to have the NVIDIA Container Toolkit installed to make the GPU available to Docker containers.
The following are instructions for launching a virtual machine
instance on AWS to run the workflow. You will need to log in to the
AWS console before continuing; additionally,
if you wish to use the AMI, you must make sure your region (found in
the top-right corner next to your username) is set to us-east-2
(Ohio), since the AMI is stored in that region.
If you are using the AMI, you may skip the steps Launch instance and Choose an OS image below by using this link. Otherwise, continue reading below.
Additionally, these instructions are available in video form here: https://youtu.be/Z_SHw1mU2JM
Go to the EC2 Console (can also be found by searching "EC2" from the main AWS console), and press the orange "Launch Instance" button.
The "Launch an instance" form will come up; first, give your instance a name in the box at the top of the screen.
Next, you will need to choose an operating system image to be installed on the instance. Scroll down to the "Application and OS Images" section on the form, and make sure the "Quick Start" tab is selected.
To select the AMI, click the "Browse more AMIs" button to the
right. Then, once the "Choose an Amazon Machine Image" page comes up,
select the "Community AMIs" tab, and search for
bwb-clij-benchmarking-latest
to find the latest version of the
AMI. (Alternatively, you can search bwb-clij-benchmarking
to bring
up specific versions to use.)
Available versions are:
bwb-clij-benchmarking-latest
(AMI ID:ami-0ae86208aad96a06e
)- NOTE: The "latest" AMI is just a copy of the latest version of the AMI, but under a name that is easier to search; when a new version is released, the "latest" AMI is deleted and replaced with a copy of the new latest version.
bwb-clij-benchmarking_v0.4_20220926
(AMI ID:ami-0a86efe9d0f29abf6
)clij-benchmarking_v0.3_2022-07-05
(AMI ID:ami-0e4eb88d9cacd5be9
)
When you have found the version of the AMI you want to use, click the orange "Select" button on the right.
If you are doing manual setup, click the "Ubuntu" button in the list of operating systems and then choose Ubuntu Server 20.04 LTS from the drop-down menu.
Next, you will need to choose the type of instance you wish to launch;
this will configure the hardware installed on the instance (CPU cores,
memory, etc.). Scroll down to the "Instance type" section of the
form. Then, click the drop-down menu and search for g4dn
; the G4dn
instances
have NVIDIA GPUs installed, which are currently required for the
workflow. You can choose any of the G4dn instance types; they have
varying CPU core counts and memory/storage sizes (but larger ones may
also cost more per hour). We have tested the workflow on
g4dn.2xlarge
instances, so we recommend this instance type.
Next, you will need to create an SSH key pair (or choose one that you have previously created) to be installed onto the instance. A key pair consists of a private key, which will be downloaded onto your computer and used to access the instance, and a public key, which will be installed onto the instance. Scroll down to the "Key pair (login)" section in the form.
If you have already done this before (unlikely for new users), and still have access to the key file, select the key pair you previously created from the drop-down menu, and continue to the next step.
Otherwise, click the "Create new key pair" link. A window will pop up where you can give the key pair a name and optionally choose what key type you would like to create. You can leave the key pair type as "RSA"; then, for the private key format, choose ".pem" (unless you wish to use PuTTY; see Connecting below).
Then, click the orange "Create key pair" button when you are finished. The private key file will download in your web browser; for security reasons, you will not be able to download it again, so hold onto it! If you lose access to your private key, you may be unable to access your instance.
Next, you will need to set up a firewall security group, to allow access to the Bwb server that will run on your instance. Scroll down to the "Network settings" section of the form, and press the "Edit" button in the top right.
After doing so, you can optionally give the security group your own name and description (or just use the one already generated for you); then, press "Add security group rule" until there are three security group rules (the form starts with one).
For each of the rules, you will be able to choose whether you would like to allow incoming traffic from any computer ("Anywhere") or only from the public IP address of your network ("My IP"). Anywhere will allow access if your computer changes networks, but will allow anyone with the address to access your Bwb virtual desktop, and possibly files on the instance that are mapped to data volumes in Bwb.
Set up the security group rules like so:
- Description: "SSH access"
- Type: "ssh"
- Source Type: "Anywhere" or "My IP" (see above)
- Description: "HTTP access for Bwb"
- Type: "Custom TCP"
- Port Range: 6080
- Source Type: "Anywhere" or "My IP" (see above)
- Description: "VNC access for Bwb"
- Type: "Custom TCP"
- Port Range: 5900
- Source Type: "Anywhere" or "My IP" (see above)
Finally, scroll down to the "Configure storage" section of the form, and ensure that the size of the root volume is 100GB, and the type is "gp3". If you are using the AMI, this should already be set correctly; if you are doing manual setup, you will need to set this yourself.
If you are doing manual setup, you don't necessarily need to make the root volume 100GB, but keep in mind that the root volume must hold the operating system, Docker images, and GPU drivers.
Then, when you have verified that all settings are correct and are ready to launch the instance, click the orange "Launch instance" button in the "Summary" column on the side of the screen.
Once your instance is launched, you will need to log in to it and start the Bwb server. You will need a terminal with a Secure Shell (SSH) client installed; for users of Mac and Linux distributions, the built-in terminal application should work; for Windows users, you will need to install one. There are a few options like MobaXterm or Cygwin (you may need to ensure that SSH packages are installed), or PuTTY if you prefer a graphical client.
Once your instance has launched, you can click the instance ID (highlighted as a blue hyperlink) in the success message to go to the instance details screen.
You will then be taken to the "Instances" screen; a filter will be applied so that the only instance listed is the one you just launched. Click on it in the list, and more information should appear in the pane at the bottom. You will want to locate the "Public IPv4 Address" section, and copy the IP address shown there. This is the public IP address of your instance, which we will need later to connect to it. Note that anyone with this address can access the Bwb server on your instance, so don't share it!
First, locate the private key file you downloaded in the previous
steps (it will have a .pem
extension, and will often be located in
the "Downloads" folder). Then, open your terminal application.
For Linux and Mac users, if it is the first time connecting to the
instance, you will need to change the permissions on the private key;
for security, SSH clients often refuse to use private keys if they are
accessible by other users on the system. Run the following command,
replacing <path to private key>
with the file path to the private
key file you located; this can often be inserted by dragging and
dropping the file onto the terminal window.
# Set permissions of private key to 0400
# (read-only for you, no access for any other users)
chmod 400 <path to private key>
You do not need to do this more than once; if you have used this key already, move on to the next paragraph.
Finally, you can connect using the following command, replacing <path to private key>
with the file path to the private key (can often be
inserted by dragging and dropping the file onto the terminal window),
and <instance IP address>
with the instance's public IP address that
you located earlier in the AWS console.
# Log into user "ubuntu" on <instance IP address>, using the
# key at <path to private key> for authentication
ssh -i <path to private key> ubuntu@<instance IP address>
You may get a warning saying something like "The authenticity of host 'X.X.X.X' can't be established", and asking if you still want to connect; this happens the first time you connect to a new computer via SSH, since your client has never seen its public key before. Type "yes" and press Enter to continue connecting.
Note that your instance might take a while (often less than 5 minutes) to actually start its operating system, and so it might reject your connection if it is not ready yet. If this happens, just wait a while and retry. In many terminal applications, you can retry a previous command by pressing the up arrow and then Enter.
Once you have connected, you will get a long welcome message with the current system information of your instance, and you are ready to move on to Starting the Bwb server below.
First, locate the private key you downloaded earlier (it will have a
.pem
or .ppk
extension, and will often be located in the
"Downloads" folder). If if has a .ppk
extension, continue on to the
next paragraph; if it has a .pem
extension instead of .ppk
, then
we will need to convert it to the PPK format to use it in
PuTTY. Open the PuTTYGen program, press the "Load" button, and find
your private key file; you will likely have to change the filter in
the file window from "PuTTY Private Key Files (*.ppk)" to "All
Files". Then, once you have opened the file, press "Save private key"
and choose a location and file name (make sure it ends in the .ppk
extension) for the converted private key file.
If you have ensured that your private key is in PPK format, then open the main PuTTY program. You will then need to set the following settings:
- Session > Host Name (or IP address): set to the instance IP address you located earlier in the AWS console.
- Connection > Data > Auto-login username: Set to "ubuntu".
- Connection > SSH > Auth > Private key file for authentication: click "Browse" and locate the PPK private key file.
You may wish to save this session using the "Saved Sessions" section under the "Session" tab, but keep in mind that the IP address may change if you stop your instance and restart it.
After entering these settings, click "Open" to begin connecting. A terminal window will open, where you should get a long welcome message with the current system information of your instance; when this happens, you are ready to move on to Starting the Bwb server below.
If you are using the AMI, once you have logged into the instance, you can perform all necessary setup and start the Bwb server with the following command:
./start_all.sh
The script will output messages describing what it is doing; once you see success messages, you can move on to the Usage section to see how to connect to Bwb and run the workflow.
For more information about what this command does, see General/Setup Scripts below.
If you are not using the AMI, see Start the Bwb server under Manual Installation below.
The following instructions should work for an Ubuntu 20.04 system.
# Update software repositories and perform any necessary upgrades:
sudo apt update && sudo apt upgrade
# Install NVIDIA server driver 510:
sudo apt install nvidia-driver-510-server
# Hold the nvidia driver at the current version to prevent unintentional updates
sudo apt-mark hold nvidia-driver-510-server
You may need to reboot your system to finalize installation of the drivers. Afterwards, check if your GPU is recognized by running
nvidia-smi
When it is run, the workflow will create a directory called benchmark
in your
current working directory, which will contain all data generated or downloaded
by the workflow (including ~40GB of images). Before cloning the workflow
repository, choose an appropriate working directory, and navigate to it with
cd
. For example, if we wanted to store the workflow and all its data in
Documents
, we would use
cd Documents
On Linux, you may also be able to use the file browser to find an appropriate directory, right-click it, and then select "Open in Terminal" to launch a terminal already pointed to that working directory.
Clone this repository with
git clone https://github.com/biodepot/fiji-clij
Navigate into the repository with
cd fiji-clij
Run the command
sudo docker run --rm \
-p 5900:5900 -p 6080:6080 \
-v ${PWD}/:/data \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/.X11-unix:/tmp/.X11-unix \
--privileged --group-add root \
biodepot/bwb:imaging__latest
This will forward ports 5900 (for VNC access) and 6080 (for HTTP access) on your
local machine to those same ports within the container, "pass through" access to
your machine's Docker daemon and display server, and map your current working
directory to the /data
directory inside the container. As mentioned earlier, a
directory called benchmark
will be created inside your current working
directory (or /data/benchmark
inside the container), containing all data
generated by the workflow.
See "Overview: Running Bwb" in the Bwb documentation for more details.
To use Bwb, the user can use a browser (Chrome/Firefox/Safari) or a VNC client (e.g. RealVNC). Instructions are given in the Bwb documentation. In most cases, the browser should be set to http://localhost:6080 if the Bwb server is started on a laptop or the https://<ip of the remote machine>:6080 if started on a remote or cloud server. In addition, to connect to Bwb on the cloud, a port must be opened and forwarded to allow browser and client to communicate with Bwb. The exact methodology will depend on the cloud provider. Some instructions for Amazon web services are provided here
- From the Bwb menu bar, select
File > Load Workflow
- Navigate to
/data
, and selectclij_benchmarking_workflow
, then press "Choose". - Double click on the "Start" widget and press the blue "Start" button to start the workflow.
The workflow will download image data and benchmarking macros, and then perform the same sequence of image processing operations on the dataset using both the GPU and the CPU, and then compare the runtime and results of the two analyses.
Please note that the CPU benchmark may take a very long time to complete depending on how many image datasets are used - the runtime scales linearly with the number of image datasets used, since image data are sequentially processed.
By default, the workflow uses 300 images from the dataset used in the CLIJ paper for benchmarking; this can be adjusted by double clicking on the "Download Images" widget and modifying the parameters. Each image dataset is ~127MB, so 300 image datasets will occupy ~38GB.
The entire dataset consists of 607 images, numbered 000000
through 000606
;
the "Pattern" field of the "Download Images" widget contains a printf
-style format
string; by default it is
the URL of one of these image datasets, with a %06d
placeholder that will be replaced
by an image number, zero-padded to be 6 digits long. The number will range from
the value in the "Minimum Value" field to the value in the "Maximum Value"
field, inclusively, and files will be placed in the directory specified in the
"Output directory" field. Additionally, the "Replace Existing Files" checkbox
can be used to control whether existing files should be replaced; since the
dataset is very large, the user may choose to only download missing files and
skip existing ones.
Note that for simplicity, the benchmarking workflows will use all the image datasets in the chosen output directory; if you have already downloaded image data once, and wish to re-run the workflow with a smaller subset of image data, either delete the image data you do not wish to use, or move them to another directory.
The AMI contains an installation of Ubuntu 20.04 with Docker and
NVIDIA graphics driver packages preinstalled, as well as several
utility scripts for using the workflow, in the home directory of the
ubuntu
user.
If you are using a g4dn
instance (recommended, since an NVIDIA GPU
is required for the workflow) then your instance will have an
ephemeral "instance store" drive physically attached to the machine;
this is where the workflow, image data, and results will be
stored. This drive will be mounted at /mnt/data
, which is
symbolically linked to data
in the home directory. You can simply do
cd data
to access the drive. If you are using an instance that
doesn't have an instance store drive, /mnt/data
will simply be a
regular directory, and is still usable for storing data.
./start_all.sh
- Recommended for a quick start. Runs all the setup scripts in order, and then starts the Bwb server.
The individual setup scripts are (in recommended execution order):
./mount_disks.sh
- Formats the instance store drive(s) and then mounts them to/mnt/data
(if any). If multiple instance store drives are present, they will be formatted into a RAID-0 array and presented as one large disk. Must be done before any other operations if you wish to use the instance store drive - otherwise, the root volume will be used../download_workflow.sh
- Downloads the latest version of this workflow to/mnt/data
viagit clone
. If another version of the workflow already exists, it will be updated viagit pull
../update_bwb.sh
- Downloads the latest version of the Bwb Docker image available on DockerHub../start_bwb.sh
- Starts the Bwb server.
These scripts are intended to allow you to store a backup of your data/analysis results on Amazon S3, since the instance store drive on which they are stored is deleted when the instance shuts down.
These scripts will not work out of the box, and require AWS credentials to be set up first. If you have not already, you will need to generate an access key for the AWS Command-Line Interface by logging in to the AWS console, clicking on your username at the top-right, and selecting "Security Credentials". Once on that page, click "Create access key" and make sure to save the credentials that you are given, since you won't be able to access them through the console again.
Once you have an access key and an access key secret, run the command
aws configure
and enter the access key and secret when prompted. Afterwards, you should be able to use the scripts.
./create_bucket.sh
- This script will create an S3 bucket for the backup data (if one does not already exist)../pull_backup1.sh
- This script will download the contents ofbackup1
(i.e. the most recent backup) on S3 to the instance store drive../sync_data.sh
- This script will create a new backup by movingbackup1
tobackup2
on S3, and then copying the contents of the instance store drive tobackup1
on S3../sync_start_bwb.sh
- This script will create a new backup (by calling./sync_data.sh
) first, and then start Bwb (by calling./start_bwb.sh
).
This workflow is free software, under the BSD license; see
LICENSE.txt
for more information.
The ImageJ macros and Jupyter notebooks in clij_files
are adapted
from those in the original CLIJ benchmarking
repository by the authors
of CLIJ, and are licensed under the BSD license as well. See
clij_files/LICENSE.txt
for more
information.
-
CLIJ
Robert Haase, Loic Alain Royer, Peter Steinbach, Deborah Schmidt, Alexandr Dibrov, Uwe Schmidt, Martin Weigert, Nicola Maghelli, Pavel Tomancak, Florian Jug, Eugene W Myers. CLIJ: GPU-accelerated image processing for everyone. Nat Methods 17, 5-6 (2020) doi:10.1038/s41592-019-0650-1
Workflow is taken from the Supplementary Materials section of this paper.
-
Dataset used is the one given in the Supplementary Materials section of the CLIJ paper:
Haase, R. (2019). Workflow benchmarking data, drosophila lightsheet. doi:10.17617/1.8J
-
Fiji:
Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., … Cardona, A. (2012). Fiji: an open-source platform for biological-image analysis. Nature Methods, 9(7), 676–682. doi:10.1038/nmeth.2019