The AWS infrastructure for hosting a private instance (see link below) of Nextflow Tower and executing Nextflow workflows is defined in this repository and deployed using CloudFormation via Sceptre.
The Nextflow infrastructure has been vetted by Sage IT to process sensitive or controlled-access (e.g. PHI) data. Notably, only HIPAA eligible AWS services are deployed.
Click the link below and login with your @sagebase.org
Google account:
➡️ Nextflow Tower @ Sage Bionetworks ⬅️
Follow the Tower User Onboarding instructions below. Access is currently restricted to Sage Bionetworks staff. See below for how to get help if you run into any issues.
Read through the contribution guidelines for more information. Contributions are welcome from anyone!
Message us in the #workflow_users
Slack channel or email us at nextflow-admins[at]sagebase[dot]org
.
Before you can use Nextflow Tower, you need to first deploy a Tower project, which consists of an encrypted S3 bucket and the IAM resources (i.e. users, roles, and policies) that Tower requires to access the encrypted bucket and execute the workflow on AWS Batch. Once these resources exist, they need to be configured in Nextflow Tower, which is a process that has been automated using CI/CD.
-
Create a 'stack name' by following this naming convention: concatenate a project name with the suffix
-project
(e.g.imcore-project
,amp-ad-project
,commonmind-project
). Due to limits imposed by Tower, the stack name cannot contain more than 32 characters.N.B.: Anytime that
<stack_name>
appears below with the angle brackets, replace the placeholder with the actual stack name, omitting any angle brackets. -
Create an IT JIRA ticket requesting membership to the following JumpCloud groups for anyone who needs read/write or read-only access to the S3 bucket:
aws-sandbox-developers
aws-workflow-nextflow-tower-viewer
To confirm whether you're already a member of these JumpCloud groups, you can expand the AWS Account list on this page (after logging in with JumpCloud) and check if you have
Developer
listed underorg-sagebase-sandbox
andTowerViewer
underworkflows-nextflow-dev
andworkflows-nextflow-prod
. -
Open a pull request on this repository in which you duplicate
config/projects/example-project.yaml
as<stack_name>.yaml
in theprojects/
subdirectory and then follow the numbered steps listed in the file. Note that some steps are required whereas others are optional.N.B. Here, read/write vs read-only access refers to the level of access granted to users for the encrypted S3 bucket and to the Tower workspace (more details below). Given that access is granted to the entire bucket, you might want to create more specific Tower projects that provide more granular access control.
Getting Help: If you are unfamiliar with Git/GitHub or don't know how to open a pull request, see above for how to get help.
-
Once the pull request is approved and merged, confirm that your PR was deployed successfully. If so, the following happened on your behalf:
-
Two S3 buckets were created (listed below), and users listed under
S3ReadWriteAccessArns
andS3ReadOnlyAccessArns
have read/write and read-only access, respectively. They each serve different purposes:s3://<stack_name>-tower-bucket/
: This bucket is intended for archival purposes, i.e. to store files in the long term. It can also be indexed by Synapse by default. Whenever you specify theoutdir
orpublishDir
parameters for a workflow, they should generally point to an S3 prefix in this bucket.s3://<stack_name>-tower-scratch/
: This bucket is intended to be used as scratch storage, i.e. to store files in the short term. The important difference with this bucket is that files will automatically be deleted after 6 months. This delay can be adjusted with theScratchLifecycleExpiration
parameter. This is intended as a convenience feature so users don't have to worry about cleaning up after themselves while benefitting from caching if the need arises (presumed here to be generally within 6 months). This bucket cannot be indexed by Synapse. It's ideal for storing the Nextflow work directories (configured on each compute environment by default) and for staging files from Synapse since they already exist somewhere else.
-
All users listed under
S3ReadWriteAccessArns
andS3ReadOnlyAccessArns
were added to the Sage Bionetworks organization in Tower. -
A new Tower workspace called
<stack_name>
was created under this organization. -
Users listed under
S3ReadWriteAccessArns
were added to a workspace team with theMaintain
role, which grants the following permissions:The users can launch pipeline and modify pipeline executions (e.g. can change the pipeline launch compute env, parameters, pre/post-run scripts, nextflow config) and create new pipeline configuration in the Launchpad. The users cannot modify Compute env settings and Credentials
-
Users listed under
S3ReadOnlyAccessArns
were added to a workspace team with theView
role, which grants the following permissions:The users can access to the team resources in read-only mode
-
A set of AWS credentials called
<stack_name>
was added under this Tower workspace. -
An AWS Batch compute environment called
<stack_name> (default)
was created using these credentials with a default configuration that should satisfy most use cases.
N.B. If you need have special needs (e.g. more CPUs, on-demand EC2 instances, FSx for Lustre), see above for how to contact the administrators, who can create additional compute environments in your workspace.
-
-
Log into Nextflow Tower using the link at the top of this README and open your project workspace. If you were listed under
S3ReadWriteAccessArns
, then you'll be able to add pipelines to your workspace and launch them on your data. -
Check out the Getting Started with Nextflow and Tower wiki page for additional instructions on how to develop workflows in Nextflow and deploy/launch them in Tower.
This repository is licensed under the Apache License 2.0.
Copyright 2021 Sage Bionetworks