Skip to content

Assembling with Shasta on EC2

FlintMitchell edited this page Oct 29, 2021 · 3 revisions

Go back to GBI AWS Wiki

This page will help you if you would like to run Shasta to assemble a genome in AWS.

Shasta on Ubuntu 20.04

Note: Using a Linux-based OS, we need to use an ‘x86’ architecture. And remember, if you ever have any questions that aren't answered on here, the softwares homepage/github will usually have documentation regarding its use. Like wtdbg2, Shasta has great documentation on its github. There is a link in the resources at the bottom of this page to it.

If you are using an instance that is already assembled to run Shasta, start at step 7. (04/20/21) The current custom EC2 Shasta AMI for GBI has the ID ami-0a5ff45378d20cc4a and name GBI_ShastaAssembler_Ubuntu_x86_r5.xlarge. To create an instance from this, follow the instructions on the EC2 page.

  1. Start Ubuntu Server 20.04 Instance with a 64-bit (x86) processor.
  2. Log in through terminal: ssh -i /path/to/keypairs/keypair.pem ec2-user@ecx-xx-xxx-xxx-xxx.us-west-1.compute.amazonaws.com
  • Ex.: ssh -i /Users/flintmitchell/AWS_keypairs/flints-keypair-1.pem ec2-user@ec2-54-215-243-118.us-west-1.compute.amazonaws.com
  1. If you are using S3, download awscli to gain access to S3 storage buckets: sudo apt install awscli
  2. Make a folder to organize your data and the results that will come from the assembly: mkdir data_folder_name
  • example: mkdir my_genome_assembly
  1. Copy data from local or S3 to your data folder:
  • S3: aws s3 cp s3://[bucket-name]/[desired-file] [path/to/instance/location]
  • SCP: scp -i /path/my-key-pair.pem /local_path/file.filename ec2-user@my-instance-public-dns-name.compute-1.amazonaws.com:ec2_path/destination
  1. All of Shasta's dependencies come with it, so all you need to do is the following commands (from the documentation):
  • curl -O -L https://github.com/chanzuckerberg/shasta/releases/download/0.7.0/shasta-Linux-0.7.0
  • chmod ugo+x shasta-Linux-0.7.0
  1. That's all! Now you can do an assembly with the following:
  • ./shasta-Linux-0.7.0 --input [path/to/your-sequencing-data.FASTA]
  1. All the result files will be returned in a folder named "ShastaRun".
  2. Downloading your results.
  • We can once again use the scp command from step 8 (with a slight change) to copy the results to our local storage. We will also use the flag -r, which will copy through all the files in a given folder recursively (2 flags can be sent together, so -r and -i will be -ri [note, not -ir, order matters]) scp, -ir, keypair, results-on-ec2-instance, local file:
  • scp -ri /path/to/keypairs/keypair.pem ec2-user@ecx-xx-xxx-xxx-xxx.us-west-1.compute.amazonaws.com:~/data_folder_name/results_folder_name local/path/to/results_folder
    • Ex. scp -ri /Users/flintmitchell/Desktop/GBI/AWS_keypairs/flints-keypair-1.pem ubuntu@ec2-44-242-148-188.us-west-2.compute.amazonaws.com:~/ecoli-data/ecoli-ont /Users/flintmitchell/Desktop/GBI/Results

Just like with the other assemblers, I will be updating this page with more information on how Shasta actually works and more about the parameters that we can change to optimize our assemblies.


Resources for Shasta:

https://chanzuckerberg.github.io/shasta/QuickStart.html#QuickStartLinux

https://www.biorxiv.org/content/10.1101/715722v1.full.pdf

Go back to GBI AWS Wiki

Clone this wiki locally