SRAnwrp ("Saran Wrap") envelops several SRA-related tools in the warm, polyethylene embrace of a single Ubuntu-based Docker image and some optional assorted workflows. For the sake of simplicity, releases on main follow the same versioning scheme as the Docker image.
The combination of e-direct and sra-tools allows it do basically anything you can do from SRA's website. These exist in the form of WDL workflows -- more on WDL here.
- Pull paired FASTQs from a list of run accessions (SRR/ERR/DRR)
- Pull paired FASTQs from a lit of BioSample accessions - can be SRS or SAME notation
- Plus some bonus non-workflow pulling tasks
- Note -- as a pre-3.0.5 version of fasterq-dump is being used, pulling non-Illumina fastqs is not supported.
- Note -- it is recommended you set the disk_size variable to 20x the size of the largest .sra that you want to download.
There's a lot of BioProjects on SRA, and some of them are multi-species. Use this workflow to get a list of all run accessions, and said run accessions' species and TaxIDs, from a list of BioProject accessions. If you instead have a list of BioSamples, use this workflow to get species and taxid (as well as a list of all run accessions).
If you have a list of run accessions, this workflow will get a list of sample accessions that they cover. Some samples have more than one run -- those samples will only appear in the output once.
Here's some other tasks that can help you convert between data types.
Non-exhaustive list:
- The TB reference genome and a BED of its commonly masked regions
- bash-5.1.16(1)-release
- bedtools-latest
- bc-latest
- bcftools-1.16
- cpan-latest
- curl-latest
- entrez-direct-latest (aka edirect)
- gcc-latest
- git-latest
- htslib-1.16
- make-latest
- Matplotlib-latest
- numpy-latest
- pandas-latest
- pigz-latest
- python-3.12
- note: must be called with
python3
instead ofpython
(andpip3
instead ofpip
) when running non-interactively
- note: must be called with
- samtools-1.16
- mpileup, minimap2, fixmate, etc
- seqtk-latest
- sra-tools-3.0.1 (aka SRAtools, SRA tools, SRA toolkit, etc)
- align-info, fastq-dump, fasterq-dump, prefetch, sam-dump, sra-pileup, etc
- fyi: ncbi/ncbi-vdb was merged with sra-tools in sra-tools-3.0.0 and vdb-get was retired in 3.0.1
- sudo-latest
- taxoniumtools-latest
- tree-latest
- vim-latest
- wget-latest
Right now, the image is built and pushed manually. You'll need to include your own copy of the TB reference tarball -- it can be created with clockwork refprep, or downloaded from this Google bucket. MD5s are provided in this repo as a double-check.
- Docker Hub's latest version of staphb/sratoolkit, as of my writing this in October 2022, runs version 2.9.2 (see command 15), which doesn't work at all anymore
- Existing Docker images tend to contain either the SRA toolkit or Entrez Direct, not both
- Building SRA Toolkit on your own, without conda, is not intuitive
- Building SRA Toolkit on your own, with conda, is also not intutive (you usually end up with v2.10 which only sometimes works)
- No need to run
vdb-config --interactive
or any other interactive process before using anything in this image; SRA Toolkit's config file is generated while building the image