Skip to content

Commit

Permalink
Reorganize repo. Assume use of zibra/zibra Docker image.
Browse files Browse the repository at this point in the history
  • Loading branch information
trvrb committed Dec 31, 2016
1 parent 640cd9d commit c4c70e7
Show file tree
Hide file tree
Showing 18 changed files with 62 additions and 702 deletions.
2 changes: 0 additions & 2 deletions .gitignore
@@ -1,7 +1,5 @@
# For zika-seq repo #
environment*
data/libraries/
scripts/ssw_lib.py

# Jekyll #
##########
Expand Down
71 changes: 42 additions & 29 deletions README.md
@@ -1,55 +1,68 @@
# Experimental protocols and bioinformatic pipelines for Zika genome sequencing

#### Allison Black<sup>1,2</sup>, Barney Potter<sup>2</sup>, Nicholas J. Loman<sup>3</sup>, Trevor Bedford<sup>2</sup>

<sup>1</sup>Department of Epidemiology, University of Washington, Seattle, WA, USA, <sup>2</sup>Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA, <sup>3</sup>Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK

## Install

Clone the repo:
Clone the repo and load submodules:

git clone https://github.com/blab/zika-seq.git
git submodule update --init --recursive

Install Python dependencies:
## Data sync

pip install -r requirements.txt
Primary sequencing data lives on the Rhino FHCRC cluster at:

Install [SSW Library](https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library):
/fh/fast/bedford_t/data/

git clone https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library.git
cd Complete-Striped-Smith-Waterman-Library/src/
cp libssw.so <path to zika-seq>/zika-seq/scripts/
cp ssw_lib.py <path to zika-seq>/zika-seq/scripts/
And locally on Meristem drive at:

Install [marginAlign](https://github.com/benedictpaten/marginAlign):
/Volumes/Meristem/data/

git clone https://github.com/benedictpaten/marginAlign.git
cd marginAlign
git submodule update --init --recursive
make
export PATH=<path to marginAlign>/marginAlign/:$PATH
To sync Meristem to Rhino, run:

Install [samtools](https://github.com/samtools/samtools):
rsync -azP tbedford@rhino.fhcrc.org:/fh/fast/bedford_t/data/ /Volumes/Meristem/data/

brew tap homebrew/science
brew install samtools
Replacing `tbedford` with your username.

## Data sync
This `data/` directory is assumed to follow [a particular schema](https://github.com/blab/zika-seq/blob/master/data-schema.md).

From `zika-seq` run:
## Bioinformatic pipeline

rsync -azP tbedford@rhino.fhcrc.org:/fh/fast/bedford_t/zika-seq/data/ data/
Here, we use the ZiBRA project bioinformatic pipeline at [zibraproject/zika-pipeline](https://github.com/zibraproject/zika-pipeline/). This pipeline is instantiated in the Docker image [zibra/zibra](https://hub.docker.com/r/zibra/zibra/). Data processing is done using Docker.

Replacing `tbedford` with your username.
### Data volume

## Bioinformatic pipeline
Create a named data volume that mirrors local `data/` to `data/` within container:

docker create --name zibra-data -v /Volumes/Meristem/data:/data zibra/zibra

This is to get data into the Docker container. Note that the path to local directory has to be an absolute path.

Create a named data volume for a single sample:

docker create --name zibra-data-lb01-nb01 -v /Volumes/Meristem/data/usvi-library1-2016-12-10/basecalled_reads/pass_demultiplex/NB01:/data zibra/zibra

### Build volume

Create a named data volume that mirrors local `build/` to `build/` within container:

docker create --name zibra-build -v /Volumes/Meristem/build:/build zibra/zibra

This is to get data out of the Docker container. Note that the path to local directory has to be an absolute path.

Data lives in the [`data/`](data/) directory and is not versioned within the repo. Directory structure described in its [README.md](data/).
### Start

### Base calling
Enter docker image:

Convert raw MinION output to FAST5
docker run -t -i --volumes-from zibra-data --volumes-from zibra-build zibra/zibra /bin/bash

metrichor-cli -a <API KEY> -w 1289 -f - -i <directory_with_fast5s> -o downloads
Run single sample script within image:

### Run pipeline
./scripts/go_single_sample_r94.sh refs/KJ776791.2.fasta NB03 metadata/v2_500.amplicons.ver2.bed

Run poretools, marginAlign, samtools:
## Results

python run.py
* [Initial coverage results from first library are here](depth-coverage/)
20 changes: 20 additions & 0 deletions data-schema.md
@@ -0,0 +1,20 @@
# Data schema

## Nanopore reads

Input data to the Zika pipeline arrives in the `data/` directory. This should be [mounted to `data/` in the Docker container](https://github.com/blab/zika-seq#data-volume).

- `data`
- `usvi-library1-2016-12-10` - library
- `raw_reads` - squiggle graphs in fast5 format
- `pass` - contains `.fast5` files
- `basecalled_reads` - basecalled with Metrichor
- `pass_demultiplex` - demultiplexed basecalled reads
- `NB01` - contains `.fast5` files for NB01 barcode
- `NB02` - contains `.fast5` files for NB02 barcode
- etc...
- `nonNB_demultiplexed` - demultiplexed basecalled reads
- `BC01` - contains `.fast5` files for BC01 barcode
- `BC02` - contains `.fast5` files for BC02 barcode
- etc...
- `fail` - contains `.fast5` files that weren't demultiplexed
9 changes: 0 additions & 9 deletions data/README.md

This file was deleted.

File renamed without changes.
File renamed without changes.
2 changes: 0 additions & 2 deletions refs/Zika_Africa.fasta

This file was deleted.

0 comments on commit c4c70e7

Please sign in to comment.