Illumina data will be downloaded from [SRA](http://www.ncbi.nlm.nih.gov/sra) using the following workflow.

It will require the [SRA-toolkit](http://www.ncbi.nlm.nih.gov/Traces/sra/?view=toolkit_doc) program `fastq-dump` to be installed on your machine and in your path.

If the necessary program is not yet present on your system you can get it as follows:


In [None]:
%%bash

mkdir SRA-toolkit
cd SRA-toolkit

wget -q http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.5.7/sratoolkit.2.5.7-ubuntu64.tar.gz

#decompress the archive
tar -xzf sratoolkit.2.5.7-ubuntu64.tar.gz

#add the fastq-dump executable to your PATH (may require sudo)
cp $(pwd)/sratoolkit.2.5.7-ubuntu64/bin/fastq-dump /usr/local/bin

#remove SRA toolkit files that are not required
cd ..
rm -rf SRA-toolkit/

#check if fastq-dump is ok
echo -e "checking if fastq-dump has been set up ok .. should display program usage"
fastq-dump

The following cell contains a simple loop that will read through the file `sample_metadata/Sample_accessions_LW.tsv` (see [here]()) and download the relevant raw read files from SRA basd on the accessions provided in column 3 of the file. It will then rename the files according to the sampleID (column 1) and the marker (column 2).

The full downloading process takes ~10 minutes depending on your connection.


In [None]:
%%bash

mkdir reads

for s in $(cat sample_metadata/Sample_accessions_LW.tsv | sed 's/\t/,/g' | grep "SRA_Accession" -v)
do 
    id=$(echo $s | cut -d "," -f 1)
    accession=$(echo $s | cut -d "," -f 3)
    marker=$(echo $s | cut -d "," -f 2)
    echo -e "$(date)\tdownloading: $id\t$accession\t$marker"
    fastq-dump -O reads/ --split-files --gzip --defline-seq '@$ac-$sn/$ri' --defline-qual '+' $accession
    mv reads/$accession\_1.fastq.gz reads/$id-$marker\_1.fastq.gz
    mv reads/$accession\_2.fastq.gz reads/$id-$marker\_2.fastq.gz
done

Double check if all files are there.

In [None]:
!ls reads/*.fastq.gz

Metadata for the samples has been prepared in the local file `sample_metadata/Sample_metadata.csv` - or view [here]().