# BIGSI Test 1

This is a test of the software `bigsi` <https://bigsi.readme.io/>, specifically by following the tutorial <https://bigsi.readme.io/docs/your-first-bigsi>.

# Step 1

Step 1 involves downloading the fastq files and counting kmers. This makes use of the software `fasterq-dump` to download fastq files and `jellyfish` to count kmers. The genomes to download are stored in the file `genomes.txt`.

In [1]:
NOTEBOOK_DIR=`git rev-parse --show-toplevel`

In [2]:
cd $NOTEBOOK_DIR
rm -rf kmers
mkdir kmers
pushd kmers

~/workspace/bigsi-examples/kmers ~/workspace/bigsi-examples


In [3]:
kmer_size=31
threads=16

for accession in `cat ../genomes.txt`;
do
    echo "Working on $accession"
    
    fastq_file=${accession}.fastq
    cortex_out=${accession}.ctx
    cortex_log=${accession}_mccortex.log
    
    set -x
    fasterq-dump -s -o ${fastq_file} "$accession"
    mccortex31 build -t ${threads} -k ${kmer_size} -s ${accession} --seqi ${fastq_file} ${cortex_out} 2> ${cortex_log}.err 1> ${cortex_log}
    set +x
done

echo "Done"

Working on SRR9842706
+ fasterq-dump -s -o SRR9842706.fastq SRR9842706
spots read      : 641,190
reads read      : 1,282,380
reads written   : 1,282,380
+ mccortex31 build -t 16 -k 31 -s SRR9842706 --seqi SRR9842706.fastq SRR9842706.ctx
+ set +x
Working on SRR10140545
+ fasterq-dump -s -o SRR10140545.fastq SRR10140545
spots read      : 519,015
reads read      : 1,038,030
reads written   : 1,038,030
+ mccortex31 build -t 16 -k 31 -s SRR10140545 --seqi SRR10140545.fastq SRR10140545.ctx
+ set +x
Working on SRR10140517
+ fasterq-dump -s -o SRR10140517.fastq SRR10140517
spots read      : 566,623
reads read      : 1,133,246
reads written   : 1,133,246
+ mccortex31 build -t 16 -k 31 -s SRR10140517 --seqi SRR10140517.fastq SRR10140517.ctx
+ set +x
Working on SRR10140498
+ fasterq-dump -s -o SRR10140498.fastq SRR10140498
spots read      : 758,152
reads read      : 1,516,304
reads written   : 1,516,304
+ mccortex31 build -t 16 -k 31 -s SRR10140498 --seqi SRR10140498.fastq SRR10140498.ctx
+ set +

Awesome. We've now got files named like `SRR9842706.ctx` in our directory containing the cortex graph/kmers.

# Step 2

In step 2, we'll look at building the `bigsi` indexes for each of these kmer counts.

In [4]:
number_hashes=3
m_value=25000000

mkdir bigsi
pushd bigsi

~/workspace/bigsi-examples/kmers/bigsi ~/workspace/bigsi-examples/kmers ~/workspace/bigsi-examples


Let's setup the bigsi index configuration.

In [5]:
cat > berkeleydb.yaml << EOF
## Example config using berkeleyDB
h: ${number_hashes}
k: ${kmer_size}
m: ${m_value}
storage-engine: berkeleydb
storage-config:
  filename: bigsi.db
  flag: "c" ## Change to 'r' for read-only access
EOF

export BIGSI_CONFIG=berkeleydb.yaml

Now, let's construct the blooom filters.

In [6]:
# Link all .ctx files into current directory to make running `parallel` easier
ln -s ../*.ctx .

set -x
parallel --jobs 12 -I% bigsi bloom % %.bloom ::: *.ctx
set +x

+ parallel --jobs 12 -I% bigsi bloom % %.bloom ::: SRR10140498.ctx SRR10140517.ctx SRR10140545.ctx SRR9842706.ctx
  config = yaml.load(infile)
  config = yaml.load(infile)
  config = yaml.load(infile)
  config = yaml.load(infile)
+ set +x


In [7]:
ls

berkeleydb.yaml        [0m[01;36mSRR10140517.ctx[0m        [01;34mSRR10140545.ctx.bloom[0m
[01;36mSRR10140498.ctx[0m        [01;34mSRR10140517.ctx.bloom[0m  [01;36mSRR9842706.ctx[0m
[01;34mSRR10140498.ctx.bloom[0m  [01;36mSRR10140545.ctx[0m        [01;34mSRR9842706.ctx.bloom[0m


In [8]:
samples=`for i in *.bloom; do echo -n "-s "; basename $i .ctx.bloom; done`
set -x
bigsi build *.bloom ${samples}
set +x

+ bigsi build SRR10140498.ctx.bloom SRR10140517.ctx.bloom SRR10140545.ctx.bloom SRR9842706.ctx.bloom -s SRR10140498 -s SRR10140517 -s SRR10140545 -s SRR9842706
  config = yaml.load(infile)
INFO:bigsi.cmds.build:Building index: 0/1
DEBUG:bigsi.cmds.build:Loading /home/CSCScience.ca/apetkau/workspace/bigsi-examples/kmers/bigsi/SRR10140498.ctx.bloom/SRR10140498.ctx.bloom 
DEBUG:bigsi.cmds.build:Loading /home/CSCScience.ca/apetkau/workspace/bigsi-examples/kmers/bigsi/SRR10140517.ctx.bloom/SRR10140517.ctx.bloom 
DEBUG:bigsi.cmds.build:Loading /home/CSCScience.ca/apetkau/workspace/bigsi-examples/kmers/bigsi/SRR10140545.ctx.bloom/SRR10140545.ctx.bloom 
DEBUG:bigsi.cmds.build:Loading /home/CSCScience.ca/apetkau/workspace/bigsi-examples/kmers/bigsi/SRR9842706.ctx.bloom/SRR9842706.ctx.bloom 
DEBUG:bigsi.graph.bigsi:Insert sample metadata
DEBUG:bigsi.graph.bigsi:Create signature index
DEBUG:bigsi.graph.index:Transpose bitarrays
DEBUG:bigsi.graph.index:Insert rows
DEBUG:bigsi.storage.base:set bita

In [9]:
bigsi search GTTTCGTTCTTCCGGCGCGGGCGGTCAGCACGTTAACACCACCGACTCCGCTATCCGTATTACCCACTTGCCGACCGGCATCTTGGTGGAATGCCAGGACGAGC

  config = yaml.load(infile)
{'query': 'GTTTCGTTCTTCCGGCGCGGGCGGTCAGCACGTTAACACCACCGACTCCGCTATCCGTATTACCCACTTGCCGACCGGCATCTTGGTGGAATGCCAGGACGAGC', 'threshold': 1.0, 'results': [{'percent_kmers_found': 100.0, 'num_kmers': 74, 'num_kmers_found': 74, 'sample_name': 'SRR10140498'}], 'citation': 'http://dx.doi.org/10.1038/s41587-018-0010-1'}
