# Experiment: BIGSI query

Now that we've build the BIGSI index, let's try to query for some genes.

Let's setup some variables.

In [1]:
kmer_size="9"
bigsi_dir=bigsi/${kmer_size}
query_out_dir=${bigsi_dir}/queries
query_string="GTGCACACCCAGGTACACACGGCCCGCCTGGTCCACACCGCCGATCTTGACAGCGAGACCCGCCAGGACATCCGTCAGATGGTCACCGGCGCGTTTGCCGGTGACTTCACCGAGACCGACTGGGAGCACACGCTGGGTGGGATGCACGCCCTGATCTGGCATCACGGGGCGATCATCGCGCATGCCGCGGTGATCCAGCGGCGACTGATCTACCGCGGCAACGCGCTGCGCTGCGGGTACGTCGAAGGCGTTGCGGTGCGGGCGGACTGGCGGGGCCAACGCCTGGTGAGCGCGCTGTTGGACGCCGTCGAGCAGGTGATGCGCGGCGCTTACCAGCTCGGAGCGCTCAGTTCCTCGGCGCGGGCCCGCAGACTGTACGCCTCACGCGGCTGGCTGCCCTGGCACGGCCCGACATCGGTACTGGCACCAACCGGTCCAGTCCGTACACCCGATGACGACGGAACGGTGTTCGTCCTGCCCATCGACATCAGCCTGGACACCTCGGCGGAGCTGATGTGCGATTGGCGCGCGGGCGACGTCTGGTAA"

PROJECT_DIR=`git rev-parse --show-toplevel`
cd $PROJECT_DIR

The code given below assumes you have the following [conda](https://docs.conda.io/en/latest/) environments setup to install [bigsi](https://github.com/Phelimb/BIGSI). This can be done with.

```bash
conda create --name bigsi_mccortex bigsi
```

Let's verify these commands exist (and verify versions).

In [2]:
conda run --name bigsi_mccortex bigsi bloom --help 2>&1 | grep 'bigsi-v'

usage: [01;31m[Kbigsi-v[m[K0.3.1 bloom [-h] [-c CONFIG] ctx outfile


Great. Now let's setup a bash function to do our query tests.

In [3]:
bigsi_query() {
    type_dir=$1
    output_dir=$2
    
    mkdir ${output_dir}
    
    max_iter=10
    
    export BIGSI_CONFIG=${type_dir}/${bigsi_dir}/berkelydb.yaml
    
    temp_dir=`mktemp -d`
    for iteration in `seq 1 ${max_iter}`
    do
        iteration_out=${temp_dir}/${iteration}
        command="/usr/bin/time -v bigsi search ${query_string} 2> ${iteration_out} 1> /dev/null"
        #echo ${command}
        conda run --name bigsi_mccortex ${command}
    done
    
    max_rss=`grep -h 'Maximum resident set size (kbytes)' ${temp_dir}/* | 
        sed -e 's/^\s*Maximum resident set size (kbytes): //' |
        sort -n |
        tail -n 1`
        
    user_system_times=()
    for iteration in ${temp_dir}/*
    do
        user_time=`grep 'User time (seconds)' ${iteration} | 
            sed -e 's/^\s*User time (seconds): //'`
        system_time=`grep 'System time (seconds)' ${iteration} | 
            sed -e 's/^\s*System time (seconds): //'`
        user_system_time=`echo "${user_time}+${system_time}" | bc`
        
        user_system_times=`printf "${user_system_times}${user_system_time}\t"`
    done
    
    #printf "${user_system_times}\n"
    time_med=`printf "${user_system_times}" | sed 's/\t$//' | datamash transpose | datamash median 1`
    
    printf "${user_system_times}" | sed 's/\t$//' > ${output_dir}/bigsi-search-all-times-kmer-${kmer_size}.txt
       
    (echo -e "median_time_user_system\titerations\tmax_rss_kbytes"
    echo -e "${time_med}\t${max_iter}\t${max_rss}") | 
        tee ${output_dir}/bigsi-search-time-kmer-${kmer_size}.tsv |
        column -t
}

## Microbial query

In [4]:
data_type_dir="microbial"
bigsi_query "${data_type_dir}" "${data_type_dir}/${query_out_dir}"

mkdir: cannot create directory ‘microbial/bigsi/9/queries’: File exists
median_time_user_system  iterations  max_rss_kbytes
6.755                    10          63888


## Metagenomics query

In [5]:
data_type_dir="metagenomics"
bigsi_query "${data_type_dir}" "${data_type_dir}/${query_out_dir}"

median_time_user_system  iterations  max_rss_kbytes
6.73                     10          63948


## Human query

In [6]:
data_type_dir="human"
bigsi_query "${data_type_dir}" "${data_type_dir}/${query_out_dir}"

median_time_user_system  iterations  max_rss_kbytes
6.86                     10          63896
