# Experiment: HowDeSBT query

Now that we've build the HowDeSBT index, let's try to query for some genes.

Let's setup some variables.

In [1]:
kmer_size="17"
howdesbt_dir=howdesbt/${kmer_size}
query_out_dir=queries

PROJECT_DIR=`git rev-parse --show-toplevel`
cd $PROJECT_DIR

query_in_dir=${PROJECT_DIR}/queries
query_file=${query_in_dir}/query.fasta

The code given below assumes you have the following [conda](https://docs.conda.io/en/latest/) environments setup to install [howdesbt](https://github.com/medvedevgroup/HowDeSBT). This can be done with.

```bash
conda create --name howdesbt howdesbt
```

Let's verify these commands exist (and verify versions).

In [2]:
conda run --name howdesbt howdesbt --version

version 2.00.02 20191014


Great. Now let's setup a bash function to do our query tests.

In [3]:
howdesbt_query() {
    type_dir=$1
    output_dir=$2
    
    # Reset ourselves back to main directory
    cd ${PROJECT_DIR}
    cd ${type_dir}/${howdesbt_dir}
    pwd
    
    mkdir ${output_dir}
    
    max_iter=10
        
    temp_dir=`mktemp -d`
    for iteration in `seq 1 ${max_iter}`
    do
        iteration_out=${temp_dir}/${iteration}
        command="/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt ${query_file} \
            2> ${iteration_out} 1> /dev/null"
        echo ${command}
        conda run --name howdesbt ${command}
    done
    
    max_rss=`grep -h 'Maximum resident set size (kbytes)' ${temp_dir}/* | 
        sed -e 's/^\s*Maximum resident set size (kbytes): //' |
        sort -n |
        tail -n 1`
        
    user_system_times=()
    for iteration in ${temp_dir}/*
    do
        user_time=`grep 'User time (seconds)' ${iteration} | 
            sed -e 's/^\s*User time (seconds): //'`
        system_time=`grep 'System time (seconds)' ${iteration} | 
            sed -e 's/^\s*System time (seconds): //'`
        user_system_time=`echo "${user_time}+${system_time}" | bc`
        
        user_system_times=`printf "${user_system_times}${user_system_time}\t"`
    done
    
    #printf "${user_system_times}\n"
    time_med=`printf "${user_system_times}" | sed 's/\t$//' | datamash transpose | datamash median 1`
    
    printf "${user_system_times}" | sed 's/\t$//' > ${output_dir}/howdesbt-search-all-times-kmer-${kmer_size}.txt
       
    (echo -e "median_time_user_system\titerations\tmax_rss_kbytes"
    echo -e "${time_med}\t${max_iter}\t${max_rss}") | 
        tee ${output_dir}/howdesbt-search-time-kmer-${kmer_size}.tsv |
        column -t
        
    cd ${PROJECT_DIR}
}

## Microbial query

Let's now run our search for the microbial dataset, measuring runtime (median) and RSS (max) out of a number of iterations.

In [4]:
cd ${PROJECT_DIR}
data_type_dir="microbial"
howdesbt_query "${data_type_dir}" "${query_out_dir}"

/home/CSCScience.ca/apetkau/workspace/comp7934-project/microbial/howdesbt/17
mkdir: cannot create directory ‘queries’: File exists
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.odGzPtLQQP/1 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.odGzPtLQQP/2 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.odGzPtLQQP/3 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.odGzPtLQQP/4 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-

## Metagenomics query

In [5]:
cd ${PROJECT_DIR}
data_type_dir="metagenomics"
howdesbt_query "${data_type_dir}" "${query_out_dir}"

/home/CSCScience.ca/apetkau/workspace/comp7934-project/metagenomics/howdesbt/17
mkdir: cannot create directory ‘queries’: File exists
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.PZB0tlgTMf/1 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.PZB0tlgTMf/2 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.PZB0tlgTMf/3 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.PZB0tlgTMf/4 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp79

## Human query

In [6]:
cd ${PROJECT_DIR}
data_type_dir="human"
howdesbt_query "${data_type_dir}" "${query_out_dir}"

/home/CSCScience.ca/apetkau/workspace/comp7934-project/human/howdesbt/17
mkdir: cannot create directory ‘queries’: File exists
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.vLmRgkMkvS/1 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.vLmRgkMkvS/2 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.vLmRgkMkvS/3 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-project/queries/query.fasta 2> /tmp/tmp.vLmRgkMkvS/4 1> /dev/null
/usr/bin/time -v howdesbt query --threshold=1.0 --tree=howdesbt.build.sbt /home/CSCScience.ca/apetkau/workspace/comp7934-proj

Awesome. We've gotten all the information about querying.