Skip to content

Testing your own isolates against our model

Nicole Wheeler edited this page Apr 26, 2023 · 9 revisions

There are two approaches you can use to run the model on your own isolates, located in the directories fastq_pipeline and assembly_pipeline. The fastq pipeline is recommended because the choice of assembly method can impact the results.

All of the code you will need to run this analysis is located in the appropriate directory. The pipelines require the following to run:

  • HMMER3
  • bowtie2
  • bcftools
  • samtools
  • fastaq (only required for the assembly pipeline) (pip3 install pyfastaq)
  • BBMap (can be installed with Bioconda)
  • R v. 4.1 or above
  • R package randomForest
  • R package ROCR (conda install -c bioconda r-rocr)
  • R package DMwR, can be installed using: library(devtools); remotes::install_github("cran/DMwR")

The fastq pipeline assumes a suffix of _1.fastq and _2.fastq for paired-end reads for each sample.

To run the pipeline on multiple samples, create a file with the path to your read files, excluding the suffixes.

cat samples.txt | while read i; do ./run_invasiveness_index.sh $i; done

Where samples.txt may contain:

path_to_samples/sampleID1
path_to_samples/sampleID2
path_to_samples/sampleID3

Clone this wiki locally