BlooMine

A Bloom filter driven read filtering tool for investigating variation of a target locus within read sets.

Quick Install

BlooMine requires python 3.8 or 3.9, cmake and boost. Installing these is pretty straightforward.

Install cmake and boost

sudo apt-get install cmake libboost-all-dev

Compile BlooMine binaries

bash install.sh

Install the BlooMine package

pip install ./

Quick Start

BlooMine ./my_targets.fa ./my_fastq_dir/ -t 10 --suffix _{1,2}.fastq.gz

This runs BlooMine using targets found in my_targets.fa on FASTQ file pairs found in ./my_fastq_dir/ with the pairing suffix _1.fastq.gz and _2.fastq.gz.

BlooMine takes a multifasta containing target read sequences formatted like:

>target_1 | flank1
...TACG...
>target_1 | flank2
...TACG...
>target_2 | flank1
...TACG...
>target_2 | flank2
...TACG...

Sequences must be idenfiable as being flank1 or flank2 of a target sequence.

A directory containing Illumina reads to be screened should be specified, along with a wildcard suffix for pairing read files. For example in the directory

.
└── mydata
    ├── cp1_1.fq.gz
    ├── cp1_2.fq.gz
    ├── cp2_1.fq.gz
    ├── cp2_2.fq.gz
    ├── cp3_1.fq.gz
    └── cp3_2.fq.gz

the suffix would be _{1,2}.fq.gz.

By default, BlooMine runs on physical memory. However, this may require a supercomputing cluster for larger datasets. Consequently, the --on_disk flag partitions read files to temporary files and streams them from disk. This is much slower, but requires substantially less memory.

Usage

usage: BlooMine [-h] [-v] [-k KMER] [-f FALSE_POSITIVE] [-s FP_SIM] [-e SP_ERROR] [-t THREADS]
                [-o OUTDIR] [--suffix SUFFIX] [--on_disk] [--bloomine_exec_bin BLOOMINE_EXEC_BIN]
                target_fasta indir

positional arguments:
  target_fasta          A FASTA/MULTIFASTA file containing target sequence flanks.
  indir                 Directory containing paired FASTQ files to screen for the target sequence.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -k KMER, --kmer KMER  Kmer size. Default=7
  -f FALSE_POSITIVE, --false_positive FALSE_POSITIVE
                        False positive rate for building the Bloom filter. Range=0-1. Default=0.0001
  -s FP_SIM, --FP_sim FP_SIM
                        FP-screen threshold for gene orthology inference as a percentage of kmer array
                        identity. Range=0-100. Default=50.0
  -e SP_ERROR, --SP_error SP_ERROR
                        SP-screen screening error threshold for alignment, where the maximum number of
                        errors to return a read as a hit is 1/n given some n. Default=4.0
  -t THREADS, --threads THREADS
                        Number of threads to use. Must be an even <int>. Default=4.
  -o OUTDIR, --outdir OUTDIR
                        Directory for writing output files. Default=./BlooMine_RUNDIR.
  --suffix SUFFIX       The fastq suffix to use for pairing read files. Default=_{1,2}.fastq.gz
  --on_disk             Write read partitions to disk and stream from them. Low memory usage but
                        slower. Default=False.
  --bloomine_exec_bin BLOOMINE_EXEC_BIN
                        Path to a BlooMine_exec binary for use in this run. By default it will use the
                        executable in $PATH.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
bloomine		bloomine
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README		README
README.md		README.md
dependencies		dependencies
install.sh		install.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BlooMine

Quick Install

Quick Start

Usage

About

Releases 1

Packages

Languages

License

ArthurVM/BlooMine

Folders and files

Latest commit

History

Repository files navigation

BlooMine

Quick Install

Quick Start

Usage

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages