Shark

Fast tool for mapping-free gene separation of reads, using Bloom filter.

Dependencies

Shark requires a C++11-compliant compiler and the sdsl-lite v2.1.1 library. For convenience, sdsl-lite is included in the repository.

Download and Installation

To install the tool, run the following steps.

First, clone the repository and move into it.

git clone --recursive https://github.com/AlgoLab/shark.git
cd shark
# If sdsl-lite is not installed run the following commands
# cd sdsl-lite
# ./install.sh ..
# cd ..
make

Usage

Usage: shark -r <references> -1 <sample1> [OPTIONAL ARGUMENTS]

Arguments:
      -r, --reference                   reference sequences in FASTA format (can be gzipped)
      -1, --sample1                     sample in FASTQ (can be gzipped)

Optional arguments:
      -h, --help                        display this help and exit
      -2, --sample2                     second sample in FASTQ (optional, can be gzipped)
      -o, --out1                        first output sample in FASTQ (default: sharked_sample.1)
      -p, --out2                        second output sample in FASTQ (default: sharked_sample.2)
      -k, --kmer-size                   size of the kmers to index (default:17, max:31)
      -c, --confidence                  confidence for associating a read to a gene (default:0.6)
      -b, --bf-size                     bloom filter size in GB (default:1)
      -q, --min-base-quality            minimum base quality (assume FASTQ Illumina 1.8+ Phred scale, default:0, i.e., no filtering)
      -s, --single                      report an association only if a single gene is found
      -t, --threads                     number of threads (default:1)
      -v, --verbose                     verbose mode

Output format

shark outputs to stdout a ssv file reporting associations between reads and genes. The first element of each line is the name of the read whereas the second element is the gene identifier.

Reads in the samples that pass the filter step are stored in the files passed as argument to -o and -p.

Example

A small example is provided in the example directory.

ENSG00000277117.fa is the gene sequence of gene ENSG00000277117 in FASTA format
sample_1.fq and sample_2.fq are a paired-end RNA-Seq sample simulated from genes ENSG00000277117 and ENSG00000275464

To filter out the reads sequenced from gene ENSG00000277117, run shark as follows:

./shark -r example/ENSG00000277117.fa -1 example/sample_1.fq \
                                      -2 example/sample_2.fq \
                                      -o example/sharked.sample_1.fq \
                                      -p example/sharked.sample_2.fq > example/ENSG00000277117.ssv

The results should be equal to: example/ENSG00000277117.truth.ssv, example/sharked.sample_1.truth.fq, and example/sharked.sample_2.truth.fq.

Citation

L. Denti, Y. Pirola, M. Previtali, T. Ceccato, G. Della Vedova, R. Rizzi, P. Bonizzoni,
“Shark: fishing relevant reads in an RNA-Seq sample,”
Bioinformatics, Sep. 2020, doi: 10.1093/bioinformatics/btaa779.

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
example		example
sdsl-lite @ 0546faf		sdsl-lite @ 0546faf
.gitignore		.gitignore
.gitmodules		.gitmodules
BloomfilterFiller.hpp		BloomfilterFiller.hpp
FastaSplitter.hpp		FastaSplitter.hpp
FastqSplitter.hpp		FastqSplitter.hpp
KmerBuilder.hpp		KmerBuilder.hpp
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ReadAnalyzer.hpp		ReadAnalyzer.hpp
ReadOutput.hpp		ReadOutput.hpp
_config.yml		_config.yml
argument_parser.hpp		argument_parser.hpp
bloomfilter.h		bloomfilter.h
common.hpp		common.hpp
kmer_utils.hpp		kmer_utils.hpp
kseq.h		kseq.h
main.cpp		main.cpp
small_vector.hpp		small_vector.hpp
xxhash.hpp		xxhash.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shark

Dependencies

Download and Installation

Usage

Output format

Example

Citation

About

Releases 3

Contributors 4

Languages

License

AlgoLab/shark

Folders and files

Latest commit

History

Repository files navigation

Shark

Dependencies

Download and Installation

Usage

Output format

Example

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Contributors 4

Languages