Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
bin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Build Status License: GPL v3

FALCON

FALCON is an ultra-fast method to infer metagenomic composition of sequenced reads. FALCON measures similarity between any FASTQ file (or FASTA), independently from the size, against any multi-FASTA database, such as the entire set of complete genomes from the NCBI. FALCON supports single reads, paired-end reads, and compositions of both. It has been tested in many plataforms, such as Illumina MySeq, HiSeq, Novaseq, IonTorrent.

FALCON is efficient to detect the presence and authenticate a given species in the FASTQ reads. The core of the method is based on relative data compression. FALCON uses variable multi-threading, without multiplying the memory for each thread, being able to run efficiently in a common laptop.

The tool is also able to identify locally where, in each reference sequence, the similarity occur. FALCON provides programs to filter the local results (FALCON-filter) and to visualize the results (FALCON-filter-visual). Also for database inter-similarity analysis (FALCON-inter) and respective visualization (FALCON-inter-visual).

Find additional information here.


Install and Demo Video


1. Installation

1.1 Automatic installation with Conda

conda install -c cobilab falcon --yes

1.2 Manual installation

git clone https://github.com/cobilab/falcon.git
cd falcon/src/
cmake .
make
cp FALCON ../../
cp FALCON-filter ../../
cp FALCON-filter-visual ../../
cp FALCON-inter  ../../
cp FALCON-inter-visual ../../
cd ../../

Cmake is needed for installation.

2. Demo

Search for the top 15 similar viruses in sample reads that we provide in folder test:

cd test
gunzip reads.fq.gz
gunzip VDB.fa.gz
./FALCON -v -F -t 15 -l 47 -x top.txt reads.fq VDB.fa

It will identify Zaire Ebolavirus in the samples (top.txt) according to the following image

Top

3. Building a reference database

3.1 Build the latest NCBI viral database

An example of building a reference database from NCBI:

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/assembly_summary.txt
awk -F '\t' '{if($12=="Complete Genome") print $20}' assembly_summary.txt > ASCG.txt
mkdir -p GB_DB_VIRAL
mkdir -p GB_DB_VIRAL_CDS
mkdir -p GB_DB_VIRAL_RNA
cat ASCG.txt | xargs -I{} -n1 -P8 wget -P GB_DB_VIRAL {}/*_genomic.fna.gz
mv GB_DB_VIRAL/*_cds_from_genomic.fna.gz GB_DB_VIRAL_CDS/
mv GB_DB_VIRAL/*_rna_from_genomic.fna.gz GB_DB_VIRAL_RNA/
zcat GB_DB_VIRAL/*.fna.g > VDB.fa

For building reference databases for multiple domains/kingdoms (bacterial, fungi, protozoa, plant, etc), use:

https://raw.githubusercontent.com/cobilab/gto/master/scripts/gto_build_dbs.sh

3.2 Download an existing database

An already reference viral database is available here. With this example, you only need to uncompress it, namely through: gunzip VDB.fa.gz, and use it in FALCON along with the FASTQ reads.

4. Usage

The FALCON package includes the following binaries:

  • FALCON: metagenomic composition analysis;
  • FALCON-filter: local interations - localization;
  • FALCON-filter-visual: visualization of global and local similarities;
  • FALCON-inter: inter-similarity between database genomes;
  • FALCON-inter-visual: visualization of inter-similarities.

4.1 Metagenomic composition analysis

To see the possible options of FALCON type

./FALCON

or

./FALCON -h

These will print the following options:

Non-mandatory arguments:                                                 
                                                                         
  -h                       give this help,                               
  -F                       force mode (overwrites top file),             
  -V                       display version number,                       
  -v                       verbose mode (more information),              
  -Z                       database local similarity,                    
  -s                       show compression levels,                      
                                                                         
  -l <level>               compression level [1;47],                    
  -p <sample>              subsampling (default: 1),                    
  -t <top>                 top of similarity (default: 20),              
  -n <nThreads>            number of threads (default: 2),              
                                                                         
  -x <FILE>                similarity top filename,                      
  -y <FILE>                local similarities filename,                  
                                                                         
Mandatory arguments:                                                     

  [FILE1]:[FILE2]:...      metagenomic filename (FASTQ),                 
                           Use ":" for splitting files.                
                                                                         
  [FILE]                   database filename (Multi-FASTA).   

4.2 Local detection

For local interactions detection and visualization the FALCON package provides FALCON-filter and FALCON-filter-visual.

4.2.1 Filtering

To see the possible options of FALCON-filter type

./FALCON-filter

or

./FALCON-filter -h

These will print the following options:

Non-mandatory arguments:                                                 
                                                                         
  -h                       give this help,                               
  -F                       force mode (overwrites top file),             
  -V                       display version number,                       
  -v                       verbose mode (more information),              
  -s  <size>               filter window size,                           
  -w  <type>               filter window type,                           
  -x  <sampling>           filter window sampling,                       
  -sl <lower>              similarity lower bound,                       
  -su <upper>              similarity upper bound,                       
  -dl <lower>              size lower bound,                             
  -du <upper>              size upper bound,                             
  -t  <threshold>          threshold [0;2.0],                            
  -o  <FILE>               output filename,                              
                                                                         
Mandatory arguments:                                                     
                                                                         
  [FILE]                   profile filename (from FALCON).

4.2.2 Visualization

To see the possible options of FALCON-filter-visual type

./FALCON-filter-visual

or

./FALCON-filter-visual -h

These will print the following options:

Non-mandatory arguments:                                                 
                                                                         
  -h                       give this help,                               
  -F                       force mode (overwrites top file),             
  -V                       display version number,                       
  -v                       verbose mode (more information),              
  -w  <width>              square width (for each value),                
  -s  <ispace>             square inter-space (between each value),      
  -i  <indexs>             color index start,                            
  -r  <indexr>             color index rotations,                        
  -u  <hue>                color hue,                                    
  -sl <lower>              similarity lower bound,                       
  -su <upper>              similarity upper bound,                       
  -dl <lower>              size lower bound,                             
  -du <upper>              size upper bound,                             
  -bg                      show only the best of group,                  
  -g  <color>              color gamma,                                  
  -e  <size>               enlarge painted regions,                      
  -ss                      do NOT show global scale,                     
  -sn                      do NOT show names,                            
  -o  <FILE>               output image filename,                        
                                                                         
Mandatory arguments:                                                     
                                                                         
  [FILE]                   profile filename (from FALCON-filter).

4.3 Database inter-similarity

4.3.1 Mapping inter-similarity

To see the possible options of FALCON-inter type

./FALCON-inter

or

./FALCON-inter -h

These will print the following options:

Non-mandatory arguments:                                           
                                                                         
  -h                   give this help,                               
  -V                   display version number,                       
  -v                   verbose mode (more information),              
  -s                   how compression levels,                       
  -l <level>           compression level [1;30],                     
  -n <nThreads>        number of threads,                            
  -x <FILE>            similarity matrix filename,                   
  -o <FILE>            labels filename,                              
                                                                         
Mandatory arguments:                                               
                                                                         
  <FILE>:<FILE>:<...>  input files (last arguments).                 
                       Use ":" for file splitting.                 

4.3.2 Inter-similarity visualization

To see the possible options of FALCON-inter type

./FALCON-inter-visual

or

./FALCON-inter-visual -h

These will print the following options:

Non-mandatory arguments:                                           
                                                                         
  -h             give this help,                                     
  -V             display version number,                             
  -v             verbose mode (more information),                    
  -w             square width (for each value),                      
  -a             square inter-space (between each value),            
  -s             index color start,                                  
  -r             index color rotations,                              
  -u             color hue,                                          
  -g             color gamma,                                        
  -l <FILE>      labels filename,                                    
  -x <FILE>      heatmap filename,                                   
                                                                       
Mandatory arguments:                                               
                                                                         
  <FILE>         input matrix file (from FALCON-inter).           

Find additional information here.

5. Common use

Create the following bash script:

#!/bin/bash
./FALCON -v -n 4 -t 200 -F -Z -l 47 -c 20 -y complexity.com $1 $2
./FALCON-filter -v -F -t 0.5 -o positions.pos complexity.com
./FALCON-filter-visual -v -F -o draw.map positions.pos

Name it FALCON-meta.sh and give run access

chmod +x FALCON-meta.sh

The, run FALCON:

./FALCON-meta.sh reads1.fastq:reads2.fastq VDB.fa

reads1.fastq, reads2.fastq and VDB.fa is only an example. For more, see folder examples.

6. Citation

On using this software/method please cite:

D. Pratas, M. Hosseini, G. Grilo, A. J. Pinho, R. M. Silva, T. Caetano, J. Carneiro, F. Pereira. (2018).
Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard. 
Genes, 9(9), 445.

Bibtex:

@article{Pratas-2018a,
  title={Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard},
  author={D. Pratas, M. Hosseini, G. Grilo, A. J. Pinho, R. M. Silva, T. Caetano, J. Carneiro, F. Pereira},
  journal={Genes},
  volume={9},
  number={9},
  pages={445},
  year={2018}
}

In addition, we would like to list your publications that use our software on our website. Please email the reference, the name of your lab, department, and institution to co.bi.laboratory@gmail.com

7. Issues

For any issue let us know at issues link.

8. License

GPL v3.

For more information see LICENSE file or visit

http://www.gnu.org/licenses/gpl-3.0.html