Skip to content
ivartb edited this page Dec 25, 2023 · 10 revisions

Pre-built binaries

Pre-built Linux and Mac binaries are available from Releases page.

Prerequisites (for building from the source code)

  • Qt 6
  • CMake
  • C++17-compliant compiler

Building from source

mkdir build
cd build
cmake ..
make

New features:

Contigs rotation

Sometimes the graph layout in Bandage is not perfect and manual correction can be useful. In addition to standard nodes' movement, the ability for nodes' rotation has been implemented. To rotate contig you should hold down the right mouse button and move one of the ends of the contig. video_3

Hi-C links visualization

Hi-C links between different contigs can be visualized on the de Bruijn graph. Hi-C links are drawn as dotted lines connecting the midpoints of contigs.

Load Hi-C metadata

To load Hi-C metadata in Bandage you need to choose "Load Hi-C data" item in menu "File". You can load file with Hi-C data only after loading de Bruijn graph.

image

Each row of the file should contain three fields, separated by '\t': IDs of two connected nodes and the weight (number of Hi-C links). First row should contains name of columns.

Below is an example of a Hi-C metadata TXT file:

v1	v2	hic_w
1268598	831795	6516
1072702	831795	5454
1268598	524477	1548

Draw graph with Hi-C links

To draw de Bruijn graph with Hi-C links you should click on the "Draw graph" button after loading Hi-C metadata.

image

Filter Hi-C links

You can fit and choose filter settings and click on the "Draw graph" button to draw graph after changing parameters of Hi-C links visualization.

image

  • You can choose minimum Hi-C weight, thus Hi-C links with weight less than min weight will not be visualized.

  • You can choose minimum length of contig's sequence. Hi-C links connecting shorter contigs will not be shown.

  • You can choose filter of Hi-C links display:

    • All edges - All Hi-C links will be shown.

    image

    • All edges link groups - All Hi-C links connecting contigs from different graph's connected components will be shown.

    image

    • One edge links groups - Only one Hi-C link between different graph's connected components will be shown.

    image

Predictive model visualization

The ability to visualize RandomForest, AdaBoost or Gradient Boosted Decision Trees machine learning models was implemented. If you use Decision Trees models based on features extracted from metagenomic dataset (for example, features extracted by MetaFX), then it is possible to simultaneously visualize predictive model and de Bruijn graph in BandageNG. Also, the implementation of BandageNG supports mapping features, used in predictive model, on the nodes (contigs) in de Bruijn graph.

Load ML forest model

To load ML model in Bandage you need to choose "Load features forest" item in menu "File".

otchet_2_1

Input file format

All trees should be described in one TXT file. All tree nodes in Forest model should have unique ID. All data should be separated by tab symbol ("\t"). Every row starts with one of special symbols: N, F, C or S.

There are four types of rows in an input file:

Row format Description Example
N <Node ID> [<Left child ID>] [<Right child ID>] Row describes tree node and contains node ID, and IDs of children for inner node N 1 2 3
F <Node ID> <Feature ID> <Threshold> Row describes feature and contains node ID, feature ID and threshold value (float) used to split node into children F 1 f_1 0.25
C <Node ID> <Class> Row describes node's class and contains node ID and class of leaf (for leaves) or class of feature (for inner nodes) C 1 NonIBD
S <Fetaure ID> <Sequence> Row describe one nucleotide sequence of feature and contains feature ID and nucleotide sequence S 1 GGAGCG

Some properties:

  • Every tree node should have only one row with prefix "N" and "C" in input file.
  • Every inner tree node should have only one row with prefix "F" in input file.
  • Every feature can have one or multiple rows with prefix "S" in input file. If feature's nucleotide sequences are unknown, it cannot be matched with contig in de Bruijn graph.
Input file generation

You can write TXT file by yourself, or you can use build_model_for_bandage.py script to generate it.

To run script you should provide the following parameters:

Parameter Description
--model-file Joblib dump trained model. Support RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier
--res-file File name to save output result
--source-dir Directory that contains FASTA files {sourceDir}/contigs_<category>/components.seq.fasta for every class in ML model with nucleotide sequences of features. Sequences name should be the same as feature ID

build_model_for_bandage.py --model-file <RandomForest.joblib> --res-file <RandomForestModel.txt> --source-dir <source-dir>

Mapping forest model on De Bruijn graph

Mapping forest model on De Bruijn graph is implemented based on colour schema: tree node and part of contig in de Bruijn graph have the same colour when nucleotide sequences of tree node map on the part of contig. To synchronize classification model and De Bruijn graph you should click on the "Map features to De Bruijn graph" button.

otchet_2_5

Information about tree nodes

  • This implementation allows visualizing labels for every tree nodes: node ID, class or custom notes.
  • Choosing one node, you can obtain all information from forest on the right panel (ID, splitting rule description, class (class of feature or class of leaf) and set of nucleotide sequences).

otchet_2_13

Colour schemas

This implementation allows to select one of the colour schemas:

  1. Uniform colour – all tree nodes have one uniform colour.
  2. Class colour – tree nodes are coloured according to their classes. Nodes with one class have the same colour and nodes with different classes have different colours.
  3. BLAST hits (solid) – could be used only after mapping features on de Bruijn graph. Coloured tree nodes and matched parts of contigs in the same random colours.
  4. BLAST hits (class colours) – could be used only after mapping features to de Bruijn graph. Coloured tree nodes and matched parts of contigs colored according to tree node classes.

otchet_2_4

Multigraph mode

This implementation allows visualisation of several de Bruijn graphs (from different files) on one screen simultaneously.

Load many graphs

To load multiple graphs in Bandage you need to choose "Load graphs from dir" item in menu "File". All graph's files from selected directory will be loaded recursively. Only files with *.gfa and *.fastg extensions will be added.

image

Draw multigraph

To draw de Bruijn graphs you should click on the "Draw graph" button after loading graphs. All graphs will be presented on the one screen. Every graph will be named using its relative path in the folder. The different graphs can contain nodes with the same names, so a prefix with the graph ID is added to the name of all nodes. Every graph has random unique ID. Also new names of nodes can be used in gfa or fasta files generated in BandageNG app to save selected part of graphs.

image

CSV data

CSV metadata can be used to visualize taxonomic annotation of graph nodes. In this case you can use CSV table with columns: Superkingdom, Phylum, Class, Order, Family, Genus, Species, Serotype and Strains. This table can be filled with annotation data from Kraken2 output.

Perform the following steps:

  1. Run taxonomy classification by Kraken2 with names in output.
kraken2 --threads 8 --use-names --db ./kraken2/k2_standart ./components.seq.fasta > kraken_class.txt
  1. Transform output from kraken2 to CSV file by custom python script tax_to_csv.py. To run this script you should use python3 and ete3 (pip install ete3) library:
py tax_to_csv.py --class-file=kraken_class.txt --res-file=graph.csv

Load CSV data (Single graph):

To load CSV data you need to choose "Load CSV data" item in menu "File".

Load CSV data (Multiple graphs):

  • CSV data will be loaded automatically with multiple graphs if name of csv file is equal with name of graph. For example, if in the folder A there are three files: graph_1.gfa, graph_2.gfa and graph_2.csv then csv metadata will be applied to graph_2.
  • Also you can loaded one CSV file for all graphs, but in this case all nodes in all graphs should be unique. To do it you should choose point "Load CSV data" in menu "File".

image