# Results Tables

The success of an assembly is evaluated by the computation of metrics in two defined ways: globally through statistics inherent to the complete set of sequences that were assembled, and relative to the replicons present in the sample. 

The computation of the global metrics is done through custom python code from the complete set of sequences assembled. 

The following metrics are computed for the complete and filtered set of assembled sequences, restricted to contigs of length above a specified minimum size: 

- **Contig sizes**
    - **Contigs:** The total number of contigs in the assembly;
    - **Basepairs:** The total number of bases in the assembly;
    - **Maximum sequence length:** The length of the largest contig in the assembly.
    - **Number of ‘N’s:** Number of uncalled bases (N's) 
- **Contiguity**
    - **Nx (where 0  < x  ⩽ 100):** Length for which the collection of all assembled sequences of that length or longer in an assembly covers at least a given percentage of the total length of the assembly
- **Misassembly**
    - **Misassemblies** - Number of aligned contigs that contain a misassembly event

After filtering, the sequences are mapped with the reference replicons and the metrics are computed through custom python code. The metrics are calculated, filtered for a minimum length, for each replicon in the file provided input references. 

- **Contig sizes**
    - **Contigs:** The total number of contigs in the assembly;
    - **Basepairs:** The total number of bases in the assembly;
    - **Number of ‘N’s:** Number of uncalled bases (N's) 
- **COMPASS**
    - **(Breadth of) Coverage:** Ratio of covered sequence on the reference by aligned contigs;
    - **Multiplicity:** Ratio of the length of alignable assembled sequence to covered sequence on the reference;
    - **Validity:** Ratio of the length of the alignable assembled sequence to total basepairs in the aligned contigs;
    - **Parsimony:** Cost of the assembly (multiplicity over validity);
- **Contiguity**
    - **Contiguity:** longest single alignment between the assembly and the reference, relative to the reference length;
    - **NAx (where 0  < x  ⩽ 100):** Length for which the collection of aligned assembled sequences of that length or longer in an assembly covers at least a given percentage of the total length of the reference replicon;
    - **NGx (where 0  < x  ⩽ 100):** Length for which the collection of aligned contigs of that length or longer covers at least a given percentage of the sequence of the reference.
    - **Lx (where 0  < x  ⩽ 100):** Minimal number of contigs that cover x % of the sequence of the reference;
- **Identity**
    - **Identity:** Ratio of identical basepairs in all aligned contigs to the reference;
    - **Lowest identity:** Identity of the lowest scoring contig to the reference.
    - **PLS**: Phred-like score per contig, per assembler.
- **Misassembly**
    - **Misassemblies** - Number of aligned contigs that contain a misassembly event


## Imports

In [1]:
import sys
from plotly.offline import plot
import glob
import fnmatch
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
import pandas as pd
from itertools import groupby
import csv
import numpy as np

## Global variables

In [2]:
METRICS_COLUMNS_GLOBAL = ['run','sample','assembler', 'contigs', 'basepairs', 'max_contig', 'Ns', 'n50', 
                          'misassemblies', 'mapped_reads', 'filtered_contigs', 'filtered_basepairs',
                          'filtered_n50', 'filtered_Ns', 'type', 'algorythm']
METRICS_COLUMNS_REFERENCE = ['run','sample','assembler', 'reference', 'contiguity', 'breadth_of_coverage', 'multiplicity', 'validity', 
                             'parsimony', 'identity', 'lowest_identity', 'L90', 'contigs', 'NA50', 'NG50', 'basepairs', 'Ns',
                             'misassemblies', 'type', 'algorythm']

ASSEMBLER_PROCESS_LIST = ["BCALM2", "GATBMINIAPIPELINE", "MINIA", "MEGAHIT", "METASPADES", "UNICYCLER", "SPADES",
                          "SKESA", "VELVETOPTIMIZER", "IDBA"]
PROCESS_TO_NAME = {"BCALM2": "BCALM2", 
                   "GATBMINIAPIPELINE": "GATBMiniaPipeline",
                   "MINIA": "MINIA",
                   "MEGAHIT": "MEGAHIT", 
                   "METASPADES": "metaSPAdes", 
                   "UNICYCLER": "Unicycler", 
                   "SPADES": "SPAdes",
                    "SKESA": "SKESA",
                   "VELVETOPTIMIZER": "VelvetOptimizer",
                   "IDBA": "IDBA-UD"}

log_distributed = ['LHS', 'LNN', 'ERR2935805']
even_distribution = ['EMS', 'LNN', 'ERR2984773']

genomic_assemblers = ['BCALM2', 'MINIA', 'SKESA', 'SPAdes', 'Unicycler', 'VelvetOptimizer']
metagenomic_assemblers = ['GATBMiniaPipeline', 'IDBA-UD', 'MEGAHIT', 'metaSPAdes']
single_kmer = ['BCALM2', 'MINIA']
multiple_kmer = ['SKESA', 'SPAdes', 'Unicycler', 'VelvetOptimizer', 'GATBMiniaPipeline', 
                 'IDBA-UD', 'MEGAHIT', 'metaSPAdes']

## Global metrics

### Load data

In [3]:
report_glob = glob.glob('../Results/*/report/pipeline_report_tables.json')
global_pipeline_metrics_df = pd.DataFrame(columns=METRICS_COLUMNS_GLOBAL)

for pipeline_report_file in report_glob:
    report_file_name = pipeline_report_file.split('/')[-1]
    stats_run = pipeline_report_file.split('/')[-3]
    print('Processing {0} data from {1}...'.format(report_file_name, stats_run))
    
    with open(pipeline_report_file) as _fh:
        json_report = json.load(_fh)
        for sample in json_report.keys():
            for line in json_report[sample]['GlobalTable']:
                assembler = line['assembler']
                global_pipeline_metrics_df = global_pipeline_metrics_df.append({'run': stats_run,
                                                                                'sample': sample,
                                                                                'assembler': line['assembler'],
                                                                                'contigs': int(line['original']['contigs']),
                                                                                'basepairs': int(line['original']['basepairs']),
                                                                                'max_contig': int(line['original']['max_contig_size']),
                                                                                'n50': int(line['original']['N50']),
                                                                                'mapped_reads': line['original']['mapped_reads'],
                                                                                'Ns': int(line['original']['Ns']),
                                                                                'misassemblies': line['filtered']['misassembled_contigs'],
                                                                                'filtered_contigs': line['filtered']['contigs'],
                                                                                'filtered_basepairs': line['filtered']['basepairs'],
                                                                                'filtered_n50': line['filtered']['N50'],
                                                                                'filtered_Ns': line['filtered']['Ns']
                                                                               },
                                                                               ignore_index=True)
global_pipeline_metrics_df['distribution'] = np.where(global_pipeline_metrics_df['sample'].isin(log_distributed), 'Log', 'Even')
global_pipeline_metrics_df['type'] = np.where(global_pipeline_metrics_df['assembler'].isin(genomic_assemblers), 'Genomic', 'Metagenomic')
global_pipeline_metrics_df['algorythm'] = np.where(global_pipeline_metrics_df['assembler'].isin(single_kmer), 'Single k-mer De Bruijn graph', 'Multiple k-mer De Bruijn graph')
global_pipeline_metrics_df[['contigs','basepairs','max_contig', 'Ns','n50','misassemblies']] = global_pipeline_metrics_df[['contigs','basepairs','max_contig', 'Ns','n50','misassemblies']].apply(pd.to_numeric)

Processing pipeline_report_tables.json data from run3...
Processing pipeline_report_tables.json data from run2...


### Preview data

In [4]:
global_pipeline_metrics_df.dtypes

run                    object
sample                 object
assembler              object
contigs                 int64
basepairs               int64
max_contig              int64
Ns                      int64
n50                     int64
misassemblies           int64
mapped_reads          float64
filtered_contigs       object
filtered_basepairs     object
filtered_n50           object
filtered_Ns            object
type                   object
algorythm              object
distribution           object
dtype: object

In [5]:
global_pipeline_metrics_df

Unnamed: 0,run,sample,assembler,contigs,basepairs,max_contig,Ns,n50,misassemblies,mapped_reads,filtered_contigs,filtered_basepairs,filtered_n50,filtered_Ns,type,algorythm,distribution
0,run3,ENN,BCALM2,25258,31040696,161812,0,17766,0,96.832836,3542,28188554,21029,0,Genomic,Single k-mer De Bruijn graph,Even
1,run3,ENN,GATBMiniaPipeline,1008,30445418,632215,0,106963,0,99.480265,727,30282639,106963,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
2,run3,ENN,IDBA-UD,1825,30390287,468605,0,61778,12,99.414283,1279,30136999,62331,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
3,run3,ENN,MEGAHIT,810,30541296,632274,0,170162,9,99.731417,517,30415305,171359,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
4,run3,ENN,metaSPAdes,633,30448137,1036942,0,234919,1,99.596146,350,30376419,234919,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
115,run2,LHS,MINIA,24652,15334650,163543,0,7727,0,91.099517,2171,12920930,10582,0,Genomic,Single k-mer De Bruijn graph,Log
116,run2,LHS,SKESA,36,2957060,632013,0,351027,0,92.228441,21,2950393,351027,0,Genomic,Multiple k-mer De Bruijn graph,Log
117,run2,LHS,SPAdes,31225,16224766,263645,0,46478,1,95.503607,1526,13088051,69361,0,Genomic,Multiple k-mer De Bruijn graph,Log
118,run2,LHS,Unicycler,276,13345058,632062,0,187479,3,96.948570,247,13333597,187479,0,Genomic,Multiple k-mer De Bruijn graph,Log


### Save data

In [6]:
global_pipeline_metrics_df.to_csv("Tables/Results/Global metrics.csv")

### Global statistics per assembler type

In [7]:
stats_per_assembler_type = pd.DataFrame()
for assembly_type in global_pipeline_metrics_df.type.unique():
    row={'Assembler Type': assembly_type}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.type == assembly_type].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        std = lala.loc['std',column]
        row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
    stats_per_assembler_type=stats_per_assembler_type.append(row, ignore_index=True)
stats_per_assembler_type = stats_per_assembler_type.set_index('Assembler Type')
stats_per_assembler_type.to_csv("Tables/Results/Global metrics per assembler type.csv")

In [8]:
stats_per_assembler_type

Unnamed: 0_level_0,Ns,basepairs,contigs,mapped_reads,max_contig,misassemblies,n50
Assembler Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Genomic,632.49(±2696.83),45049830.26(±77416335.87),584820.46(±1927506.27),74.63(±36.42),407391.93(±369952.0),18.17(±94.55),75438.93(±96288.64)
Metagenomic,0.0(±0.0),25547301.35(±10011582.38),8085.02(±13528.83),94.46(±6.1),706300.71(±216738.75),4.06(±4.6),116760.52(±64826.32)


### Global stats per assembly algorithm

In [9]:
stats_per_assembler_algo = pd.DataFrame()
for assembly_algo in global_pipeline_metrics_df.algorythm.unique():
    row={'Assembler Algorithm': assembly_algo}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.algorythm == assembly_algo].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        std = lala.loc['std',column]
        row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
    stats_per_assembler_algo=stats_per_assembler_algo.append(row, ignore_index=True)
stats_per_assembler_algo = stats_per_assembler_algo.set_index('Assembler Algorithm')
stats_per_assembler_algo.to_csv("Tables/Results/Global metrics per assembler algorithm.csv")

In [10]:
stats_per_assembler_algo

Unnamed: 0_level_0,Ns,basepairs,contigs,mapped_reads,max_contig,misassemblies,n50
Assembler Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Single k-mer De Bruijn graph,0.0(±0.0),91615808.0(±121102826.02),1691066.88(±3086055.78),68.91(±38.55),108171.83(±74665.4),0.33(±0.76),10206.33(±9056.65)
Multiple k-mer De Bruijn graph,474.36(±2347.62),23657071.38(±12649027.02),19891.14(±86129.94),85.97(±26.66),631651.34(±309648.34),15.57(±81.88),112407.88(±85994.72)


### Global stats per assembler

In [11]:
stats_per_assembler = pd.DataFrame()
for assembler in global_pipeline_metrics_df.assembler.unique():
    row={'Assembler': assembler, 
         'Type':global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].type.unique()[0],
         'Algorithm': global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        std = lala.loc['std',column]
        row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
    stats_per_assembler=stats_per_assembler.append(row, ignore_index=True)
stats_per_assembler = stats_per_assembler.set_index('Assembler')
stats_per_assembler.to_csv("Tables/Results/Global metrics per assembler.csv")

In [12]:
stats_per_assembler

Unnamed: 0_level_0,Algorithm,Ns,Type,basepairs,contigs,mapped_reads,max_contig,misassemblies,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),Genomic,157403460.17(±145361404.71),3352423.83(±3727041.83),46.33(±44.2),58728.17(±77129.19),0.0(±0.0),6220.67(±8873.1)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,23074937.42(±8661768.88),2629.42(±2589.73),94.65(±6.52),640091.33(±68954.44),1.92(±2.07),102424.0(±26843.99)
IDBA-UD,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,25378779.83(±9698785.87),6832.17(±7856.47),93.22(±6.21),515095.83(±121784.53),6.17(±4.73),67858.42(±30786.39)
MEGAHIT,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,25840461.5(±10055256.9),6528.5(±8264.11),94.98(±6.19),847862.5(±228686.73),6.33(±6.49),147830.83(±47776.33)
metaSPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,27895026.67(±12052231.75),16350.0(±23107.65),94.97(±6.09),822153.17(±218316.11),1.83(±0.94),148928.83(±94815.17)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),Genomic,25828155.83(±9600619.33),29709.92(±19566.17),91.49(±6.43),157615.5(±19336.61),0.67(±0.98),14192.0(±7623.07)
SKESA,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,18472455.67(±12490360.65),3671.5(±4090.4),90.6(±8.18),390219.83(±263956.92),3.33(±4.46),101495.67(±123485.57)
SPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,28138604.33(±10873664.65),19436.5(±17375.62),94.21(±6.49),697457.83(±218426.88),98.0(±221.9),107840.17(±85499.81)
Unicycler,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,21030768.0(±9911805.52),531.33(±330.3),94.07(±6.84),857592.0(±234205.23),1.33(±1.44),212762.33(±22226.12)
VelvetOptimizer,Multiple k-mer De Bruijn graph,3794.92(±5818.15),Genomic,19425537.58(±22071793.34),103149.67(±232595.04),31.06(±45.27),282738.25(±417128.51),5.67(±8.39),10122.75(±14919.83)


### Global stats per distribution

In [13]:
for distribution in global_pipeline_metrics_df.distribution.unique():
    stats_per_distribution = pd.DataFrame()
    for assembler in global_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':global_pipeline_metrics_df[(global_pipeline_metrics_df.assembler == assembler) & (global_pipeline_metrics_df.distribution == distribution)].type.unique()[0],
             'Algorithm': global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=global_pipeline_metrics_df[(global_pipeline_metrics_df.assembler == assembler) & (global_pipeline_metrics_df.distribution == distribution)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            std = lala.loc['std',column]
            row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
        stats_per_distribution=stats_per_distribution.append(row, ignore_index=True)
    stats_per_distribution = stats_per_distribution.set_index('Assembler')
    stats_per_distribution.to_csv("Tables/Results/{} - Global metrics per assembler.csv".format(distribution))
    print(distribution)
    display(stats_per_distribution)

Even


Unnamed: 0_level_0,Algorithm,Ns,Type,basepairs,contigs,mapped_reads,max_contig,misassemblies,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),Genomic,81485532.0(±69082486.61),1286040.67(±1726341.64),57.29(±40.35),61507.0(±78246.58),0.0(±0.0),6186.0(±8975.2)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,31254576.33(±1205888.55),2439.5(±2292.43),93.87(±8.31),677707.0(±70298.6),2.67(±2.73),110537.33(±9542.15)
IDBA-UD,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,33523229.67(±4845847.19),7943.0(±9486.87),90.89(±7.11),413513.0(±85348.16),9.67(±4.41),49048.33(±21762.56)
MEGAHIT,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,34099350.67(±5474162.25),7463.33(±10186.7),94.35(±7.85),868834.0(±268046.42),12.0(±3.9),143276.33(±42581.65)
metaSPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,36624891.67(±9555633.92),21002.33(±31475.51),94.28(±7.71),847661.67(±290003.13),2.0(±0.89),158678.0(±91558.71)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),Genomic,34158353.67(±5443602.16),33771.5(±26508.92),90.39(±7.82),147189.67(±22652.82),0.0(±0.0),14485.0(±5722.72)
SKESA,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,30285330.33(±115680.54),1219.0(±199.11),92.53(±8.74),540293.0(±83135.61),2.67(±2.07),85120.33(±16926.29)
SPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,35828349.33(±8294764.4),13949.33(±20031.01),94.27(±7.78),813593.33(±52845.57),4.33(±4.41),158629.67(±83121.53)
Unicycler,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,30462753.0(±99796.14),833.67(±138.96),93.41(±8.6),1011262.0(±227172.01),1.67(±1.37),215168.67(±26920.81)
VelvetOptimizer,Multiple k-mer De Bruijn graph,7589.83(±6317.33),Genomic,38851075.17(±12888650.42),206299.33(±305753.73),62.13(±46.83),565476.5(±436959.29),11.33(±8.82),20245.5(±15613.92)


Log


Unnamed: 0_level_0,Algorithm,Ns,Type,basepairs,contigs,mapped_reads,max_contig,misassemblies,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),Genomic,233321388.33(±166976037.91),5418807.0(±4163140.62),35.37(±48.8),55949.33(±83345.94),0.0(±0.0),6255.33(±9625.66)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,14895298.5(±1740108.73),2819.33(±3068.06),95.43(±4.81),602475.67(±46071.98),1.17(±0.75),94310.67(±36555.23)
IDBA-UD,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,17234330.0(±4926154.48),5721.33(±6544.52),95.56(±4.6),616678.67(±24065.69),2.67(±0.52),86668.5(±27611.4)
MEGAHIT,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,17581572.33(±5363708.41),5593.67(±6662.12),95.61(±4.66),826891.0(±205305.52),0.67(±0.52),152385.33(±56202.22)
metaSPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),Metagenomic,19165161.67(±6734121.74),11697.67(±11491.98),95.66(±4.59),796644.67(±138537.99),1.67(±1.03),139179.67(±105672.65)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),Genomic,17497958.0(±2569841.72),25648.33(±9996.14),92.59(±5.2),168041.33(±6968.79),1.33(±1.03),13899.0(±9741.1)
SKESA,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,6659581.0(±2881152.41),6124.0(±4725.89),88.66(±7.85),240146.67(±303837.03),4.0(±6.2),117871.0(±180601.9)
SPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,20448859.33(±7028523.19),24923.67(±13809.57),94.15(±5.68),581322.33(±264197.12),191.67(±295.38),57050.67(±54610.55)
Unicycler,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,11598783.0(±1617707.48),229.0(±36.54),94.74(±5.29),703922.0(±111322.71),1.0(±1.55),210356.0(±18659.42)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)


## Global metrics

### Load data

In [14]:
report_glob = glob.glob('../Results/*/report/pipeline_report_tables.json')
reference_pipeline_metrics_df = pd.DataFrame(columns=METRICS_COLUMNS_REFERENCE)

for pipeline_report_file in report_glob:
    report_file_name = pipeline_report_file.split('/')[-1]
    stats_run = pipeline_report_file.split('/')[-3]
    print('Processing {0} data from {1}...'.format(report_file_name, stats_run))
    
    with open(pipeline_report_file) as _fh:
        json_report = json.load(_fh)
        for sample in json_report.keys():
            for reference, data in json_report[sample]['ReferenceTables'].items():
                for row in data:
                    for item in row:
                        reference_pipeline_metrics_df = reference_pipeline_metrics_df.append({'run': stats_run,
                                                                                        'sample': sample,
                                                                                        'assembler': item['assembler'],
                                                                                        'reference': reference,
                                                                                        'contiguity': item['contiguity'],
                                                                                        'breadth_of_coverage': item['breadth_of_coverage'],
                                                                                        'multiplicity': item['multiplicity'],
                                                                                        'validity': item['validity'],
                                                                                        'parsimony': item['parsimony'],
                                                                                        'identity': item['identity'],
                                                                                        'lowest_identity': item['lowest_identity'],
                                                                                        'L90': item['L90'],
                                                                                        'contigs': item['aligned_contigs'],
                                                                                        'NA50': item['NA50'],
                                                                                        'NG50': item['NG50'],
                                                                                        'basepairs': item['aligned_basepairs'],
                                                                                        'Ns': item['Ns'],
                                                                                        'misassemblies': item['misassembled_contigs']},
                                                                                       ignore_index=True)

reference_pipeline_metrics_df['distribution'] = np.where(reference_pipeline_metrics_df['sample'].isin(log_distributed), 'Log', 'Even')
reference_pipeline_metrics_df['type'] = np.where(reference_pipeline_metrics_df['assembler'].isin(genomic_assemblers), 'Genomic', 'Metagenomic')
reference_pipeline_metrics_df['algorythm'] = np.where(reference_pipeline_metrics_df['assembler'].isin(single_kmer), 'Single k-mer De Bruijn graph', 'Multiple k-mer De Bruijn graph')
reference_pipeline_metrics_df[['contigs','basepairs','L90','Ns','NA50','NG50','misassemblies', 'multiplicity','validity','parsimony','identity','lowest_identity']] = reference_pipeline_metrics_df[['contigs','basepairs','L90','Ns','NA50','NG50','misassemblies','multiplicity','validity','parsimony','identity','lowest_identity']].apply(pd.to_numeric)
display(reference_pipeline_metrics_df)

Processing pipeline_report_tables.json data from run3...
Processing pipeline_report_tables.json data from run2...


Unnamed: 0,run,sample,assembler,reference,contiguity,breadth_of_coverage,multiplicity,validity,parsimony,identity,...,L90,contigs,NA50,NG50,basepairs,Ns,misassemblies,type,algorythm,distribution
0,run3,ENN,BCALM2,Bacillus_subtilis,0.032648,0.976835,1.000279,1.000000,1.000279,1.000000,...,66,116,65138,64934,3951958,0,0,Genomic,Single k-mer De Bruijn graph,Even
1,run3,ENN,GATBMiniaPipeline,Bacillus_subtilis,0.097833,0.983819,1.000000,1.000000,1.000000,0.999966,...,14,29,269695,263366,3980213,0,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
2,run3,ENN,IDBA-UD,Bacillus_subtilis,0.077894,0.982940,1.000048,0.999999,1.000048,0.999558,...,21,38,211169,211169,3976658,0,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
3,run3,ENN,MEGAHIT,Bacillus_subtilis,0.097848,0.985261,1.000217,0.999861,1.000356,0.998712,...,14,28,263587,263587,3986046,0,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
4,run3,ENN,metaSPAdes,Bacillus_subtilis,0.255293,0.985404,1.000000,1.000000,1.000000,0.998464,...,10,20,325068,325068,3986625,0,0,Metagenomic,Multiple k-mer De Bruijn graph,Even
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1435,run2,LHS,MINIA,Staphylococcus_aureus_plasmid3,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0,0,0,0,0,0,0,Genomic,Single k-mer De Bruijn graph,Log
1436,run2,LHS,SKESA,Staphylococcus_aureus_plasmid3,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0,0,0,0,0,0,0,Genomic,Multiple k-mer De Bruijn graph,Log
1437,run2,LHS,SPAdes,Staphylococcus_aureus_plasmid3,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0,0,0,0,0,0,0,Genomic,Multiple k-mer De Bruijn graph,Log
1438,run2,LHS,Unicycler,Staphylococcus_aureus_plasmid3,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0,0,0,0,0,0,0,Genomic,Multiple k-mer De Bruijn graph,Log


In [15]:
reference_pipeline_metrics_df.dtypes

run                     object
sample                  object
assembler               object
reference               object
contiguity             float64
breadth_of_coverage    float64
multiplicity           float64
validity               float64
parsimony              float64
identity               float64
lowest_identity        float64
L90                      int64
contigs                  int64
NA50                     int64
NG50                     int64
basepairs                int64
Ns                       int64
misassemblies            int64
type                    object
algorythm               object
distribution            object
dtype: object

### Save data

In [16]:
reference_pipeline_metrics_df.to_csv("Tables/Results/Reference metrics.csv")

### Reference statistics per assembler type

In [17]:
ref_stats_per_assembler_type = pd.DataFrame()
for assembly_type in reference_pipeline_metrics_df.type.unique():
    row={'Assembler Type': assembly_type}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.type == assembly_type].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        std = lala.loc['std',column]
        row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
    ref_stats_per_assembler_type=ref_stats_per_assembler_type.append(row, ignore_index=True)
ref_stats_per_assembler_type = ref_stats_per_assembler_type.set_index('Assembler Type')
ref_stats_per_assembler_type.to_csv("Tables/Results/Reference metrics per assembler type.csv")

In [18]:
ref_stats_per_assembler_type

Unnamed: 0_level_0,L90,NA50,NG50,Ns,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Genomic,32.27(±116.14),44541.82(±91876.27),44301.36(±93249.33),81.46(±801.06),1392591.77(±2023755.37),0.42(±0.47),113.05(±296.71),0.08(±0.2),0.5(±0.5),0.47(±0.49),1.55(±27.23),0.5(±0.5),0.51(±0.51),0.5(±0.5)
Metagenomic,24.19(±51.73),82725.76(±123505.97),81989.47(±122524.9),0.0(±0.0),1830606.01(±2236330.94),0.55(±0.49),45.2(±82.46),0.14(±0.26),0.66(±0.47),0.61(±0.46),0.39(±1.08),0.67(±0.48),0.69(±0.56),0.66(±0.47)


### Reference stats per assembly algorithm

In [19]:
ref_stats_per_assembler_algo = pd.DataFrame()
for assembly_algo in reference_pipeline_metrics_df.algorythm.unique():
    row={'Assembler Algorithm': assembly_algo}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.algorythm == assembly_algo].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        std = lala.loc['std',column]
        row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
    ref_stats_per_assembler_algo=ref_stats_per_assembler_algo.append(row, ignore_index=True)
ref_stats_per_assembler_algo = ref_stats_per_assembler_algo.set_index('Assembler Algorithm')
ref_stats_per_assembler_algo.to_csv("Tables/Results/Reference metrics per assembler algorithm.csv")

In [20]:
ref_stats_per_assembler_algo

Unnamed: 0_level_0,L90,NA50,NG50,Ns,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Single k-mer De Bruijn graph,59.17(±180.12),8875.39(±15789.06),8406.65(±15616.07),0.0(±0.0),1281528.14(±1944494.79),0.35(±0.44),200.96(±381.74),0.03(±0.13),0.47(±0.5),0.46(±0.5),0.03(±0.25),0.47(±0.5),0.47(±0.5),0.47(±0.5)
Multiple k-mer De Bruijn graph,21.51(±55.64),72550.4(±116246.96),72119.09(±116500.93),61.1(±694.53),1639364.8(±2158330.22),0.5(±0.49),57.14(±174.1),0.12(±0.25),0.59(±0.49),0.54(±0.48),1.34(±23.59),0.59(±0.49),0.6(±0.54),0.59(±0.49)


### Reference stats per assembler

In [21]:
ref_stats_per_assembler = pd.DataFrame()
for assembler in reference_pipeline_metrics_df.assembler.unique():
    row={'Assembler': assembler, 
         'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
         'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        std = lala.loc['std',column]
        row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
    ref_stats_per_assembler=ref_stats_per_assembler.append(row, ignore_index=True)
ref_stats_per_assembler = ref_stats_per_assembler.set_index('Assembler')
ref_stats_per_assembler.to_csv("Tables/Results/Reference metrics per assembler.csv")

In [22]:
ref_stats_per_assembler

Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,44.26(±217.6),5388.68(±13681.22),5086.6(±13454.63),0.0(±0.0),Genomic,840447.15(±1650142.0),0.23(±0.38),188.96(±421.77),0.02(±0.12),0.33(±0.47),0.33(±0.47),0.0(±0.0),0.33(±0.47),0.33(±0.47),0.33(±0.47)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,26.53(±60.32),75511.67(±112427.0),70925.78(±103438.3),0.0(±0.0),Metagenomic,1828000.69(±2238597.0),0.56(±0.49),47.59(±88.73),0.14(±0.27),0.62(±0.49),0.61(±0.48),0.17(±0.53),0.63(±0.49),0.63(±0.49),0.62(±0.49)
IDBA-UD,Multiple k-mer De Bruijn graph,43.74(±74.57),52024.85(±91970.26),51105.3(±91977.8),0.0(±0.0),Metagenomic,1821846.85(±2232054.57),0.54(±0.48),74.9(±120.66),0.09(±0.2),0.66(±0.47),0.57(±0.46),0.64(±1.62),0.67(±0.47),0.74(±0.75),0.65(±0.47)
MEGAHIT,Multiple k-mer De Bruijn graph,12.94(±22.37),105537.11(±143830.26),109204.12(±146844.18),0.0(±0.0),Metagenomic,1837438.15(±2249917.86),0.56(±0.49),31.08(±45.0),0.17(±0.3),0.68(±0.46),0.61(±0.45),0.57(±1.17),0.68(±0.47),0.69(±0.47),0.68(±0.47)
metaSPAdes,Multiple k-mer De Bruijn graph,13.56(±20.69),97829.43(±133394.7),96722.68(±132640.77),0.0(±0.0),Metagenomic,1835138.35(±2248100.93),0.56(±0.49),27.24(±38.26),0.16(±0.27),0.68(±0.47),0.65(±0.46),0.17(±0.5),0.69(±0.47),0.69(±0.48),0.68(±0.47)
MINIA,Single k-mer De Bruijn graph,74.08(±131.58),12362.1(±16990.35),11726.7(±16917.51),0.0(±0.0),Genomic,1722609.12(±2115127.99),0.48(±0.47),212.97(±338.05),0.04(±0.13),0.61(±0.49),0.59(±0.49),0.07(±0.35),0.61(±0.49),0.61(±0.49),0.61(±0.49)
SKESA,Multiple k-mer De Bruijn graph,22.76(±44.46),47300.06(±83407.74),46891.18(±83479.08),0.0(±0.0),Genomic,1436160.07(±1961548.25),0.47(±0.47),95.31(±280.69),0.12(±0.26),0.54(±0.5),0.48(±0.48),0.33(±1.5),0.54(±0.5),0.54(±0.5),0.54(±0.5)
SPAdes,Multiple k-mer De Bruijn graph,25.32(±98.93),83133.97(±116866.68),83465.24(±118060.11),0.0(±0.0),Genomic,1808988.64(±2234202.5),0.52(±0.48),72.15(±260.1),0.12(±0.21),0.7(±0.45),0.64(±0.45),8.28(±66.43),0.72(±0.46),0.73(±0.48),0.7(±0.45)
Unicycler,Multiple k-mer De Bruijn graph,11.68(±21.56),104906.65(±137452.28),101808.62(±136868.91),0.0(±0.0),Genomic,1742439.92(±2215832.77),0.54(±0.49),23.51(±35.61),0.16(±0.26),0.56(±0.5),0.54(±0.49),0.12(±0.53),0.56(±0.5),0.56(±0.5),0.55(±0.5)
VelvetOptimizer,Multiple k-mer De Bruijn graph,15.51(±41.88),14159.49(±47604.26),16829.79(±64115.87),488.78(±1916.28),Genomic,804905.71(±1645149.23),0.25(±0.42),85.38(±254.37),0.03(±0.13),0.26(±0.44),0.25(±0.42),0.47(±1.67),0.26(±0.44),0.26(±0.44),0.26(±0.44)


### Reference stats per reference replicon

In [23]:
reference_pipeline_metrics_df.reference.unique()

array(['Bacillus_subtilis', 'Enterococcus_faecalis',
       'Escherichia_coli_plasmid', 'Escherichia_coli',
       'Lactobacillus_fermentum', 'Listeria_monocytogenes',
       'Pseudomonas_aeruginosa', 'Salmonella_enterica',
       'Staphylococcus_aureus', 'Staphylococcus_aureus_plasmid1',
       'Staphylococcus_aureus_plasmid2', 'Staphylococcus_aureus_plasmid3'],
      dtype=object)

In [24]:
for reference in reference_pipeline_metrics_df.reference.unique():
    ref_stats_per_reference_per_assembler = pd.DataFrame()
    for assembler in reference_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.assembler == assembler) & (reference_pipeline_metrics_df.reference == reference)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            std = lala.loc['std',column]
            row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
        ref_stats_per_reference_per_assembler=ref_stats_per_reference_per_assembler.append(row, ignore_index=True)
    ref_stats_per_reference_per_assembler = ref_stats_per_reference_per_assembler.set_index('Assembler')
    ref_stats_per_reference_per_assembler.to_csv("Tables/Results/{} - Reference metrics per assembler.csv".format(reference))
    print(reference)
    display(ref_stats_per_reference_per_assembler)

Bacillus_subtilis


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,32.5(±51.63),17016.5(±25473.66),16507.83(±25721.82),0.0(±0.0),Genomic,1913231.5(±1818680.2),0.47(±0.45),382.67(±535.65),0.01(±0.01),0.83(±0.39),0.83(±0.39),0.0(±0.0),0.83(±0.39),0.83(±0.39),0.83(±0.39)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,115.25(±166.23),149601.08(±122100.99),147406.25(±120104.85),0.0(±0.0),Metagenomic,3960376.67(±37546.65),0.98(±0.01),165.42(±215.2),0.08(±0.06),1.0(±0.0),0.99(±0.01),0.58(±0.79),1.0(±0.0),1.0(±0.0),1.0(±0.0)
IDBA-UD,Multiple k-mer De Bruijn graph,92.17(±98.39),111161.67(±94220.24),110292.0(±93518.39),0.0(±0.0),Metagenomic,3968675.5(±15538.85),0.98(±0.0),141.5(±139.84),0.05(±0.03),1.0(±0.0),0.98(±0.01),0.83(±1.11),1.0(±0.0),1.0(±0.0),1.0(±0.0)
MEGAHIT,Multiple k-mer De Bruijn graph,38.67(±51.04),205803.83(±101816.32),206013.67(±101682.93),0.0(±0.0),Metagenomic,3983434.17(±6834.76),0.98(±0.0),65.33(±74.49),0.11(±0.06),1.0(±0.01),0.87(±0.21),0.5(±0.8),1.0(±0.0),1.0(±0.0),1.0(±0.0)
metaSPAdes,Multiple k-mer De Bruijn graph,32.83(±22.3),173649.83(±111545.82),170909.5(±109144.16),0.0(±0.0),Metagenomic,3977137.17(±5153.73),0.98(±0.0),60.0(±37.85),0.13(±0.1),1.0(±0.0),0.98(±0.01),0.67(±0.78),1.0(±0.0),1.0(±0.0),1.0(±0.0)
MINIA,Single k-mer De Bruijn graph,193.33(±216.14),40222.25(±22140.45),39606.08(±22072.87),0.0(±0.0),Genomic,3928355.33(±49914.34),0.97(±0.01),278.67(±242.5),0.02(±0.01),1.0(±0.0),1.0(±0.01),0.67(±0.98),1.0(±0.01),1.0(±0.01),1.0(±0.0)
SKESA,Multiple k-mer De Bruijn graph,9.33(±10.07),117416.17(±124370.34),116361.33(±122979.89),0.0(±0.0),Genomic,1984359.17(±2072596.78),0.49(±0.51),16.33(±17.45),0.05(±0.05),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SPAdes,Multiple k-mer De Bruijn graph,27.17(±23.3),214387.33(±102274.23),222639.33(±109226.61),0.0(±0.0),Genomic,3974502.67(±6095.09),0.98(±0.0),49.33(±34.63),0.13(±0.06),1.0(±0.0),0.99(±0.01),0.33(±0.78),1.0(±0.0),1.0(±0.0),1.0(±0.0)
Unicycler,Multiple k-mer De Bruijn graph,28.83(±52.18),180230.33(±141952.8),162359.67(±156178.13),0.0(±0.0),Genomic,2965228.33(±1564926.8),0.73(±0.39),43.5(±62.9),0.11(±0.08),0.83(±0.39),0.83(±0.39),0.0(±0.0),0.83(±0.39),0.83(±0.39),0.83(±0.39)
VelvetOptimizer,Multiple k-mer De Bruijn graph,20.5(±30.29),22185.33(±32832.41),22034.67(±32594.5),0.0(±0.0),Genomic,1317971.0(±1946772.19),0.33(±0.48),37.17(±54.92),0.02(±0.03),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Enterococcus_faecalis


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,17.5(±40.88),5311.67(±11755.93),5066.58(±11833.0),0.0(±0.0),Genomic,651058.92(±1075258.97),0.23(±0.38),160.42(±303.97),0.01(±0.02),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,9.17(±10.13),83020.17(±89046.92),84170.5(±90589.23),0.0(±0.0),Metagenomic,1400651.25(±1462979.09),0.49(±0.51),19.25(±20.61),0.08(±0.09),0.5(±0.52),0.49(±0.52),0.33(±0.49),0.5(±0.52),0.5(±0.52),0.5(±0.52)
IDBA-UD,Multiple k-mer De Bruijn graph,32.67(±54.62),41193.67(±54299.03),41736.33(±55119.29),0.0(±0.0),Metagenomic,1393959.67(±1455984.8),0.49(±0.51),56.17(±85.33),0.05(±0.06),0.5(±0.52),0.46(±0.49),0.5(±0.8),0.5(±0.52),0.5(±0.52),0.5(±0.52)
MEGAHIT,Multiple k-mer De Bruijn graph,7.0(±7.48),117668.83(±122901.09),118207.17(±123468.38),0.0(±0.0),Metagenomic,1407900.08(±1470537.2),0.49(±0.52),15.92(±16.69),0.11(±0.12),0.5(±0.52),0.48(±0.5),0.33(±0.49),0.5(±0.52),0.5(±0.52),0.5(±0.52)
metaSPAdes,Multiple k-mer De Bruijn graph,7.5(±9.43),103369.67(±112974.61),103369.67(±112974.61),0.0(±0.0),Metagenomic,1396282.67(±1458429.71),0.49(±0.51),15.17(±18.64),0.08(±0.1),0.5(±0.52),0.49(±0.51),0.17(±0.39),0.5(±0.52),0.5(±0.52),0.5(±0.52)
MINIA,Single k-mer De Bruijn graph,59.25(±63.72),15743.5(±12692.73),13404.83(±14499.77),0.0(±0.0),Genomic,1383381.67(±1432831.1),0.49(±0.5),93.33(±96.61),0.02(±0.02),0.83(±0.39),0.83(±0.39),0.0(±0.0),0.83(±0.39),0.83(±0.39),0.83(±0.39)
SKESA,Multiple k-mer De Bruijn graph,15.83(±17.61),49461.0(±52706.44),47018.5(±52133.86),0.0(±0.0),Genomic,1402020.17(±1458916.42),0.49(±0.51),29.83(±29.71),0.06(±0.06),0.83(±0.39),0.69(±0.4),0.33(±0.49),0.83(±0.39),0.84(±0.39),0.83(±0.39)
SPAdes,Multiple k-mer De Bruijn graph,8.0(±9.63),97227.67(±110100.19),98739.33(±111126.93),0.0(±0.0),Genomic,1412023.67(±1464709.19),0.5(±0.51),21.67(±21.59),0.1(±0.1),0.79(±0.38),0.71(±0.44),1.67(±2.74),0.87(±0.41),0.96(±0.53),0.78(±0.38)
Unicycler,Multiple k-mer De Bruijn graph,4.83(±5.2),141763.17(±155926.65),133592.17(±151708.82),0.0(±0.0),Genomic,1396688.5(±1458835.03),0.49(±0.51),11.5(±12.3),0.13(±0.14),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,34.17(±50.48),10149.33(±14991.56),10133.5(±14968.2),0.0(±0.0),Genomic,911895.5(±1346958.97),0.32(±0.47),53.17(±78.55),0.02(±0.02),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Escherichia_coli_plasmid


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,0.5(±1.17),4473.0(±10446.68),4473.0(±10446.68),0.0(±0.0),Genomic,17516.67(±40910.12),0.16(±0.37),0.83(±1.95),0.06(±0.13),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,1.17(±1.27),18771.83(±21683.73),18771.83(±21683.73),0.0(±0.0),Metagenomic,54458.0(±56881.01),0.5(±0.52),1.75(±1.86),0.23(±0.24),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
IDBA-UD,Multiple k-mer De Bruijn graph,1.67(±1.78),14454.33(±14717.48),14269.83(±14905.13),0.0(±0.0),Metagenomic,54250.67(±56279.71),0.49(±0.51),2.67(±2.67),0.17(±0.18),0.66(±0.49),0.66(±0.49),0.0(±0.0),0.67(±0.49),0.67(±0.49),0.67(±0.49)
MEGAHIT,Multiple k-mer De Bruijn graph,0.17(±0.39),46160.5(±51369.14),45910.0(±51609.48),0.0(±0.0),Metagenomic,55086.33(±57016.76),0.5(±0.52),0.92(±0.9),0.42(±0.47),0.66(±0.49),0.66(±0.49),0.0(±0.0),0.67(±0.49),0.67(±0.49),0.67(±0.49)
metaSPAdes,Multiple k-mer De Bruijn graph,0.5(±0.52),32132.83(±33900.9),32132.83(±33900.9),0.0(±0.0),Metagenomic,54693.0(±57128.66),0.5(±0.52),1.0(±1.04),0.29(±0.31),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
MINIA,Single k-mer De Bruijn graph,2.0(±2.34),11414.5(±12636.0),11414.5(±12636.0),0.0(±0.0),Genomic,52511.42(±54848.51),0.48(±0.5),3.08(±3.37),0.15(±0.16),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SKESA,Multiple k-mer De Bruijn graph,0.33(±0.78),41292.5(±51587.61),41292.5(±51587.61),0.0(±0.0),Genomic,54529.83(±56958.31),0.5(±0.52),0.83(±1.11),0.41(±0.47),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SPAdes,Multiple k-mer De Bruijn graph,1.0(±1.21),21183.33(±25801.18),20931.67(±26016.32),0.0(±0.0),Genomic,54636.83(±56545.69),0.5(±0.51),1.83(±1.75),0.25(±0.27),0.66(±0.49),0.66(±0.49),0.0(±0.0),0.67(±0.49),0.67(±0.49),0.67(±0.49)
Unicycler,Multiple k-mer De Bruijn graph,0.83(±1.11),25299.83(±29220.18),24956.0(±28847.65),0.0(±0.0),Genomic,54046.83(±56453.02),0.49(±0.51),1.33(±1.56),0.25(±0.27),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,1.0(±1.48),8984.33(±13270.92),8984.33(±13270.92),0.0(±0.0),Genomic,35092.0(±51834.34),0.32(±0.47),1.5(±2.24),0.12(±0.17),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Escherichia_coli


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),1069.17(±1751.88),741.67(±1243.32),0.0(±0.0),Genomic,1098954.5(±1647814.92),0.23(±0.35),438.5(±658.35),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,50.67(±53.18),29829.67(±30833.33),28981.83(±30420.26),0.0(±0.0),Metagenomic,2313731.67(±2415114.11),0.49(±0.51),94.58(±97.59),0.01(±0.01),0.66(±0.49),0.57(±0.46),0.17(±0.39),0.67(±0.49),0.67(±0.5),0.66(±0.49)
IDBA-UD,Multiple k-mer De Bruijn graph,123.42(±130.12),15857.33(±11471.13),10091.67(±10614.96),0.0(±0.0),Metagenomic,2274553.25(±2369552.74),0.48(±0.5),202.25(±205.78),0.01(±0.01),0.95(±0.07),0.44(±0.41),2.83(±2.37),1.0(±0.0),1.8(±1.79),0.84(±0.31)
MEGAHIT,Multiple k-mer De Bruijn graph,24.5(±25.94),42178.67(±43128.04),47202.5(±50787.41),0.0(±0.0),Metagenomic,2330881.5(±2424930.54),0.49(±0.51),70.75(±66.26),0.03(±0.03),0.99(±0.01),0.78(±0.37),1.83(±2.12),1.0(±0.01),1.02(±0.02),0.99(±0.02)
metaSPAdes,Multiple k-mer De Bruijn graph,26.67(±33.8),52767.0(±57732.18),51810.83(±58746.82),0.0(±0.0),Metagenomic,2331363.67(±2427849.41),0.49(±0.51),59.0(±62.64),0.04(±0.05),0.98(±0.03),0.9(±0.18),0.83(±1.11),1.0(±0.0),1.02(±0.03),0.98(±0.03)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),2224.5(±2323.51),1579.33(±1650.13),0.0(±0.0),Genomic,1881469.42(±1965150.53),0.39(±0.41),565.08(±590.26),0.01(±0.01),0.5(±0.52),0.49(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SKESA,Multiple k-mer De Bruijn graph,62.5(±69.02),22329.33(±24731.84),22091.5(±24654.37),0.0(±0.0),Genomic,2281684.5(±2383170.32),0.48(±0.5),113.17(±121.89),0.01(±0.01),0.5(±0.52),0.19(±0.37),0.33(±0.49),0.5(±0.52),0.52(±0.54),0.48(±0.51)
SPAdes,Multiple k-mer De Bruijn graph,29.67(±33.14),43076.33(±44941.99),41704.67(±44965.26),0.0(±0.0),Genomic,2319842.17(±2406797.45),0.49(±0.51),62.17(±53.74),0.04(±0.04),0.97(±0.05),0.77(±0.31),1.17(±1.11),1.0(±0.0),1.03(±0.07),0.98(±0.06)
Unicycler,Multiple k-mer De Bruijn graph,20.83(±23.26),57316.5(±60896.05),57285.67(±61099.9),0.0(±0.0),Genomic,2312584.67(±2415461.01),0.49(±0.51),45.83(±49.13),0.05(±0.05),0.5(±0.52),0.43(±0.47),0.67(±1.15),0.5(±0.52),0.5(±0.53),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),1614.83(±2385.32),1368.17(±2021.42),1679.67(±2481.3),Genomic,1357997.17(±2005942.66),0.28(±0.42),375.67(±554.93),0.0(±0.01),0.33(±0.49),0.28(±0.41),2.17(±3.21),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Lactobacillus_fermentum


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,34.67(±80.96),2276.17(±4211.52),1705.67(±3983.58),0.0(±0.0),Genomic,317681.17(±670990.97),0.17(±0.35),60.17(±94.49),0.01(±0.01),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,27.0(±28.21),20600.83(±20729.94),19628.5(±20520.71),0.0(±0.0),Metagenomic,898812.75(±937432.93),0.47(±0.49),43.83(±45.14),0.03(±0.03),0.67(±0.49),0.66(±0.49),0.0(±0.0),0.67(±0.49),0.67(±0.49),0.67(±0.49)
IDBA-UD,Multiple k-mer De Bruijn graph,40.5(±49.71),16478.5(±18444.33),15759.67(±18444.43),0.0(±0.0),Metagenomic,893165.42(±931536.72),0.47(±0.49),57.83(±66.49),0.03(±0.03),0.67(±0.49),0.67(±0.49),0.0(±0.0),0.67(±0.49),0.67(±0.49),0.67(±0.49)
MEGAHIT,Multiple k-mer De Bruijn graph,21.5(±23.06),22554.0(±22845.25),21806.0(±22940.55),0.0(±0.0),Metagenomic,905074.75(±943865.02),0.48(±0.5),42.0(±43.29),0.03(±0.03),0.67(±0.49),0.63(±0.47),1.33(±1.44),0.67(±0.49),0.67(±0.49),0.67(±0.49)
metaSPAdes,Multiple k-mer De Bruijn graph,27.0(±28.78),21254.67(±21457.03),19114.17(±20100.39),0.0(±0.0),Metagenomic,896610.17(±933661.73),0.47(±0.49),43.5(±44.07),0.03(±0.03),0.83(±0.39),0.82(±0.38),0.17(±0.39),0.83(±0.39),0.84(±0.39),0.83(±0.39)
MINIA,Single k-mer De Bruijn graph,112.42(±119.05),5502.33(±5297.92),4980.17(±5281.41),0.0(±0.0),Genomic,874011.83(±911955.09),0.46(±0.48),126.25(±132.08),0.01(±0.02),0.67(±0.49),0.66(±0.48),0.0(±0.0),0.67(±0.49),0.67(±0.49),0.67(±0.49)
SKESA,Multiple k-mer De Bruijn graph,32.17(±33.88),18663.17(±19562.2),17273.83(±18269.01),0.0(±0.0),Genomic,884407.0(±923752.3),0.46(±0.48),44.83(±46.98),0.03(±0.03),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SPAdes,Multiple k-mer De Bruijn graph,25.67(±27.03),22543.17(±22095.77),21440.33(±22590.33),0.0(±0.0),Genomic,896103.67(±933069.36),0.47(±0.49),44.33(±45.26),0.03(±0.03),0.98(±0.03),0.98(±0.03),0.83(±1.95),1.0(±0.0),1.01(±0.01),0.99(±0.01)
Unicycler,Multiple k-mer De Bruijn graph,26.33(±27.65),21890.17(±22950.16),21005.33(±22036.18),0.0(±0.0),Genomic,887187.0(±926638.69),0.47(±0.49),39.83(±41.63),0.03(±0.03),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,67.5(±99.74),3881.33(±5733.73),3681.33(±5437.89),0.0(±0.0),Genomic,583673.5(±862142.94),0.31(±0.45),77.17(±114.02),0.01(±0.02),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Listeria_monocytogenes


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,20.42(±31.06),18785.0(±27736.56),17807.67(±26866.29),0.0(±0.0),Genomic,1198544.25(±1361745.87),0.4(±0.46),180.92(±326.85),0.02(±0.03),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,10.83(±0.94),351932.67(±59629.94),302389.5(±78025.69),0.0(±0.0),Metagenomic,2945179.25(±16640.4),0.98(±0.01),21.08(±2.97),0.2(±0.01),1.0(±0.0),0.99(±0.01),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
IDBA-UD,Multiple k-mer De Bruijn graph,27.33(±37.72),269599.5(±140387.79),269315.5(±140923.15),0.0(±0.0),Metagenomic,2947515.5(±15104.41),0.99(±0.01),47.33(±57.82),0.16(±0.06),1.0(±0.0),0.99(±0.01),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
MEGAHIT,Multiple k-mer De Bruijn graph,7.67(±0.98),449821.33(±95277.74),450683.67(±93501.07),0.0(±0.0),Metagenomic,2945895.08(±19057.69),0.98(±0.01),19.08(±3.29),0.25(±0.1),0.99(±0.03),0.82(±0.32),0.33(±0.49),1.01(±0.02),1.01(±0.02),1.0(±0.0)
metaSPAdes,Multiple k-mer De Bruijn graph,9.0(±2.89),372265.0(±126651.34),372265.0(±126651.34),0.0(±0.0),Metagenomic,2953982.17(±14428.14),0.99(±0.0),18.33(±6.34),0.22(±0.07),1.0(±0.0),0.99(±0.0),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
MINIA,Single k-mer De Bruijn graph,130.42(±160.69),32615.25(±23203.96),31617.83(±22863.1),0.0(±0.0),Genomic,2830630.5(±134819.59),0.95(±0.05),339.0(±323.81),0.03(±0.02),1.0(±0.0),0.83(±0.38),0.17(±0.39),1.0(±0.0),1.0(±0.01),1.0(±0.0)
SKESA,Multiple k-mer De Bruijn graph,47.17(±86.5),194836.17(±148932.43),194654.83(±149181.14),0.0(±0.0),Genomic,2800686.83(±254111.92),0.94(±0.08),236.83(±374.7),0.13(±0.09),1.0(±0.0),1.0(±0.0),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
SPAdes,Multiple k-mer De Bruijn graph,146.83(±322.89),217159.83(±184259.18),217226.5(±184174.84),0.0(±0.0),Genomic,2663180.83(±416898.87),0.89(±0.14),534.5(±778.37),0.13(±0.1),1.0(±0.0),0.91(±0.19),94.33(±220.32),1.07(±0.15),1.07(±0.15),1.0(±0.0)
Unicycler,Multiple k-mer De Bruijn graph,13.17(±11.65),329234.17(±118689.87),329234.17(±118689.87),0.0(±0.0),Genomic,2944906.5(±14176.66),0.98(±0.0),25.33(±20.48),0.2(±0.03),1.0(±0.0),0.99(±0.01),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
VelvetOptimizer,Multiple k-mer De Bruijn graph,22.5(±33.25),16602.83(±24583.97),15930.5(±23620.45),0.0(±0.0),Genomic,972774.33(±1436882.94),0.33(±0.48),39.5(±58.36),0.02(±0.03),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Pseudomonas_aeruginosa


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,406.83(±665.77),8346.0(±10863.52),8015.67(±10517.7),0.0(±0.0),Genomic,3187529.5(±3332377.59),0.47(±0.49),469.17(±667.83),0.01(±0.01),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,47.0(±13.8),173809.83(±51047.17),172485.0(±52333.37),0.0(±0.0),Metagenomic,6694504.42(±29451.89),0.99(±0.0),112.67(±19.26),0.06(±0.01),1.0(±0.0),0.99(±0.03),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
IDBA-UD,Multiple k-mer De Bruijn graph,65.25(±15.14),117962.67(±32563.68),115056.17(±31878.32),0.0(±0.0),Metagenomic,6699905.75(±4414.15),0.99(±0.0),146.5(±24.85),0.05(±0.01),1.0(±0.0),0.95(±0.03),0.67(±0.78),1.0(±0.0),1.0(±0.0),1.0(±0.0)
MEGAHIT,Multiple k-mer De Bruijn graph,28.33(±2.31),249058.0(±22127.07),250514.0(±24871.89),0.0(±0.0),Metagenomic,6733274.08(±3931.88),0.99(±0.0),79.67(±2.81),0.07(±0.0),1.0(±0.0),0.97(±0.04),0.17(±0.39),1.0(±0.0),1.0(±0.0),1.0(±0.0)
metaSPAdes,Multiple k-mer De Bruijn graph,29.0(±7.77),233006.33(±48648.87),227289.33(±43495.58),0.0(±0.0),Metagenomic,6727853.5(±3715.14),0.99(±0.0),66.0(±16.46),0.11(±0.03),1.0(±0.0),0.98(±0.01),0.0(±0.0),1.0(±0.01),1.0(±0.01),1.0(±0.0)
MINIA,Single k-mer De Bruijn graph,330.83(±40.76),22426.83(±2003.94),21766.17(±2116.41),0.0(±0.0),Genomic,6510048.42(±14555.32),0.96(±0.0),511.0(±47.51),0.02(±0.01),1.0(±0.0),0.99(±0.01),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
SKESA,Multiple k-mer De Bruijn graph,25.5(±27.92),70226.5(±74403.23),71537.67(±78367.82),0.0(±0.0),Genomic,4195442.67(±2806139.97),0.62(±0.41),560.67(±731.79),0.03(±0.03),0.83(±0.39),0.77(±0.36),3.0(±4.43),0.83(±0.39),0.83(±0.39),0.83(±0.39)
SPAdes,Multiple k-mer De Bruijn graph,39.17(±13.8),201858.33(±60787.13),198958.83(±58264.4),0.0(±0.0),Genomic,6710119.33(±15195.06),0.99(±0.0),86.17(±25.75),0.09(±0.04),1.0(±0.0),0.98(±0.02),0.17(±0.39),1.0(±0.0),1.0(±0.0),1.0(±0.0)
Unicycler,Multiple k-mer De Bruijn graph,26.67(±3.8),262392.33(±56688.27),262392.33(±56688.27),0.0(±0.0),Genomic,6683616.17(±63040.19),0.98(±0.01),69.0(±9.4),0.11(±0.03),1.0(±0.0),0.97(±0.05),0.5(±1.17),1.0(±0.0),1.0(±0.0),1.0(±0.0)
VelvetOptimizer,Multiple k-mer De Bruijn graph,5.17(±7.64),93246.0(±138021.07),127325.67(±189859.82),3093.67(±5214.96),Genomic,2235285.83(±3301739.97),0.33(±0.49),24.67(±36.84),0.03(±0.05),0.33(±0.49),0.3(±0.45),2.33(±3.65),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Salmonella_enterica


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),1158.83(±1945.24),804.33(±1372.03),0.0(±0.0),Genomic,1112956.58(±1669062.03),0.23(±0.35),437.5(±661.39),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,40.67(±42.77),31951.17(±32820.08),31071.33(±32883.36),0.0(±0.0),Metagenomic,2326651.67(±2428224.81),0.49(±0.51),84.0(±86.42),0.03(±0.03),0.83(±0.39),0.76(±0.39),0.67(±1.15),0.83(±0.39),0.84(±0.39),0.83(±0.39)
IDBA-UD,Multiple k-mer De Bruijn graph,108.92(±115.95),12896.33(±12411.72),12067.75(±12819.8),0.0(±0.0),Metagenomic,2295911.58(±2392852.98),0.48(±0.5),189.58(±197.3),0.01(±0.01),0.97(±0.04),0.6(±0.36),2.83(±3.49),1.0(±0.0),1.03(±0.03),0.98(±0.03)
MEGAHIT,Multiple k-mer De Bruijn graph,20.33(±21.75),55259.0(±57274.72),58692.83(±63264.03),0.0(±0.0),Metagenomic,2342960.08(±2436179.16),0.49(±0.51),56.67(±51.7),0.05(±0.05),0.99(±0.01),0.86(±0.19),1.0(±1.48),1.0(±0.01),1.02(±0.01),0.98(±0.02)
metaSPAdes,Multiple k-mer De Bruijn graph,21.33(±28.64),72896.5(±84848.7),71169.5(±84212.5),0.0(±0.0),Metagenomic,2340048.0(±2431405.11),0.49(±0.51),47.5(±50.45),0.04(±0.05),0.98(±0.03),0.82(±0.35),0.17(±0.39),1.0(±0.0),1.02(±0.06),0.98(±0.06)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),3137.17(±1905.63),1742.83(±1821.61),0.0(±0.0),Genomic,1909982.0(±1993558.04),0.4(±0.42),545.92(±569.2),0.01(±0.01),1.0(±0.0),0.96(±0.08),0.0(±0.0),1.0(±0.0),1.0(±0.0),1.0(±0.0)
SKESA,Multiple k-mer De Bruijn graph,59.0(±64.84),23731.83(±25590.8),23076.5(±24728.76),0.0(±0.0),Genomic,2300371.83(±2402686.59),0.48(±0.5),104.5(±112.93),0.02(±0.02),0.5(±0.52),0.31(±0.4),0.33(±0.49),0.5(±0.52),0.5(±0.53),0.5(±0.52)
SPAdes,Multiple k-mer De Bruijn graph,17.83(±20.42),68132.5(±73377.61),67903.0(±74527.17),0.0(±0.0),Genomic,2338166.67(±2422256.79),0.49(±0.51),51.0(±42.3),0.05(±0.05),0.97(±0.04),0.72(±0.35),0.67(±0.78),1.0(±0.0),1.04(±0.06),0.97(±0.05)
Unicycler,Multiple k-mer De Bruijn graph,13.67(±15.08),96287.17(±102858.85),96287.17(±102858.85),0.0(±0.0),Genomic,2324824.0(±2428210.58),0.49(±0.51),33.0(±36.27),0.05(±0.05),0.5(±0.52),0.4(±0.47),0.17(±0.39),0.5(±0.52),0.51(±0.53),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),1825.17(±2696.5),1487.83(±2197.9),1092.0(±1641.04),Genomic,1375574.83(±2031862.14),0.29(±0.43),360.0(±531.81),0.0(±0.01),0.33(±0.49),0.26(±0.4),1.17(±1.75),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Staphylococcus_aureus


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,18.67(±43.6),5171.83(±11444.51),4860.83(±11352.46),0.0(±0.0),Genomic,586836.67(±1003940.87),0.22(±0.37),137.17(±247.77),0.01(±0.01),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,16.67(±17.72),42504.0(±45834.0),42085.83(±47098.12),0.0(±0.0),Metagenomic,1337608.58(±1397088.76),0.49(±0.51),27.67(±29.37),0.05(±0.05),0.5(±0.52),0.5(±0.52),0.33(±0.78),0.5(±0.52),0.5(±0.52),0.5(±0.52)
IDBA-UD,Multiple k-mer De Bruijn graph,32.83(±41.97),22172.17(±24955.53),22152.67(±24951.85),0.0(±0.0),Metagenomic,1331546.17(±1390758.02),0.49(±0.51),54.17(±66.16),0.03(±0.03),0.5(±0.52),0.42(±0.46),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
MEGAHIT,Multiple k-mer De Bruijn graph,7.17(±8.08),73781.5(±87016.61),107260.0(±136205.58),0.0(±0.0),Metagenomic,1340715.33(±1400331.53),0.49(±0.52),21.83(±23.01),0.06(±0.07),0.5(±0.52),0.42(±0.46),1.33(±1.67),0.5(±0.52),0.5(±0.52),0.5(±0.52)
metaSPAdes,Multiple k-mer De Bruijn graph,8.67(±11.36),108771.83(±150051.03),108771.83(±150051.03),0.0(±0.0),Metagenomic,1339675.67(±1399248.63),0.49(±0.51),15.33(±18.86),0.1(±0.14),0.5(±0.52),0.49(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
MINIA,Single k-mer De Bruijn graph,60.67(±64.11),13787.83(±14562.52),13552.67(±14317.83),0.0(±0.0),Genomic,1299647.92(±1357438.92),0.48(±0.5),92.92(±97.68),0.02(±0.02),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SKESA,Multiple k-mer De Bruijn graph,21.33(±22.32),26558.67(±27739.65),26302.17(±27476.61),0.0(±0.0),Genomic,1326801.67(±1385802.73),0.49(±0.51),35.5(±37.22),0.03(±0.03),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SPAdes,Multiple k-mer De Bruijn graph,8.5(±11.28),109600.33(±147100.36),109600.33(±147100.36),0.0(±0.0),Genomic,1336849.0(±1396295.96),0.49(±0.51),14.33(±18.7),0.09(±0.09),0.5(±0.52),0.5(±0.52),0.17(±0.39),0.5(±0.52),0.5(±0.52),0.5(±0.52)
Unicycler,Multiple k-mer De Bruijn graph,5.0(±5.91),140460.5(±158318.33),130585.33(±154313.21),0.0(±0.0),Genomic,1336196.33(±1395613.73),0.49(±0.51),12.0(±13.06),0.16(±0.19),0.5(±0.52),0.5(±0.52),0.17(±0.39),0.5(±0.52),0.5(±0.52),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,35.33(±52.2),10368.67(±15332.37),9955.5(±14705.84),0.0(±0.0),Genomic,867548.33(±1281453.9),0.32(±0.47),55.5(±81.99),0.01(±0.02),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)


Staphylococcus_aureus_plasmid1


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),1056.0(±2466.28),1056.0(±2466.28),0.0(±0.0),Genomic,1056.0(±2466.28),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0(±0.0),3195.92(±3338.67),3195.92(±3338.67),0.0(±0.0),Metagenomic,3165.83(±3306.61),0.5(±0.52),0.5(±0.52),0.5(±0.53),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.53),0.5(±0.53),0.5(±0.52)
IDBA-UD,Multiple k-mer De Bruijn graph,0.0(±0.0),1854.83(±2778.99),1854.83(±2778.99),0.0(±0.0),Metagenomic,1849.83(±2773.24),0.29(±0.44),0.33(±0.49),0.29(±0.44),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
MEGAHIT,Multiple k-mer De Bruijn graph,0.0(±0.0),3244.5(±3392.54),3244.5(±3392.54),0.0(±0.0),Metagenomic,3168.17(±3309.04),0.5(±0.52),0.5(±0.52),0.5(±0.52),0.49(±0.51),0.49(±0.51),0.0(±0.0),0.5(±0.52),0.51(±0.54),0.49(±0.51)
metaSPAdes,Multiple k-mer De Bruijn graph,0.17(±0.39),2818.17(±3032.84),2818.17(±3032.84),0.0(±0.0),Metagenomic,3146.0(±3286.21),0.5(±0.52),0.67(±0.78),0.44(±0.48),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),1271.0(±2416.47),1056.0(±2466.28),0.0(±0.0),Genomic,1271.0(±2416.47),0.2(±0.38),0.33(±0.49),0.2(±0.38),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
SKESA,Multiple k-mer De Bruijn graph,0.0(±0.0),2287.5(±2829.83),2287.5(±2829.83),0.0(±0.0),Genomic,2819.33(±2972.63),0.44(±0.47),0.83(±1.11),0.36(±0.45),0.5(±0.52),0.49(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
SPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),2112.0(±3119.63),2112.0(±3119.63),0.0(±0.0),Genomic,2112.0(±3119.63),0.33(±0.49),0.33(±0.49),0.33(±0.49),0.33(±0.49),0.33(±0.49),0.0(±0.0),0.33(±0.49),0.33(±0.49),0.33(±0.49)
Unicycler,Multiple k-mer De Bruijn graph,0.0(±0.0),3137.5(±3277.37),3137.5(±3277.37),0.0(±0.0),Genomic,3132.5(±3272.01),0.49(±0.52),0.5(±0.52),0.49(±0.52),0.5(±0.52),0.5(±0.52),0.0(±0.0),0.5(±0.52),0.5(±0.52),0.5(±0.52)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),1056.0(±2466.28),1056.0(±2466.28),0.0(±0.0),Genomic,1056.0(±2466.28),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)


Staphylococcus_aureus_plasmid2


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0(±0.0),397.0(±927.2),397.0(±927.2),0.0(±0.0),Metagenomic,369.33(±862.58),0.17(±0.39),0.17(±0.39),0.18(±0.42),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.18(±0.42),0.18(±0.42),0.17(±0.39)
IDBA-UD,Multiple k-mer De Bruijn graph,0.0(±0.0),385.83(±901.11),385.83(±901.11),0.0(±0.0),Metagenomic,369.33(±862.58),0.17(±0.39),0.17(±0.39),0.17(±0.41),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.41),0.17(±0.41),0.17(±0.39)
MEGAHIT,Multiple k-mer De Bruijn graph,0.0(±0.0),392.83(±917.46),392.83(±917.46),0.0(±0.0),Metagenomic,369.33(±862.58),0.17(±0.39),0.17(±0.39),0.18(±0.41),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.18(±0.41),0.18(±0.41),0.17(±0.39)
metaSPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),378.5(±883.99),378.5(±883.99),0.0(±0.0),Metagenomic,369.33(±862.58),0.17(±0.39),0.17(±0.39),0.17(±0.4),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.4),0.17(±0.4),0.17(±0.39)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)
SKESA,Multiple k-mer De Bruijn graph,0.0(±0.0),334.17(±780.45),334.17(±780.45),0.0(±0.0),Genomic,334.17(±780.45),0.15(±0.35),0.17(±0.39),0.15(±0.35),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
SPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),326.83(±763.32),326.83(±763.32),0.0(±0.0),Genomic,326.83(±763.32),0.15(±0.34),0.17(±0.39),0.15(±0.34),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
Unicycler,Multiple k-mer De Bruijn graph,0.0(±0.0),369.33(±862.58),369.33(±862.58),0.0(±0.0),Genomic,369.33(±862.58),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)


Staphylococcus_aureus_plasmid3


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0(±0.0),525.83(±1228.08),525.83(±1228.08),0.0(±0.0),Metagenomic,498.83(±1165.02),0.17(±0.39),0.17(±0.39),0.18(±0.41),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.18(±0.41),0.18(±0.41),0.17(±0.39)
IDBA-UD,Multiple k-mer De Bruijn graph,0.17(±0.39),281.33(±657.05),281.33(±657.05),0.0(±0.0),Metagenomic,459.5(±1073.16),0.15(±0.36),0.33(±0.78),0.09(±0.22),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.4),0.17(±0.4),0.17(±0.39)
MEGAHIT,Multiple k-mer De Bruijn graph,0.0(±0.0),522.33(±1219.91),522.33(±1219.91),0.0(±0.0),Metagenomic,498.83(±1165.02),0.17(±0.39),0.17(±0.39),0.17(±0.41),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.41),0.17(±0.41),0.17(±0.39)
metaSPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),642.83(±1501.34),642.83(±1501.34),0.0(±0.0),Metagenomic,498.83(±1165.02),0.17(±0.39),0.17(±0.39),0.21(±0.5),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.21(±0.5),0.21(±0.5),0.17(±0.39)
MINIA,Single k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)
SKESA,Multiple k-mer De Bruijn graph,0.0(±0.0),463.67(±1082.89),463.67(±1082.89),0.0(±0.0),Genomic,463.67(±1082.89),0.15(±0.36),0.17(±0.39),0.15(±0.36),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
SPAdes,Multiple k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)
Unicycler,Multiple k-mer De Bruijn graph,0.0(±0.0),498.83(±1165.02),498.83(±1165.02),0.0(±0.0),Genomic,498.83(±1165.02),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.17(±0.39),0.0(±0.0),0.17(±0.39),0.17(±0.39),0.17(±0.39)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)


### Reference stats per distribution

In [25]:
for distribution in reference_pipeline_metrics_df.distribution.unique():
    ref_stats_per_reference_per_assembler = pd.DataFrame()
    for assembler in reference_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.assembler == assembler) & (reference_pipeline_metrics_df.distribution == distribution)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            std = lala.loc['std',column]
            row[column] = "{}(±{})".format(round(mean, 2), round(std, 2))
        ref_stats_per_reference_per_assembler=ref_stats_per_reference_per_assembler.append(row, ignore_index=True)
    ref_stats_per_reference_per_assembler = ref_stats_per_reference_per_assembler.set_index('Assembler')
    ref_stats_per_reference_per_assembler.to_csv("Tables/Results/{} - Reference metrics per assembler.csv".format(distribution))
    print(distribution)
    display(ref_stats_per_reference_per_assembler)

Even


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,74.93(±300.64),7322.22(±14758.07),6909.82(±14651.4),0.0(±0.0),Genomic,1290698.65(±1827654.8),0.37(±0.42),342.06(±544.94),0.04(±0.17),0.5(±0.5),0.5(±0.5),0.0(±0.0),0.5(±0.5),0.5(±0.5),0.5(±0.5)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,29.53(±32.8),109767.99(±118390.68),100966.54(±103290.32),0.0(±0.0),Metagenomic,2524984.11(±2142615.69),0.87(±0.31),57.58(±62.96),0.25(±0.34),0.89(±0.32),0.86(±0.32),0.25(±0.64),0.89(±0.32),0.89(±0.32),0.89(±0.32)
IDBA-UD,Multiple k-mer De Bruijn graph,71.28(±84.72),56036.0(±73769.04),55774.38(±73629.02),0.0(±0.0),Metagenomic,2508176.47(±2128731.97),0.83(±0.34),120.25(±138.91),0.16(±0.27),0.86(±0.35),0.75(±0.39),1.0(±2.14),0.87(±0.35),0.87(±0.35),0.86(±0.35)
MEGAHIT,Multiple k-mer De Bruijn graph,17.47(±17.88),144475.11(±141019.39),151952.72(±144606.65),0.0(±0.0),Metagenomic,2533624.76(±2150186.44),0.88(±0.31),44.06(±44.47),0.3(±0.37),0.89(±0.32),0.79(±0.35),1.08(±1.47),0.89(±0.32),0.9(±0.32),0.89(±0.32)
metaSPAdes,Multiple k-mer De Bruijn graph,19.92(±22.9),134941.08(±125682.7),133991.47(±125319.33),0.0(±0.0),Metagenomic,2529572.39(±2148555.68),0.87(±0.31),38.94(±42.57),0.28(±0.34),0.89(±0.32),0.88(±0.31),0.17(±0.56),0.9(±0.32),0.9(±0.32),0.89(±0.32)
MINIA,Single k-mer De Bruijn graph,82.97(±110.03),18331.39(±18137.17),17666.15(±17998.82),0.0(±0.0),Genomic,2345476.71(±1982106.28),0.72(±0.38),306.47(±391.64),0.08(±0.18),0.81(±0.4),0.79(±0.39),0.0(±0.0),0.81(±0.4),0.81(±0.4),0.81(±0.4)
SKESA,Multiple k-mer De Bruijn graph,38.81(±44.56),84178.14(±88805.36),83571.72(±89085.63),0.0(±0.0),Genomic,2505379.86(±2127723.39),0.85(±0.31),69.67(±78.58),0.23(±0.34),0.89(±0.32),0.77(±0.38),0.33(±0.75),0.89(±0.32),0.89(±0.32),0.88(±0.32)
SPAdes,Multiple k-mer De Bruijn graph,19.72(±20.8),129438.44(±125555.2),129237.0(±125128.06),0.0(±0.0),Genomic,2524523.92(±2144608.71),0.81(±0.37),39.03(±39.85),0.21(±0.26),0.83(±0.38),0.8(±0.38),0.39(±1.01),0.83(±0.38),0.84(±0.38),0.83(±0.38)
Unicycler,Multiple k-mer De Bruijn graph,15.64(±17.3),161424.75(±141031.02),158207.14(±140318.23),0.0(±0.0),Genomic,2521523.86(±2142658.7),0.87(±0.31),32.81(±33.31),0.29(±0.32),0.89(±0.32),0.86(±0.33),0.17(±0.56),0.89(±0.32),0.89(±0.32),0.89(±0.32)
VelvetOptimizer,Multiple k-mer De Bruijn graph,31.03(±55.18),28318.97(±64479.6),33659.58(±87778.85),977.56(±2628.96),Genomic,1609811.42(±2033999.9),0.5(±0.48),170.75(±339.91),0.07(±0.18),0.53(±0.5),0.5(±0.48),0.94(±2.28),0.53(±0.5),0.53(±0.5),0.53(±0.5)


Log


Unnamed: 0_level_0,Algorithm,L90,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,contiguity,identity,lowest_identity,misassemblies,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
BCALM2,Single k-mer De Bruijn graph,13.58(±55.44),3455.14(±12312.8),3263.39(±11966.56),0.0(±0.0),Genomic,390195.64(±1316359.92),0.09(±0.27),35.86(±117.43),0.0(±0.01),0.17(±0.38),0.17(±0.37),0.0(±0.0),0.17(±0.38),0.17(±0.38),0.17(±0.38)
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,23.54(±78.96),41255.35(±95190.78),40885.01(±95133.88),0.0(±0.0),Metagenomic,1131017.26(±2125360.46),0.25(±0.43),37.6(±108.12),0.02(±0.06),0.36(±0.48),0.36(±0.48),0.1(±0.38),0.36(±0.48),0.36(±0.48),0.36(±0.48)
IDBA-UD,Multiple k-mer De Bruijn graph,16.21(±49.86),48013.69(±107525.28),46436.22(±107580.39),0.0(±0.0),Metagenomic,1135517.22(±2132472.67),0.25(±0.43),29.56(±76.51),0.02(±0.06),0.46(±0.49),0.4(±0.47),0.28(±0.65),0.47(±0.5),0.61(±0.99),0.44(±0.49)
MEGAHIT,Multiple k-mer De Bruijn graph,8.42(±25.44),66599.11(±136763.09),66455.53(±137159.35),0.0(±0.0),Metagenomic,1141251.53(±2142253.99),0.25(±0.43),18.11(±41.95),0.03(±0.08),0.47(±0.5),0.43(±0.47),0.06(±0.23),0.47(±0.5),0.48(±0.51),0.47(±0.5)
metaSPAdes,Multiple k-mer De Bruijn graph,7.19(±15.98),60717.78(±131337.07),59453.89(±130049.83),0.0(±0.0),Metagenomic,1140704.31(±2141201.79),0.25(±0.43),15.53(±29.28),0.03(±0.07),0.47(±0.5),0.43(±0.48),0.17(±0.44),0.47(±0.5),0.48(±0.51),0.47(±0.5)
MINIA,Single k-mer De Bruijn graph,65.18(±150.35),6392.81(±13423.27),5787.25(±13451.03),0.0(±0.0),Genomic,1099741.54(±2072424.11),0.24(±0.41),119.46(±243.01),0.01(±0.01),0.42(±0.5),0.39(±0.49),0.14(±0.48),0.42(±0.5),0.42(±0.5),0.42(±0.5)
SKESA,Multiple k-mer De Bruijn graph,6.72(±38.38),10421.97(±58025.85),10210.64(±58056.78),0.0(±0.0),Genomic,366940.28(±950611.99),0.1(±0.26),120.94(±388.81),0.01(±0.03),0.19(±0.4),0.19(±0.39),0.33(±1.99),0.19(±0.4),0.19(±0.4),0.19(±0.4)
SPAdes,Multiple k-mer De Bruijn graph,30.92(±138.63),36829.5(±85995.88),37693.47(±90368.89),0.0(±0.0),Genomic,1093453.36(±2101398.69),0.23(±0.41),105.28(±363.93),0.02(±0.05),0.56(±0.48),0.49(±0.47),16.17(±93.6),0.6(±0.52),0.63(±0.55),0.57(±0.48)
Unicycler,Multiple k-mer De Bruijn graph,7.72(±24.6),48388.56(±108092.74),45410.11(±107655.24),0.0(±0.0),Genomic,963355.97(±2016669.48),0.2(±0.39),14.22(±35.63),0.03(±0.06),0.22(±0.42),0.22(±0.41),0.08(±0.5),0.22(±0.42),0.22(±0.42),0.22(±0.42)
VelvetOptimizer,Multiple k-mer De Bruijn graph,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),Genomic,0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0),0.0(±0.0)
