# 1. Compute Results Overview Tables

The success of an assembly is evaluated by the computation of metrics in two defined ways: globally through statistics inherent to the complete set of sequences that were assembled, and relative to the replicons present in the sample. 

The computation of the global metrics is done through custom python code from the complete set of sequences assembled. 

The following metrics are computed for the complete and filtered set of assembled sequences, restricted to contigs of length above a specified minimum size: 

- **Contig sizes**
    - **Contigs:** The total number of contigs in the assembly;
    - **Basepairs:** The total number of bases in the assembly;
    - **Maximum sequence length:** The length of the largest contig in the assembly.
    - **Number of ‘N’s:** Number of uncalled bases (N's) 
- **Contiguity**
    - **Nx (where 0  < x  ⩽ 100):** Length for which the collection of all assembled sequences of that length or longer in an assembly covers at least a given percentage of the total length of the assembly
- **Misassembly**
    - **Misassemblies** - Number of aligned contigs that contain a misassembly event

After filtering, the sequences are mapped with the reference replicons and the metrics are computed through custom python code. The metrics are calculated, filtered for a minimum length, for each replicon in the file provided input references. 

- **Contig sizes**
    - **Contigs:** The total number of contigs in the assembly;
    - **Basepairs:** The total number of bases in the assembly;
    - **Number of ‘N’s:** Number of uncalled bases (N's) 
- **COMPASS**
    - **(Breadth of) Coverage:** Ratio of covered sequence on the reference by aligned contigs;
    - **Multiplicity:** Ratio of the length of alignable assembled sequence to covered sequence on the reference;
    - **Validity:** Ratio of the length of the alignable assembled sequence to total basepairs in the aligned contigs;
    - **Parsimony:** Cost of the assembly (multiplicity over validity);
- **Contiguity**
    - **Contiguity:** longest single alignment between the assembly and the reference, relative to the reference length;
    - **NAx (where 0  < x  ⩽ 100):** Length for which the collection of aligned assembled sequences of that length or longer in an assembly covers at least a given percentage of the total length of the reference replicon;
    - **NGx (where 0  < x  ⩽ 100):** Length for which the collection of aligned contigs of that length or longer covers at least a given percentage of the sequence of the reference.
    - **Lx (where 0  < x  ⩽ 100):** Minimal number of contigs that cover x % of the sequence of the reference;
- **Identity**
    - **Identity:** Ratio of identical basepairs in all aligned contigs to the reference;
    - **Lowest identity:** Identity of the lowest scoring contig to the reference.
    - **PLS**: Phred-like score per contig, per assembler.
- **Misassembly**
    - **Misassemblies** - Number of aligned contigs that contain a misassembly event


## Imports

In [92]:
import sys
from plotly.offline import plot
import glob
import fnmatch
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
import pandas as pd
from itertools import groupby
import csv
import numpy as np

## Global variables

In [93]:
ASSEMBLER_PROCESS_LIST = ["ABYSS", "BCALM2", "GATBMINIAPIPELINE", "METAHIPMER2", "MINIA", "MEGAHIT", "METASPADES", "UNICYCLER", "SPADES",
                          "SKESA", "VELVETOPTIMIZER", "IDBA"]
PROCESS_TO_NAME = {"ABYSS": "AbYSS",
                   "BCALM2": "BCALM2", 
                   "GATBMINIAPIPELINE": "GATBMiniaPipeline",
                   "METAHIPMER2": "MetaHipMer2",
                   "MINIA": "MINIA",
                   "MEGAHIT": "MEGAHIT", 
                   "METASPADES": "metaSPAdes", 
                   "UNICYCLER": "Unicycler", 
                   "SPADES": "SPAdes",
                    "SKESA": "SKESA",
                   "VELVETOPTIMIZER": "VelvetOptimizer",
                   "IDBA": "IDBA-UD"}

log_distributed = ['LHS', 'LNN', 'ERR2935805']
even_distribution = ['EMS', 'LNN', 'ERR2984773']

genomic_assemblers = ['AbYSS', 'BCALM2', 'MINIA', 'SKESA', 'SPAdes', 'Unicycler', 'VelvetOptimizer']
metagenomic_assemblers = ['MetaHipMer2','GATBMiniaPipeline', 'IDBA-UD', 'MEGAHIT', 'metaSPAdes']
single_kmer = ['BCALM2', 'MINIA', 'AbYSS']
multiple_kmer = ['SKESA', 'SPAdes', 'Unicycler', 'VelvetOptimizer', 'GATBMiniaPipeline', 
                 'IDBA-UD', 'MEGAHIT', 'metaSPAdes', 'MetaHipMer2']

## Global metrics

### Load data

In [97]:
report_glob = glob.glob('../Results/*/*/report/pipeline_report_tables.json')
global_pipeline_metrics_df = pd.DataFrame()

for pipeline_report_file in report_glob:
    report_file_name = pipeline_report_file.split('/')[-1]
    stats_run = pipeline_report_file.split('/')[-3]
    print('Processing {0} data from {1}...'.format(report_file_name, stats_run))
    
    with open(pipeline_report_file) as _fh:
        json_report = json.load(_fh)
        for sample in json_report.keys():
            for line in json_report[sample]['GlobalTable']:
                assembler = line['assembler']
                global_pipeline_metrics_df = global_pipeline_metrics_df.append({'run': stats_run,
                                                                                'sample': sample,
                                                                                'assembler': line['assembler'],
                                                                                'contigs': int(line['original']['contigs']),
                                                                                'basepairs': int(line['original']['basepairs']),
                                                                                'max_contig': int(line['original']['max_contig_size']),
                                                                                'n50': int(line['original']['N50']),
                                                                                'mapped_reads': line['original']['mapped_reads'],
                                                                                'Ns': int(line['original']['Ns']),
                                                                                'misassembled contigs': line['filtered']['misassembled_contigs'],
                                                                                'misassembly events': line['filtered']['misassembly_events'],
                                                                                'filtered_contigs': line['filtered']['contigs'],
                                                                                'filtered_basepairs': line['filtered']['basepairs'],
                                                                                'filtered_n50': line['filtered']['N50'],
                                                                                'filtered_Ns': line['filtered']['Ns'],
                                                                                'filtered_mapped_reads': line['filtered']['mapped_reads'],
                                                                               },
                                                                               ignore_index=True)
global_pipeline_metrics_df['distribution'] = np.where(global_pipeline_metrics_df['sample'].isin(log_distributed), 'Log', 'Even')
global_pipeline_metrics_df['type'] = np.where(global_pipeline_metrics_df['assembler'].isin(genomic_assemblers), 'Genomic', 'Metagenomic')
global_pipeline_metrics_df['algorythm'] = np.where(global_pipeline_metrics_df['assembler'].isin(single_kmer), 'Single k-mer De Bruijn graph', 'Multiple k-mer De Bruijn graph')
global_pipeline_metrics_df[['contigs','basepairs','max_contig', 'Ns','n50', 'filtered_n50','misassembled contigs', 'misassembly events']] = global_pipeline_metrics_df[['contigs','basepairs','max_contig', 'Ns','n50','filtered_n50','misassembled contigs', 'misassembly events']].apply(pd.to_numeric)

Processing pipeline_report_tables.json data from run2...
Processing pipeline_report_tables.json data from run3...
Processing pipeline_report_tables.json data from run1...


### Preview data

In [98]:
global_pipeline_metrics_df.dtypes

Ns                       float64
assembler                 object
basepairs                float64
contigs                  float64
filtered_Ns              float64
filtered_basepairs       float64
filtered_contigs         float64
filtered_mapped_reads    float64
filtered_n50             float64
mapped_reads             float64
max_contig               float64
misassembled contigs     float64
misassembly events       float64
n50                      float64
run                       object
sample                    object
distribution              object
type                      object
algorythm                 object
dtype: object

In [99]:
global_pipeline_metrics_df

Unnamed: 0,Ns,assembler,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50,run,sample,distribution,type,algorythm
0,147.0,ABySS,91720949.0,497783.0,5.0,30657287.0,1196.0,82.019612,52869.0,92.343671,298093.0,2.0,6.0,139.0,run2,ERR2984773,Even,Metagenomic,Multiple k-mer De Bruijn graph
1,0.0,BCALM2,170402348.0,3508016.0,0.0,0.0,0.0,0.000000,0.0,8.151058,994.0,0.0,0.0,49.0,run2,ERR2984773,Even,Genomic,Single k-mer De Bruijn graph
2,0.0,GATBMiniaPipeline,32809112.0,5397.0,0.0,30564302.0,761.0,82.840039,113162.0,84.068286,768462.0,1.0,2.0,102115.0,run2,ERR2984773,Even,Metagenomic,Multiple k-mer De Bruijn graph
3,0.0,IDBA-UD,39773020.0,20164.0,0.0,31562690.0,2654.0,81.596739,29134.0,84.336773,303329.0,4.0,7.0,20993.0,run2,ERR2984773,Even,Metagenomic,Multiple k-mer De Bruijn graph
4,0.0,MEGAHIT,41166407.0,20614.0,0.0,32021766.0,1447.0,84.138912,149841.0,86.166393,1205839.0,15.0,32.0,88308.0,run2,ERR2984773,Even,Metagenomic,Multiple k-mer De Bruijn graph
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
135,0.0,MINIA,15334799.0,24654.0,0.0,12920975.0,2171.0,81.993164,10582.0,91.155743,163543.0,0.0,0.0,7727.0,run1,LHS,Log,Genomic,Single k-mer De Bruijn graph
136,0.0,SKESA,2957060.0,36.0,0.0,2950393.0,21.0,92.094062,351027.0,92.347172,632013.0,0.0,0.0,351027.0,run1,LHS,Log,Genomic,Multiple k-mer De Bruijn graph
137,0.0,SPAdes,16214591.0,31040.0,0.0,13087777.0,1526.0,77.136681,69361.0,99.314996,263645.0,1.0,2.0,46478.0,run1,LHS,Log,Genomic,Multiple k-mer De Bruijn graph
138,0.0,Unicycler,13324149.0,264.0,0.0,13315624.0,243.0,96.651358,192888.0,97.201237,847572.0,5.0,10.0,192888.0,run1,LHS,Log,Genomic,Multiple k-mer De Bruijn graph


### Save data

In [100]:
global_pipeline_metrics_df.to_csv("Tables/Results/Global metrics.csv")

### Global statistics per assembler type

In [101]:
stats_per_assembler_type = pd.DataFrame()
for assembly_type in global_pipeline_metrics_df.type.unique():
    row={'Assembler Type': assembly_type}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.type == assembly_type].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2), int(minimum), int(maximum))
    stats_per_assembler_type=stats_per_assembler_type.append(row, ignore_index=True)
stats_per_assembler_type = stats_per_assembler_type.set_index('Assembler Type')
stats_per_assembler_type.to_csv("Tables/Results/Global metrics per assembler type.csv")

In [102]:
stats_per_assembler_type

Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Metagenomic,768.38 [0;8788],32228239.44 [13174826;91720949],61715.1 [640;677860],590.35 [0;8283],24113385.45 [13883;32021766],1228.86 [13;5038],85.24 [0;107],102483.75 [1039;239185],90.7 [1;297],613615.18 [1157;1205839],4.08 [0;18],7.84 [0;36],87849.68 [85;238611]
Genomic,0.0 [0;0],46130651.25 [2957060;364761201],470678.18 [36;8175890],0.0 [0;0],22632362.1 [0;32500826],1715.12 [0;6250],79.7 [0;99],95228.97 [0;351027],85.17 [1;107],486162.63 [994;1303875],11.55 [0;572],23.08 [0;1148],89248.1 [35;351027]


### Global stats per assembly algorithm

In [103]:
stats_per_assembler_algo = pd.DataFrame()
for assembly_algo in global_pipeline_metrics_df.algorythm.unique():
    row={'Assembler Algorithm': assembly_algo}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.algorythm == assembly_algo].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2), int(minimum), int(maximum))
    stats_per_assembler_algo=stats_per_assembler_algo.append(row, ignore_index=True)
stats_per_assembler_algo = stats_per_assembler_algo.set_index('Assembler Algorithm')
stats_per_assembler_algo.to_csv("Tables/Results/Global metrics per assembler algorithm.csv")

In [104]:
stats_per_assembler_algo

Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Multiple k-mer De Bruijn graph,529.91 [0;8788],30716745.31 [2957060;91720949],44688.53 [36;677860],407.14 [0;8283],24527772.38 [13883;32500826],1143.22 [13;5038],87.09 [0;107],117011.65 [1039;351027],91.8 [1;297],652239.76 [1157;1303875],8.75 [0;572],17.28 [0;1148],104650.93 [85;351027]
Single k-mer De Bruijn graph,0.0 [0;0],74289823.92 [15334799;364761201],1166417.88 [15001;8175890],0.0 [0;0],18407956.92 [0;28279162],2858.46 [0;6250],62.45 [0;97],14128.62 [0;33550],71.58 [1;99],108298.33 [994;177038],0.17 [0;2],0.29 [0;4],10139.67 [35;26474]


### Global stats per assembler

In [105]:
stats_per_assembler = pd.DataFrame()
for assembler in global_pipeline_metrics_df.assembler.unique():
    row={'Assembler': assembler, 
         'Type':global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].type.unique()[0],
         'Algorithm': global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2), int(minimum), int(maximum))
    stats_per_assembler=stats_per_assembler.append(row, ignore_index=True)
stats_per_assembler = stats_per_assembler.set_index('Assembler')
stats_per_assembler.to_csv("Tables/Results/Global metrics per assembler.csv")

In [106]:
stats_per_assembler

Unnamed: 0_level_0,Algorithm,Ns,Type,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Multiple k-mer De Bruijn graph,1594.1 [9;3884],Metagenomic,48050192.0 [20132631;91720949],182966.7 [4769;677860],923.0 [0;2328],26320647.2 [3150980;30988200],1093.3 [356;1825],98.09 [79;107],100040.5 [18568;192306],122.57 [92;297],691710.5 [88665;952618],3.2 [0;8],5.7 [0;14],64240.5 [101;124246]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],Genomic,118585717.58 [19635512;364761201],2301095.5 [25377;8175890],0.0 [0;0],12317748.75 [0;28174748],2564.5 [0;6250],38.02 [0;97],8752.33 [0;31436],51.53 [1;98],64194.08 [994;163543],0.0 [0;0],0.0 [0;0],6086.08 [35;18682]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,27162999.67 [13638625;32810830],2565.25 [693;6777],0.0 [0;0],26216699.17 [13507672;30564396],690.0 [219;814],93.95 [82;98],108613.08 [90988;140561],94.99 [84;100],692934.42 [542997;768585],1.5 [0;3],3.0 [0;6],102311.75 [56759;136741]
IDBA-UD,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,29452731.0 [13791669;39779384],7388.92 [1025;20194],0.0 [0;0],26691259.58 [13601126;31562690],1592.92 [261;2725],93.16 [81;99],66708.58 [29134;113853],94.88 [84;100],493485.17 [303329;632213],9.17 [2;18],16.92 [2;36],58013.67 [20993;105808]
MEGAHIT,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,29973124.08 [13843596;41166407],7000.17 [767;20614],0.0 [0;0],27042637.5 [13671964;32021766],917.75 [172;3044],95.43 [84;100],174773.25 [149841;210993],96.57 [86;101],858411.0 [632250;1205839],8.67 [0;15],18.0 [0;32],147653.58 [82249;203548]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,23684949.7 [13174826;30381792],7533.5 [882;19361],0.0 [0;0],19665016.4 [1599587;30264953],1325.7 [712;5038],67.41 [3;98],63175.6 [1175;106637],75.53 [25;99],283385.4 [2964;468375],0.0 [0;0],0.0 [0;0],62261.0 [658;106383]
metaSPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,32266278.08 [14157309;48964620],18684.67 [640;61644],0.0 [0;0],27034872.42 [13672885;31967753],912.83 [139;3392],94.58 [83;100],179333.08 [87983;239185],95.56 [85;100],806147.42 [473277;1036942],1.67 [1;3],4.08 [3;7],157272.92 [3674;238611]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],Genomic,29993930.25 [15334799;41187744],31740.25 [15001;68012],0.0 [0;0],24498165.08 [12920975;28279162],3152.42 [762;3983],86.87 [74;97],19504.92 [10582;33550],91.63 [80;99],152402.58 [117945;177038],0.33 [0;2],0.58 [0;4],14193.25 [7060;26474]
SKESA,Multiple k-mer De Bruijn graph,0.0 [0;0],Genomic,24362387.0 [2957060;30430077],2766.75 [36;9514],0.0 [0;0],23629742.75 [2950393;30211740],1227.67 [21;2889],89.47 [62;97],78840.67 [2148;351027],91.75 [78;98],487176.67 [29155;632013],5.75 [0;12],11.17 [0;23],77687.25 [1155;351027]
SPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],Genomic,32022403.0 [15606257;46535940],17112.17 [1140;39819],0.0 [0;0],27143287.25 [13087777;32500826],1288.92 [227;4999],90.97 [75;99],149985.25 [55465;208991],96.85 [85;107],756751.58 [263645;850176],50.75 [0;572],101.83 [0;1148],130334.75 [1972;206515]


### Global stats per distribution

In [107]:
for distribution in global_pipeline_metrics_df.distribution.unique():
    stats_per_distribution = pd.DataFrame()
    for assembler in global_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':global_pipeline_metrics_df[(global_pipeline_metrics_df.assembler == assembler) & (global_pipeline_metrics_df.distribution == distribution)].type.unique()[0],
             'Algorithm': global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=global_pipeline_metrics_df[(global_pipeline_metrics_df.assembler == assembler) & (global_pipeline_metrics_df.distribution == distribution)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            minimum = lala.loc['min',column]
            maximum = lala.loc['max',column]
            row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
        stats_per_distribution=stats_per_distribution.append(row, ignore_index=True)
    stats_per_distribution = stats_per_distribution.set_index('Assembler')
    stats_per_distribution.to_csv("Tables/Results/{} - Global metrics per assembler.csv".format(distribution))
    print(distribution)
    display(stats_per_distribution)

Even


Unnamed: 0_level_0,Algorithm,Ns,Type,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Multiple k-mer De Bruijn graph,1823.12 [147;3884],Metagenomic,47705094.62 [31503127;91720949],138398.25 [4769;497797],1103.75 [5;2328],30810219.75 [30632246;30988200],1170.62 [497;1825],100.22 [81;107],98691.38 [52698;138706],103.33 [92;108],747140.75 [297881;952618],4.0 [2;8],7.12 [3;14],79091.88 [139;124246]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],Genomic,80340494.0 [31047581;170402348],1261858.33 [25377;3508016],0.0 [0;0],14860681.0 [0;28174748],3275.0 [0;6250],39.88 [0;92],7918.67 [0;20700],56.66 [8;96],66942.33 [994;161812],0.0 [0;0],0.0 [0;0],6029.67 [49;17395]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,31253352.33 [30440762;32810830],2480.44 [943;5400],0.0 [0;0],30393672.67 [30273320;30564396],744.0 [657;814],93.59 [82;98],108898.33 [90988;122545],94.75 [84;100],723087.33 [632215;768585],1.67 [1;3],3.0 [1;6],104806.0 [90346;121957]
IDBA-UD,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,33525497.0 [30390566;39779384],7944.44 [1822;20194],0.0 [0;0],30613010.56 [30130413;31562690],1729.0 [1254;2654],92.47 [81;98],52473.33 [29134;65108],94.54 [84;100],452420.67 [303329;585610],11.44 [4;18],21.22 [7;36],48829.89 [20993;63725]
MEGAHIT,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,34103641.33 [30545419;41166407],7469.0 [814;20614],0.0 [0;0],30953572.33 [30418624;32021766],830.0 [518;1447],94.89 [84;100],167083.67 [149841;179870],96.22 [86;101],868917.67 [632274;1205839],11.33 [9;15],23.56 [18;32],146076.33 [88308;178381]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,24646614.11 [13174826;30381792],7041.0 [882;19361],0.0 [0;0],20709564.33 [1599587;30264953],913.22 [712;1310],66.63 [3;98],69954.0 [1175;106637],74.44 [25;99],313232.67 [2964;468375],0.0 [0;0],0.0 [0;0],69010.56 [658;106383]
metaSPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,36633480.0 [30453778;48964620],21015.0 [640;61644],0.0 [0;0],30917818.67 [30383186;31967753],804.33 [359;1689],94.17 [83;100],180893.67 [95101;238611],95.36 [85;100],809315.0 [473277;1036942],1.67 [1;2],4.0 [3;6],163304.0 [42332;238611]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],Genomic,34159156.67 [30641333;41187744],33769.78 [16570;68012],0.0 [0;0],28234093.56 [28205937;28279162],3714.44 [3574;3983],86.52 [74;92],19098.78 [15817;20782],91.29 [80;97],147189.67 [117945;161812],0.0 [0;0],0.0 [0;0],14292.56 [7060;17967]
SKESA,Multiple k-mer De Bruijn graph,0.0 [0;0],Genomic,30263322.33 [30110758;30430077],1647.67 [1269;2322],0.0 [0;0],30037221.0 [29835112;30211740],1154.33 [846;1726],91.9 [80;97],65058.67 [43168;77870],92.76 [81;98],569520.0 [446163;632013],6.33 [0;10],12.33 [0;19],64292.67 [42115;76924]
SPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],Genomic,35881180.33 [30442718;46535940],14529.0 [1140;39819],0.0 [0;0],31043946.33 [30276227;32500826],968.33 [412;2049],93.29 [82;99],170923.67 [99299;208991],95.02 [85;100],815228.0 [745370;850176],4.0 [0;9],8.0 [0;18],154762.67 [53292;206515]


Log


Unnamed: 0_level_0,Algorithm,Ns,Type,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Multiple k-mer De Bruijn graph,678.0 [9;1347],Metagenomic,49430581.5 [20132631;78728532],361240.5 [44621;677860],200.0 [0;400],8362357.0 [3150980;13573734],784.0 [356;1212],89.57 [79;99],105437.0 [18568;192306],199.55 [101;297],469989.5 [88665;851314],0.0 [0;0],0.0 [0;0],4835.0 [101;9569]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],Genomic,233321388.33 [19635512;364761201],5418807.0 [44825;8175890],0.0 [0;0],4688952.0 [11967;13397815],433.0 [11;784],32.44 [0;97],11253.33 [1062;31436],36.14 [1;98],55949.33 [1212;163543],0.0 [0;0],0.0 [0;0],6255.33 [35;18682]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,14891941.67 [13638625;17126362],2819.67 [693;6777],0.0 [0;0],13685778.67 [13507672;13891328],528.0 [219;762],95.04 [89;98],107757.33 [90989;140561],95.72 [89;100],602475.67 [542997;632215],1.0 [0;2],3.0 [0;6],94829.0 [56759;136741]
IDBA-UD,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,17234433.0 [13791669;23586883],5722.33 [1025;14155],0.0 [0;0],14926006.67 [13601126;17534580],1184.67 [261;2725],95.23 [89;99],109414.33 [106347;113853],95.91 [90;99],616678.67 [585610;632213],2.33 [2;3],4.0 [2;5],85565.0 [51161;105808]
MEGAHIT,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,17581572.33 [13843596;24498566],5593.67 [767;14172],0.0 [0;0],15309833.0 [13671964;18552996],1181.0 [172;3044],97.05 [93;100],197842.0 [171540;210993],97.63 [94;100],826891.0 [632250;1080013],0.67 [0;1],1.33 [0;2],152385.33 [82249;203548]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,15029970.0 [15029970;15029970],11966.0 [11966;11966],0.0 [0;0],10264085.0 [10264085;10264085],5038.0 [5038;5038],74.39 [74;74],2170.0 [2170;2170],85.28 [85;85],14760.0 [14760;14760],0.0 [0;0],0.0 [0;0],1515.0 [1515;1515]
metaSPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],Metagenomic,19164672.33 [14157309;27822010],11693.67 [2099;26282],0.0 [0;0],15386033.67 [13672885;18792639],1238.33 [139;3392],95.81 [89;99],174651.33 [87983;239185],96.17 [90;100],796644.67 [622107;917728],1.67 [1;3],4.33 [3;7],139179.67 [3674;220609]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],Genomic,17498251.0 [15334799;20758864],25651.67 [15001;37300],0.0 [0;0],13290379.67 [12920975;13501498],1466.33 [762;2171],87.94 [81;97],20723.33 [10582;33550],92.64 [87;99],168041.33 [163543;177038],1.33 [0;2],2.33 [0;4],13895.33 [7485;26474]
SKESA,Multiple k-mer De Bruijn graph,0.0 [0;0],Genomic,6659581.0 [2957060;8818742],6124.0 [36;9514],0.0 [0;0],4407308.0 [2950393;5801439],1447.67 [21;2889],82.19 [62;92],120186.67 [2148;351027],88.7 [78;95],240146.67 [29155;632013],4.0 [0;12],7.67 [0;23],117871.0 [1155;351027]
SPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],Genomic,20446071.0 [15606257;29517365],24861.67 [7331;36214],0.0 [0;0],15441310.0 [13087777;19595813],2250.67 [227;4999],84.02 [75;98],87170.0 [55465;136684],102.37 [99;107],581322.33 [263645;847683],191.0 [0;572],383.33 [0;1148],57051.0 [1972;122703]


## Reference-based metrics

### Load data

In [109]:
report_glob = glob.glob('../Results/*/*/report/pipeline_report_tables.json')
reference_pipeline_metrics_df = pd.DataFrame()

for pipeline_report_file in report_glob:
    report_file_name = pipeline_report_file.split('/')[-1]
    stats_run = pipeline_report_file.split('/')[-3]
    print('Processing {0} data from {1}...'.format(report_file_name, stats_run))
    
    with open(pipeline_report_file) as _fh:
        json_report = json.load(_fh)
        for sample in json_report.keys():
            for reference, data in json_report[sample]['ReferenceTables'].items():
                for row in data:
                    for item in row:
                        reference_pipeline_metrics_df = reference_pipeline_metrics_df.append({'run': stats_run,
                                                                                        'sample': sample,
                                                                                        'assembler': item['assembler'],
                                                                                        'reference': reference,
                                                                                        'LSA': item['contiguity'],
                                                                                        'breadth_of_coverage': item['breadth_of_coverage'],
                                                                                        'multiplicity': item['multiplicity'],
                                                                                        'validity': item['validity'],
                                                                                        'parsimony': item['parsimony'],
                                                                                        'identity': item['identity'],
                                                                                        'lowest_identity': item['lowest_identity'],
                                                                                        'L90': item['L90'],
                                                                                        'contigs': item['aligned_contigs'],
                                                                                        'NA50': item['NA50'],
                                                                                        'NG50': item['NG50'],
                                                                                        'basepairs': item['aligned_basepairs'],
                                                                                        'Ns': item['Ns'],
                                                                                        'misassembled contigs': item['misassembled_contigs'],
                                                                                        'misassembly events': item['misassembly_events']},
                                                                                       ignore_index=True)

reference_pipeline_metrics_df['distribution'] = np.where(reference_pipeline_metrics_df['sample'].isin(log_distributed), 'Log', 'Even')
reference_pipeline_metrics_df['type'] = np.where(reference_pipeline_metrics_df['assembler'].isin(genomic_assemblers), 'Genomic', 'Metagenomic')
reference_pipeline_metrics_df['algorythm'] = np.where(reference_pipeline_metrics_df['assembler'].isin(single_kmer), 'Single k-mer De Bruijn graph', 'Multiple k-mer De Bruijn graph')
reference_pipeline_metrics_df[['contigs','basepairs','L90','Ns','NA50','NG50','misassembled contigs', 'misassembly events', 'multiplicity','validity','parsimony','identity','lowest_identity']] = reference_pipeline_metrics_df[['contigs','basepairs','L90','Ns','NA50','NG50','misassembled contigs', 'misassembly events','multiplicity','validity','parsimony','identity','lowest_identity']].apply(pd.to_numeric)
display(reference_pipeline_metrics_df)

Processing pipeline_report_tables.json data from run2...
Processing pipeline_report_tables.json data from run3...
Processing pipeline_report_tables.json data from run1...


Unnamed: 0,L90,LSA,NA50,NG50,Ns,assembler,basepairs,breadth_of_coverage,contigs,identity,...,misassembly events,multiplicity,parsimony,reference,run,sample,validity,distribution,type,algorythm
0,39.0,0.073679,98440.0,98440.0,0.0,ABySS,3977886.0,0.983244,66.0,0.999639,...,0.0,1.001060,1.001159,Bacillus_subtilis,run2,ERR2984773,0.999901,Even,Metagenomic,Multiple k-mer De Bruijn graph
1,0.0,0.000000,0.0,0.0,0.0,BCALM2,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,Bacillus_subtilis,run2,ERR2984773,0.000000,Even,Genomic,Single k-mer De Bruijn graph
2,15.0,0.189942,263420.0,257358.0,0.0,GATBMiniaPipeline,3982010.0,0.984263,30.0,0.999809,...,0.0,1.000079,1.000083,Bacillus_subtilis,run2,ERR2984773,0.999995,Even,Metagenomic,Multiple k-mer De Bruijn graph
3,120.0,0.031121,34330.0,34330.0,0.0,IDBA-UD,3976919.0,0.983005,184.0,0.999756,...,0.0,1.003298,1.003311,Bacillus_subtilis,run2,ERR2984773,0.999986,Even,Metagenomic,Multiple k-mer De Bruijn graph
4,14.0,0.103094,294549.0,294549.0,0.0,MEGAHIT,3989518.0,0.986119,31.0,0.978636,...,4.0,0.998972,1.011678,Bacillus_subtilis,run2,ERR2984773,0.987440,Even,Metagenomic,Multiple k-mer De Bruijn graph
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1675,0.0,0.000000,0.0,0.0,0.0,MINIA,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,Staphylococcus_aureus_plasmid3,run1,LHS,0.000000,Log,Genomic,Single k-mer De Bruijn graph
1676,0.0,0.000000,0.0,0.0,0.0,SKESA,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,Staphylococcus_aureus_plasmid3,run1,LHS,0.000000,Log,Genomic,Multiple k-mer De Bruijn graph
1677,0.0,0.000000,0.0,0.0,0.0,SPAdes,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,Staphylococcus_aureus_plasmid3,run1,LHS,0.000000,Log,Genomic,Multiple k-mer De Bruijn graph
1678,0.0,0.000000,0.0,0.0,0.0,Unicycler,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,Staphylococcus_aureus_plasmid3,run1,LHS,0.000000,Log,Genomic,Multiple k-mer De Bruijn graph


In [110]:
reference_pipeline_metrics_df.dtypes

L90                     float64
LSA                     float64
NA50                    float64
NG50                    float64
Ns                      float64
assembler                object
basepairs               float64
breadth_of_coverage     float64
contigs                 float64
identity                float64
lowest_identity         float64
misassembled contigs    float64
misassembly events      float64
multiplicity            float64
parsimony               float64
reference                object
run                      object
sample                   object
validity                float64
distribution             object
type                     object
algorythm                object
dtype: object

### Save data

In [111]:
reference_pipeline_metrics_df.to_csv("Tables/Results/Reference metrics.csv")

### Reference statistics per assembler type

In [112]:
ref_stats_per_assembler_type = pd.DataFrame()
for assembly_type in reference_pipeline_metrics_df.type.unique():
    row={'Assembler Type': assembly_type}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.type == assembly_type].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_assembler_type=ref_stats_per_assembler_type.append(row, ignore_index=True)
ref_stats_per_assembler_type = ref_stats_per_assembler_type.set_index('Assembler Type')
ref_stats_per_assembler_type.to_csv("Tables/Results/Reference metrics per assembler type.csv")

In [113]:
ref_stats_per_assembler_type

Unnamed: 0_level_0,L90,LSA,NA50,NG50,Ns,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Metagenomic,30.94 [0;1033],0.22 [0;1],77775.8 [0;593167],77156.14 [0;593167],70.11 [0;12098],1973596.73 [0;6781605],0.71 [0;1],83.76 [0;2402],0.8 [0;1],0.77 [0;1],0.36 [0;9],0.68 [0;15],0.8 [0;1],0.81 [0;5],0.8 [0;1]
Genomic,40.57 [0;921],0.16 [0;1],62460.97 [0;475179],61338.27 [0;475179],0.0 [0;0],1863265.6 [0;6728888],0.62 [0;1],132.4 [0;1877],0.71 [0;1],0.67 [0;1],1.0 [0;565],1.96 [0;1134],0.71 [0;1],0.71 [0;1],0.71 [0;1]


### Reference stats per assembly algorithm

In [114]:
ref_stats_per_assembler_algo = pd.DataFrame()
for assembly_algo in reference_pipeline_metrics_df.algorythm.unique():
    row={'Assembler Algorithm': assembly_algo}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.algorythm == assembly_algo].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_assembler_algo=ref_stats_per_assembler_algo.append(row, ignore_index=True)
ref_stats_per_assembler_algo = ref_stats_per_assembler_algo.set_index('Assembler Algorithm')
ref_stats_per_assembler_algo.to_csv("Tables/Results/Reference metrics per assembler algorithm.csv")

In [115]:
ref_stats_per_assembler_algo

Unnamed: 0_level_0,L90,LSA,NA50,NG50,Ns,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler Algorithm,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Multiple k-mer De Bruijn graph,29.85 [0;1033],0.23 [0;1],83811.38 [0;593167],82906.92 [0;593167],48.35 [0;12098],2008465.08 [0;6781605],0.71 [0;1],77.17 [0;2402],0.8 [0;1],0.76 [0;1],0.76 [0;565],1.47 [0;1134],0.81 [0;1],0.81 [0;5],0.8 [0;1]
Single k-mer De Bruijn graph,60.32 [0;921],0.05 [0;0],10316.74 [0;65138],9816.02 [0;64960],0.0 [0;0],1529238.55 [0;6524118],0.46 [0;0],237.24 [0;1495],0.56 [0;1],0.55 [0;1],0.02 [0;2],0.03 [0;4],0.56 [0;1],0.56 [0;1],0.56 [0;1]


### Reference stats per assembler

In [116]:
ref_stats_per_assembler = pd.DataFrame()
for assembler in reference_pipeline_metrics_df.assembler.unique():
    row={'Assembler': assembler, 
         'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
         'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_assembler=ref_stats_per_assembler.append(row, ignore_index=True)
ref_stats_per_assembler = ref_stats_per_assembler.set_index('Assembler')
ref_stats_per_assembler.to_csv("Tables/Results/Reference metrics per assembler.csv")

In [117]:
ref_stats_per_assembler

Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,58.42 [0;1033],0.25 [0;0],86644.12 [0;351511],86507.42 [0;351511],77.04 [0;2170],Metagenomic,2166028.81 [0;6781605],0.76 [0;0],90.72 [0;1404],0.82 [0;1],0.8 [0;1],0.27 [0;8],0.48 [0;14],0.83 [0;1],0.83 [0;1],0.82 [0;1]
BCALM2,Single k-mer De Bruijn graph,41.8 [0;921],0.03 [0;0],6063.03 [0;65138],5720.81 [0;64934],0.0 [0;0],Genomic,1024596.01 [0;6515102],0.3 [0;0],213.69 [0;1495],0.42 [0;1],0.42 [0;1],0.0 [0;0],0.0 [0;0],0.42 [0;1],0.42 [0;1],0.42 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,29.4 [0;459],0.27 [0;1],89885.56 [0;452364],82812.65 [0;351186],0.0 [0;0],Metagenomic,2175625.31 [0;6716522],0.79 [0;1],54.33 [0;604],0.84 [0;1],0.83 [0;1],0.12 [0;3],0.25 [0;6],0.84 [0;1],0.84 [0;1],0.84 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,55.7 [0;292],0.21 [0;1],53880.44 [0;475377],52523.53 [0;475377],0.0 [0;0],Metagenomic,2164508.2 [0;6710005],0.77 [0;1],98.23 [0;449],0.84 [0;1],0.75 [0;1],0.9 [0;9],1.55 [0;15],0.85 [0;1],0.89 [0;5],0.84 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,15.83 [0;147],0.29 [0;1],120904.61 [0;543128],122959.26 [0;543128],0.0 [0;0],Metagenomic,2186131.55 [0;6738936],0.8 [0;1],37.82 [0;223],0.86 [0;1],0.81 [0;1],0.74 [0;4],1.52 [0;7],0.87 [0;1],0.87 [0;1],0.87 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,19.53 [0;110],0.21 [0;0],52708.73 [0;257424],52025.82 [0;257424],0.0 [0;0],Metagenomic,1632657.63 [0;6717767],0.63 [0;0],107.52 [0;2402],0.89 [0;1],0.88 [0;1],0.0 [0;0],0.0 [0;0],0.89 [0;1],0.89 [0;1],0.89 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,16.9 [0;90],0.27 [0;1],115349.42 [0;593167],114559.37 [0;593167],0.0 [0;0],Metagenomic,2182712.19 [0;6733500],0.79 [0;1],33.38 [0;167],0.87 [0;1],0.85 [0;1],0.15 [0;2],0.35 [0;6],0.88 [0;1],0.88 [0;1],0.87 [0;1]
MINIA,Single k-mer De Bruijn graph,78.83 [0;649],0.07 [0;0],14570.44 [0;65138],13911.24 [0;64960],0.0 [0;0],Genomic,2033881.08 [0;6524118],0.62 [0;0],260.78 [0;1147],0.71 [0;1],0.69 [0;1],0.03 [0;2],0.06 [0;4],0.71 [0;1],0.71 [0;1],0.71 [0;1]
SKESA,Multiple k-mer De Bruijn graph,45.12 [0;670],0.2 [0;0],59645.81 [0;351027],58903.85 [0;351027],0.0 [0;0],Genomic,1964267.06 [0;6673601],0.7 [0;0],102.28 [0;1877],0.8 [0;1],0.7 [0;1],0.58 [0;12],1.03 [0;23],0.8 [0;1],0.8 [0;1],0.8 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,23.5 [0;835],0.22 [0;1],97446.0 [0;351104],97219.7 [0;351104],0.0 [0;0],Genomic,2166426.08 [0;6728888],0.74 [0;1],57.19 [0;1839],0.84 [0;1],0.8 [0;1],4.29 [0;565],8.55 [0;1134],0.86 [0;1],0.88 [0;1],0.84 [0;1]


### Reference stats per reference replicon

#### Per assembler for each reference replicon

In [118]:
reference_pipeline_metrics_df.reference.unique()

array(['Bacillus_subtilis', 'Enterococcus_faecalis',
       'Escherichia_coli_plasmid', 'Escherichia_coli',
       'Lactobacillus_fermentum', 'Listeria_monocytogenes',
       'Pseudomonas_aeruginosa', 'Salmonella_enterica',
       'Staphylococcus_aureus', 'Staphylococcus_aureus_plasmid1',
       'Staphylococcus_aureus_plasmid2', 'Staphylococcus_aureus_plasmid3'],
      dtype=object)

In [119]:
for reference in reference_pipeline_metrics_df.reference.unique():
    ref_stats_per_reference_per_assembler = pd.DataFrame()
    for assembler in reference_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.assembler == assembler) & (reference_pipeline_metrics_df.reference == reference)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            minimum = lala.loc['min',column]
            maximum = lala.loc['max',column]
            row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
        ref_stats_per_reference_per_assembler=ref_stats_per_reference_per_assembler.append(row, ignore_index=True)
    ref_stats_per_reference_per_assembler = ref_stats_per_reference_per_assembler.set_index('Assembler')
    ref_stats_per_reference_per_assembler.to_csv("Tables/Results/{} - Reference metrics per assembler.csv".format(reference))
    print(reference)
    display(ref_stats_per_reference_per_assembler)

Bacillus_subtilis


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,119.7 [0;1033],0.08 [0;0],186473.7 [0;295233],186436.1 [0;295233],54.1 [0;400],Metagenomic,3559593.4 [0;3991152],0.88 [0;0],141.7 [0;1141],0.9 [0;0],0.89 [0;0],0.0 [0;0],0.0 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;0]
BCALM2,Single k-mer De Bruijn graph,27.25 [0;129],0.01 [0;0],19702.58 [0;65138],19337.25 [0;64934],0.0 [0;0],Genomic,2083135.33 [0;3951974],0.51 [0;0],455.08 [0;1467],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,64.67 [13;459],0.13 [0;0],207579.58 [8915;269695],204452.42 [8732;263445],0.0 [0;0],Metagenomic,3971062.42 [3882084;3984103],0.98 [0;0],96.5 [27;604],1.0 [0;1],0.99 [0;1],0.25 [0;2],0.75 [0;6],1.0 [1;1],1.0 [1;1],1.0 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,72.58 [18;292],0.06 [0;0],139056.17 [14006;252653],128977.08 [13214;212969],0.0 [0;0],Metagenomic,3971930.75 [3925882;3977845],0.98 [0;0],113.5 [34;418],1.0 [0;0],0.91 [0;0],0.33 [0;2],0.67 [0;5],1.0 [1;1],1.0 [1;1],1.0 [0;0]
MEGAHIT,Multiple k-mer De Bruijn graph,26.17 [13;147],0.12 [0;0],239949.58 [26511;294549],240054.5 [26209;294549],0.0 [0;0],Metagenomic,3986036.5 [3969756;3989518],0.99 [0;0],46.83 [27;223],0.99 [0;0],0.83 [0;0],0.83 [0;2],1.92 [0;4],1.0 [0;1],1.0 [0;1],1.0 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,9.3 [0;16],0.06 [0;0],155049.2 [1145;257424],154676.1 [0;257424],0.0 [0;0],Metagenomic,2800013.9 [205541;3976387],0.69 [0;0],210.3 [27;1409],1.0 [0;1],0.99 [0;1],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,23.75 [9;66],0.15 [0;0],224502.58 [61490;343843],224268.42 [62046;343843],0.0 [0;0],Metagenomic,3979708.42 [3971865;3987932],0.98 [0;0],45.25 [18;110],1.0 [0;0],0.98 [0;0],0.33 [0;2],1.0 [0;6],1.0 [0;1],1.0 [0;1],1.0 [0;1]
MINIA,Single k-mer De Bruijn graph,135.83 [66;649],0.03 [0;0],49500.25 [6662;65138],48602.83 [6313;64960],0.0 [0;0],Genomic,3939810.5 [3821823;3953200],0.97 [0;0],206.58 [116;783],1.0 [0;1],1.0 [0;1],0.33 [0;2],0.58 [0;4],1.0 [1;1],1.0 [1;1],1.0 [0;1]
SKESA,Multiple k-mer De Bruijn graph,12.75 [0;20],0.07 [0;0],181820.75 [0;269493],180238.5 [0;263164],0.0 [0;0],Genomic,2977377.25 [0;3971432],0.74 [0;0],22.5 [0;34],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,19.67 [11;72],0.16 [0;0],258653.92 [56470;315090],262779.92 [54127;315090],0.0 [0;0],Genomic,3977556.17 [3964395;3982668],0.98 [0;0],37.42 [22;116],1.0 [0;0],0.99 [0;0],0.17 [0;2],0.42 [0;5],1.0 [0;1],1.0 [0;1],1.0 [0;1]


Enterococcus_faecalis


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,18.5 [0;64],0.12 [0;0],153198.2 [0;241090],153198.2 [0;241090],0.0 [0;0],Metagenomic,2256766.3 [0;2828460],0.79 [0;0],33.7 [0;108],0.8 [0;0],0.79 [0;0],0.0 [0;0],0.0 [0;0],0.81 [0;1],0.81 [0;1],0.8 [0;1]
BCALM2,Single k-mer De Bruijn graph,26.08 [0;107],0.01 [0;0],7915.25 [0;30453],7605.0 [0;30452],0.0 [0;0],Genomic,745329.0 [0;2768590],0.26 [0;0],86.42 [0;180],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,13.75 [0;24],0.13 [0;0],135710.5 [0;235254],135710.5 [0;235254],0.0 [0;0],Metagenomic,2098979.17 [0;2820351],0.74 [0;0],28.08 [0;46],0.75 [0;1],0.74 [0;1],0.25 [0;1],0.5 [0;2],0.75 [0;1],0.75 [0;1],0.75 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,49.25 [0;147],0.08 [0;0],61181.75 [0;113206],62772.0 [0;118897],0.0 [0;0],Metagenomic,2099448.75 [0;2811602],0.74 [0;0],84.75 [0;231],0.75 [0;0],0.7 [0;0],0.75 [0;3],1.5 [0;6],0.75 [0;1],0.75 [0;1],0.75 [0;0]
MEGAHIT,Multiple k-mer De Bruijn graph,10.75 [0;17],0.17 [0;0],176503.25 [0;235381],177310.75 [0;238546],0.0 [0;0],Metagenomic,2101557.67 [0;2822833],0.74 [0;0],24.33 [0;35],0.75 [0;0],0.72 [0;0],0.5 [0;1],1.0 [0;2],0.75 [0;1],0.75 [0;1],0.75 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,16.2 [0;27],0.08 [0;0],71897.8 [1166;118895],70110.6 [0;118895],0.0 [0;0],Metagenomic,1740712.0 [9974;2811369],0.61 [0;0],80.5 [5;179],1.0 [0;1],1.0 [0;1],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [1;1]
metaSPAdes,Multiple k-mer De Bruijn graph,11.25 [0;25],0.13 [0;0],155065.75 [0;238611],155065.75 [0;238611],0.0 [0;0],Metagenomic,2104706.5 [0;2818677],0.74 [0;0],23.75 [0;51],0.75 [0;0],0.73 [0;0],0.25 [0;1],0.5 [0;2],0.75 [0;1],0.75 [0;1],0.75 [0;1]
MINIA,Single k-mer De Bruijn graph,88.83 [0;149],0.03 [0;0],21664.83 [0;30453],20172.75 [0;30453],0.0 [0;0],Genomic,2070182.92 [0;2770364],0.73 [0;0],138.83 [0;222],0.92 [0;1],0.91 [0;1],0.0 [0;0],0.0 [0;0],0.92 [0;1],0.92 [0;1],0.92 [0;1]
SKESA,Multiple k-mer De Bruijn graph,22.75 [0;42],0.09 [0;0],79782.0 [0;152709],79339.25 [0;152709],0.0 [0;0],Genomic,2102692.5 [0;2810284],0.74 [0;0],41.67 [0;72],0.91 [0;1],0.7 [0;1],0.5 [0;1],0.5 [0;1],0.92 [0;1],0.92 [0;1],0.91 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,12.25 [0;25],0.15 [0;0],145226.0 [0;239100],143125.75 [0;239100],0.0 [0;0],Genomic,2098387.33 [0;2812518],0.74 [0;0],29.0 [0;63],0.9 [0;1],0.87 [0;1],1.25 [0;6],2.17 [0;8],0.94 [0;1],0.98 [0;1],0.89 [0;1]


Escherichia_coli_plasmid


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,0.2 [0;1],0.13 [0;0],12339.8 [0;55639],11519.0 [0;55639],93.6 [0;204],Metagenomic,43179.1 [0;109424],0.39 [0;0],10.2 [0;26],0.79 [0;0],0.78 [0;0],0.0 [0;0],0.0 [0;0],0.8 [0;1],0.8 [0;1],0.8 [0;1]
BCALM2,Single k-mer De Bruijn graph,4.5 [0;18],0.05 [0;0],2576.75 [0;7117],2436.75 [0;6638],0.0 [0;0],Genomic,48410.25 [0;102586],0.44 [0;0],13.5 [0;33],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,2.25 [0;7],0.2 [0;0],13154.5 [0;28587],13000.0 [0;28587],0.0 [0;0],Metagenomic,78130.0 [0;109450],0.71 [0;0],9.25 [0;24],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,3.75 [0;8],0.2 [0;0],17077.5 [0;28584],16985.25 [0;28584],0.0 [0;0],Metagenomic,80037.83 [0;108159],0.73 [0;0],5.83 [0;11],0.83 [0;1],0.83 [0;1],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.83 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,2.5 [0;6],0.26 [0;0],24913.5 [0;55277],24075.0 [0;55277],0.0 [0;0],Metagenomic,81337.42 [0;109516],0.74 [0;0],4.08 [0;9],0.83 [0;1],0.83 [0;1],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.83 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,3.0 [0;7],0.17 [0;0],12704.4 [0;28267],12253.5 [0;28267],0.0 [0;0],Metagenomic,64668.4 [0;107495],0.59 [0;0],4.8 [0;10],0.9 [0;1],0.9 [0;1],0.0 [0;0],0.0 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,2.5 [0;6],0.26 [0;0],24339.0 [0;55105],23644.0 [0;55105],0.0 [0;0],Metagenomic,80059.5 [0;108144],0.73 [0;0],3.75 [0;9],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
MINIA,Single k-mer De Bruijn graph,5.75 [0;17],0.11 [0;0],6296.75 [0;14811],6139.5 [0;14811],0.0 [0;0],Genomic,74715.5 [0;104297],0.68 [0;0],15.5 [0;33],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SKESA,Multiple k-mer De Bruijn graph,0.5 [0;2],0.13 [0;0],7802.25 [0;28384],7096.0 [0;28384],0.0 [0;0],Genomic,40787.75 [0;107808],0.37 [0;0],10.25 [0;26],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,2.75 [0;6],0.26 [0;0],17829.83 [0;28541],17008.75 [0;28541],0.0 [0;0],Genomic,81280.42 [0;109262],0.74 [0;0],4.58 [0;9],0.83 [0;1],0.83 [0;1],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.83 [0;1]


Escherichia_coli


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,74.9 [0;133],0.03 [0;0],52249.4 [0;100101],51970.3 [0;100101],13.5 [0;45],Metagenomic,3751078.0 [0;4706885],0.79 [0;0],134.1 [0;234],0.8 [0;0],0.69 [0;0],0.0 [0;0],0.0 [0;0],0.81 [0;1],0.81 [0;1],0.8 [0;0]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],1677.25 [0;4465],1172.25 [0;3200],0.0 [0;0],Genomic,1714793.33 [0;3766804],0.36 [0;0],653.75 [0;1495],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,77.75 [0;112],0.02 [0;0],43740.92 [0;59656],43462.75 [0;59530],0.0 [0;0],Metagenomic,3468765.25 [0;4632389],0.73 [0;0],141.08 [0;199],0.83 [0;0],0.82 [0;0],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.84 [0;1],0.83 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,187.0 [0;283],0.01 [0;0],17916.0 [1146;30548],14932.0 [0;22537],0.0 [0;0],Metagenomic,3403322.83 [2287;4589511],0.71 [0;0],313.08 [2;449],0.97 [0;0],0.51 [0;0],4.5 [0;8],8.08 [0;15],1.0 [0;1],1.41 [1;5],0.92 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,38.0 [0;55],0.04 [0;0],59995.75 [1049;90017],63025.5 [0;102135],0.0 [0;0],Metagenomic,3501717.92 [4237;4673019],0.73 [0;0],104.17 [3;141],0.99 [0;0],0.9 [0;0],2.5 [0;4],4.25 [0;7],1.01 [0;1],1.01 [0;1],1.0 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,65.1 [0;110],0.02 [0;0],35257.7 [1155;59122],33943.8 [0;56794],0.0 [0;0],Metagenomic,2805850.0 [1352;4613132],0.59 [0;0],147.3 [1;193],1.0 [0;1],0.99 [0;1],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,40.5 [0;90],0.07 [0;0],78366.92 [1088;138793],76893.25 [0;138793],0.0 [0;0],Metagenomic,3491713.33 [4384;4666787],0.73 [0;0],85.25 [3;167],0.99 [0;0],0.94 [0;0],0.83 [0;2],2.33 [0;6],1.0 [0;1],1.01 [0;1],0.99 [0;1]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],0.01 [0;0],3335.42 [0;4469],2371.42 [0;3211],0.0 [0;0],Genomic,2822981.67 [0;3773077],0.59 [0;0],847.75 [0;1147],0.75 [0;0],0.74 [0;0],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SKESA,Multiple k-mer De Bruijn graph,87.75 [0;152],0.02 [0;0],34893.25 [0;62623],35412.25 [0;66235],0.0 [0;0],Genomic,3428896.25 [0;4579934],0.72 [0;0],164.75 [0;270],0.75 [0;0],0.53 [0;0],1.25 [0;3],2.0 [0;4],0.75 [0;1],0.76 [0;1],0.75 [0;0]
SPAdes,Multiple k-mer De Bruijn graph,43.25 [0;79],0.06 [0;0],63116.58 [1094;98812],62796.75 [0;96999],0.0 [0;0],Genomic,3476288.08 [6019;4643850],0.73 [0;0],90.17 [4;143],0.98 [0;0],0.75 [0;0],0.75 [0;3],1.42 [0;5],1.0 [0;1],1.02 [0;1],0.98 [0;0]


Lactobacillus_fermentum


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,46.2 [0;106],0.05 [0;0],29410.2 [0;42368],29164.3 [0;42368],0.0 [0;0],Metagenomic,1474890.7 [0;1873053],0.77 [0;0],70.6 [0;137],0.8 [0;0],0.8 [0;0],0.0 [0;0],0.0 [0;0],0.81 [0;1],0.81 [0;1],0.8 [0;1]
BCALM2,Single k-mer De Bruijn graph,52.0 [0;208],0.01 [0;0],2903.17 [0;11209],2558.5 [0;10234],0.0 [0;0],Genomic,437345.17 [0;1748625],0.23 [0;0],59.17 [0;236],0.33 [0;1],0.33 [0;1],0.0 [0;0],0.0 [0;0],0.33 [0;1],0.33 [0;1],0.33 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,40.5 [0;55],0.05 [0;0],30517.08 [0;40488],29442.75 [0;40151],0.0 [0;0],Metagenomic,1347280.58 [0;1798516],0.71 [0;0],65.42 [0;89],0.83 [0;1],0.83 [0;1],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.83 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,61.25 [0;131],0.04 [0;0],24333.25 [0;40364],22633.25 [0;37467],0.0 [0;0],Metagenomic,1339735.58 [0;1790018],0.7 [0;0],87.17 [0;171],0.83 [0;0],0.83 [0;0],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.83 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,32.0 [0;53],0.05 [0;0],33379.5 [0;46932],34157.0 [0;52972],0.0 [0;0],Metagenomic,1356161.0 [0;1808544],0.71 [0;0],62.92 [0;88],0.83 [0;0],0.74 [0;0],1.75 [0;3],3.92 [0;6],0.83 [0;1],0.83 [0;1],0.83 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,32.7 [0;55],0.04 [0;0],24570.9 [0;40362],24040.2 [0;40067],0.0 [0;0],Metagenomic,1075233.4 [0;1787622],0.56 [0;0],54.3 [0;87],0.9 [0;1],0.9 [0;1],0.0 [0;0],0.0 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,40.0 [0;64],0.04 [0;0],32383.92 [0;46069],29878.75 [0;45077],0.0 [0;0],Metagenomic,1344747.75 [0;1796429],0.71 [0;0],63.67 [0;95],0.92 [0;0],0.91 [0;0],0.25 [0;1],0.25 [0;1],0.92 [0;1],0.92 [0;1],0.92 [0;1]
MINIA,Single k-mer De Bruijn graph,168.83 [0;263],0.02 [0;0],8021.83 [0;11293],7470.25 [0;10836],0.0 [0;0],Genomic,1310547.08 [0;1754083],0.69 [0;0],189.25 [0;282],0.83 [0;1],0.82 [0;1],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.83 [0;1]
SKESA,Multiple k-mer De Bruijn graph,47.25 [0;69],0.05 [0;0],27994.75 [0;40164],25910.75 [0;39869],0.0 [0;0],Genomic,1327082.75 [0;1776186],0.7 [0;0],66.5 [0;94],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,45.25 [0;76],0.05 [0;0],29460.75 [1408;46123],27911.75 [0;45008],0.0 [0;0],Genomic,1342755.75 [1368;1792828],0.7 [0;0],72.42 [1;109],0.99 [0;0],0.99 [0;0],1.0 [0;4],2.0 [0;8],1.0 [0;1],1.0 [1;1],0.99 [0;1]


Listeria_monocytogenes


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,26.7 [8;147],0.19 [0;0],267347.9 [21261;351511],267239.1 [20173;351511],0.3 [0;1],Metagenomic,2956056.2 [2941416;2961911],0.99 [0;0],42.6 [13;226],1.0 [0;0],1.0 [0;0],0.0 [0;0],0.0 [0;0],1.01 [1;1],1.01 [1;1],1.0 [0;1]
BCALM2,Single k-mer De Bruijn graph,22.58 [0;75],0.02 [0;0],17300.08 [0;64777],16296.0 [0;62493],0.0 [0;0],Genomic,1082146.42 [0;2921297],0.36 [0;0],128.58 [0;359],0.58 [0;1],0.58 [0;1],0.0 [0;0],0.0 [0;0],0.58 [0;1],0.58 [0;1],0.58 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,10.92 [10;12],0.2 [0;0],368416.08 [254631;452364],294101.33 [155105;351186],0.0 [0;0],Metagenomic,2942369.25 [2915611;2957674],0.98 [0;0],20.92 [18;26],1.0 [0;1],0.99 [0;1],0.0 [0;0],0.0 [0;0],1.0 [0;1],1.0 [0;1],1.0 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,36.0 [9;108],0.15 [0;0],224219.25 [27259;475377],223793.25 [25555;475377],0.0 [0;0],Metagenomic,2934925.67 [2915464;2956853],0.98 [0;0],60.75 [19;171],1.0 [0;0],0.99 [0;0],0.0 [0;0],0.0 [0;0],1.0 [0;1],1.0 [0;1],1.0 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,7.5 [6;9],0.27 [0;0],464670.75 [269400;543128],465101.92 [274574;543128],0.0 [0;0],Metagenomic,2949010.67 [2924395;2962451],0.99 [0;0],19.42 [17;26],0.99 [0;0],0.87 [0;0],0.33 [0;1],0.67 [0;2],1.0 [0;1],1.0 [0;1],1.0 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,7.8 [0;13],0.09 [0;0],135213.7 [1221;254627],134822.8 [0;254627],0.0 [0;0],Metagenomic,2156355.7 [450667;2949910],0.72 [0;0],229.1 [21;1074],1.0 [0;1],0.98 [0;1],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,9.08 [5;13],0.22 [0;0],365322.67 [235115;593167],365322.67 [235115;593167],0.0 [0;0],Metagenomic,2941835.25 [2920851;2964879],0.98 [0;0],18.08 [12;27],1.0 [0;0],0.99 [0;0],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
MINIA,Single k-mer De Bruijn graph,110.08 [0;461],0.04 [0;0],36451.17 [3362;64777],35066.67 [2969;64473],0.0 [0;0],Genomic,2862068.83 [2548083;2924618],0.96 [0;0],242.83 [87;918],1.0 [0;1],0.91 [0;1],0.08 [0;1],0.08 [0;1],1.0 [1;1],1.0 [1;1],1.0 [0;1]
SKESA,Multiple k-mer De Bruijn graph,30.17 [0;232],0.16 [0;0],255132.67 [2534;351027],255042.0 [1926;351027],0.0 [0;0],Genomic,2856512.83 [2259852;2950392],0.95 [0;0],130.58 [19;1001],1.0 [0;1],1.0 [0;1],0.0 [0;0],0.0 [0;0],1.0 [0;1],1.0 [0;1],1.0 [1;1]
SPAdes,Multiple k-mer De Bruijn graph,80.0 [0;835],0.14 [0;0],205423.42 [1964;351104],205456.92 [1651;351104],0.0 [0;0],Genomic,2806158.58 [1899833;2957750],0.94 [0;0],279.92 [13;1839],1.0 [0;1],0.95 [0;1],47.08 [0;565],94.5 [0;1134],1.03 [0;1],1.03 [0;1],1.0 [0;1]


Pseudomonas_aeruginosa


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,352.2 [0;1004],0.03 [0;0],64187.9 [1421;245341],64283.9 [0;245341],740.4 [0;2170],Metagenomic,6039233.0 [179111;6781605],0.89 [0;0],537.0 [58;1404],1.0 [0;0],0.86 [0;0],2.8 [0;8],4.7 [0;14],1.02 [1;1],1.02 [1;1],1.0 [0;0]
BCALM2,Single k-mer De Bruijn graph,341.17 [0;921],0.01 [0;0],9563.08 [0;23566],9106.17 [0;22511],0.0 [0;0],Genomic,3770698.17 [0;6515102],0.56 [0;0],457.08 [0;1142],0.58 [0;1],0.58 [0;1],0.0 [0;0],0.0 [0;0],0.58 [0;1],0.58 [0;1],0.58 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,53.17 [36;78],0.06 [0;0],159539.0 [88774;211558],158995.5 [88774;211019],0.0 [0;0],Metagenomic,6700314.08 [6646876;6716522],0.99 [0;0],122.83 [98;166],1.0 [0;1],0.97 [0;1],0.75 [0;3],1.5 [0;6],1.0 [1;1],1.0 [1;1],1.0 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,72.42 [48;90],0.05 [0;0],101487.33 [84458;171408],99368.25 [83502;171408],0.0 [0;0],Metagenomic,6700196.5 [6694930;6710005],0.99 [0;0],158.58 [119;185],1.0 [0;0],0.92 [0;0],0.83 [0;2],1.33 [0;4],1.0 [1;1],1.0 [1;1],1.0 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,28.08 [24;31],0.07 [0;0],254762.67 [215542;286027],256946.67 [215542;294763],0.0 [0;0],Metagenomic,6733180.08 [6729202;6738936],0.99 [0;0],80.33 [75;82],1.0 [0;0],0.96 [0;0],0.25 [0;1],0.5 [0;2],1.0 [1;1],1.0 [1;1],1.0 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,23.7 [0;42],0.04 [0;0],120278.8 [1126;211554],119727.0 [0;211015],0.0 [0;0],Metagenomic,4435086.0 [93618;6717767],0.65 [0;0],325.3 [81;2402],1.0 [0;1],0.99 [0;1],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,29.75 [24;44],0.11 [0;0],227527.67 [151807;285156],224893.58 [151807;257869],0.0 [0;0],Metagenomic,6726334.17 [6721688;6733500],0.99 [0;0],68.5 [56;100],1.0 [0;0],0.98 [0;0],0.0 [0;0],0.0 [0;0],1.01 [1;1],1.01 [1;1],1.0 [0;1]
MINIA,Single k-mer De Bruijn graph,345.75 [291;407],0.02 [0;0],21684.0 [18690;24665],21007.5 [17851;24029],0.0 [0;0],Genomic,6509919.5 [6489664;6524118],0.96 [0;0],530.67 [463;598],1.0 [0;0],0.99 [0;0],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
SKESA,Multiple k-mer De Bruijn graph,225.5 [0;670],0.02 [0;0],40350.75 [0;111919],38670.67 [0;106185],0.0 [0;0],Genomic,5399196.33 [0;6673601],0.79 [0;0],588.33 [0;1877],0.92 [0;1],0.51 [0;1],4.25 [0;12],8.67 [0;23],0.92 [0;1],0.92 [0;1],0.92 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,34.33 [26;60],0.11 [0;0],207363.33 [122703;264371],207363.33 [122703;264371],0.0 [0;0],Genomic,6715515.75 [6682762;6728888],0.99 [0;0],78.25 [63;129],1.0 [0;0],0.98 [0;0],0.58 [0;2],0.92 [0;3],1.0 [0;1],1.0 [0;1],1.0 [0;1]


Salmonella_enterica


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,45.7 [0;118],0.05 [0;0],91283.4 [0;157988],91039.3 [0;157988],1.3 [0;2],Metagenomic,3752791.6 [0;4703394],0.79 [0;0],86.3 [0;214],0.8 [0;0],0.74 [0;0],0.2 [0;1],0.6 [0;4],0.81 [0;1],0.81 [0;1],0.8 [0;0]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],1831.5 [0;5009],1262.5 [0;3551],0.0 [0;0],Genomic,1728825.25 [0;3817265],0.36 [0;0],639.5 [0;1479],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,65.75 [0;96],0.05 [0;0],44358.67 [0;64572],42721.25 [0;60374],0.0 [0;0],Metagenomic,3485469.5 [0;4652574],0.73 [0;0],125.83 [0;178],0.92 [0;0],0.81 [0;0],0.25 [0;1],0.25 [0;1],0.92 [0;1],0.92 [0;1],0.92 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,136.67 [0;200],0.02 [0;0],20698.58 [1117;29331],20595.25 [0;28547],0.0 [0;0],Metagenomic,3438939.75 [1463;4618887],0.72 [0;0],272.83 [1;397],0.98 [0;0],0.66 [0;0],4.42 [0;9],7.0 [0;14],1.0 [0;1],1.02 [1;1],0.99 [0;0]
MEGAHIT,Multiple k-mer De Bruijn graph,33.0 [0;47],0.07 [0;0],82471.08 [1086;120785],83154.25 [0;120785],0.0 [0;0],Metagenomic,3505110.83 [4780;4671620],0.74 [0;0],79.5 [2;105],0.99 [0;0],0.94 [0;0],1.0 [0;3],2.0 [0;6],1.0 [0;1],1.01 [1;1],0.99 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,51.6 [0;86],0.04 [0;0],37797.7 [0;64529],35786.1 [0;60329],0.0 [0;0],Metagenomic,2815717.0 [0;4641887],0.59 [0;0],122.7 [0;162],0.9 [0;1],0.87 [0;1],0.0 [0;0],0.0 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,32.5 [0;77],0.06 [0;0],108693.33 [1223;198784],106754.75 [0;198784],0.0 [0;0],Metagenomic,3505419.67 [2586;4680545],0.74 [0;0],66.58 [2;145],0.99 [0;0],0.89 [0;0],0.08 [0;1],0.08 [0;1],1.0 [0;1],1.01 [0;1],0.99 [0;1]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],0.01 [0;0],4048.58 [1313;4996],2614.25 [0;3551],0.0 [0;0],Genomic,2863842.92 [1312;3826699],0.6 [0;0],818.25 [1;1106],1.0 [0;1],0.95 [0;1],0.0 [0;0],0.0 [0;0],1.0 [1;1],1.0 [1;1],1.0 [0;1]
SKESA,Multiple k-mer De Bruijn graph,84.0 [0;144],0.03 [0;0],36470.25 [0;60912],36079.25 [0;60131],0.0 [0;0],Genomic,3438815.75 [0;4612070],0.72 [0;0],148.75 [0;245],0.75 [0;0],0.46 [0;0],1.0 [0;2],1.25 [0;3],0.75 [0;1],0.76 [0;1],0.75 [0;0]
SPAdes,Multiple k-mer De Bruijn graph,30.75 [0;50],0.07 [0;0],93997.67 [1251;164404],91912.75 [0;157954],0.0 [0;0],Genomic,3487896.17 [1463;4666050],0.73 [0;0],69.5 [1;110],0.99 [0;0],0.79 [0;0],0.42 [0;2],0.42 [0;2],1.0 [0;1],1.02 [1;1],0.98 [0;0]


Staphylococcus_aureus


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,17.0 [0;48],0.21 [0;0],175648.3 [0;281333],175648.3 [0;281333],21.3 [0;71],Metagenomic,2151166.8 [0;2693907],0.79 [0;0],30.2 [0;80],0.8 [0;0],0.8 [0;0],0.2 [0;1],0.4 [0;2],0.8 [0;1],0.8 [0;1],0.8 [0;1]
BCALM2,Single k-mer De Bruijn graph,28.0 [0;112],0.01 [0;0],7702.75 [0;29647],7291.25 [0;29165],0.0 [0;0],Genomic,682885.25 [0;2600423],0.25 [0;0],71.0 [0;175],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,24.0 [0;39],0.08 [0;0],66887.5 [0;112739],63142.5 [0;108199],0.0 [0;0],Metagenomic,2006536.42 [0;2679055],0.74 [0;0],39.75 [0;65],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,49.25 [0;112],0.04 [0;0],33258.25 [0;53248],32888.75 [0;53248],0.0 [0;0],Metagenomic,1997988.5 [0;2665812],0.73 [0;0],80.0 [0;174],0.75 [0;0],0.63 [0;0],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,12.0 [0;19],0.11 [0;0],105373.25 [0;181811],122849.5 [0;213210],0.0 [0;0],Metagenomic,2010815.5 [0;2681291],0.74 [0;0],30.0 [0;43],0.75 [0;0],0.64 [0;0],1.75 [0;3],4.0 [0;7],0.75 [0;1],0.75 [0;1],0.75 [0;0]
MetaHipMer2,Multiple k-mer De Bruijn graph,25.0 [0;42],0.04 [0;0],32334.5 [0;53350],31549.7 [0;53350],0.0 [0;0],Metagenomic,1690855.1 [0;2667635],0.62 [0;0],113.9 [0;247],0.9 [0;1],0.9 [0;1],0.0 [0;0],0.0 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,12.75 [0;30],0.16 [0;0],160355.5 [0;360908],160355.5 [0;360908],0.0 [0;0],Metagenomic,2009502.75 [0;2682662],0.74 [0;0],22.75 [0;49],0.75 [0;0],0.74 [0;0],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
MINIA,Single k-mer De Bruijn graph,90.92 [0;140],0.02 [0;0],20681.75 [0;29647],20329.0 [0;29165],0.0 [0;0],Genomic,1949343.25 [0;2600576],0.72 [0;0],139.17 [0;207],0.75 [0;0],0.75 [0;0],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;0]
SKESA,Multiple k-mer De Bruijn graph,30.75 [0;43],0.05 [0;0],44836.75 [0;73147],42391.25 [0;63469],0.0 [0;0],Genomic,1992379.25 [0;2661968],0.73 [0;0],51.25 [0;73],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,13.75 [0;30],0.13 [0;0],142248.75 [0;280950],142248.75 [0;280950],0.0 [0;0],Genomic,2005500.5 [0;2678585],0.74 [0;0],23.25 [0;48],0.75 [0;0],0.73 [0;0],0.25 [0;1],0.75 [0;3],0.75 [0;1],0.75 [0;1],0.75 [0;1]


Staphylococcus_aureus_plasmid1


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,0.0 [0;0],0.6 [0;0],3800.7 [0;6335],3800.7 [0;6335],0.0 [0;0],Metagenomic,3800.7 [0;6335],0.6 [0;0],0.6 [0;1],0.6 [0;1],0.6 [0;1],0.0 [0;0],0.0 [0;0],0.6 [0;1],0.6 [0;1],0.6 [0;1]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],0.25 [0;0],1584.0 [0;6336],1584.0 [0;6336],0.0 [0;0],Genomic,1584.0 [0;6336],0.25 [0;0],0.25 [0;1],0.25 [0;1],0.25 [0;1],0.0 [0;0],0.0 [0;0],0.25 [0;1],0.25 [0;1],0.25 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0 [0;0],0.76 [0;1],4796.58 [0;6522],4796.58 [0;6522],0.0 [0;0],Metagenomic,4751.75 [0;6337],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.76 [0;1],0.76 [0;1],0.75 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,0.0 [0;0],0.69 [0;0],4359.25 [0;6336],4359.25 [0;6336],0.0 [0;0],Metagenomic,4351.75 [0;6336],0.69 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,0.0 [0;0],0.75 [0;1],4859.75 [0;6795],4859.75 [0;6795],0.0 [0;0],Metagenomic,4745.25 [0;6337],0.75 [0;1],0.75 [0;1],0.74 [0;1],0.74 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.77 [0;1],0.74 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],0.6 [0;0],3800.4 [0;6334],3800.4 [0;6334],0.0 [0;0],Metagenomic,3800.4 [0;6334],0.6 [0;0],0.6 [0;1],0.6 [0;1],0.6 [0;1],0.0 [0;0],0.0 [0;0],0.6 [0;1],0.6 [0;1],0.6 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,0.25 [0;1],0.67 [0;0],4227.25 [0;6336],4227.25 [0;6336],0.0 [0;0],Metagenomic,4719.0 [0;6336],0.74 [0;0],1.0 [0;2],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],0.5 [0;0],3160.75 [0;6336],3160.75 [0;6336],0.0 [0;0],Genomic,3160.75 [0;6336],0.5 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
SKESA,Multiple k-mer De Bruijn graph,0.0 [0;0],0.54 [0;0],3420.75 [0;6044],3420.75 [0;6044],0.0 [0;0],Genomic,4218.5 [0;6044],0.67 [0;0],1.25 [0;3],0.75 [0;1],0.74 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],0.5 [0;0],3168.0 [0;6336],3168.0 [0;6336],0.0 [0;0],Genomic,3168.0 [0;6336],0.5 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]


Staphylococcus_aureus_plasmid2


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,0.0 [0;0],0.72 [0;0],1604.0 [0;2215],1604.0 [0;2215],0.0 [0;0],Metagenomic,1604.0 [0;2215],0.72 [0;0],0.8 [0;1],0.8 [0;1],0.8 [0;1],0.0 [0;0],0.0 [0;0],0.8 [0;1],0.8 [0;1],0.8 [0;1]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],Genomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0 [0;0],0.76 [0;1],1694.0 [0;2397],1694.0 [0;2397],0.0 [0;0],Metagenomic,1651.75 [0;2216],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.77 [0;1],0.77 [0;1],0.75 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,0.0 [0;0],0.51 [0;1],1132.5 [0;2315],1132.5 [0;2315],0.0 [0;0],Metagenomic,1107.75 [0;2216],0.5 [0;1],0.5 [0;1],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.51 [0;1],0.51 [0;1],0.5 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,0.0 [0;0],0.77 [0;1],1696.75 [0;2357],1696.75 [0;2357],0.0 [0;0],Metagenomic,1661.5 [0;2216],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.77 [0;1],0.77 [0;1],0.75 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],0.6 [0;0],1327.8 [0;2213],1327.8 [0;2213],0.0 [0;0],Metagenomic,1327.8 [0;2213],0.6 [0;0],0.6 [0;1],0.6 [0;1],0.6 [0;1],0.0 [0;0],0.0 [0;0],0.6 [0;1],0.6 [0;1],0.6 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],0.72 [0;1],1605.75 [0;2271],1605.75 [0;2271],0.0 [0;0],Metagenomic,1592.0 [0;2216],0.72 [0;1],0.75 [0;1],0.75 [0;0],0.75 [0;0],0.0 [0;0],0.0 [0;0],0.76 [0;1],0.76 [0;1],0.75 [0;1]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],Genomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
SKESA,Multiple k-mer De Bruijn graph,0.0 [0;0],0.61 [0;0],1343.0 [0;2005],1343.0 [0;2005],0.0 [0;0],Genomic,1343.0 [0;2005],0.61 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],0.65 [0;1],1696.5 [0;3246],1696.5 [0;3246],0.0 [0;0],Genomic,1439.0 [0;2216],0.65 [0;1],0.75 [0;1],0.67 [0;1],0.67 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.87 [0;1],0.67 [0;1]


Staphylococcus_aureus_plasmid3


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,0.0 [0;0],0.73 [0;0],2185.9 [0;2992],2185.9 [0;2992],0.0 [0;0],Metagenomic,2185.9 [0;2992],0.73 [0;0],0.8 [0;1],0.8 [0;1],0.8 [0;1],0.0 [0;0],0.0 [0;0],0.8 [0;1],0.8 [0;1],0.8 [0;1]
BCALM2,Single k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],Genomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,0.0 [0;0],0.75 [0;1],2232.25 [0;3162],2232.25 [0;3162],0.0 [0;0],Metagenomic,2193.5 [0;2993],0.73 [0;1],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.76 [0;1],0.76 [0;1],0.75 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,0.25 [0;1],0.62 [0;0],1845.5 [0;2847],1845.5 [0;2847],0.0 [0;0],Metagenomic,2112.75 [0;2847],0.71 [0;0],1.0 [0;2],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.76 [0;1],0.76 [0;1],0.75 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,0.0 [0;0],0.76 [0;1],2279.5 [0;3134],2279.5 [0;3134],0.0 [0;0],Metagenomic,2244.25 [0;2993],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.76 [0;1],0.76 [0;1],0.75 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],0.76 [0;0],2271.9 [0;2990],2271.9 [0;2990],0.0 [0;0],Metagenomic,2271.9 [0;2990],0.76 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;1],0.0 [0;0],0.0 [0;0],0.9 [0;1],0.9 [0;1],0.9 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,0.5 [0;1],0.6 [0;1],1802.75 [0;3857],1802.75 [0;3857],0.0 [0;0],Metagenomic,2208.0 [0;2993],0.74 [0;1],1.25 [0;2],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.83 [0;1],0.83 [0;1],0.75 [0;1]
MINIA,Single k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],Genomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
SKESA,Multiple k-mer De Bruijn graph,0.0 [0;0],0.64 [0;0],1902.5 [0;2782],1902.5 [0;2782],0.0 [0;0],Genomic,1902.5 [0;2782],0.64 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1],0.0 [0;0],0.0 [0;0],0.75 [0;1],0.75 [0;1],0.75 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,0.0 [0;0],0.39 [0;0],1167.25 [0;2992],1167.25 [0;2992],0.0 [0;0],Genomic,1167.25 [0;2992],0.39 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]


#### Per reference replicon for all assemblers - Even only!

In [120]:
ref_stats_per_reference_all_assemblers = pd.DataFrame()
for reference in reference_pipeline_metrics_df.reference.unique():
    row={'Reference': reference}
    lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.reference == reference) & (reference_pipeline_metrics_df.distribution == 'Even')].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_reference_all_assemblers=ref_stats_per_reference_all_assemblers.append(row, ignore_index=True)
ref_stats_per_reference_all_assemblers = ref_stats_per_reference_all_assemblers.set_index('Reference')
ref_stats_per_reference_all_assemblers.to_csv("Tables/Results/Reference metrics per replicon.csv")
display(ref_stats_per_reference_all_assemblers)

Unnamed: 0_level_0,L90,LSA,NA50,NG50,Ns,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Reference,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Bacillus_subtilis,25.64 [0;120],0.1 [0;0],196509.03 [0;343843],194589.61 [0;343843],1.32 [0;47],3614434.53 [0;3991152],0.89 [0;0],91.85 [0;1467],0.94 [0;1],0.92 [0;1],0.08 [0;2],0.2 [0;4],0.94 [0;1],0.94 [0;1],0.94 [0;1]
Enterococcus_faecalis,36.26 [0;149],0.13 [0;0],134153.57 [0;377104],132615.19 [0;377104],0.0 [0;0],2491484.12 [0;2828460],0.88 [0;0],72.49 [0;231],0.94 [0;1],0.91 [0;1],0.34 [0;3],0.62 [0;6],0.95 [0;1],0.95 [0;1],0.94 [0;1]
Escherichia_coli_plasmid,3.27 [0;18],0.22 [0;0],18003.78 [0;55639],17461.25 [0;55639],37.6 [0;835],82423.2 [0;109516],0.75 [0;0],9.2 [0;33],0.89 [0;1],0.88 [0;1],0.0 [0;0],0.0 [0;0],0.89 [0;1],0.89 [0;1],0.89 [0;1]
Escherichia_coli,70.08 [0;283],0.04 [0;0],51151.78 [0;138793],50851.09 [0;138793],1.26 [0;45],4051935.78 [0;4706885],0.85 [0;0],363.8 [0;1495],0.94 [0;1],0.84 [0;1],1.07 [0;8],1.99 [0;15],0.95 [0;1],0.95 [0;1],0.94 [0;1]
Lactobacillus_fermentum,77.7 [0;263],0.05 [0;0],30397.63 [0;46932],29158.61 [0;52972],0.0 [0;0],1585069.36 [0;1873053],0.83 [0;0],105.8 [0;282],0.92 [0;1],0.9 [0;1],0.34 [0;4],0.69 [0;8],0.92 [0;1],0.92 [0;1],0.92 [0;1]
Listeria_monocytogenes,24.36 [0;122],0.15 [0;0],236971.75 [0;542872],228275.5 [0;542872],0.03 [0;1],2625730.37 [0;2961845],0.88 [0;0],62.41 [0;361],0.94 [0;1],0.93 [0;1],0.03 [0;1],0.06 [0;2],0.95 [0;1],0.95 [0;1],0.94 [0;1]
Pseudomonas_aeruginosa,147.13 [0;1004],0.06 [0;0],141661.78 [0;338923],146649.92 [0;464977],583.01 [0;12098],6111755.96 [0;6780808],0.9 [0;0],240.31 [0;1404],0.94 [0;1],0.85 [0;1],1.27 [0;8],2.26 [0;15],0.95 [0;1],0.95 [0;1],0.94 [0;1]
Salmonella_enterica,54.45 [0;200],0.05 [0;0],71283.69 [0;223960],70301.45 [0;223960],0.12 [0;2],4074863.96 [0;4703394],0.86 [0;0],332.36 [0;1479],0.94 [0;1],0.83 [0;1],0.79 [0;9],1.25 [0;14],0.95 [0;1],0.95 [0;1],0.94 [0;1]
Staphylococcus_aureus,40.08 [0;140],0.12 [0;0],114085.93 [0;394879],113359.04 [0;394879],1.99 [0;71],2373450.96 [0;2693907],0.87 [0;0],76.07 [0;247],0.94 [0;1],0.91 [0;1],0.27 [0;3],0.63 [0;7],0.94 [0;1],0.95 [0;1],0.94 [0;1]
Staphylococcus_aureus_plasmid1,0.03 [0;1],0.77 [0;1],4910.9 [0;6795],4910.9 [0;6795],0.0 [0;0],5035.96 [0;6337],0.79 [0;1],0.9 [0;3],0.81 [0;1],0.81 [0;1],0.0 [0;0],0.0 [0;0],0.82 [0;1],0.82 [0;1],0.81 [0;1]


### Reference stats per distribution

In [121]:
for distribution in reference_pipeline_metrics_df.distribution.unique():
    ref_stats_per_reference_per_assembler = pd.DataFrame()
    for assembler in reference_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.assembler == assembler) & (reference_pipeline_metrics_df.distribution == distribution)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            minimum = lala.loc['min',column]
            maximum = lala.loc['max',column]
            row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
        ref_stats_per_reference_per_assembler=ref_stats_per_reference_per_assembler.append(row, ignore_index=True)
    ref_stats_per_reference_per_assembler = ref_stats_per_reference_per_assembler.set_index('Assembler')
    ref_stats_per_reference_per_assembler.to_csv("Tables/Results/{} - Reference metrics per assembler.csv".format(distribution))
    print(distribution)
    display(ref_stats_per_reference_per_assembler)

Even


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,60.4 [0;1004],0.3 [0;0],101808.53 [0;351502],101667.72 [0;351502],92.14 [0;2170],Metagenomic,2534947.7 [0;6780808],0.91 [0;0],97.11 [0;1404],0.98 [0;1],0.94 [0;1],0.33 [0;8],0.59 [0;14],0.99 [0;1],0.99 [0;1],0.98 [0;1]
BCALM2,Single k-mer De Bruijn graph,51.2 [0;921],0.04 [0;0],6932.33 [0;65138],6539.94 [0;64934],0.0 [0;0],Genomic,1236062.67 [0;6510588],0.37 [0;0],272.97 [0;1495],0.5 [0;1],0.5 [0;1],0.0 [0;0],0.0 [0;0],0.5 [0;1],0.5 [0;1],0.5 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,31.33 [0;112],0.35 [0;1],106093.76 [2176;452364],96782.73 [2176;351186],0.0 [0;0],Metagenomic,2523829.08 [2176;6716522],0.98 [0;1],59.92 [1;199],1.0 [0;1],0.98 [0;1],0.14 [0;3],0.25 [0;6],1.01 [0;1],1.01 [0;1],1.0 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,68.82 [0;283],0.27 [0;1],55836.03 [0;254629],54554.11 [0;254629],0.0 [0;0],Metagenomic,2507600.54 [0;6710005],0.94 [0;1],121.12 [0;449],0.97 [0;1],0.87 [0;1],1.12 [0;9],1.94 [0;15],0.98 [0;1],0.98 [0;1],0.97 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,18.31 [0;55],0.37 [0;1],138989.33 [2215;542872],141793.83 [2215;542872],0.0 [0;0],Metagenomic,2534425.59 [2215;6737986],0.99 [0;1],44.39 [1;141],1.0 [0;1],0.93 [0;1],0.97 [0;4],1.99 [0;7],1.0 [0;1],1.01 [0;1],1.0 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,21.7 [0;110],0.23 [0;0],58464.9 [0;257424],57747.71 [0;257424],0.0 [0;0],Metagenomic,1722404.01 [0;6717767],0.68 [0;0],74.19 [0;361],0.94 [0;1],0.94 [0;1],0.0 [0;0],0.0 [0;0],0.94 [0;1],0.94 [0;1],0.94 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,20.14 [0;90],0.34 [0;1],133559.97 [1677;475510],132927.86 [1677;475510],0.0 [0;0],Metagenomic,2530047.11 [2076;6726884],0.98 [0;1],39.33 [1;167],1.0 [0;1],0.99 [0;1],0.14 [0;2],0.33 [0;6],1.01 [0;1],1.01 [0;1],1.0 [0;1]
MINIA,Single k-mer De Bruijn graph,83.41 [0;407],0.08 [0;0],17271.52 [0;65138],16622.5 [0;64960],0.0 [0;0],Genomic,2345242.24 [0;6518607],0.74 [0;0],307.84 [0;1147],0.81 [0;1],0.79 [0;1],0.0 [0;0],0.0 [0;0],0.81 [0;1],0.81 [0;1],0.81 [0;1]
SKESA,Multiple k-mer De Bruijn graph,57.92 [0;670],0.27 [0;0],76053.75 [1404;351014],75134.92 [0;351014],0.0 [0;0],Genomic,2496709.31 [1460;6673601],0.9 [0;0],96.06 [1;933],1.0 [0;1],0.87 [0;1],0.67 [0;7],1.17 [0;15],1.0 [0;1],1.0 [0;1],1.0 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,21.06 [0;79],0.29 [0;1],117650.97 [0;315090],117061.69 [0;315090],0.0 [0;0],Genomic,2524122.0 [0;6728888],0.9 [0;1],41.19 [0;143],0.94 [0;1],0.9 [0;1],0.36 [0;4],0.69 [0;8],0.95 [0;1],0.96 [0;1],0.94 [0;1]


Log


Unnamed: 0_level_0,Algorithm,L90,LSA,NA50,NG50,Ns,Type,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ABySS,Multiple k-mer De Bruijn graph,50.54 [0;1033],0.02 [0;0],25986.46 [0;351511],25866.25 [0;351511],16.67 [0;400],Metagenomic,690353.25 [0;6781605],0.16 [0;0],65.12 [0;1141],0.21 [0;0],0.2 [0;0],0.0 [0;0],0.0 [0;0],0.21 [0;1],0.21 [0;1],0.21 [0;1]
BCALM2,Single k-mer De Bruijn graph,13.58 [0;311],0.0 [0;0],3455.14 [0;64777],3263.39 [0;62493],0.0 [0;0],Genomic,390196.06 [0;6515102],0.09 [0;0],35.86 [0;496],0.17 [0;1],0.17 [0;1],0.0 [0;0],0.0 [0;0],0.17 [0;1],0.17 [0;1],0.17 [0;1]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,23.58 [0;459],0.02 [0;0],41260.94 [0;351149],40902.42 [0;351149],0.0 [0;0],Metagenomic,1131013.97 [0;6715846],0.25 [0;0],37.56 [0;604],0.36 [0;1],0.36 [0;1],0.08 [0;2],0.25 [0;6],0.36 [0;1],0.36 [0;1],0.36 [0;1]
IDBA-UD,Multiple k-mer De Bruijn graph,16.33 [0;292],0.02 [0;0],48013.69 [0;475377],46431.78 [0;475377],0.0 [0;0],Metagenomic,1135231.19 [0;6700232],0.25 [0;0],29.56 [0;418],0.46 [0;0],0.38 [0;0],0.25 [0;2],0.39 [0;5],0.47 [0;1],0.61 [0;5],0.44 [0;1]
MEGAHIT,Multiple k-mer De Bruijn graph,8.42 [0;147],0.03 [0;0],66650.44 [0;543128],66455.53 [0;543128],0.0 [0;0],Metagenomic,1141249.42 [0;6738936],0.25 [0;0],18.11 [0;223],0.46 [0;0],0.42 [0;0],0.06 [0;1],0.11 [0;2],0.47 [0;1],0.48 [0;1],0.47 [0;1]
MetaHipMer2,Multiple k-mer De Bruijn graph,0.0 [0;0],0.0 [0;0],903.25 [0;2986],528.83 [0;2697],0.0 [0;0],Metagenomic,824940.25 [0;3766165],0.19 [0;0],407.58 [0;2402],0.42 [0;1],0.39 [0;1],0.0 [0;0],0.0 [0;0],0.42 [0;1],0.42 [0;1],0.42 [0;1]
metaSPAdes,Multiple k-mer De Bruijn graph,7.19 [0;66],0.03 [0;0],60717.78 [0;593167],59453.89 [0;593167],0.0 [0;0],Metagenomic,1140707.44 [0;6733500],0.25 [0;0],15.53 [0;110],0.47 [0;0],0.43 [0;0],0.17 [0;2],0.39 [0;6],0.47 [0;1],0.48 [0;1],0.47 [0;1]
MINIA,Single k-mer De Bruijn graph,65.11 [0;649],0.01 [0;0],6467.22 [0;64777],5777.47 [0;64473],0.0 [0;0],Genomic,1099797.58 [0;6524118],0.24 [0;0],119.58 [0;918],0.42 [0;1],0.39 [0;1],0.14 [0;2],0.22 [0;4],0.42 [0;1],0.42 [0;1],0.42 [0;1]
SKESA,Multiple k-mer De Bruijn graph,6.72 [0;232],0.01 [0;0],10421.97 [0;351027],10210.64 [0;351027],0.0 [0;0],Genomic,366940.31 [0;3512871],0.1 [0;0],120.94 [0;1877],0.19 [0;1],0.19 [0;1],0.33 [0;12],0.64 [0;23],0.19 [0;1],0.19 [0;1],0.19 [0;1]
SPAdes,Multiple k-mer De Bruijn graph,30.83 [0;835],0.02 [0;0],36831.08 [0;351104],37693.72 [0;351104],0.0 [0;0],Genomic,1093338.33 [0;6710057],0.23 [0;0],105.17 [0;1839],0.56 [0;1],0.49 [0;1],16.08 [0;565],32.11 [0;1134],0.6 [0;1],0.63 [0;1],0.56 [0;1]


### Genomes vs Plasmids

In [122]:
ref_stats_per_replicon_type = pd.DataFrame()
# Genome
row={'Replicon': "Genome"}
lala=reference_pipeline_metrics_df[~reference_pipeline_metrics_df['reference'].str.contains("plasmid")].describe()
for column in lala.columns:
    mean = lala.loc['mean',column]
    std = lala.loc['std',column]
    row[column] = "{}(± {})".format(round(mean, 2), round(std, 2))
ref_stats_per_replicon_type =ref_stats_per_replicon_type .append(row, ignore_index=True)

# Plasmid
row={'Replicon': "Plasmid"}
lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df['reference'].str.contains("plasmid")].describe()
for column in lala.columns:
    mean = lala.loc['mean',column]
    minimum = lala.loc['min',column]
    maximum = lala.loc['max',column]
    row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    
ref_stats_per_replicon_type =ref_stats_per_replicon_type .append(row, ignore_index=True)
ref_stats_per_replicon_type  = ref_stats_per_replicon_type .set_index('Replicon')
ref_stats_per_replicon_type.to_csv("Tables/Results/Reference metrics per replicon type.csv")
display(ref_stats_per_replicon_type)

Unnamed: 0_level_0,L90,LSA,NA50,NG50,Ns,basepairs,breadth_of_coverage,contigs,identity,lowest_identity,misassembled contigs,misassembly events,multiplicity,parsimony,validity
Replicon,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Genome,52.28(± 106.83),0.07(± 0.08),104300.28(± 121733.48),103102.92(± 121619.06),56.51(± 711.48),2880781.06(± 2043121.66),0.73(± 0.41),155.8(± 296.42),0.84(± 0.36),0.79(± 0.37),0.95(± 16.92),1.84(± 33.95),0.84(± 0.37),0.85(± 0.39),0.84(± 0.36)
Plasmid,0.65 [0;18],0.44 [0;1],5036.33 [0;55639],4925.31 [0;55639],7.18 [0;835],17373.76 [0;109516],0.55 [0;1],2.22 [0;33],0.6 [0;1],0.6 [0;1],0.0 [0;0],0.0 [0;0],0.6 [0;1],0.61 [0;1],0.6 [0;1]
