# 1. Compute Results Overview Tables

The success of an assembly is evaluated by the computation of metrics in two defined ways: globally through statistics inherent to the complete set of sequences that were assembled, and relative to the replicons present in the sample. 

The computation of the global metrics is done through custom python code from the complete set of sequences assembled. 

The following metrics are computed for the complete and filtered set of assembled sequences, restricted to contigs of length above a specified minimum size: 

- **Contig sizes**
    - **Contigs:** The total number of contigs in the assembly;
    - **Basepairs:** The total number of bases in the assembly;
    - **Maximum sequence length:** The length of the largest contig in the assembly.
    - **Number of ‘N’s:** Number of uncalled bases (N's) 
- **Contiguity**
    - **Nx (where 0  < x  ⩽ 100):** Length for which the collection of all assembled sequences of that length or longer in an assembly covers at least a given percentage of the total length of the assembly
- **Misassembly**
    - **Misassemblies** - Number of aligned contigs that contain a misassembly event

After filtering, the sequences are mapped with the reference replicons and the metrics are computed through custom python code. The metrics are calculated, filtered for a minimum length, for each replicon in the file provided input references. 

- **Contig sizes**
    - **Contigs:** The total number of contigs in the assembly;
    - **Basepairs:** The total number of bases in the assembly;
    - **Number of ‘N’s:** Number of uncalled bases (N's) 
- **COMPASS**
    - **(Breadth of) Coverage:** Ratio of covered sequence on the reference by aligned contigs;
    - **Multiplicity:** Ratio of the length of alignable assembled sequence to covered sequence on the reference;
    - **Validity:** Ratio of the length of the alignable assembled sequence to total basepairs in the aligned contigs;
    - **Parsimony:** Cost of the assembly (multiplicity over validity);
- **Contiguity**
    - **Contiguity:** longest single alignment between the assembly and the reference, relative to the reference length;
    - **NAx (where 0  < x  ⩽ 100):** Length for which the collection of aligned assembled sequences of that length or longer in an assembly covers at least a given percentage of the total length of the reference replicon;
    - **NGx (where 0  < x  ⩽ 100):** Length for which the collection of aligned contigs of that length or longer covers at least a given percentage of the sequence of the reference.
    - **Lx (where 0  < x  ⩽ 100):** Minimal number of contigs that cover x % of the sequence of the reference;
- **Identity**
    - **Identity:** Ratio of identical basepairs in all aligned contigs to the reference;
    - **Lowest identity:** Identity of the lowest scoring contig to the reference.
    - **PLS**: Phred-like score per contig, per assembler.
- **Misassembly**
    - **Misassemblies** - Number of aligned contigs that contain a misassembly event


## Imports

In [21]:
import sys
from plotly.offline import plot
import glob
import fnmatch
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
import pandas as pd
from itertools import groupby
import csv
import numpy as np

## Global variables

In [87]:
ASSEMBLER_PROCESS_LIST = ["ABYSS", "BCALM2", "GATBMINIAPIPELINE", "METAHIPMER2", "MINIA", "MEGAHIT", "METASPADES", "UNICYCLER", "SPADES",
                          "SKESA", "VELVETOPTIMIZER", "IDBA"]
PROCESS_TO_NAME = {"ABYSS": "AbYSS",
                   "BCALM2": "BCALM2", 
                   "GATBMINIAPIPELINE": "GATBMiniaPipeline",
                   "METAHIPMER2": "MetaHipMer2",
                   "MINIA": "MINIA",
                   "MEGAHIT": "MEGAHIT", 
                   "METASPADES": "metaSPAdes", 
                   "UNICYCLER": "Unicycler", 
                   "SPADES": "SPAdes",
                    "SKESA": "SKESA",
                   "VELVETOPTIMIZER": "VelvetOptimizer",
                   "IDBA": "IDBA-UD"}

log_distributed = ['LHS', 'LNN', 'ERR2935805']
even_distribution = ['EMS', 'LNN', 'ERR2984773']

genomic_assemblers = ['ABySS', 'BCALM2', 'MINIA', 'SKESA', 'SPAdes', 'Unicycler', 'VelvetOptimizer']
metagenomic_assemblers = ['MetaHipMer2','GATBMiniaPipeline', 'IDBA-UD', 'MEGAHIT', 'metaSPAdes']
single_kmer = ['BCALM2', 'MINIA', 'ABySS']
multiple_kmer = ['SKESA', 'SPAdes', 'Unicycler', 'VelvetOptimizer', 'GATBMiniaPipeline', 
                 'IDBA-UD', 'MEGAHIT', 'metaSPAdes', 'MetaHipMer2']

best_min = ['Ns', 'contigs', 'filtered_Ns', 'filtered_contigs','misassembled contigs','misassembly events']
best_max = ['basepairs','filtered_basepairs','filtered_mapped_reads','filtered_n50','mapped_reads','max_contig','n50']

COLOURS = ['#5876c8', '#58AEC8', '#009392', '#39B185', '#9CCB86', '#E9E29C', '#EEB479', '#E88471', '#CF597E', '#a54765', '#a42a2a', '#835221', 'darkgray']

## Global metrics

### Load data

In [23]:
report_glob = glob.glob('../Results/*/*/report/pipeline_report_tables.json')
global_pipeline_metrics_df = pd.DataFrame()

for pipeline_report_file in report_glob:
    report_file_name = pipeline_report_file.split('/')[-1]
    stats_run = pipeline_report_file.split('/')[-3]
    print('Processing {0} data from {1}...'.format(report_file_name, stats_run))
    
    with open(pipeline_report_file) as _fh:
        json_report = json.load(_fh)
        for sample in json_report.keys():
            for line in json_report[sample]['GlobalTable']:
                assembler = line['assembler']
                global_pipeline_metrics_df = global_pipeline_metrics_df.append({'run': stats_run,
                                                                                'sample': sample,
                                                                                'assembler': line['assembler'],
                                                                                'contigs': int(line['original']['contigs']),
                                                                                'basepairs': int(line['original']['basepairs']),
                                                                                'max_contig': int(line['original']['max_contig_size']),
                                                                                'n50': int(line['original']['N50']),
                                                                                'mapped_reads': line['original']['mapped_reads'],
                                                                                'Ns': int(line['original']['Ns']),
                                                                                'misassembled contigs': line['filtered']['misassembled_contigs'],
                                                                                'misassembly events': line['filtered']['misassembly_events'],
                                                                                'filtered_contigs': line['filtered']['contigs'],
                                                                                'filtered_basepairs': line['filtered']['basepairs'],
                                                                                'filtered_n50': line['filtered']['N50'],
                                                                                'filtered_Ns': line['filtered']['Ns'],
                                                                                'filtered_mapped_reads': line['filtered']['mapped_reads'],
                                                                               },
                                                                               ignore_index=True)
global_pipeline_metrics_df['distribution'] = np.where(global_pipeline_metrics_df['sample'].isin(log_distributed), 'Log', 'Even')
global_pipeline_metrics_df['type'] = np.where(global_pipeline_metrics_df['assembler'].isin(genomic_assemblers), 'Genomic', 'Metagenomic')
global_pipeline_metrics_df['algorythm'] = np.where(global_pipeline_metrics_df['assembler'].isin(single_kmer), 'Single k-mer De Bruijn graph', 'Multiple k-mer De Bruijn graph')
global_pipeline_metrics_df[['contigs','basepairs','max_contig', 'Ns','n50', 'filtered_n50','misassembled contigs', 'misassembly events']] = global_pipeline_metrics_df[['contigs','basepairs','max_contig', 'Ns','n50','filtered_n50','misassembled contigs', 'misassembly events']].apply(pd.to_numeric)

Processing pipeline_report_tables.json data from run3...
Processing pipeline_report_tables.json data from run2...
Processing pipeline_report_tables.json data from run1...
Processing pipeline_report_tables.json data from run3...
Processing pipeline_report_tables.json data from run2...


### Preview data

In [24]:
global_pipeline_metrics_df.dtypes

Ns                       float64
assembler                 object
basepairs                float64
contigs                  float64
filtered_Ns              float64
filtered_basepairs       float64
filtered_contigs         float64
filtered_mapped_reads    float64
filtered_n50             float64
mapped_reads             float64
max_contig               float64
misassembled contigs     float64
misassembly events       float64
n50                      float64
run                       object
sample                    object
distribution              object
type                      object
algorythm                 object
dtype: object

In [25]:
global_pipeline_metrics_df

Unnamed: 0,Ns,assembler,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50,run,sample,distribution,type,algorythm
0,1345.0,ABySS,20132244.0,44619.0,400.0,13573790.0,1211.0,98.901529,192306.0,99.851184,851314.0,0.0,0.0,9569.0,run3,LNN,Log,Genomic,Single k-mer De Bruijn graph
1,0.0,BCALM2,19635512.0,44825.0,0.0,13397815.0,784.0,97.222124,31436.0,98.306950,163543.0,0.0,0.0,18682.0,run3,LNN,Log,Genomic,Single k-mer De Bruijn graph
2,0.0,GATBMiniaPipeline,13911245.0,693.0,0.0,13658336.0,219.0,98.807198,140561.0,99.719684,632215.0,1.0,3.0,136741.0,run3,LNN,Log,Metagenomic,Multiple k-mer De Bruijn graph
3,0.0,IDBA-UD,14324920.0,1987.0,0.0,13642314.0,261.0,98.951939,108043.0,99.694880,585610.0,2.0,2.0,105808.0,run3,LNN,Log,Metagenomic,Multiple k-mer De Bruijn graph
4,0.0,MEGAHIT,14402555.0,1842.0,0.0,13704539.0,172.0,99.686862,210993.0,99.765145,1080013.0,1.0,2.0,203548.0,run3,LNN,Log,Metagenomic,Multiple k-mer De Bruijn graph
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
211,0.0,MINIA,30648944.0,16748.0,0.0,28218335.0,3574.0,92.157082,20782.0,95.935753,161812.0,0.0,0.0,17967.0,run2,EMS,Even,Genomic,Single k-mer De Bruijn graph
212,0.0,SKESA,30249132.0,1269.0,0.0,30064811.0,891.0,97.237351,74138.0,97.952533,632013.0,9.0,18.0,73839.0,run2,EMS,Even,Genomic,Multiple k-mer De Bruijn graph
213,0.0,SPAdes,30664883.0,2628.0,0.0,30354786.0,412.0,98.283354,208991.0,98.700052,850176.0,0.0,0.0,206515.0,run2,EMS,Even,Genomic,Multiple k-mer De Bruijn graph
214,0.0,Unicycler,30301867.0,656.0,0.0,30188943.0,334.0,97.609864,245050.0,98.457135,1303875.0,0.0,0.0,245050.0,run2,EMS,Even,Genomic,Multiple k-mer De Bruijn graph


### Save data

In [26]:
global_pipeline_metrics_df.to_csv("Tables/Results/Global metrics.csv")

### Global stats per assembler - per sample

#### Values

In [250]:
best_stats_dfs = []

for sample in global_pipeline_metrics_df['sample'].unique():
    print(sample)
    stats_per_assembler = pd.DataFrame()
    best_stats_per_assembler = pd.DataFrame()
    for assembler in global_pipeline_metrics_df['assembler'].unique():
        
        row = {'Assembler': assembler, 
             'Type':global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        
        best_row = {'Assembler': assembler}
        
        describe_df = global_pipeline_metrics_df[(global_pipeline_metrics_df['assembler'] == assembler) & (global_pipeline_metrics_df['sample'] == sample)].describe()

        for column in describe_df.columns:
            mean = describe_df.loc['mean',column]
            minimum = describe_df.loc['min',column]
            maximum = describe_df.loc['max',column]
            
            row[column] = "{} [{};{}]".format(round(mean, 2), int(minimum), int(maximum))
            
            if "basepairs" in column:
                target = 30946587
                best_row[column] = min(list(global_pipeline_metrics_df[column][(global_pipeline_metrics_df['assembler'] == assembler) & (global_pipeline_metrics_df['sample'] == sample)]), key=lambda x:abs(x-target))
            elif column in best_min:
                best_row[column] = minimum
            else:
                best_row[column] = maximum
                
        stats_per_assembler=stats_per_assembler.append(row, ignore_index=True)
        best_stats_per_assembler=best_stats_per_assembler.append(best_row, ignore_index=True)
    
    stats_per_assembler = stats_per_assembler.set_index(['Assembler','Algorithm', 'Type'])
    best_stats_per_assembler = best_stats_per_assembler.set_index('Assembler')
    display(stats_per_assembler)
    display(best_stats_per_assembler)
    stats_per_assembler.to_csv("Tables/Results/Global metrics per assembler - {}.csv".format(sample))
    best_stats_per_assembler.to_csv("Tables/Results/Best Global metrics per assembler - {}.csv".format(sample))
    best_stats_dfs.append([sample, best_stats_per_assembler])

LNN


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Algorithm,Type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Single k-mer De Bruijn graph,Genomic,1345.67 [1345;1347],20132373.0 [20132244;20132631],44619.67 [44619;44621],400.0 [400;400],13573771.33 [13573734;13573790],1211.33 [1211;1212],98.9 [98;98],192306.0 [192306;192306],99.85 [99;99],851314.0 [851314;851314],0.0 [0;0],0.0 [0;0],9569.0 [9569;9569]
BCALM2,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],19635512.0 [19635512;19635512],44825.0 [44825;44825],0.0 [0;0],13397815.0 [13397815;13397815],784.0 [784;784],97.22 [97;97],31436.0 [31436;31436],98.31 [98;98],163543.0 [163543;163543],0.0 [0;0],0.0 [0;0],18682.0 [18682;18682]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],13911109.33 [13910838;13911245],693.0 [693;693],0.0 [0;0],13658336.0 [13658336;13658336],219.0 [219;219],98.81 [98;98],140561.0 [140561;140561],99.72 [99;99],632215.0 [632215;632215],1.0 [1;1],3.0 [3;3],136741.0 [136741;136741]
IDBA-UD,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],14324862.33 [14324747;14324920],1987.0 [1987;1987],0.0 [0;0],13642314.0 [13642314;13642314],261.0 [261;261],98.95 [98;98],108043.0 [108043;108043],99.69 [99;99],585610.0 [585610;585610],2.0 [2;2],2.0 [2;2],105808.0 [105808;105808]
MEGAHIT,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],14402555.0 [14402555;14402555],1842.0 [1842;1842],0.0 [0;0],13704539.0 [13704539;13704539],172.0 [172;172],99.69 [99;99],210993.0 [210993;210993],99.77 [99;99],1080013.0 [1080013;1080013],1.0 [1;1],2.0 [2;2],203548.0 [203548;203548]
MetaHipMer2,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
metaSPAdes,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],15514698.0 [15514698;15514698],6700.0 [6700;6700],0.0 [0;0],13692577.0 [13692577;13692577],139.0 [139;139],99.59 [99;99],239185.0 [239185;239185],99.8 [99;99],850099.0 [850099;850099],1.0 [1;1],3.0 [3;3],220609.0 [220609;220609]
MINIA,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],16401048.67 [16401028;16401090],15000.33 [15000;15001],0.0 [0;0],13501499.33 [13501498;13501500],762.0 [762;762],97.53 [97;97],33550.0 [33550;33550],99.01 [99;99],177038.0 [177038;177038],2.0 [2;2],4.0 [4;4],26474.0 [26474;26474]
SKESA,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],8202941.0 [8202941;8202941],9514.0 [9514;9514],0.0 [0;0],4470092.0 [4470092;4470092],1433.0 [1433;1433],91.77 [91;91],7385.0 [7385;7385],95.1 [95;95],59272.0 [59272;59272],12.0 [12;12],23.0 [23;23],1155.0 [1155;1155]
SPAdes,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],15606257.0 [15606257;15606257],7331.0 [7331;7331],0.0 [0;0],13640340.0 [13640340;13640340],227.0 [227;227],99.59 [99;99],136684.0 [136684;136684],99.71 [99;99],632639.0 [632639;632639],0.0 [0;0],0.0 [0;0],122703.0 [122703;122703]


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,1345.0,20132631.0,44619.0,400.0,13573790.0,1211.0,98.901625,192306.0,99.851429,851314.0,0.0,0.0,9569.0
BCALM2,0.0,19635512.0,44825.0,0.0,13397815.0,784.0,97.222124,31436.0,98.307446,163543.0,0.0,0.0,18682.0
GATBMiniaPipeline,0.0,13911245.0,693.0,0.0,13658336.0,219.0,98.807198,140561.0,99.719684,632215.0,1.0,3.0,136741.0
IDBA-UD,0.0,14324920.0,1987.0,0.0,13642314.0,261.0,98.951945,108043.0,99.69488,585610.0,2.0,2.0,105808.0
MEGAHIT,0.0,14402555.0,1842.0,0.0,13704539.0,172.0,99.686868,210993.0,99.765146,1080013.0,1.0,2.0,203548.0
MetaHipMer2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
metaSPAdes,0.0,15514698.0,6700.0,0.0,13692577.0,139.0,99.591855,239185.0,99.798003,850099.0,1.0,3.0,220609.0
MINIA,0.0,16401090.0,15000.0,0.0,13501500.0,762.0,97.535204,33550.0,99.011021,177038.0,2.0,4.0,26474.0
SKESA,0.0,8202941.0,9514.0,0.0,4470092.0,1433.0,91.766425,7385.0,95.10133,59272.0,12.0,23.0,1155.0
SPAdes,0.0,15606257.0,7331.0,0.0,13640340.0,227.0,99.591855,136684.0,99.713909,632639.0,0.0,0.0,122703.0


ERR2935805


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Algorithm,Type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Single k-mer De Bruijn graph,Genomic,9.0 [9;9],78727988.67 [78727717;78728532],677855.33 [677853;677860],0.0 [0;0],3150980.0 [3150980;3150980],356.0 [356;356],79.03 [79;79],18568.0 [18568;18568],91.83 [91;91],88665.0 [88665;88665],0.0 [0;0],0.0 [0;0],101.0 [101;101]
BCALM2,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],315567452.0 [315567452;315567452],8035706.0 [8035706;8035706],0.0 [0;0],657074.0 [657074;657074],504.0 [504;504],0.1 [0;0],1262.0 [1262;1262],1.43 [1;1],3093.0 [3093;3093],0.0 [0;0],0.0 [0;0],35.0 [35;35]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],17140207.33 [17126362;17147130],6776.33 [6776;6777],0.0 [0;0],13905380.67 [13891328;13912407],603.0 [603;603],88.9 [88;88],90393.0 [90095;90989],89.4 [89;89],542997.0 [542997;542997],0.67 [0;1],1.33 [0;2],55722.33 [55204;56759]
IDBA-UD,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],23587129.0 [23586883;23587252],14155.67 [14155;14156],0.0 [0;0],17534580.0 [17534580;17534580],2725.0 [2725;2725],89.23 [89;89],113853.0 [113853;113853],89.8 [89;89],632213.0 [632213;632213],3.0 [3;3],5.0 [5;5],51161.0 [51161;51161]
MEGAHIT,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],24498566.0 [24498566;24498566],14172.0 [14172;14172],0.0 [0;0],18552996.0 [18552996;18552996],3044.0 [3044;3044],89.5 [89;89],210993.0 [210993;210993],89.76 [89;89],768410.0 [768410;768410],1.0 [1;1],2.0 [2;2],82249.0 [82249;82249]
MetaHipMer2,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],15029445.33 [15029183;15029970],11970.67 [11966;11973],0.0 [0;0],10257182.33 [10253731;10264085],5033.33 [5031;5038],73.91 [73;73],2174.67 [2170;2177],84.35 [84;84],14760.0 [14760;14760],0.0 [0;0],0.0 [0;0],1515.67 [1515;1516]
metaSPAdes,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],27822010.0 [27822010;27822010],26282.0 [26282;26282],0.0 [0;0],18792639.0 [18792639;18792639],3392.0 [3392;3392],89.37 [89;89],87983.0 [87983;87983],89.93 [89;89],622107.0 [622107;622107],3.0 [3;3],7.0 [7;7],3674.0 [3674;3674]
MINIA,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],20758237.33 [20757924;20758864],37287.33 [37281;37300],0.0 [0;0],13447286.0 [13446596;13448666],1464.0 [1463;1466],84.24 [84;84],18204.67 [18038;18288],87.68 [87;87],163543.0 [163543;163543],2.0 [2;2],3.0 [3;3],7485.0 [7485;7485]
SKESA,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],8818742.0 [8818742;8818742],8822.0 [8822;8822],0.0 [0;0],5801439.0 [5801439;5801439],2889.0 [2889;2889],62.71 [62;62],2148.0 [2148;2148],78.66 [78;78],29155.0 [29155;29155],0.0 [0;0],0.0 [0;0],1431.0 [1431;1431]
SPAdes,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],29517365.0 [29517365;29517365],36214.0 [36214;36214],0.0 [0;0],19595813.0 [19595813;19595813],4999.0 [4999;4999],57.93 [57;57],55465.0 [55465;55465],87.24 [87;87],847683.0 [847683;847683],572.0 [572;572],1148.0 [1148;1148],1972.0 [1972;1972]


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,9.0,78727717.0,677853.0,0.0,3150980.0,356.0,79.029655,18568.0,91.834377,88665.0,0.0,0.0,101.0
BCALM2,0.0,315567452.0,8035706.0,0.0,657074.0,504.0,0.103994,1262.0,1.428389,3093.0,0.0,0.0,35.0
GATBMiniaPipeline,0.0,17147130.0,6776.0,0.0,13912407.0,603.0,88.896275,90989.0,89.404632,542997.0,0.0,0.0,56759.0
IDBA-UD,0.0,23587252.0,14155.0,0.0,17534580.0,2725.0,89.225493,113853.0,89.800832,632213.0,3.0,5.0,51161.0
MEGAHIT,0.0,24498566.0,14172.0,0.0,18552996.0,3044.0,89.502391,210993.0,89.764224,768410.0,1.0,2.0,82249.0
MetaHipMer2,0.0,15029970.0,11966.0,0.0,10264085.0,5031.0,73.959495,2177.0,84.356216,14760.0,0.0,0.0,1516.0
metaSPAdes,0.0,27822010.0,26282.0,0.0,18792639.0,3392.0,89.368112,87983.0,89.926254,622107.0,3.0,7.0,3674.0
MINIA,0.0,20758864.0,37281.0,0.0,13448666.0,1463.0,84.244476,18288.0,87.678584,163543.0,2.0,3.0,7485.0
SKESA,0.0,8818742.0,8822.0,0.0,5801439.0,2889.0,62.710033,2148.0,78.662557,29155.0,0.0,0.0,1431.0
SPAdes,0.0,29517365.0,36214.0,0.0,19595813.0,4999.0,57.929408,55465.0,87.244353,847683.0,572.0,1148.0,1972.0


LHS


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Algorithm,Type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
BCALM2,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],364761201.0 [364761201;364761201],8175890.0 [8175890;8175890],0.0 [0;0],11967.0 [11967;11967],11.0 [11;11],0.0 [0;0],1062.0 [1062;1062],6.39 [6;6],1212.0 [1212;1212],0.0 [0;0],0.0 [0;0],49.0 [49;49]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],13638465.67 [13638386;13638625],988.33 [988;989],0.0 [0;0],13507754.67 [13507672;13507796],762.0 [762;762],96.81 [96;96],91722.0 [91722;91722],97.17 [97;97],632215.0 [632215;632215],2.0 [2;2],6.0 [6;6],90987.0 [90987;90987]
IDBA-UD,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],13791669.67 [13791669;13791670],1025.0 [1025;1025],0.0 [0;0],13601108.67 [13601100;13601126],568.0 [568;568],96.96 [96;96],107465.0 [106347;108024],97.19 [97;97],632213.0 [632213;632213],2.0 [2;2],5.0 [5;5],104140.0 [99726;106347]
MEGAHIT,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],13843596.0 [13843596;13843596],767.0 [767;767],0.0 [0;0],13671964.0 [13671964;13671964],327.0 [327;327],97.15 [97;97],171540.0 [171540;171540],97.29 [97;97],632250.0 [632250;632250],0.0 [0;0],0.0 [0;0],171359.0 [171359;171359]
MetaHipMer2,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0]
metaSPAdes,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],14157309.0 [14157309;14157309],2099.0 [2099;2099],0.0 [0;0],13672885.0 [13672885;13672885],184.0 [184;184],97.18 [97;97],196786.0 [196786;196786],97.27 [97;97],917728.0 [917728;917728],1.0 [1;1],3.0 [3;3],193256.0 [193256;193256]
MINIA,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],15335005.0 [15334799;15335108],24660.67 [24654;24664],0.0 [0;0],12921859.0 [12920975;12922301],2171.67 [2171;2172],81.99 [81;81],10582.0 [10582;10582],91.09 [91;91],163543.0 [163543;163543],0.0 [0;0],0.0 [0;0],7727.0 [7727;7727]
SKESA,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],2957060.0 [2957060;2957060],36.0 [36;36],0.0 [0;0],2950393.0 [2950393;2950393],21.0 [21;21],92.0 [92;92],351027.0 [351027;351027],92.23 [92;92],632013.0 [632013;632013],0.0 [0;0],0.0 [0;0],351027.0 [351027;351027]
SPAdes,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],16214591.0 [16214591;16214591],31040.0 [31040;31040],0.0 [0;0],13087777.0 [13087777;13087777],1526.0 [1526;1526],77.08 [77;77],69361.0 [69361;69361],95.5 [95;95],263645.0 [263645;263645],1.0 [1;1],2.0 [2;2],46478.0 [46478;46478]


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BCALM2,0.0,364761201.0,8175890.0,0.0,11967.0,11.0,0.001875,1062.0,6.388781,1212.0,0.0,0.0,49.0
GATBMiniaPipeline,0.0,13638625.0,988.0,0.0,13507796.0,762.0,96.813085,91722.0,97.172274,632215.0,2.0,6.0,90987.0
IDBA-UD,0.0,13791670.0,1025.0,0.0,13601126.0,568.0,96.957028,108024.0,97.185816,632213.0,2.0,5.0,106347.0
MEGAHIT,0.0,13843596.0,767.0,0.0,13671964.0,327.0,97.147345,171540.0,97.286491,632250.0,0.0,0.0,171359.0
MetaHipMer2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
metaSPAdes,0.0,14157309.0,2099.0,0.0,13672885.0,184.0,97.182958,196786.0,97.267869,917728.0,1.0,3.0,193256.0
MINIA,0.0,15335108.0,24654.0,0.0,12922301.0,2171.0,81.989688,10582.0,91.095955,163543.0,0.0,0.0,7727.0
SKESA,0.0,2957060.0,36.0,0.0,2950393.0,21.0,92.000386,351027.0,92.228441,632013.0,0.0,0.0,351027.0
SPAdes,0.0,16214591.0,31040.0,0.0,13087777.0,1526.0,77.081486,69361.0,95.503474,263645.0,1.0,2.0,46478.0


EMS


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Algorithm,Type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Single k-mer De Bruijn graph,Genomic,882.0 [881;883],34563814.33 [34563647;34563949],32429.67 [32428;32431],611.33 [610;612],30988179.33 [30988165;30988200],498.0 [497;499],98.12 [98;98],138706.0 [138706;138706],98.63 [98;98],841100.0 [841100;841100],2.0 [2;2],3.0 [3;3],124246.0 [124246;124246]
BCALM2,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],39571553.0 [39571553;39571553],252182.0 [252182;252182],0.0 [0;0],16407295.0 [16407295;16407295],6250.0 [6250;6250],27.01 [27;27],3056.0 [3056;3056],65.26 [65;65],38021.0 [38021;38021],0.0 [0;0],0.0 [0;0],645.0 [645;645]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30509406.0 [30509406;30509406],943.0 [943;943],0.0 [0;0],30343348.0 [30343348;30343348],657.0 [657;657],98.14 [98;98],122545.0 [122545;122545],98.94 [98;98],768585.0 [768585;768585],1.0 [1;1],1.0 [1;1],121957.0 [121957;121957]
IDBA-UD,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30408932.0 [30408932;30408932],1829.0 [1829;1829],0.0 [0;0],30145929.0 [30145929;30145929],1254.0 [1254;1254],97.3 [97;97],63178.0 [63178;63178],98.64 [98;98],468323.0 [468323;468323],13.0 [13;13],22.0 [22;22],61771.67 [61704;61907]
MEGAHIT,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30599098.0 [30599036;30599129],979.0 [979;979],0.0 [0;0],30418686.0 [30418624;30418717],518.0 [518;518],98.65 [98;98],171540.0 [171540;171540],99.05 [99;99],768640.0 [768640;768640],10.0 [10;10],21.0 [21;21],171540.0 [171540;171540]
MetaHipMer2,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30381398.67 [30381152;30381792],884.33 [884;885],0.0 [0;0],30262824.0 [30262824;30262824],712.0 [712;712],97.25 [97;97],106637.0 [106637;106637],98.03 [98;98],468375.0 [468375;468375],0.0 [0;0],0.0 [0;0],106383.0 [106383;106383]
metaSPAdes,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30482042.0 [30482042;30482042],761.0 [761;761],0.0 [0;0],30402517.0 [30402517;30402517],365.0 [365;365],98.28 [98;98],208969.0 [208969;208969],98.77 [98;98],917726.0 [917726;917726],1.0 [1;1],3.0 [3;3],208969.0 [208969;208969]
MINIA,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],30649478.33 [30648944;30649850],16748.33 [16748;16749],0.0 [0;0],28218361.0 [28218335;28218378],3574.0 [3574;3574],92.16 [92;92],20782.0 [20782;20782],95.93 [95;95],161812.0 [161812;161812],0.0 [0;0],0.0 [0;0],17967.0 [17967;17967]
SKESA,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],30249132.0 [30249132;30249132],1269.0 [1269;1269],0.0 [0;0],30064811.0 [30064811;30064811],891.0 [891;891],97.24 [97;97],74138.0 [74138;74138],97.95 [97;97],632013.0 [632013;632013],9.0 [9;9],18.0 [18;18],73839.0 [73839;73839]
SPAdes,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],30664883.0 [30664883;30664883],2628.0 [2628;2628],0.0 [0;0],30354786.0 [30354786;30354786],412.0 [412;412],98.28 [98;98],208991.0 [208991;208991],98.7 [98;98],850176.0 [850176;850176],0.0 [0;0],0.0 [0;0],206515.0 [206515;206515]


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,881.0,34563647.0,32428.0,610.0,30988165.0,497.0,98.120755,138706.0,98.63473,841100.0,2.0,3.0,124246.0
BCALM2,0.0,39571553.0,252182.0,0.0,16407295.0,6250.0,27.009095,3056.0,65.260675,38021.0,0.0,0.0,645.0
GATBMiniaPipeline,0.0,30509406.0,943.0,0.0,30343348.0,657.0,98.141277,122545.0,98.943315,768585.0,1.0,1.0,121957.0
IDBA-UD,0.0,30408932.0,1829.0,0.0,30145929.0,1254.0,97.299088,63178.0,98.642441,468323.0,13.0,22.0,61907.0
MEGAHIT,0.0,30599129.0,979.0,0.0,30418717.0,518.0,98.647341,171540.0,99.053249,768640.0,10.0,21.0,171540.0
MetaHipMer2,0.0,30381792.0,884.0,0.0,30262824.0,712.0,97.251875,106637.0,98.03828,468375.0,0.0,0.0,106383.0
metaSPAdes,0.0,30482042.0,761.0,0.0,30402517.0,365.0,98.283354,208969.0,98.77395,917726.0,1.0,3.0,208969.0
MINIA,0.0,30649850.0,16748.0,0.0,28218378.0,3574.0,92.15723,20782.0,95.935986,161812.0,0.0,0.0,17967.0
SKESA,0.0,30249132.0,1269.0,0.0,30064811.0,891.0,97.237351,74138.0,97.952533,632013.0,9.0,18.0,73839.0
SPAdes,0.0,30664883.0,2628.0,0.0,30354786.0,412.0,98.283354,208991.0,98.700052,850176.0,0.0,0.0,206515.0


ERR2984773


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Algorithm,Type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Single k-mer De Bruijn graph,Genomic,98.67 [0;149],61139451.0 [0;91720949],331860.0 [0;497797],4.0 [0;7],20429844.33 [0;30657287],800.33 [0;1205],53.74 [0;80],35189.0 [0;52869],56.42 [0;84],198658.0 [0;298093],1.33 [0;2],3.33 [0;6],92.67 [0;139]
BCALM2,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],170402348.0 [170402348;170402348],3508016.0 [3508016;3508016],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],0.0 [0;0],8.15 [8;8],994.0 [994;994],0.0 [0;0],0.0 [0;0],49.0 [49;49]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],32809889.0 [32809112;32810830],5398.33 [5397;5400],0.0 [0;0],30564350.0 [30564302;30564396],761.0 [761;761],82.23 [82;82],113162.0 [113162;113162],83.14 [83;83],768462.0 [768462;768462],1.0 [1;1],2.0 [2;2],102115.0 [102115;102115]
IDBA-UD,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],39776993.0 [39773020;39779384],20182.33 [20164;20194],0.0 [0;0],31562689.67 [31562689;31562690],2654.0 [2654;2654],81.31 [81;81],29134.0 [29134;29134],83.67 [83;83],303329.0 [303329;303329],4.0 [4;4],7.0 [7;7],20993.0 [20993;20993]
MEGAHIT,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],41166407.0 [41166407;41166407],20614.0 [20614;20614],0.0 [0;0],32021766.0 [32021766;32021766],1447.0 [1447;1447],82.83 [82;82],149841.0 [149841;149841],84.22 [84;84],1205839.0 [1205839;1205839],15.0 [15;15],31.67 [31;32],88308.0 [88308;88308]
MetaHipMer2,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],13179761.33 [13174826;13184199],19356.0 [19351;19361],0.0 [0;0],1600962.67 [1599587;1601911],1309.0 [1308;1310],3.96 [3;3],1176.0 [1175;1178],25.5 [25;25],2964.0 [2964;2964],0.0 [0;0],0.0 [0;0],658.67 [658;659]
metaSPAdes,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],48964620.0 [48964620;48964620],61644.0 [61644;61644],0.0 [0;0],31967753.0 [31967753;31967753],1689.0 [1689;1689],82.04 [82;82],95101.0 [95101;95101],84.33 [84;84],473277.0 [473277;473277],2.0 [2;2],3.0 [3;3],42332.0 [42332;42332]
MINIA,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],41186649.33 [41185696;41187744],67991.0 [67974;68012],0.0 [0;0],28277982.0 [28276538;28279162],3980.33 [3978;3983],74.36 [74;74],15886.33 [15817;15930],80.65 [80;80],117945.0 [117945;117945],0.0 [0;0],0.0 [0;0],7090.67 [7060;7106]
SKESA,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],30430077.0 [30430077;30430077],1352.0 [1352;1352],0.0 [0;0],30211740.0 [30211740;30211740],846.0 [846;846],80.51 [80;80],77870.0 [77870;77870],81.25 [81;81],446163.0 [446163;446163],0.0 [0;0],0.0 [0;0],76924.0 [76924;76924]
SPAdes,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],46535940.0 [46535940;46535940],39819.0 [39819;39819],0.0 [0;0],32500826.0 [32500826;32500826],2049.0 [2049;2049],81.17 [81;81],99299.0 [99299;99299],84.2 [84;84],745370.0 [745370;745370],9.0 [9;9],18.0 [18;18],53292.0 [53292;53292]


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.0,0.0,0.0,0.0,30657287.0,0.0,80.631071,52869.0,84.624746,298093.0,0.0,0.0,139.0
BCALM2,0.0,170402348.0,3508016.0,0.0,0.0,0.0,0.0,0.0,8.149572,994.0,0.0,0.0,49.0
GATBMiniaPipeline,0.0,32809112.0,5397.0,0.0,30564396.0,761.0,82.22544,113162.0,83.144413,768462.0,1.0,2.0,102115.0
IDBA-UD,0.0,39773020.0,20164.0,0.0,31562689.0,2654.0,81.309318,29134.0,83.671484,303329.0,4.0,7.0,20993.0
MEGAHIT,0.0,41166407.0,20614.0,0.0,32021766.0,1447.0,82.834929,149841.0,84.223732,1205839.0,15.0,31.0,88308.0
MetaHipMer2,0.0,13184199.0,19351.0,0.0,1601911.0,1308.0,3.970853,1178.0,25.512982,2964.0,0.0,0.0,659.0
metaSPAdes,0.0,48964620.0,61644.0,0.0,31967753.0,1689.0,82.040817,95101.0,84.326108,473277.0,2.0,3.0,42332.0
MINIA,0.0,41185696.0,67974.0,0.0,28279162.0,3978.0,74.366709,15930.0,80.64856,117945.0,0.0,0.0,7106.0
SKESA,0.0,30430077.0,1352.0,0.0,30211740.0,846.0,80.508554,77870.0,81.254167,446163.0,0.0,0.0,76924.0
SPAdes,0.0,46535940.0,39819.0,0.0,32500826.0,2049.0,81.169518,99299.0,84.204433,745370.0,9.0,18.0,53292.0


ENN


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Algorithm,Type,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ABySS,Single k-mer De Bruijn graph,Genomic,3881.0 [3879;3884],31510320.33 [31503127;31513997],4772.33 [4769;4778],2328.0 [2328;2328],30742562.33 [30734590;30746773],1823.33 [1822;1825],98.92 [98;98],89282.0 [89282;89282],99.79 [99;99],952617.33 [952616;952618],7.33 [7;8],12.67 [12;14],86573.0 [86573;86573]
BCALM2,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],31047581.0 [31047581;31047581],25377.0 [25377;25377],0.0 [0;0],28174748.0 [28174748;28174748],3575.0 [3575;3575],92.59 [92;92],20700.0 [20700;20700],96.52 [96;96],161812.0 [161812;161812],0.0 [0;0],0.0 [0;0],17395.0 [17395;17395]
GATBMiniaPipeline,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30440762.0 [30440762;30440762],1100.0 [1100;1100],0.0 [0;0],30273320.0 [30273320;30273320],814.0 [814;814],98.46 [98;98],90988.0 [90988;90988],99.34 [99;99],632215.0 [632215;632215],3.0 [3;3],6.0 [6;6],90346.0 [90346;90346]
IDBA-UD,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30390566.0 [30390566;30390566],1822.0 [1822;1822],0.0 [0;0],30130413.0 [30130413;30130413],1279.0 [1279;1279],97.92 [97;97],65108.0 [65108;65108],99.34 [99;99],585610.0 [585610;585610],17.33 [17;18],34.67 [34;36],63725.0 [63725;63725]
MEGAHIT,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30545419.0 [30545419;30545419],814.0 [814;814],0.0 [0;0],30420265.0 [30420265;30420265],525.0 [525;525],99.39 [99;99],179870.0 [179870;179870],99.69 [99;99],632274.0 [632274;632274],9.0 [9;9],18.0 [18;18],178381.0 [178381;178381]
MetaHipMer2,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30378682.33 [30378611;30378729],882.67 [882;883],0.0 [0;0],30264906.33 [30264835;30264953],718.67 [718;719],97.95 [97;97],102049.0 [102049;102049],98.73 [98;98],468359.0 [468359;468359],0.0 [0;0],0.0 [0;0],99990.0 [99990;99990]
metaSPAdes,Multiple k-mer De Bruijn graph,Metagenomic,0.0 [0;0],30453778.0 [30453778;30453778],640.0 [640;640],0.0 [0;0],30383186.0 [30383186;30383186],359.0 [359;359],98.92 [98;98],238611.0 [238611;238611],99.53 [99;99],1036942.0 [1036942;1036942],2.0 [2;2],6.0 [6;6],238611.0 [238611;238611]
MINIA,Single k-mer De Bruijn graph,Genomic,0.0 [0;0],30641342.33 [30641333;30641357],16570.0 [16570;16570],0.0 [0;0],28205937.67 [28205937;28205939],3589.0 [3589;3589],92.93 [92;92],20628.0 [20628;20628],97.19 [97;97],161812.0 [161812;161812],0.0 [0;0],0.0 [0;0],17820.0 [17820;17820]
SKESA,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],30110758.0 [30110758;30110758],2322.0 [2322;2322],0.0 [0;0],29835112.0 [29835112;29835112],1726.0 [1726;1726],97.42 [97;97],43168.0 [43168;43168],98.31 [98;98],630384.0 [630384;630384],10.0 [10;10],19.0 [19;19],42115.0 [42115;42115]
SPAdes,Multiple k-mer De Bruijn graph,Genomic,0.0 [0;0],30442718.0 [30442718;30442718],1140.0 [1140;1140],0.0 [0;0],30276227.0 [30276227;30276227],444.0 [444;444],98.56 [98;98],204481.0 [204481;204481],99.42 [99;99],850138.0 [850138;850138],3.0 [3;3],6.0 [6;6],204481.0 [204481;204481]


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,3879.0,31503127.0,4769.0,2328.0,30746773.0,1822.0,98.923812,89282.0,99.78769,952618.0,7.0,12.0,86573.0
BCALM2,0.0,31047581.0,25377.0,0.0,28174748.0,3575.0,92.594028,20700.0,96.523967,161812.0,0.0,0.0,17395.0
GATBMiniaPipeline,0.0,30440762.0,1100.0,0.0,30273320.0,814.0,98.462815,90988.0,99.340345,632215.0,3.0,6.0,90346.0
IDBA-UD,0.0,30390566.0,1822.0,0.0,30130413.0,1279.0,97.922883,65108.0,99.343697,585610.0,17.0,34.0,63725.0
MEGAHIT,0.0,30545419.0,814.0,0.0,30420265.0,525.0,99.38811,179870.0,99.693845,632274.0,9.0,18.0,178381.0
MetaHipMer2,0.0,30378729.0,882.0,0.0,30264953.0,718.0,97.946876,102049.0,98.726583,468359.0,0.0,0.0,99990.0
metaSPAdes,0.0,30453778.0,640.0,0.0,30383186.0,359.0,98.91725,238611.0,99.530051,1036942.0,2.0,6.0,238611.0
MINIA,0.0,30641357.0,16570.0,0.0,28205939.0,3589.0,92.933709,20628.0,97.18818,161812.0,0.0,0.0,17820.0
SKESA,0.0,30110758.0,2322.0,0.0,29835112.0,1726.0,97.424467,43168.0,98.313391,630384.0,10.0,19.0,42115.0
SPAdes,0.0,30442718.0,1140.0,0.0,30276227.0,444.0,98.561253,204481.0,99.41944,850138.0,3.0,6.0,204481.0


#### rank

In [315]:
rank_dfs = []
for sample,df in best_stats_dfs:
    print(sample)
    
    rank_df = pd.DataFrame(index=df.index)
    
    for column in df.columns:
        rank_list = []
        
        if "basepairs" in column:
            target = 30946587
            for item in df[column]:
                diff = item / target
                if 1 < diff < 1.99999:
                    diff = diff - 1
                elif diff > 1.99999:
                    diff = 0
                rank_list.append(diff)
            rank_df[column] = rank_list
        else:    
            if column in best_min:
                for item in df[column]:
                    diff = 1 - (item / df[column].max())
                    rank_list.append(diff)
            else:
                for item in df[column]:
                    diff = (item / df[column].max())
                    rank_list.append(diff)

            rank_df[column] = rank_list
    
    # in case assembly fails
    for assembler, row in df.iterrows():
        if row.sum() == 0:
            rank_df.at[assembler, :]=0

    rank_df = rank_df.fillna(1) #happens when all values are 0
    display(rank_df) 
    rank_df.to_csv("Tables/Results/Rank Global metrics per assembler - {}.csv".format(sample))
    rank_dfs.append([sample, rank_df])
            

LNN


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.727291,0.650561,0.004596,0.0,0.43862,0.15492,0.992123,0.792403,1.0,0.788244,1.0,1.0,0.039429
BCALM2,1.0,0.634497,0.0,1.0,0.432934,0.452896,0.975275,0.129533,0.984537,0.151427,1.0,1.0,0.07698
GATBMiniaPipeline,1.0,0.449524,0.98454,1.0,0.441352,0.847174,0.991176,0.579186,0.998681,0.585377,0.916667,0.869565,0.563446
IDBA-UD,1.0,0.462892,0.955672,1.0,0.440834,0.817865,0.992628,0.445195,0.998432,0.542225,0.833333,0.913043,0.435985
MEGAHIT,1.0,0.4654,0.958907,1.0,0.442845,0.879972,1.0,0.869404,0.999136,1.0,0.916667,0.913043,0.838726
MetaHipMer2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
metaSPAdes,1.0,0.501338,0.85053,1.0,0.442458,0.903001,0.999047,0.98557,0.999465,0.787119,0.916667,0.869565,0.909027
MINIA,1.0,0.529981,0.665365,1.0,0.436284,0.468248,0.978416,0.138244,0.991583,0.163922,0.833333,0.826087,0.109087
SKESA,1.0,0.265068,0.787752,1.0,0.144445,0.0,0.920547,0.03043,0.952428,0.054881,0.0,0.0,0.004759
SPAdes,1.0,0.504297,0.836453,1.0,0.44077,0.841591,0.999047,0.563211,0.998623,0.58577,1.0,1.0,0.505602


ERR2935805



invalid value encountered in double_scalars



Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.0,0.0,0.915645,1.0,0.10182,0.929239,0.882989,0.072282,1.0,0.104597,1.0,1.0,0.000393
BCALM2,1.0,0.0,0.0,1.0,0.021233,0.899821,0.001162,0.004913,0.015554,0.003649,1.0,1.0,0.000136
GATBMiniaPipeline,1.0,0.554088,0.999157,1.0,0.449562,0.880143,0.993228,0.354203,0.973542,0.640566,1.0,1.0,0.220952
IDBA-UD,1.0,0.762192,0.998238,1.0,0.566608,0.458358,0.996906,0.443208,0.977856,0.745813,0.994755,0.995645,0.19916
MEGAHIT,1.0,0.79164,0.998236,1.0,0.599517,0.394951,1.0,0.821355,0.977458,0.906483,0.998252,0.998258,0.32018
MetaHipMer2,1.0,0.485675,0.998511,1.0,0.331671,0.0,0.826341,0.008475,0.918569,0.017412,1.0,1.0,0.005901
metaSPAdes,1.0,0.899033,0.996729,1.0,0.60726,0.32578,0.9985,0.342501,0.979222,0.733891,0.994755,0.993902,0.014302
MINIA,1.0,0.670797,0.995361,1.0,0.434577,0.709203,0.941254,0.071192,0.954747,0.192929,0.996503,0.997387,0.029138
SKESA,1.0,0.284967,0.998902,1.0,0.187466,0.42576,0.700652,0.008362,0.85657,0.034394,1.0,1.0,0.005571
SPAdes,1.0,0.953816,0.995493,1.0,0.633214,0.006361,0.647239,0.215915,0.950018,1.0,0.0,0.0,0.007677


LHS



invalid value encountered in double_scalars



Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
BCALM2,1.0,0.0,0.0,1.0,0.000387,0.994933,1.9e-05,0.003025,0.06567,0.001321,1.0,1.0,0.00014
GATBMiniaPipeline,1.0,0.440715,0.999879,1.0,0.436487,0.64901,0.996194,0.261296,0.998826,0.688891,0.6,0.4,0.259202
IDBA-UD,1.0,0.44566,0.999875,1.0,0.439503,0.738369,0.997675,0.307737,0.998965,0.688889,0.6,0.5,0.30296
MEGAHIT,1.0,0.447338,0.999906,1.0,0.441792,0.849378,0.999634,0.48868,1.0,0.68893,1.0,1.0,0.488165
MetaHipMer2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
metaSPAdes,1.0,0.457476,0.999743,1.0,0.441822,0.915246,1.0,0.560601,0.999809,1.0,0.8,0.7,0.550545
MINIA,1.0,0.495535,0.996985,1.0,0.417568,0.0,0.843663,0.030146,0.936368,0.178204,1.0,1.0,0.022013
SKESA,1.0,0.095554,0.999996,1.0,0.095338,0.990327,0.946672,1.0,0.948009,0.688671,1.0,1.0,1.0
SPAdes,1.0,0.523954,0.996203,1.0,0.422915,0.297098,0.793158,0.197594,0.981673,0.28728,0.8,0.8,0.132406


EMS


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.798996,0.116881,0.87141,0.860826,0.001344,0.92048,0.994662,0.566031,0.995775,0.645077,0.846154,0.863636,0.507023
BCALM2,1.0,0.278705,0.0,1.0,0.530181,0.0,0.273794,0.012471,0.658844,0.02916,1.0,1.0,0.002632
GATBMiniaPipeline,1.0,0.985873,0.996261,1.0,0.980507,0.89488,0.99487,0.500082,0.99889,0.589462,0.923077,0.954545,0.497682
IDBA-UD,1.0,0.982626,0.992747,1.0,0.974128,0.79936,0.986333,0.257817,0.995853,0.359178,0.0,0.0,0.25263
MEGAHIT,1.0,0.988772,0.996118,1.0,0.982943,0.91712,1.0,0.70002,1.0,0.589504,0.230769,0.045455,0.70002
MetaHipMer2,1.0,0.981749,0.996495,1.0,0.977905,0.88608,0.985854,0.435164,0.989753,0.359218,1.0,1.0,0.434128
metaSPAdes,1.0,0.984989,0.996982,1.0,0.982419,0.9416,0.99631,0.852761,0.99718,0.703845,0.923077,0.863636,0.852761
MINIA,1.0,0.990411,0.933588,1.0,0.911841,0.42816,0.934209,0.084807,0.968529,0.124101,1.0,1.0,0.07332
SKESA,1.0,0.977463,0.994968,1.0,0.971507,0.85744,0.985707,0.302542,0.988888,0.484719,0.307692,0.181818,0.301322
SPAdes,1.0,0.990897,0.989579,1.0,0.980877,0.93408,0.99631,0.85285,0.996434,0.652038,1.0,1.0,0.842746


ERR2984773



invalid value encountered in double_scalars



Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,1.0,0.0,1.0,1.0,0.990652,1.0,0.973395,0.276188,1.0,0.247208,1.0,1.0,0.000726
BCALM2,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.096302,0.000824,1.0,1.0,0.000256
GATBMiniaPipeline,1.0,0.060185,0.998462,1.0,0.98765,0.808698,0.992642,0.591159,0.982507,0.637284,0.933333,0.935484,0.533449
IDBA-UD,1.0,0.285215,0.994252,1.0,0.019909,0.332831,0.981583,0.152196,0.988735,0.25155,0.733333,0.774194,0.109668
MEGAHIT,1.0,0.330241,0.994124,1.0,0.034743,0.636249,1.0,0.78277,0.995261,1.0,0.0,0.0,0.461321
MetaHipMer2,1.0,0.426031,0.994484,1.0,0.051764,0.671192,0.047937,0.006154,0.301484,0.002458,1.0,1.0,0.003443
metaSPAdes,1.0,0.58223,0.982428,1.0,0.032998,0.575415,0.990413,0.496808,0.996471,0.392488,0.866667,0.903226,0.221143
MINIA,1.0,0.330864,0.980623,1.0,0.913806,0.0,0.89777,0.083218,0.953014,0.097812,1.0,1.0,0.037122
SKESA,1.0,0.98331,0.999615,1.0,0.976254,0.78733,0.971916,0.406793,0.96017,0.370002,1.0,1.0,0.401851
SPAdes,1.0,0.50375,0.988649,1.0,0.050223,0.484917,0.979895,0.518739,0.995033,0.618134,0.4,0.419355,0.278398


ENN


Unnamed: 0_level_0,Ns,basepairs,contigs,filtered_Ns,filtered_basepairs,filtered_contigs,filtered_mapped_reads,filtered_n50,mapped_reads,max_contig,misassembled contigs,misassembly events,n50
Assembler,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ABySS,0.558603,0.017984,0.812074,0.718942,0.993543,0.492338,0.995328,0.374174,1.0,0.91868,0.588235,0.647059,0.362821
BCALM2,1.0,0.003263,0.0,1.0,0.910432,0.003901,0.931641,0.086752,0.967293,0.156047,1.0,1.0,0.072901
GATBMiniaPipeline,1.0,0.983655,0.956654,1.0,0.978244,0.773196,0.99069,0.381324,0.995517,0.609692,0.823529,0.823529,0.378633
IDBA-UD,1.0,0.982033,0.928203,1.0,0.973626,0.643633,0.985258,0.272863,0.995551,0.564747,0.0,0.0,0.267066
MEGAHIT,1.0,0.987037,0.967924,1.0,0.982993,0.85372,1.0,0.753821,0.99906,0.609749,0.470588,0.470588,0.747581
MetaHipMer2,1.0,0.98165,0.965244,1.0,0.977974,0.799944,0.985499,0.427679,0.989366,0.451673,1.0,1.0,0.41905
metaSPAdes,1.0,0.984075,0.97478,1.0,0.981794,0.899972,0.995262,1.0,0.997418,1.0,0.882353,0.823529,1.0
MINIA,1.0,0.990137,0.347047,1.0,0.911439,0.0,0.935059,0.08645,0.97395,0.156047,1.0,1.0,0.074682
SKESA,1.0,0.972991,0.9085,1.0,0.964084,0.519086,0.980243,0.180914,0.985226,0.607926,0.411765,0.441176,0.176501
SPAdes,1.0,0.983718,0.955077,1.0,0.978338,0.876289,0.991681,0.856964,0.99631,0.819851,0.823529,0.823529,0.856964


#### radar plots

In [325]:
import plotly.graph_objects as go

for sample, df in rank_dfs:

    categories = ['contigs','filtered_contigs','basepairs','filtered_basepairs', 'max_contig', 'n50','filtered_n50', 'mapped_reads','filtered_mapped_reads', 'Ns','filtered_Ns','misassembled contigs', 'misassembly events']
    categories_original =  ['contigs','basepairs', 'max_contig', 'n50', 'mapped_reads', 'Ns'] 
    categories_original_renamed =  ['Contigs','Basepairs', 'Largest contig', 'N50', '% Mapped reads', 'Uncalled bases'] 
    categories_filtered = ['filtered_contigs', 'filtered_basepairs', 'filtered_n50', 'filtered_mapped_reads','filtered_Ns', 'misassembled contigs', 'misassembly events']
    categories_filtered_renamed = ['Contigs', 'Basepairs', 'N50', '% Mapped reads','Uncalled bases', 'Misassembled contigs', 'Misassembly events']

    fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'polar'}, {'type': 'polar'}]], subplot_titles=('Original', 'Filtered'))
    
    i= 0
    
    for assembler, row in df.iterrows():
        
        row_filtered = []
        row_original = []
        
        for col in categories_original:
            row_original.append(row.loc[:].at[col])
            
        for col in categories_filtered:
            row_filtered.append(row.loc[:].at[col])

        fig.add_trace(go.Scatterpolar(r=row_original,
                                      theta=categories_original_renamed, mode='lines+markers',
                                      marker=dict(color=COLOURS[i], size=12), 
                                      marker_line_color="black", 
                                      marker_line_width=2,
                                      opacity=0.6,
                                      name=assembler, line=dict(color=COLOURS[i]), showlegend=True), col=1, row=1)
        
        fig.add_trace(go.Scatterpolar(r=row_filtered,
                                      theta=categories_filtered_renamed, mode='lines+markers',
                                      marker=dict(color=COLOURS[i], size=12), 
                                      marker_line_color="black", 
                                      marker_line_width=2,
                                      opacity=0.6,
                                      name=assembler, line=dict(color=COLOURS[i]), showlegend=False), col=2, row=1)
        i+=1
            
    fig.update_layout(polar=dict(radialaxis=dict(visible=True,
                                                 range=[0,1],
                                                 linewidth = 2,
                                                 linecolor="black",
                                                 gridcolor = "#DCDCDC"), 
                                 hole=1/12, bgcolor='rgb(255,255,255)',
                                angularaxis=dict(linecolor="black"),
                                angularaxis_categoryarray = categories_original_renamed,
                                radialaxis_angle = -22.5),
                     polar2=dict(radialaxis=dict(visible=True,
                                                 range=[0,1],
                                                 linewidth = 2,
                                                 linecolor="black",
                                                 gridcolor = "#DCDCDC"), 
                                 hole=1/12, bgcolor='rgb(255,255,255)',
                                angularaxis=dict(linecolor="black"),
                                angularaxis_categoryarray = categories_filtered_renamed,
                                radialaxis_angle = -22.5))
    fig.update_layout(title=sample)
    fig.update_layout(legend=dict(
        orientation="h",
        yanchor="bottom",
        y=-0.5,
        xanchor="left",
        x=0
    ))

    fig.show()



In [308]:
import plotly.graph_objects as go

for sample, df in rank_dfs:

    categories = ['contigs','filtered_contigs','basepairs','filtered_basepairs', 'max_contig', 'n50','filtered_n50', 'mapped_reads','filtered_mapped_reads', 'Ns','filtered_Ns','misassembled contigs', 'misassembly events']
    categories_original =  ['contigs','basepairs', 'max_contig', 'n50', 'mapped_reads', 'Ns'] 
    categories_original_renamed =  ['Contigs','Basepairs', 'Largest contig', 'N50', '% Mapped reads', 'Uncalled bases'] 
    categories_filtered = ['filtered_contigs', 'filtered_basepairs', 'filtered_n50', 'filtered_mapped_reads','filtered_Ns', 'misassembled contigs', 'misassembly events']
    categories_filtered_renamed = ['Contigs', 'Basepairs', 'N50', '% Mapped reads','Uncalled bases', 'Misassembled contigs', 'Misassembly events']

    fig = make_subplots(rows=1, cols=2, specs=[[{'type': 'polar'}]*2], subplot_titles=('Original', 'Filtered'))
    
    COLOURS_ORIGINAL = ['#5876c8', '#009392', '#9CCB86', '#EEB479', '#E88471', '#CF597E']
    COLOURS_FILTERED = ['#5876c8', '#009392', '#EEB479', '#E88471', '#CF597E', '#a54765', '#835221']

    i = 0
    j = 0
    assembler_list = list(df.index)
    
    for column in categories:
                
        row_all = list(df[column])
        

        if column in categories_original:
            name = categories_original_renamed[categories_original.index(column)]
            fig.add_trace(go.Scatterpolar(r=row_all,
                                          theta=assembler_list,
                                          name=name, 
                                          mode='markers', 
                                          marker=dict(color=COLOURS_ORIGINAL[i], size=12), 
                                          marker_line_color="black", 
                                          marker_line_width=2,
                                          opacity=0.6,showlegend=True),
                          row=1, col=1) 
            i+=1
            
        elif column in categories_filtered:
            name = categories_filtered_renamed[categories_filtered.index(column)]
            showlegend = True if column in ['misassembled contigs', 'misassembly events'] else False
            fig.add_trace(go.Scatterpolar(r=row_all,
                                          theta=assembler_list,
                                          name=name, 
                                          mode='markers', 
                                          marker=dict(color=COLOURS_FILTERED[j], size=12),
                                          marker_line_color="black", 
                                          marker_line_width=2, 
                                          opacity=0.6,showlegend=showlegend),
                         row=1, col=2)
            j+=1
        
    
    fig.update_layout(polar=dict(radialaxis=dict(visible=True,
                                                 range=[0,1],
                                                 linewidth = 2,
                                                 linecolor="black",
                                                 gridcolor = "#DCDCDC"), 
                                 hole=1/12, bgcolor='rgb(255,255,255)',
                                angularaxis=dict(linecolor="black"),
                                angularaxis_categoryarray = assembler_list + [' '],
                                radialaxis_angle = -22.5),
                     polar2=dict(radialaxis=dict(visible=True,
                                                 range=[0,1],
                                                 linewidth = 2,
                                                 linecolor="black",
                                                 gridcolor = "#DCDCDC"), 
                                 hole=1/12, bgcolor='rgb(255,255,255)',
                                angularaxis=dict(linecolor="black"),
                                angularaxis_categoryarray = assembler_list + [' '],
                                radialaxis_angle = -22.5))
    fig.update_layout(title=sample)

    fig.show()



#### Boxplot rank distribution

In [309]:

for sample, df in rank_dfs:
    
    fig = go.Figure()
    
    for assembler in df.index:
        
        fig.add_trace(go.Box(x=df.loc[[assembler]].values[0],
                                 name=assembler, boxpoints='all',jitter=1, pointpos=0,
                                 boxmean=False, fillcolor='#D3D3D3', line=dict(color='#000000'),
                                marker=dict(color='rgba(178,37,34,0.7)', size=6)))
    fig.update_layout(showlegend=False, xaxis_title="Rank",
                          plot_bgcolor='rgb(255,255,255)', xaxis=dict(zeroline=False, gridcolor='#DCDCDC'),
                     title=sample)
    fig.show()


### Global statistics per assembler type

In [None]:
stats_per_assembler_type = pd.DataFrame()
for assembly_type in global_pipeline_metrics_df.type.unique():
    row={'Assembler Type': assembly_type}
    lala=global_pipeline_metrics_df[(global_pipeline_metrics_df.type == assembly_type) & (global_pipeline_metrics_df.algorythm == "Multiple k-mer De Bruijn graph")].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2), int(minimum), int(maximum))
    stats_per_assembler_type=stats_per_assembler_type.append(row, ignore_index=True)
stats_per_assembler_type = stats_per_assembler_type.set_index('Assembler Type')
stats_per_assembler_type.to_csv("Tables/Results/Global metrics per assembler type multiple kmer.csv")

In [None]:
stats_per_assembler_type

### Global stats per assembly algorithm

In [None]:
stats_per_assembler_algo = pd.DataFrame()
for assembly_algo in global_pipeline_metrics_df.algorythm.unique():
    row={'Assembler Algorithm': assembly_algo}
    lala=global_pipeline_metrics_df[global_pipeline_metrics_df.algorythm == assembly_algo].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2), int(minimum), int(maximum))
    stats_per_assembler_algo=stats_per_assembler_algo.append(row, ignore_index=True)
stats_per_assembler_algo = stats_per_assembler_algo.set_index('Assembler Algorithm')
stats_per_assembler_algo.to_csv("Tables/Results/Global metrics per assembler algorithm.csv")

In [None]:
stats_per_assembler_algo

### Global stats per distribution

In [None]:
for distribution in global_pipeline_metrics_df.distribution.unique():
    stats_per_distribution = pd.DataFrame()
    for assembler in global_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':global_pipeline_metrics_df[(global_pipeline_metrics_df.assembler == assembler) & (global_pipeline_metrics_df.distribution == distribution)].type.unique()[0],
             'Algorithm': global_pipeline_metrics_df[global_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=global_pipeline_metrics_df[(global_pipeline_metrics_df.assembler == assembler) & (global_pipeline_metrics_df.distribution == distribution)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            minimum = lala.loc['min',column]
            maximum = lala.loc['max',column]
            row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
        stats_per_distribution=stats_per_distribution.append(row, ignore_index=True)
    stats_per_distribution = stats_per_distribution.set_index('Assembler')
    stats_per_distribution.to_csv("Tables/Results/{} - Global metrics per assembler.csv".format(distribution))
    print(distribution)
    display(stats_per_distribution)

## Reference-based metrics

### Load data

In [None]:
report_glob = glob.glob('../Results/*/*/report/pipeline_report_tables.json')
reference_pipeline_metrics_df = pd.DataFrame()

for pipeline_report_file in report_glob:
    report_file_name = pipeline_report_file.split('/')[-1]
    stats_run = pipeline_report_file.split('/')[-3]
    print('Processing {0} data from {1}...'.format(report_file_name, stats_run))
    
    with open(pipeline_report_file) as _fh:
        json_report = json.load(_fh)
        for sample in json_report.keys():
            for reference, data in json_report[sample]['ReferenceTables'].items():
                for row in data:
                    for item in row:
                        reference_pipeline_metrics_df = reference_pipeline_metrics_df.append({'run': stats_run,
                                                                                        'sample': sample,
                                                                                        'assembler': item['assembler'],
                                                                                        'reference': reference,
                                                                                        'LSA': item['contiguity'],
                                                                                        'breadth_of_coverage': item['breadth_of_coverage'],
                                                                                        'multiplicity': item['multiplicity'],
                                                                                        'validity': item['validity'],
                                                                                        'parsimony': item['parsimony'],
                                                                                        'identity': item['identity'],
                                                                                        'lowest_identity': item['lowest_identity'],
                                                                                        'L90': item['L90'],
                                                                                        'contigs': item['aligned_contigs'],
                                                                                        'NA50': item['NA50'],
                                                                                        'NG50': item['NG50'],
                                                                                        'basepairs': item['aligned_basepairs'],
                                                                                        'Ns': item['Ns'],
                                                                                        'misassembled contigs': item['misassembled_contigs'],
                                                                                        'misassembly events': item['misassembly_events']},
                                                                                       ignore_index=True)

reference_pipeline_metrics_df['distribution'] = np.where(reference_pipeline_metrics_df['sample'].isin(log_distributed), 'Log', 'Even')
reference_pipeline_metrics_df['type'] = np.where(reference_pipeline_metrics_df['assembler'].isin(genomic_assemblers), 'Genomic', 'Metagenomic')
reference_pipeline_metrics_df['algorythm'] = np.where(reference_pipeline_metrics_df['assembler'].isin(single_kmer), 'Single k-mer De Bruijn graph', 'Multiple k-mer De Bruijn graph')
reference_pipeline_metrics_df[['contigs','basepairs','L90','Ns','NA50','NG50','misassembled contigs', 'misassembly events', 'multiplicity','validity','parsimony','identity','lowest_identity']] = reference_pipeline_metrics_df[['contigs','basepairs','L90','Ns','NA50','NG50','misassembled contigs', 'misassembly events','multiplicity','validity','parsimony','identity','lowest_identity']].apply(pd.to_numeric)
display(reference_pipeline_metrics_df)

In [None]:
reference_pipeline_metrics_df.dtypes

### Save data

In [None]:
reference_pipeline_metrics_df.to_csv("Tables/Results/Reference metrics.csv")

### Reference statistics per assembler type

In [None]:
ref_stats_per_assembler_type = pd.DataFrame()
for assembly_type in reference_pipeline_metrics_df.type.unique():
    row={'Assembler Type': assembly_type}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.type == assembly_type].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_assembler_type=ref_stats_per_assembler_type.append(row, ignore_index=True)
ref_stats_per_assembler_type = ref_stats_per_assembler_type.set_index('Assembler Type')
ref_stats_per_assembler_type.to_csv("Tables/Results/Reference metrics per assembler type.csv")

In [None]:
ref_stats_per_assembler_type

### Reference stats per assembly algorithm

In [None]:
ref_stats_per_assembler_algo = pd.DataFrame()
for assembly_algo in reference_pipeline_metrics_df.algorythm.unique():
    row={'Assembler Algorithm': assembly_algo}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.algorythm == assembly_algo].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_assembler_algo=ref_stats_per_assembler_algo.append(row, ignore_index=True)
ref_stats_per_assembler_algo = ref_stats_per_assembler_algo.set_index('Assembler Algorithm')
ref_stats_per_assembler_algo.to_csv("Tables/Results/Reference metrics per assembler algorithm.csv")

In [None]:
ref_stats_per_assembler_algo

### Reference stats per assembler

In [None]:
ref_stats_per_assembler = pd.DataFrame()
for assembler in reference_pipeline_metrics_df.assembler.unique():
    row={'Assembler': assembler, 
         'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
         'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
    lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_assembler=ref_stats_per_assembler.append(row, ignore_index=True)
ref_stats_per_assembler = ref_stats_per_assembler.set_index('Assembler')
ref_stats_per_assembler.to_csv("Tables/Results/Reference metrics per assembler.csv")

In [None]:
ref_stats_per_assembler

### Reference stats per reference replicon

#### Per assembler for each reference replicon

In [None]:
reference_pipeline_metrics_df.reference.unique()

In [None]:
for reference in reference_pipeline_metrics_df.reference.unique():
    ref_stats_per_reference_per_assembler = pd.DataFrame()
    for assembler in reference_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.assembler == assembler) & (reference_pipeline_metrics_df.reference == reference)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            minimum = lala.loc['min',column]
            maximum = lala.loc['max',column]
            row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
        ref_stats_per_reference_per_assembler=ref_stats_per_reference_per_assembler.append(row, ignore_index=True)
    ref_stats_per_reference_per_assembler = ref_stats_per_reference_per_assembler.set_index('Assembler')
    ref_stats_per_reference_per_assembler.to_csv("Tables/Results/{} - Reference metrics per assembler.csv".format(reference))
    print(reference)
    display(ref_stats_per_reference_per_assembler)

#### Per reference replicon for all assemblers - Even only!

In [None]:
ref_stats_per_reference_all_assemblers = pd.DataFrame()
for reference in reference_pipeline_metrics_df.reference.unique():
    row={'Reference': reference}
    lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.reference == reference) & (reference_pipeline_metrics_df.distribution == 'Even')].describe()
    for column in lala.columns:
        mean = lala.loc['mean',column]
        minimum = lala.loc['min',column]
        maximum = lala.loc['max',column]
        row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    ref_stats_per_reference_all_assemblers=ref_stats_per_reference_all_assemblers.append(row, ignore_index=True)
ref_stats_per_reference_all_assemblers = ref_stats_per_reference_all_assemblers.set_index('Reference')
ref_stats_per_reference_all_assemblers.to_csv("Tables/Results/Reference metrics per replicon.csv")
display(ref_stats_per_reference_all_assemblers)

### Reference stats per distribution

In [None]:
for distribution in reference_pipeline_metrics_df.distribution.unique():
    ref_stats_per_reference_per_assembler = pd.DataFrame()
    for assembler in reference_pipeline_metrics_df.assembler.unique():
        row={'Assembler': assembler, 
             'Type':reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].type.unique()[0],
             'Algorithm': reference_pipeline_metrics_df[reference_pipeline_metrics_df.assembler == assembler].algorythm.unique()[0]}
        lala=reference_pipeline_metrics_df[(reference_pipeline_metrics_df.assembler == assembler) & (reference_pipeline_metrics_df.distribution == distribution)].describe()
        for column in lala.columns:
            mean = lala.loc['mean',column]
            minimum = lala.loc['min',column]
            maximum = lala.loc['max',column]
            row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
        ref_stats_per_reference_per_assembler=ref_stats_per_reference_per_assembler.append(row, ignore_index=True)
    ref_stats_per_reference_per_assembler = ref_stats_per_reference_per_assembler.set_index('Assembler')
    ref_stats_per_reference_per_assembler.to_csv("Tables/Results/{} - Reference metrics per assembler.csv".format(distribution))
    print(distribution)
    display(ref_stats_per_reference_per_assembler)

### Genomes vs Plasmids

In [None]:
ref_stats_per_replicon_type = pd.DataFrame()
# Genome
row={'Replicon': "Genome"}
lala=reference_pipeline_metrics_df[~reference_pipeline_metrics_df['reference'].str.contains("plasmid")].describe()
for column in lala.columns:
    mean = lala.loc['mean',column]
    std = lala.loc['std',column]
    row[column] = "{}(± {})".format(round(mean, 2), round(std, 2))
ref_stats_per_replicon_type =ref_stats_per_replicon_type .append(row, ignore_index=True)

# Plasmid
row={'Replicon': "Plasmid"}
lala=reference_pipeline_metrics_df[reference_pipeline_metrics_df['reference'].str.contains("plasmid")].describe()
for column in lala.columns:
    mean = lala.loc['mean',column]
    minimum = lala.loc['min',column]
    maximum = lala.loc['max',column]
    row[column] = "{} [{};{}]".format(round(mean, 2),int(minimum), int(maximum))
    
ref_stats_per_replicon_type =ref_stats_per_replicon_type .append(row, ignore_index=True)
ref_stats_per_replicon_type  = ref_stats_per_replicon_type .set_index('Replicon')
ref_stats_per_replicon_type.to_csv("Tables/Results/Reference metrics per replicon type.csv")
display(ref_stats_per_replicon_type)