# Creating Big Table with Quality Metrics

The purpose of this notebook is to create a big notebook that combine the bin names and the following quality metrics:
1. Completeness
2. Contamination
3. N50
4. L50
5. \# contigs
6. Largest contig
7. Total length

This table will be useful when we have to prioritize the taxa we want to look into: for example, it's better to select those that have higher completeness and lower contamination.

In [1]:
import pandas as pd

# Relevant columns
checkm_cols = ["Bin Id", "Completeness", "Contamination"]
quast_cols = ["Assembly", "# contigs", "Largest contig", "Total length", "N50", "L50"]

# Open Kaloevig CheckM and Quast datasets
kaloevig_checkm = pd.read_table(
    "results_final_kaloevig_bins_checkm.tsv", usecols=checkm_cols
)
kaloevig_quast = pd.read_table("report_kaloevig_quast_T.tsv", usecols=quast_cols)

# Open Loegten CheckM and Quast datasets
loegten_checkm = pd.read_table(
    "results_final_loegten_bins_checkm.tsv", usecols=checkm_cols
)
loegten_quast = pd.read_table("report_loegten_quast_T.tsv", usecols=quast_cols)

# Rename Quast Assembly column to Bin Id to match with CheckM datasets
kaloevig_quast = kaloevig_quast.rename(columns={"Assembly": "Bin Id"})
loegten_quast = loegten_quast.rename(columns={"Assembly": "Bin Id"})

In [2]:
### Merge datasets ###
# Kaloevig
kaloevig_quality_table = kaloevig_checkm.merge(kaloevig_quast, on=["Bin Id"])

# Loegten
loegten_quality_table = loegten_checkm.merge(loegten_quast, on=["Bin Id"])

# Save datasets to csv files
kaloevig_quality_table.to_csv("kaloevig_quality_table.csv", index=False)
loegten_quality_table.to_csv("loegten_quality_table.csv", index=False)