# Compute Bin Abundances for Kalø Vig and Løgten

The purpose of this notebook is to compute bin abundances. The abundance can be computed through the proxy of the contig depths computed by MetaBAT 2 (option `--jgi_summarize_bam_contig_depths`). For each bin, the mean value of depth of all the contigs contained in that bin is reported. Next, the resulting dataset is merged with another dataset containing quality metrics.

In [1]:
import os
import re
from pathlib import Path

import pandas as pd

In [2]:
# Open contig depth files generated by MetaBAT 2
kaloevig_contig_depth = pd.read_table(
    "kaloevig_bins_contig_depths/kaloevig_bam_contig_depths.tsv",
    usecols=["contigName", "totalAvgDepth"],
)
loegten_contig_depth = pd.read_table(
    "loegten_bins_contig_depths/loegten_bam_contig_depths.tsv",
    usecols=["contigName", "totalAvgDepth"],
)

In [3]:
# Remove S1C from contig names to match them with bin contig names
kaloevig_contig_depth["contigName"] = kaloevig_contig_depth["contigName"].str[3:]
loegten_contig_depth["contigName"] = loegten_contig_depth["contigName"].str[3:]

In [4]:
# Convert txt files into tsv files for merging
for file in os.listdir("kaloevig_contig_names/"):
    p = Path("kaloevig_contig_names/" + file)
    p.rename(p.with_suffix(".tsv"))

# Convert txt files into tsv files for merging
for file in os.listdir("loegten_contig_names/"):
    p = Path("loegten_contig_names/" + file)
    p.rename(p.with_suffix(".tsv"))

In [5]:
def compute_depth(contigs_path, depth_file):
    """
    Take bin contigs from a path, merge them with a depth file,
    and return a dataframe with bin ids and their  average depths (i.e., abundances).

    Parameters
    ----------
    contig_path : str
        Directory with tsv files for each bin that contain contig ids of that bin.
    depth_file : pd.DataFrame
        DataFrame containing contig names and their average total depth.

    Returns
    -------
    pd.DataFrame
        Dataframe with bin ids and their relative depths (i.e., abundances).
    """
    # Pattern to extract bins names from filenames
    pattern = re.compile(".+?(?=.fa_contig_names)")

    # Lists to store values for dataframe
    bin_ids = []
    avg_depth = []

    for tsv in os.listdir(contigs_path):
        contig_names = pd.read_table(contigs_path + tsv, names=["contigName"])

        # Merge contig names with Average Depth file
        merged = contig_names.merge(depth_file)

        # Append values to lists that will be used to populate the dataframe
        bin_ids.append(re.match(pattern, tsv).group(0))
        avg_depth.append(merged["totalAvgDepth"].mean())

    # DataFrame to save bin names and average depth (i.e., abundance of the bin)
    df = pd.DataFrame({"Bin Id": bin_ids, "AvgDepth": avg_depth})
    return df

In [6]:
kaloevig_abund = compute_depth("kaloevig_contig_names/", kaloevig_contig_depth)
loegten_abund = compute_depth("loegten_contig_names/", loegten_contig_depth)

Now merge these tables with big quality tables.

In [7]:
# Open quality tables
kaloevig_quality = pd.read_csv(
    "../../../taxonomy/results/2022-04-06/kaloevig_taxa_quality_table.csv"
)
loegten_quality = pd.read_csv(
    "../../../taxonomy/results/2022-04-06/loegten_taxa_quality_table.csv"
)

# Merge
kaloevig_merged = kaloevig_quality.merge(kaloevig_abund)
loegten_merged = loegten_quality.merge(loegten_abund)

# Save to csv files
kaloevig_merged.to_csv("kaloevig_taxa_quality_table.csv", index=False)
loegten_merged.to_csv("loegten_taxa_quality_table.csv", index=False)