# Generate df_stats

In this notebook, I will collect the general statistics reported in the manuscript, from various sources.

Important note: for every % of reads displayed here, the percentage is calculated using the input number of reads for that given process. For example, imagine we start with 1000 reads. Then, 100 reads are filtered out due to some process, leaving 900. The survival rate is 90%. Then, these 900 reads are piped to the next process, which filters out another 200 reads. The survival rate is now 700/900 = 77.77%, and NOT 700/1000 = 70%!  
  
The reason for this choice is simple: It will be easier to isolate and compare a specific percentage between all methods without other filtering steps obfuscating the differences. If one is interested in the total number of reads surviving several steps, it's simply an issue of chaining each process's filtering rates together.

In [1]:
import pandas as pd
import polars as pl
import glob
import numpy as np
import os
import pickle

%load_ext lab_black

In [2]:
verbose = False

In [3]:
downsample_rate = "fixedcells"

I start by initializing a large data frame that will have all the stats.

In [4]:
samples = [
    x.split("/")[-1].split("__")[0]
    for x in sorted(glob.glob("../1_data_repository/libds_fastq/*R1*"))
]

In [5]:
if verbose == True:
    print(samples)

In [6]:
# df = pd.read_csv(samples)  # read the metadata file to know exactly which files were ran in the pipeline
# df_sample_names = df["sample_name"]
# if os.path.exists("fixedcells_general_statistics.tsv"):
#     df_stats = pd.read_csv("fixedcells_general_statistics.tsv", sep="\t", index_col=0)
# else:
df_stats = pd.DataFrame(index=samples)

Initialize some metadata

In [7]:
centre_dict = {
    "BIO": "BioRad",
    "BRO": "Broad",
    "CNA": "CNAG",
    "HAR": "Harvard",
    "MDC": "Max Delbrück Center",
    "SAN": "Wellcome Sanger Institute",
    "STA": "Stanford",
    "TXG": "10x Genomics",
    "UCS": "UCSF",
    "VIB": "VIB",
    "EPF": "EPFL",
    "OHS": "Oregon Health & Science University",
}

sample_id_ultrashort_alias_dict = {
    "BIO_ddseq_1": "ddS Bi1",
    "BIO_ddseq_2": "ddS Bi2",
    "BIO_ddseq_3": "ddS Bi3",
    "BIO_ddseq_4": "ddS Bi4",
    "BRO_mtscatac_1": "mt* Br1",
    "BRO_mtscatac_2": "mt* Br2",
    "CNA_10xmultiome_1": "MO C1",
    "CNA_10xmultiome_2": "MO C2",
    "CNA_10xv11_1": "v1.1 C1",
    "CNA_10xv11_2": "v1.1 C2",
    "CNA_10xv11_3": "v1.1 C3",
    "CNA_10xv11_4": "v1.1c C1",
    "CNA_10xv11_5": "v1.1c C2",
    "CNA_10xv2_1": "v2 C1",
    "CNA_10xv2_2": "v2 C2",
    "CNA_hydrop_1": "Hy C1",
    "CNA_hydrop_2": "Hy C2",
    "CNA_hydrop_3": "Hy C3",
    "CNA_mtscatac_1": "mt C1",
    "CNA_mtscatac_2": "mt C2",
    "EPF_hydrop_1": "Hy E1",
    "EPF_hydrop_2": "Hy E2",
    "EPF_hydrop_3": "Hy E3",
    "EPF_hydrop_4": "Hy E4",
    "HAR_ddseq_1": "ddS H1",
    "HAR_ddseq_2": "ddS H2",
    "MDC_mtscatac_1": "mt M1",
    "MDC_mtscatac_2": "mt M2",
    "OHS_s3atac_1": "s3 O1",
    "OHS_s3atac_2": "s3 O2",
    "SAN_10xmultiome_1": "MO Sa1",
    "SAN_10xmultiome_2": "MO Sa2",
    "STA_10xv11_1": "v1.1 St1",
    "STA_10xv11_2": "v1.1 St2",
    "TXG_10xv11_1": "v1.1 T1",
    "TXG_10xv2_1": "v2 T1",
    "TXG_10xv2_2": "v2 T2",
    "UCS_ddseq_1": "ddS U1",
    "UCS_ddseq_2": "ddS U2",
    "VIB_10xmultiome_1": "MO V1",
    "VIB_10xmultiome_2": "MO V2",
    "VIB_10xv1_1": "v1 V1",
    "VIB_10xv1_2": "v1 V2",
    "VIB_10xv2_1": "v2 V1",
    "VIB_10xv2_2": "v2 V2",
    "VIB_hydrop_1": "Hy V1",
    "VIB_hydrop_2": "Hy V2",
    "VIB_hydrop_11": "Hy V1",
    "VIB_hydrop_12": "Hy V1",
    "VIB_hydrop_21": "Hy V2",
    "VIB_hydrop_22": "Hy V2",
}

In [8]:
tech_dict = {
    "10xmultiome": "10x Multiome",
    "10xv1": "10x v1",
    "10xv11": "10x v1.1",
    "10xv2": "10x v2",
    "ddseq": "ddSEQ SureCell",
    "hydrop": "HyDrop",
    "mtscatac": "mtscATAC-seq",
    "s3atac": "s3-ATAC",
}

tech_alias_dict = {
    "10xmultiome": "10x Multiome",
    "10xv1": "10x v1",
    "10xv11": "10x v1.1",
    "10xv2": "10x v2",
    "ddseq": "Bio-Rad ddSEQ SureCell",
    "hydrop": "HyDrop",
    "mtscatac": "mtscATAC-seq",
    "s3atac": "s3-ATAC",
}

In [9]:
df_stats["short_identifier"] = [
    sample_id_ultrashort_alias_dict[x] for x in df_stats.index
]

In [10]:
df_stats["centre"] = [
    centre_dict[x] for x in np.array([x.split("_") for x in list(df_stats.index)])[:, 0]
]

In [11]:
df_stats["technology"] = [
    tech_alias_dict[x]
    for x in np.array([x.split("_") for x in list(df_stats.index)])[:, 1]
]

## Determine sequencer
I use this script: https://github.com/10XGenomics/supernova/blob/master/tenkit/lib/python/tenkit/illumina_instrument.py  
Very cool! Especially the in-built dict is nice:

```
# dictionary of instrument id regex: [platform(s)]
InstrumentIDs = {"HWI-M[0-9]{4}$" : ["MiSeq"],
        "HWUSI" : ["Genome Analyzer IIx"],
        "M[0-9]{5}$" : ["MiSeq"],
        "HWI-C[0-9]{5}$" : ["HiSeq 1500"],
        "C[0-9]{5}$" : ["HiSeq 1500"],
        "HWI-D[0-9]{5}$" : ["HiSeq 2500"],
        "D[0-9]{5}$" : ["HiSeq 2500"],
        "J[0-9]{5}$" : ["HiSeq 3000"],
        "K[0-9]{5}$" : ["HiSeq 3000","HiSeq 4000"],
        "E[0-9]{5}$" : ["HiSeq X"],
        "NB[0-9]{6}$": ["NextSeq"],
        "NS[0-9]{6}$" : ["NextSeq"],
        "MN[0-9]{5}$" : ["MiniSeq"]}

# dictionary of flow cell id regex: ([platform(s)], flow cell version and yeild)
FCIDs = {"C[A-Z,0-9]{4}ANXX$" : (["HiSeq 1500", "HiSeq 2000", "HiSeq 2500"], "High Output (8-lane) v4 flow cell"),
         "C[A-Z,0-9]{4}ACXX$" : (["HiSeq 1000", "HiSeq 1500", "HiSeq 2000", "HiSeq 2500"], "High Output (8-lane) v3 flow cell"),
         "H[A-Z,0-9]{4}ADXX$" : (["HiSeq 1500", "HiSeq 2500"], "Rapid Run (2-lane) v1 flow cell"),
         "H[A-Z,0-9]{4}BCXX$" : (["HiSeq 1500", "HiSeq 2500"], "Rapid Run (2-lane) v2 flow cell"),
         "H[A-Z,0-9]{4}BCXY$" : (["HiSeq 1500", "HiSeq 2500"], "Rapid Run (2-lane) v2 flow cell"),
         "H[A-Z,0-9]{4}BBXX$" : (["HiSeq 4000"], "(8-lane) v1 flow cell"),
         "H[A-Z,0-9]{4}BBXY$" : (["HiSeq 4000"], "(8-lane) v1 flow cell"),
         "H[A-Z,0-9]{4}CCXX$" : (["HiSeq X"], "(8-lane) flow cell"),
         "H[A-Z,0-9]{4}CCXY$" : (["HiSeq X"], "(8-lane) flow cell"),
         "H[A-Z,0-9]{4}ALXX$" : (["HiSeq X"], "(8-lane) flow cell"),
         "H[A-Z,0-9]{4}BGXX$" : (["NextSeq"], "High output flow cell"),
         "H[A-Z,0-9]{4}BGXY$" : (["NextSeq"], "High output flow cell"),
         "H[A-Z,0-9]{4}BGX2$" : (["NextSeq"], "High output flow cell"),
         "H[A-Z,0-9]{4}AFXX$" : (["NextSeq"], "Mid output flow cell"),
         "A[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq flow cell"),
         "B[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq flow cell"),
         "D[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq nano flow cell"),
         "G[A-Z,0-9]{4}$" : (["MiSeq"], "MiSeq micro flow cell"),
         "H[A-Z,0-9]{4}DMXX$" : (["NovaSeq"], "S2 flow cell")}

```

In [12]:
import gzip
import re

# dictionary of instrument id regex: [platform(s)]
InstrumentIDs = {
    "HWI-M[0-9]{4}$": ["MiSeq"],
    "HWUSI": ["Genome Analyzer IIx"],
    "M[0-9]{5}$": ["MiSeq"],
    "HWI-C[0-9]{5}$": ["HiSeq 1500"],
    "C[0-9]{5}$": ["HiSeq 1500"],
    "HWI-D[0-9]{5}$": ["HiSeq 2500"],
    "D[0-9]{5}$": ["HiSeq 2500"],
    "J[0-9]{5}$": ["HiSeq 3000"],
    "K[0-9]{5}$": ["HiSeq 3000", "HiSeq 4000"],
    "E[0-9]{5}$": ["HiSeq X"],
    "NB[0-9]{6}$": ["NextSeq 500/550"],
    "NS[0-9]{6}$": ["NextSeq 500/550"],
    "MN[0-9]{5}$": ["MiniSeq"],
    "N[0-9]{5}$": ["NextSeq 500/550"],  # added since original was outdated
    "A[0-9]{5}$": ["NovaSeq 6000"],  # added since original was outdated
    "V[0-9]{5}$": ["NextSeq 2000"],  # added since original was outdated
    "VH[0-9]{5}$": ["NextSeq 2000"],  # added since original was outdated
}

# dictionary of flow cell id regex: ([platform(s)], flow cell version and yeild)
FCIDs = {
    "C[A-Z,0-9]{4}ANXX$": (
        ["HiSeq 1500", "HiSeq 2000", "HiSeq 2500"],
        "High Output (8-lane) v4 flow cell",
    ),
    "C[A-Z,0-9]{4}ACXX$": (
        ["HiSeq 1000", "HiSeq 1500", "HiSeq 2000", "HiSeq 2500"],
        "High Output (8-lane) v3 flow cell",
    ),
    "H[A-Z,0-9]{4}ADXX$": (
        ["HiSeq 1500", "HiSeq 2500"],
        "Rapid Run (2-lane) v1 flow cell",
    ),
    "H[A-Z,0-9]{4}BCXX$": (
        ["HiSeq 1500", "HiSeq 2500"],
        "Rapid Run (2-lane) v2 flow cell",
    ),
    "H[A-Z,0-9]{4}BCXY$": (
        ["HiSeq 1500", "HiSeq 2500"],
        "Rapid Run (2-lane) v2 flow cell",
    ),
    "H[A-Z,0-9]{4}BBXX$": (["HiSeq 4000"], "(8-lane) v1 flow cell"),
    "H[A-Z,0-9]{4}BBXY$": (["HiSeq 4000"], "(8-lane) v1 flow cell"),
    "H[A-Z,0-9]{4}CCXX$": (["HiSeq X"], "(8-lane) flow cell"),
    "H[A-Z,0-9]{4}CCXY$": (["HiSeq X"], "(8-lane) flow cell"),
    "H[A-Z,0-9]{4}ALXX$": (["HiSeq X"], "(8-lane) flow cell"),
    "H[A-Z,0-9]{4}BGXX$": (["NextSeq"], "High output flow cell"),
    "H[A-Z,0-9]{4}BGXY$": (["NextSeq"], "High output flow cell"),
    "H[A-Z,0-9]{4}BGX2$": (["NextSeq"], "High output flow cell"),
    "H[A-Z,0-9]{4}AFXX$": (["NextSeq"], "Mid output flow cell"),
    "A[A-Z,0-9]{4}$": (["MiSeq"], "MiSeq flow cell"),
    "B[A-Z,0-9]{4}$": (["MiSeq"], "MiSeq flow cell"),
    "D[A-Z,0-9]{4}$": (["MiSeq"], "MiSeq nano flow cell"),
    "G[A-Z,0-9]{4}$": (["MiSeq"], "MiSeq micro flow cell"),
    "H[A-Z,0-9]{4}DMXX$": (["NovaSeq"], "S2 flow cell"),
}


SUPERNOVA_PLATFORM_BLACKLIST = ["HiSeq 3000", "HiSeq 4000", "HiSeq 3000/4000"]

_upgrade_set1 = set(["HiSeq 2000", "HiSeq 2500"])
_upgrade_set2 = set(["HiSeq 1500", "HiSeq 2500"])
_upgrade_set3 = set(["HiSeq 3000", "HiSeq 4000"])
_upgrade_set4 = set(["HiSeq 1000", "HiSeq 1500"])
_upgrade_set5 = set(["HiSeq 1000", "HiSeq 2000"])

fail_msg = "Cannot determine sequencing platform"
success_msg_template = "(likelihood: {})"
null_template = "{}"

# do intersection of lists
def intersect(a, b):
    return list(set(a) & set(b))


def union(a, b):
    return list(set(a) | set(b))


# extract ids from reads
def parse_readhead(head):
    fields = head.strip("\n").split(":")

    # if ill-formatted/modified non-standard header, return cry-face
    if len(fields) < 3:
        return -1, -1
    iid = fields[0][1:]
    fcid = fields[2]
    return iid, fcid


# infer sequencer from ids from single fastq
def infer_sequencer(iid, fcid):
    seq_by_iid = []
    for key in InstrumentIDs:
        if re.search(key, iid):
            seq_by_iid += InstrumentIDs[key]

    seq_by_fcid = []
    for key in FCIDs:
        if re.search(key, fcid):
            seq_by_fcid += FCIDs[key][0]

    sequencers = []

    # if both empty
    if not seq_by_iid and not seq_by_fcid:
        return sequencers, "fail"

    # if one non-empty
    if not seq_by_iid:
        return seq_by_fcid, "likely"
    if not seq_by_fcid:
        return seq_by_iid, "likely"

    # if neither empty
    sequencers = intersect(seq_by_iid, seq_by_fcid)
    if sequencers:
        return sequencers, "high"
    # this should not happen, but if both ids indicate different sequencers..
    else:
        sequencers = union(seq_by_iid, seq_by_fcid)
        return sequencers, "uncertain"


# process the flag and detected sequencer(s) for single fastq
def infer_sequencer_with_message(iid, fcid):
    sequencers, flag = infer_sequencer(iid, fcid)
    if not sequencers:
        return [""], fail_msg

    if flag == "high":
        msg_template = null_template
    else:
        msg_template = success_msg_template

    if set(sequencers) <= _upgrade_set1:
        return ["HiSeq2000/2500"], msg_template.format(flag)
    if set(sequencers) <= _upgrade_set2:
        return ["HiSeq1500/2500"], msg_template.format(flag)
    if set(sequencers) <= _upgrade_set3:
        return ["HiSeq3000/4000"], msg_template.format(flag)
    return sequencers, msg_template.format(flag)


def test_sequencer_detection():
    Samples = [
        "@ST-E00314:132:HLCJTCCXX:6:2206:31213:47966 1:N:0",
        "@D00209:258:CACDKANXX:6:2216:1260:1978 1:N:0:CGCAGTT",
        "@D00209:258:CACDKANXX:6:2216:1586:1970 1:N:0:GAGCAAG",
        "@A00311:74:HMLK5DMXX:1:1101:2013:1000 3:N:0:ACTCAGAC",
    ]

    seqrs = set()
    for head in Samples:
        iid, fcid = parse_readhead(head)
        seqr, msg = infer_sequencer_with_message(iid, fcid)
        for sr in seqr:
            signal = (sr, msg)
        seqrs.add(signal)

    print(seqrs)


def sequencer_detection_message(fastq_files):
    seqrs = set()
    # accumulate (sequencer, status) set
    for fastq in fastq_files:
        with gzip.open(fastq) as f:
            head = str(f.readline())
            # line = str(f.readline()
            # if len(line) > 0:
            #     if line[0] == "@":
            #         head = line
            #     else:
            #         print("Incorrectly formatted first read in FASTQ file: %s" % fastq)
            #         print(line)

        iid, fcid = parse_readhead(head)
        seqr, msg = infer_sequencer_with_message(iid, fcid)
        for sr in seqr:
            signal = (sr, msg)
        seqrs.add(signal)

    # get a list of sequencing platforms
    platforms = set()
    for platform, _ in seqrs:
        platforms.add(platform)
    sequencers = list(platforms)

    # if no sequencer detected at all
    message = ""
    fails = 0
    for platform, status in seqrs:
        if status == fail_msg:
            fails += 1
    if fails == len(seqrs):
        message = "could not detect the sequencing platform(s) used to generate the input FASTQ files"
        return message, sequencers

    # if partial or no detection failures
    if fails > 0:
        message = "could not detect the sequencing platform used to generate some of the input FASTQ files, "
    message += "detected the following sequencing platforms- "
    for platform, status in seqrs:
        if status != fail_msg:
            message += platform + " " + status + ", "
    message = message.strip(", ")
    return message, sequencers

In [13]:
sequencers_dict = {}
for file in sorted(list(glob.glob("../1_data_repository/libds_fastq/*R2*.fastq.gz"))):
    filename = file.split("/")[-1]
    samplename = filename.split(".")[0].split("__")[0]
    if (
        samplename.split("_")[1] == "s3atac"
    ):  # s3atac fastqs have altered readname due to unidex preprocessing, the original fastqs show that they were sequenced on NS2000.
        sequencers = ["NovaSeq 6000"]
    else:
        message, sequencers = sequencer_detection_message([file])
    print(f"{filename}: {sequencers}")

    sequencers_dict[samplename] = sequencers[0]

BIO_ddseq_1__R2.LIBDS.fastq.gz: ['NovaSeq']
BIO_ddseq_2__R2.LIBDS.fastq.gz: ['NovaSeq']
BIO_ddseq_3__R2.LIBDS.fastq.gz: ['NextSeq 500/550']
BIO_ddseq_4__R2.LIBDS.fastq.gz: ['NextSeq 500/550']
BRO_mtscatac_1__R2.LIBDS.fastq.gz: ['NextSeq 500/550']
BRO_mtscatac_2__R2.LIBDS.fastq.gz: ['NextSeq 500/550']
CNA_10xmultiome_1__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xmultiome_2__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv11_1__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv11_2__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv11_3__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv11_4__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv11_5__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv2_1__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_10xv2_2__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_hydrop_1__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_hydrop_2__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_hydrop_3__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_mtscatac_1__R2.LIBDS.fastq.gz: ['NovaSeq 6000']
CNA_mtscatac_2__R2.LIBDS.fastq.gz: ['N

In [14]:
df_stats["sequencing_instrument"] = df_stats.index.map(sequencers_dict)

## Determine read counts

In [15]:
read_count_df = pd.DataFrame()
for readfile in sorted(glob.glob("../1_data_repository/R*_lengths.LIBDS.sorted.txt")):
    read = readfile.split("/")[-1].split("_")[0]
    df = pd.read_csv(readfile, sep="\t", header=None)
    df.index = [x.split("/")[-1].split("__")[0] for x in df[0]]
    df = df[1]
    read_count_df[read] = df

In [16]:
if verbose == True:
    print(read_count_df)

In [17]:
read_dict = {}
for line in read_count_df.index:
    if read_count_df.isna().loc[line]["R3"]:
        if read_count_df.loc[line]["R1"] == read_count_df.loc[line]["R2"]:
            print(f"{line} OK")
            read_dict[line] = read_count_df.loc[line]["R1"]
        else:
            print(f"{line} has read discrepancy!")
            print(f'\t{read_count_df.loc[line]["R1"]}')
            print(f'\t{read_count_df.loc[line]["R2"]}')
    else:
        if (
            read_count_df.loc[line]["R1"]
            == read_count_df.loc[line]["R2"]
            == read_count_df.loc[line]["R3"]
        ):
            print(f"{line} OK")
            read_dict[line] = read_count_df.loc[line]["R1"]
        else:
            print(f"{line} has read discrepancy!")
            print(f'\t{read_count_df.loc[line]["R1"]}')
            print(f'\t{read_count_df.loc[line]["R2"]}')
            print(f'\t{read_count_df.loc[line]["R3"]}')

BIO_ddseq_1 OK
BIO_ddseq_2 OK
BIO_ddseq_3 OK
BIO_ddseq_4 OK
BRO_mtscatac_1 OK
BRO_mtscatac_2 OK
CNA_10xmultiome_1 OK
CNA_10xmultiome_2 OK
CNA_10xv11_1 OK
CNA_10xv11_2 OK
CNA_10xv11_3 OK
CNA_10xv11_4 OK
CNA_10xv11_5 OK
CNA_10xv2_1 OK
CNA_10xv2_2 OK
CNA_hydrop_1 OK
CNA_hydrop_2 OK
CNA_hydrop_3 OK
CNA_mtscatac_1 OK
CNA_mtscatac_2 OK
EPF_hydrop_1 OK
EPF_hydrop_2 OK
EPF_hydrop_3 OK
EPF_hydrop_4 OK
HAR_ddseq_1 OK
HAR_ddseq_2 OK
MDC_mtscatac_1 OK
MDC_mtscatac_2 OK
OHS_s3atac_1 OK
OHS_s3atac_2 OK
SAN_10xmultiome_1 OK
SAN_10xmultiome_2 OK
STA_10xv11_1 OK
STA_10xv11_2 OK
TXG_10xv11_1 OK
TXG_10xv2_1 OK
TXG_10xv2_2 OK
UCS_ddseq_1 OK
UCS_ddseq_2 OK
VIB_10xmultiome_1 OK
VIB_10xmultiome_2 OK
VIB_10xv1_1 OK
VIB_10xv1_2 OK
VIB_10xv2_1 OK
VIB_10xv2_2 OK
VIB_hydrop_11 OK
VIB_hydrop_12 OK
VIB_hydrop_21 OK
VIB_hydrop_22 OK


In [18]:
df_stats["reads"] = df_stats.index.map(read_dict)

In [19]:
if verbose == True:
    print(df_stats)

# Determine cell counts

In [20]:
tsv_list = sorted(
    glob.glob(f"../{downsample_rate}_2_cistopic/selected_barcodes/*.RAW.txt")
)
cell_count_dict = {}
for tsv in tsv_list:
    sample = tsv.split("/")[-1].split(".")[0].split("_metadata_bc_df")[0]
    # print(sample)
    df = pd.read_csv(tsv, sep="\t", index_col=0)
    cell_count_dict[sample] = len(df)

df_stats["cells"] = df_stats.index.map(cell_count_dict)
df_stats["cells"]

BIO_ddseq_1           6359
BIO_ddseq_2           5159
BIO_ddseq_3           2801
BIO_ddseq_4           2649
BRO_mtscatac_1        3575
BRO_mtscatac_2        3398
CNA_10xmultiome_1     3536
CNA_10xmultiome_2     3122
CNA_10xv11_1          2733
CNA_10xv11_2          2785
CNA_10xv11_3          4603
CNA_10xv11_4           803
CNA_10xv11_5          1111
CNA_10xv2_1           4179
CNA_10xv2_2           5994
CNA_hydrop_1          1770
CNA_hydrop_2          2177
CNA_hydrop_3          1417
CNA_mtscatac_1        2633
CNA_mtscatac_2        1169
EPF_hydrop_1          3251
EPF_hydrop_2          3104
EPF_hydrop_3          2863
EPF_hydrop_4          2856
HAR_ddseq_1           4661
HAR_ddseq_2           5079
MDC_mtscatac_1        7990
MDC_mtscatac_2        5628
OHS_s3atac_1          3120
OHS_s3atac_2          1787
SAN_10xmultiome_1     3492
SAN_10xmultiome_2     4269
STA_10xv11_1           934
STA_10xv11_2          1571
TXG_10xv11_1          9933
TXG_10xv2_1           9977
TXG_10xv2_2          10005
U

Now, calculate the RPC:

In [21]:
df_stats["RPC"] = df_stats["reads"] / df_stats["cells"]

## Determine sequence lengths

In [22]:
## Determine average insert length

### Barcode correction stats

In [23]:
directory = "../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/barcode/"
for sample in df_stats.index:
    file = directory + sample + "_____R1.corrected.bc_stats.log"
    if os.path.exists(file):
        # print(f"{sample}: {file}")
        df = pd.read_csv(file, sep="\t\t|\t", engine="python", index_col=0, header=None)
        # print(df)
        if "ddseq" in sample:
            nreads = df.loc["nbr_reads:", 1]
            nbarcodes_total = df.loc[
                "nbr_reads_with_bc1_bc2_bc3_correct_or_correctable", 1
            ]
            percentage_correct_barcodes = nbarcodes_total / nreads * 100
        else:
            nreads = df.loc["nbr_reads:", 1]
            nbarcodes_total = df.loc["total_bc_found", 1]
            percentage_correct_barcodes = nbarcodes_total / nreads * 100

        if verbose == True:
            print(f"nreads: {nreads}")
            print(f"nbarcodes_total: {nbarcodes_total}")
            print(f"percentage_correct_barcodes: {percentage_correct_barcodes}")
            print("-------------------------------------\n")

        df_stats.loc[sample, "nreads"] = int(nreads)
        df_stats.loc[sample, "%_correct_barcodes"] = round(
            percentage_correct_barcodes, 2
        )
    else:
        print(f"{file} does not exist!")

In [24]:
df_stats

Unnamed: 0,short_identifier,centre,technology,sequencing_instrument,reads,cells,RPC,nreads,%_correct_barcodes
BIO_ddseq_1,ddS Bi1,BioRad,Bio-Rad ddSEQ SureCell,NovaSeq,259455742.0,6359,40801.343293,259455742.0,96.54
BIO_ddseq_2,ddS Bi2,BioRad,Bio-Rad ddSEQ SureCell,NovaSeq,210510517.0,5159,40804.519674,210510517.0,96.76
BIO_ddseq_3,ddS Bi3,BioRad,Bio-Rad ddSEQ SureCell,NextSeq 500/550,114310431.0,2801,40810.578722,114310431.0,87.47
BIO_ddseq_4,ddS Bi4,BioRad,Bio-Rad ddSEQ SureCell,NextSeq 500/550,108112191.0,2649,40812.454134,108112191.0,88.1
BRO_mtscatac_1,mt* Br1,Broad,mtscATAC-seq,NextSeq 500/550,145886207.0,3575,40807.330629,145886207.0,94.66
BRO_mtscatac_2,mt* Br2,Broad,mtscATAC-seq,NextSeq 500/550,138669154.0,3398,40809.050618,138669154.0,94.79
CNA_10xmultiome_1,MO C1,CNAG,10x Multiome,NovaSeq 6000,144290077.0,3536,40806.017251,144290077.0,98.31
CNA_10xmultiome_2,MO C2,CNAG,10x Multiome,NovaSeq 6000,127397732.0,3122,40806.44843,127397732.0,98.4
CNA_10xv11_1,v1.1 C1,CNAG,10x v1.1,NovaSeq 6000,111532272.0,2733,40809.46652,111532272.0,98.19
CNA_10xv11_2,v1.1 C2,CNAG,10x v1.1,NovaSeq 6000,113657151.0,2785,40810.467145,113657151.0,98.17


## Mapping statistics

In [25]:
directory = (
    "../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/"
)

for sample in df_stats.index:
    file = directory + sample + "_____R1.mapping_stats.tsv"
    if os.path.exists(file):
        print(f"{sample}: {file}")
        df = pd.read_csv(file, sep="\t", engine="python", index_col=0, header=0)
        if verbose == True:
            print(df.astype(int))
            print("\n")

        percent_mapq30 = (
            df.loc["Reads mapped with MAPQ>30:"] / df.loc["raw total sequences:"] * 100
        )
        avg_insert = df.loc["insert size average:"]
        avg_map_quality = df.loc["average quality:"]
        r1_length = df.loc["maximum first fragment length:"]
        r2_length = df.loc["maximum last fragment length:"]

        if verbose == True:

            print(f"read 1 length: {int(r1_length)}")
            print(f"read 2 length: {int(r2_length)}")
            print(f"average map quality: {round(avg_map_quality, 2)}")
            print(f"percent mapq30: {round(percent_mapq30, 2)}")
            print(f"insert size average: {avg_insert}")
            print("-------------------------------------\n")

        df_stats.loc[sample, "r1_length"] = int(r1_length)
        df_stats.loc[sample, "r2_length"] = int(r2_length)
        df_stats.loc[sample, "avg_insert_size"] = int(avg_insert)
        df_stats.loc[sample, "%_mapq30"] = round(percent_mapq30.iloc[0], 2)
        df_stats.loc[sample, "avg_map_quality"] = round(avg_map_quality.iloc[0], 2)
    elif verbose == True:
        print(f"{file}")

BIO_ddseq_1: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/BIO_ddseq_1_____R1.mapping_stats.tsv
BIO_ddseq_2: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/BIO_ddseq_2_____R1.mapping_stats.tsv
BIO_ddseq_3: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/BIO_ddseq_3_____R1.mapping_stats.tsv
BIO_ddseq_4: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/BIO_ddseq_4_____R1.mapping_stats.tsv
BRO_mtscatac_1: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/BRO_mtscatac_1_____R1.mapping_stats.tsv
BRO_mtscatac_2: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/BRO_mtscatac_2_____R1.mapping_stats.tsv
CNA_10xmultiome_1: ../libds_1_vsn_preprocessing/libds_preprocessing_out/data/reports/mapping_stats/CNA_10xmultiome_1_____R1.mapping_stats.tsv
CNA_10xmultiome_2: ../libds_1_vsn_preprocessing/libds_preprocessin

In [26]:
for hydrop_number in ["VIB_hydrop_1", "VIB_hydrop_2"]:
    df_stats.loc[f"{hydrop_number}"] = df_stats.loc[f"{hydrop_number}1"]

    # additive
    for var in ["reads", "nreads", "cells"]:
        df_stats.at[f"{hydrop_number}", var] = (
            df_stats.loc[f"{hydrop_number}1"][var]
            + df_stats.loc[f"{hydrop_number}2"][var]
        )

    # weighted average
    weight_1 = df_stats.loc[f"{hydrop_number}1"]["reads"] / (
        df_stats.loc[f"{hydrop_number}1"]["reads"]
        + df_stats.loc[f"{hydrop_number}2"]["reads"]
    )
    weight_2 = df_stats.loc[f"{hydrop_number}2"]["reads"] / (
        df_stats.loc[f"{hydrop_number}1"]["reads"]
        + df_stats.loc[f"{hydrop_number}2"]["reads"]
    )

    for var in ["avg_insert_size", "avg_map_quality", "%_correct_barcodes", "%_mapq30"]:
        df_stats.at[f"{hydrop_number}", var] = (
            df_stats.loc[f"{hydrop_number}1"][var] * weight_1
            + df_stats.loc[f"{hydrop_number}2"][var] * weight_2
        )

    # special
    var = "RPC"
    df_stats.at[f"{hydrop_number}", var] = (
        df_stats.loc[f"{hydrop_number}"]["reads"]
        / df_stats.loc[f"{hydrop_number}"]["cells"]
    )

In [27]:
df_stats.columns

Index(['short_identifier', 'centre', 'technology', 'sequencing_instrument',
       'reads', 'cells', 'RPC', 'nreads', '%_correct_barcodes', 'r1_length',
       'r2_length', 'avg_insert_size', '%_mapq30', 'avg_map_quality'],
      dtype='object')

In [28]:
df_stats

Unnamed: 0,short_identifier,centre,technology,sequencing_instrument,reads,cells,RPC,nreads,%_correct_barcodes,r1_length,r2_length,avg_insert_size,%_mapq30,avg_map_quality
BIO_ddseq_1,ddS Bi1,BioRad,Bio-Rad ddSEQ SureCell,NovaSeq,259455742.0,6359,40801.343293,259455742.0,96.54,53.0,40.0,172.0,90.5,36.1
BIO_ddseq_2,ddS Bi2,BioRad,Bio-Rad ddSEQ SureCell,NovaSeq,210510517.0,5159,40804.519674,210510517.0,96.76,52.0,40.0,167.0,92.14,36.1
BIO_ddseq_3,ddS Bi3,BioRad,Bio-Rad ddSEQ SureCell,NextSeq 500/550,114310431.0,2801,40810.578722,114310431.0,87.47,53.0,40.0,129.0,87.21,31.5
BIO_ddseq_4,ddS Bi4,BioRad,Bio-Rad ddSEQ SureCell,NextSeq 500/550,108112191.0,2649,40812.454134,108112191.0,88.1,54.0,40.0,129.0,87.25,31.4
BRO_mtscatac_1,mt* Br1,Broad,mtscATAC-seq,NextSeq 500/550,145886207.0,3575,40807.330629,145886207.0,94.66,72.0,72.0,138.0,87.55,33.1
BRO_mtscatac_2,mt* Br2,Broad,mtscATAC-seq,NextSeq 500/550,138669154.0,3398,40809.050618,138669154.0,94.79,72.0,72.0,137.0,87.66,33.1
CNA_10xmultiome_1,MO C1,CNAG,10x Multiome,NovaSeq 6000,144290077.0,3536,40806.017251,144290077.0,98.31,50.0,49.0,160.0,89.65,36.1
CNA_10xmultiome_2,MO C2,CNAG,10x Multiome,NovaSeq 6000,127397732.0,3122,40806.44843,127397732.0,98.4,50.0,49.0,165.0,89.36,36.1
CNA_10xv11_1,v1.1 C1,CNAG,10x v1.1,NovaSeq 6000,111532272.0,2733,40809.46652,111532272.0,98.19,50.0,49.0,161.0,90.03,36.4
CNA_10xv11_2,v1.1 C2,CNAG,10x v1.1,NovaSeq 6000,113657151.0,2785,40810.467145,113657151.0,98.17,50.0,49.0,157.0,90.1,36.4


# single_cell_stats

In [29]:
metadata_path_dict = {
    x.split("/")[-1].split(f"__")[0].split(".")[0]: x
    for x in sorted(
        glob.glob(
            f"../{downsample_rate}_3_cistopic_consensus/cistopic_qc_out_CONSENSUS/*metadata*pkl"
        )
    )
}
if verbose:
    print(metadata_path_dict)

In [30]:
selected_cells_path_dict = {
    x.split("/")[-1].split(f"__")[0].split(".")[0]: x
    for x in sorted(
        glob.glob(f"../{downsample_rate}_3_cistopic_consensus/selected_barcodes/*.pkl")
    )
}
if verbose:
    print(selected_cells_path_dict)

In [31]:
df_merged = pd.DataFrame()
for sample in metadata_path_dict.keys():
    with open(metadata_path_dict[sample], "rb") as f:
        df = pickle.load(f)

    with open(selected_cells_path_dict[sample], "rb") as f:
        selected_barcodes = pickle.load(f)

    if downsample_rate == "fixedcells":
        selected_barcodes = [x.replace("FULL", "FIXEDCELLS") for x in selected_barcodes]

    df = df.loc[selected_barcodes]
    df_median = df.median()
    df_median.index = ["Median_" + x.lower() for x in df_median.index]
    df_median["total_nr_frag_in_selected_barcodes"] = sum(df["Total_nr_frag"])
    df_median["total_nr_unique_frag_in_selected_barcodes"] = sum(df["Unique_nr_frag"])
    df_median["total_nr_unique_frag_in_selected_barcodes_in_regions"] = sum(
        df["Unique_nr_frag_in_regions"]
    )
    df_median["n_barcodes_merged"] = len(
        [x for x in [x.split("__")[0] for x in df.index] if "_" in x]
    )
    df_median["frac_barcodes_merged"] = len(
        [x for x in [x.split("__")[0] for x in df.index] if "_" in x]
    ) / len(df)
    df_merged = pd.concat([df_merged, df_median], axis=1)

  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median = df.median()
  df_median 

In [32]:
df_merged.columns = metadata_path_dict.keys()

df_merged = df_merged.T

In [33]:
df_merged = df_merged[
    [
        "Median_total_nr_frag",
        "Median_unique_nr_frag",
        "Median_dupl_rate",
        "Median_total_nr_frag_in_regions",
        "Median_frip",
        "Median_tss_enrichment",
        "total_nr_frag_in_selected_barcodes",
        "total_nr_unique_frag_in_selected_barcodes",
        "total_nr_unique_frag_in_selected_barcodes_in_regions",
        "n_barcodes_merged",
        "frac_barcodes_merged",
    ]
]

In [34]:
if not "total_nr_frag_in_selected_barcodes" in df_stats.columns:
    df_stats = pd.concat([df_stats, df_merged], axis=1)

In [35]:
df_stats = df_stats.loc[:, ~df_stats.columns.duplicated()].copy()

In [36]:
df_stats.columns

Index(['short_identifier', 'centre', 'technology', 'sequencing_instrument',
       'reads', 'cells', 'RPC', 'nreads', '%_correct_barcodes', 'r1_length',
       'r2_length', 'avg_insert_size', '%_mapq30', 'avg_map_quality',
       'Median_total_nr_frag', 'Median_unique_nr_frag', 'Median_dupl_rate',
       'Median_total_nr_frag_in_regions', 'Median_frip',
       'Median_tss_enrichment', 'total_nr_frag_in_selected_barcodes',
       'total_nr_unique_frag_in_selected_barcodes',
       'total_nr_unique_frag_in_selected_barcodes_in_regions',
       'n_barcodes_merged', 'frac_barcodes_merged'],
      dtype='object')

In [37]:
df_stats["efficiency"] = (
    df_stats["total_nr_unique_frag_in_selected_barcodes_in_regions"] / df_stats["reads"]
)

# Count total fragments

In [38]:
fragments_path_dict = {
    x.split("/")[-1].split(f"__")[0].split(".")[0]: x
    for x in sorted(
        glob.glob(f"../1_data_repository/{downsample_rate}_fragments/*.tsv.gz")
    )
}
if verbose:
    print(fragments_path_dict)

In [39]:
chroms_standard = (
    ["chr" + str(x + 1) for x in range(22)] + ["chrX"] + ["chrY"] + ["chrM"]
)
counts_in_standard_chrom_df = pl.DataFrame({"Chromosome": chroms_standard})
count_len_dict = {}
if not "total_fragments" in df_stats.columns:
    if os.path.exists("fixedcells_general_chroms.csv"):
        df_frags = pd.read_csv("fixedcells_general_chroms.csv", index_col=0)
        df_stats = pd.concat([df_stats, df_frags], axis=1)

    else:
        for sample in fragments_path_dict:
            print(sample)

            df = pl.read_csv(
                fragments_path_dict[sample], has_header=False, sep="\t", n_threads=4
            )

            df.columns = ["Chromosome", "Start", "End", "CB", "Count"]
            df_counts_per_chrom = (
                df
                # .filter(pl.col("CB").is_in(selected_CBs)
                .groupby(["Chromosome", "Start", "End", "CB"])
                .agg(
                    [
                        pl.col("Chromosome").first().alias("temp"),
                    ]
                )
                .groupby("Chromosome")
                .agg(pl.count().alias("count"))
                .sort(by="Chromosome")
            )

            total_counts_in_standard_chroms = df_counts_per_chrom.sum()
            total_counts_in_standard_chroms = df_counts_per_chrom.filter(
                pl.col("Chromosome").is_in(chroms_standard)
            ).sum()

            counts_in_standard_chroms = df_counts_per_chrom.filter(
                pl.col("Chromosome").is_in(chroms_standard)
            )
            counts_in_standard_chroms = counts_in_standard_chroms.rename(
                {"count": sample}
            )
            counts_in_standard_chrom_df = counts_in_standard_chrom_df.join(
                counts_in_standard_chroms, on="Chromosome"
            )
            count_len_dict[sample] = len(df)

        df = counts_in_standard_chrom_df.to_pandas().T
        df.columns = df.loc["Chromosome"]
        df = df.drop("Chromosome")

        for sample in df.index:
            df.at[sample, "total"] = count_len_dict[sample]

        df["nonstandard"] = df["total"] - df[df.columns[:-1]].sum(axis=1)

        df = df.div(df["total"], axis=0) * 100

        for sample in df.index:
            df.at[sample, "total_fragments"] = count_len_dict[sample]

        df.drop("total", inplace=True, axis="columns")

        df_stats = pd.concat([df_stats, df], axis=1)

else:
    print("already done")

# add dar strength and count

In [40]:
dar_path_dict = {
    x.split("/")[-1]: x
    for x in sorted(
        glob.glob(
            "../fixedcells_3_cistopic_consensus/downstream_analysis/DARs/*/*DARs.bed"
        )
    )
}
if verbose:
    print(dar_path_dict)

In [41]:
df = pd.DataFrame(index=df_stats.index)
if not "n_dars__B_cell" in df_stats.columns:
    for sample, dar_path in dar_path_dict.items():
        supersample = sample.split(".")[0]
        cell_type = sample.split("__")[1]
        if verbose:
            print(sample)
            print(cell_type)
        col_name = "n_dars__" + cell_type

        df_dars = pd.read_csv(dar_path, sep="\t", header=None)
        df.at[supersample, col_name] = len(df_dars)

        col_name = "top_2000_dars_median_logfc__" + cell_type
        df.at[supersample, col_name] = df_dars[4][0:2000].median()

        col_name = "top_2000_dars_median_fc__" + cell_type
        df.at[supersample, col_name] = 2 ** df_dars[4][0:2000].median()

# add peak strength and count

In [42]:
peak_path_dict = {
    x.split("/")[-2] + "__" + x.split("/")[-1]: x
    for x in sorted(
        glob.glob(
            "../fixedcells_3_cistopic_consensus/final_consensus_peaks/*__SCREEN_consensus_peaks/*narrowPeak"
        )
    )
}
if verbose:
    print(peak_path_dict)

In [43]:
cell_type

'Cytotoxic_T_cell'

In [44]:
for sample, dar_path in peak_path_dict.items():
    if verbose:
        print(sample)
    supersample = sample.split(".")[0]
    cell_type = sample.split("__")[-1].split(".")[0].split("_peaks")[0]
    if verbose:
        print(cell_type)
    col_name = "n_peaks__" + cell_type

    df_peaks = pd.read_csv(dar_path, sep="\t", header=None)
    df.at[supersample, col_name] = len(df_peaks)

    col_name = "top10k_peaks_strength__" + cell_type
    df.at[supersample, col_name] = df_peaks[6][0:10000].median()
    col_name = "top10k_peaks_pval" + cell_type
    df.at[supersample, col_name] = df_peaks[7][0:10000].median()
    col_name = "top10k_peaks_qval" + cell_type
    df.at[supersample, col_name] = df_peaks[8][0:10000].median()

In [45]:
df_peaks

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,chr1,629195,629317,CytotoxicTcell_peak_1,11519,.,337.11000,1157.29000,1151.99000,76
1,chr1,629908,630058,CytotoxicTcell_peak_2,3223,.,112.66600,327.25200,322.39500,37
2,chr1,633254,633421,CytotoxicTcell_peak_3,25,.,3.54853,5.00599,2.52812,44
3,chr1,633941,634096,CytotoxicTcell_peak_4,54565,.,1331.58000,5461.93000,5456.59000,84
4,chr1,778487,779114,CytotoxicTcell_peak_5,1808,.,69.19630,184.95100,180.84000,244
...,...,...,...,...,...,...,...,...,...,...
52310,chrY,19077416,19077584,CytotoxicTcell_peak_43921,73,.,6.20992,10.01870,7.30265,92
52311,chrY,19567084,19567498,CytotoxicTcell_peak_43922a,108,.,7.98419,13.66820,10.84690,95
52312,chrY,19567084,19567498,CytotoxicTcell_peak_43922b,205,.,12.41980,23.52780,20.51960,254
52313,chrY,19567717,19567946,CytotoxicTcell_peak_43923,288,.,15.96840,31.97620,28.85860,116


In [46]:
if not "n_peaks__B_cell" in df_stats.columns:
    df_stats = pd.concat([df_stats, df], axis=1)

# add recovered % DARs and peaks recovered

In [47]:
import pandas as pd

In [48]:
df = pd.read_csv(
    "../fixedcells_8_individual_tech_cistopic_objects/peak_dar_recovery_individual_samples.tsv",
    sep="\t",
    index_col=0,
)

In [49]:
df

Unnamed: 0_level_0,B_cells_bot20peaks_recovery,Naive_T_cells_bot20peaks_recovery,Cytotoxic_T_cells_bot20peaks_recovery,NK_cells_bot20peaks_recovery,CD14+_monocytes_bot20peaks_recovery,CD16+_monocytes_bot20peaks_recovery,Dendritic_cells_bot20peaks_recovery,mean_bot20peaks_recovery,B_cells_top20peaks_recovery,Naive_T_cells_top20peaks_recovery,...,Natural_killer_cell,mean_bot20dars_recovery,B_cell_top20dars_recovery,CD4+_T_cell_top20dars_recovery,Cytotoxic_T_cell_top20dars_recovery,Natural_killer_cell_top20dars_recovery,CD14+_monocyte_top20dars_recovery,CD16+_monocyte_top20dars_recovery,Dendritic_cell_top20dars_recovery,mean_top20dars_recovery
sample,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
BIO_ddseq_1,0.048732,0.05449,0.005072,0.032241,0.073321,,0.033081,0.041156,0.875821,0.922818,...,0.224573,0.280102,0.248372,0.329651,0.282447,0.36041,0.349878,,0.334588,0.317558
BIO_ddseq_2,0.058525,0.106334,0.005103,0.040168,0.059927,,0.021725,0.04863,0.891785,0.938597,...,0.308532,0.291218,0.231163,0.305814,0.239362,0.322184,0.399919,,0.304057,0.300416
BIO_ddseq_3,0.029213,0.024504,0.007033,0.010109,0.087114,0.048462,0.007077,0.030501,0.855937,0.911619,...,0.143345,0.422531,0.334419,0.397093,0.172872,0.108532,0.764849,0.432505,0.554161,0.394919
BIO_ddseq_4,0.033727,0.026697,0.008433,0.014046,0.040793,0.041192,0.011027,0.025131,0.871296,0.917942,...,0.127645,0.350729,0.377674,0.424419,0.137766,0.077816,0.794752,0.443189,0.221455,0.353867
BRO_mtscatac_1,0.046508,0.045302,0.004139,0.076452,0.119403,0.138792,0.080151,0.072964,0.979192,0.994179,...,0.688737,0.640961,0.552093,0.917442,0.864894,0.845051,0.973759,0.961167,0.769971,0.840625
BRO_mtscatac_2,0.020548,0.04205,0.005134,0.058523,0.098036,0.106447,0.029954,0.051527,0.95797,0.99477,...,0.643686,0.648653,0.48186,0.919186,0.87766,0.857338,0.970098,0.981303,0.808239,0.84224
CNA_10xmultiome_1,0.060284,0.118472,0.006784,0.056501,0.072338,,0.037196,0.058596,0.992703,0.998582,...,0.517406,0.56942,0.820465,0.805233,0.851596,0.869625,0.878153,,0.813885,0.839826
CNA_10xmultiome_2,0.030474,0.096578,0.011016,0.043254,0.069308,,0.018378,0.044835,0.946912,0.997902,...,0.337884,0.438058,0.637674,0.768605,0.633511,0.645734,0.910903,,0.846508,0.740489
CNA_10xv11_1,0.017494,0.065154,0.008682,0.073207,0.02041,0.035107,,0.036676,0.956121,0.996247,...,0.604778,0.627695,0.907442,0.913372,0.893617,0.879863,0.888731,0.935895,,0.903153
CNA_10xv11_2,0.010191,0.056797,0.000809,0.062194,0.042892,,0.00576,0.029774,0.951469,0.995834,...,0.486689,0.606913,0.867907,0.90814,0.871809,0.873038,0.890968,,0.869302,0.880194


In [50]:
list(df_stats.columns)

['short_identifier',
 'centre',
 'technology',
 'sequencing_instrument',
 'reads',
 'cells',
 'RPC',
 'nreads',
 '%_correct_barcodes',
 'r1_length',
 'r2_length',
 'avg_insert_size',
 '%_mapq30',
 'avg_map_quality',
 'Median_total_nr_frag',
 'Median_unique_nr_frag',
 'Median_dupl_rate',
 'Median_total_nr_frag_in_regions',
 'Median_frip',
 'Median_tss_enrichment',
 'total_nr_frag_in_selected_barcodes',
 'total_nr_unique_frag_in_selected_barcodes',
 'total_nr_unique_frag_in_selected_barcodes_in_regions',
 'n_barcodes_merged',
 'frac_barcodes_merged',
 'efficiency',
 'chr1',
 'chr10',
 'chr11',
 'chr12',
 'chr13',
 'chr14',
 'chr15',
 'chr16',
 'chr17',
 'chr18',
 'chr19',
 'chr2',
 'chr20',
 'chr21',
 'chr22',
 'chr3',
 'chr4',
 'chr5',
 'chr6',
 'chr7',
 'chr8',
 'chr9',
 'chrM',
 'chrX',
 'chrY',
 'nonstandard',
 'total_fragments',
 'n_dars__B_cell',
 'top_2000_dars_median_logfc__B_cell',
 'top_2000_dars_median_fc__B_cell',
 'n_dars__CD14+_monocyte',
 'top_2000_dars_median_logfc__CD14+

In [51]:
if not "B_cells_bot20peaks_recovery" in df_stats.columns:
    df_stats = pd.concat([df_stats, df], axis=1)

# add median peak and dar tss distances

In [52]:
glob.glob("../fixedcells_3_cistopic_consensus/*median*tss_dist.tsv")

['../fixedcells_3_cistopic_consensus/median_dar_tss_dist.tsv',
 '../fixedcells_3_cistopic_consensus/median_frag_len_median_tss_dist.tsv',
 '../fixedcells_3_cistopic_consensus/median_peak_tss_dist.tsv',
 '../fixedcells_3_cistopic_consensus/median_top2kdar_tss_dist.tsv']

In [53]:
path = "../fixedcells_3_cistopic_consensus/median_dar_tss_dist.tsv"
df = pd.read_csv(path, sep="\t", index_col=0)
df.index = [x.split(".")[0] for x in df.index]
# df.index = [
#     x.replace("CNA_10xv11_4", "CNA_10xv11c_1")
#     .replace("CNA_10xv11_5", "CNA_10xv11c_2")
#     .replace("BRO_mtscatac", "BRO_mtscatacfacs")
#     for x in df.index
# ]

df_stats["alldars_median_dar_logfc"] = df["score"]
df_stats["alldars_median_dar_tss_dist"] = df["tss_dist"]

In [54]:
path = "../fixedcells_3_cistopic_consensus/median_peak_tss_dist.tsv"
df = pd.read_csv(path, sep="\t", index_col=0)
df.index = [x.split(".")[0] for x in df.index]
# df.index = [
#     x.replace("CNA_10xv11_4", "CNA_10xv11c_1")
#     .replace("CNA_10xv11_5", "CNA_10xv11c_2")
#     .replace("BRO_mtscatac", "BRO_mtscatacfacs")
#     for x in df.index
# ]
df_stats["allpeaks_median_peak_logfc"] = df["median_peak_score"]
df_stats["allpeaks_median_peak_tss_dist"] = df["median_peak_tss_dist"]

In [55]:
path = "../fixedcells_3_cistopic_consensus/median_top2kdar_tss_dist.tsv"
df = pd.read_csv(path, sep="\t", index_col=0)
df.index = [x.split(".")[0] for x in df.index]
# df.index = [
#     x.replace("CNA_10xv11_4", "CNA_10xv11c_1")
#     .replace("CNA_10xv11_5", "CNA_10xv11c_2")
#     .replace("BRO_mtscatac", "BRO_mtscatacfacs")
#     for x in df.index
# ]
df_stats["top2kdars_median_dar_logfc"] = df["score"]
df_stats["top2kdars_median_dar_tss_dist"] = df["tss_dist"]

# add median frag dist and len

In [56]:
path = "../fixedcells_3_cistopic_consensus/median_frag_len_median_tss_dist.tsv"
df = pd.read_csv(path, sep="\t", index_col=0)
df.index = [x.split(".")[0] for x in df.index]

In [57]:
if not "median_frag_len" in df_stats.columns:
    df_stats = pd.concat([df_stats, df], axis=1)

# add frag len quantiles

In [58]:
path = (
    "../fixedcells_3_cistopic_consensus/median_frag_len_median_tss_dist_quantiles.tsv"
)
df = pd.read_csv(path, sep="\t", index_col=0)
df.index = [x.split(".")[0] for x in df.index]

In [59]:
if not "mononucleosomal_distal" in df_stats.columns:
    df_stats = pd.concat([df_stats, df], axis=1)

# add fmx 

In [146]:
df = pd.read_csv(
    "../fixedcells_3_cistopic_consensus/out_fmx/genotype_concordance_unified.txt",
    sep="\t",
)
df

Unnamed: 0,INT_ID,BARCODE,NUM.SNPS,NUM.READS,DROPLET.TYPE,BEST.GUESS,BEST.LLK,NEXT.GUESS,NEXT.LLK,DIFF.LLK.BEST.NEXT,BEST.POSTERIOR,SNG.POSTERIOR,SNG.BEST.GUESS,SNG.BEST.LLK,SNG.NEXT.GUESS,SNG.NEXT.LLK,SNG.ONLY.POSTERIOR,DBL.BEST.GUESS,DBL.BEST.LLK,DIFF.LLK.SNG.DBL,ubarcode,replicate,sample
CNA_hydrop_2.FIXEDCELLS.1,0,CGACATTACATAGGAGTCAA,182,182,SNG,11,-324.11,10,-343.05,18.94,0.00000,1.0,1,-324.11,0,-373.28,1.0,10,-343.05,18.94,CNA_hydrop_2.FIXEDCELLS#CGACATTACATAGGAGTCAA,CNA_hydrop_2.FIXEDCELLS,sampleB
CNA_hydrop_2.FIXEDCELLS.2,1,GGCAACCTCTGAGCTAGTAA,209,209,SNG,00,-369.10,10,-390.29,21.19,0.00000,1.0,0,-369.10,1,-426.11,1.0,10,-390.29,21.19,CNA_hydrop_2.FIXEDCELLS#GGCAACCTCTGAGCTAGTAA,CNA_hydrop_2.FIXEDCELLS,sampleA
CNA_hydrop_2.FIXEDCELLS.3,2,CAACACCATTTCTCGCACGA,273,273,SNG,00,-492.20,10,-520.48,28.28,0.00000,1.0,0,-492.20,1,-576.63,1.0,10,-520.48,28.28,CNA_hydrop_2.FIXEDCELLS#CAACACCATTTCTCGCACGA,CNA_hydrop_2.FIXEDCELLS,sampleA
CNA_hydrop_2.FIXEDCELLS.4,3,TGCATGAGGTTACGGACGGT,354,354,SNG,11,-636.69,10,-669.83,33.14,0.00000,1.0,1,-636.69,0,-734.25,1.0,10,-669.83,33.14,CNA_hydrop_2.FIXEDCELLS#TGCATGAGGTTACGGACGGT,CNA_hydrop_2.FIXEDCELLS,sampleB
CNA_hydrop_2.FIXEDCELLS.5,4,ACAGTGAAGATCCAGTGTTC,577,577,SNG,00,-1069.99,10,-1093.26,23.27,0.00000,1.0,0,-1069.99,1,-1173.29,1.0,10,-1093.26,23.27,CNA_hydrop_2.FIXEDCELLS#ACAGTGAAGATCCAGTGTTC,CNA_hydrop_2.FIXEDCELLS,sampleA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
VIB_hydrop_12.FIXEDCELLS.1417,1416,GTCGTTGAGAGTGACCAGTA,126,126,SNG,11,-221.86,10,-240.94,19.08,0.00000,1.0,1,-221.86,0,-283.50,1.0,10,-240.94,19.08,VIB_hydrop_12.FIXEDCELLS#GTCGTTGAGAGTGACCAGTA,VIB_hydrop_12.FIXEDCELLS,sampleB
VIB_hydrop_12.FIXEDCELLS.1418,1417,ACCGAAGGCTTTGCAGTTCT,103,103,SNG,11,-185.75,10,-198.81,13.06,0.00000,1.0,1,-185.75,0,-226.20,1.0,10,-198.81,13.06,VIB_hydrop_12.FIXEDCELLS#ACCGAAGGCTTTGCAGTTCT,VIB_hydrop_12.FIXEDCELLS,sampleB
VIB_hydrop_12.FIXEDCELLS.1419,1418,TAGAGCCTGATTGTGTAGGA,115,115,SNG,11,-207.21,10,-219.27,12.06,-0.00001,1.0,1,-207.21,0,-244.26,1.0,10,-219.27,12.06,VIB_hydrop_12.FIXEDCELLS#TAGAGCCTGATTGTGTAGGA,VIB_hydrop_12.FIXEDCELLS,sampleB
VIB_hydrop_12.FIXEDCELLS.1420,1419,CAATTGGAGACCACACGGAT,101,101,SNG,11,-177.98,10,-188.29,10.31,-0.00007,1.0,1,-177.98,0,-214.34,1.0,10,-188.29,10.31,VIB_hydrop_12.FIXEDCELLS#CAATTGGAGACCACACGGAT,VIB_hydrop_12.FIXEDCELLS,sampleB


In [148]:
list(df.columns)

['INT_ID',
 'BARCODE',
 'NUM.SNPS',
 'NUM.READS',
 'DROPLET.TYPE',
 'BEST.GUESS',
 'BEST.LLK',
 'NEXT.GUESS',
 'NEXT.LLK',
 'DIFF.LLK.BEST.NEXT',
 'BEST.POSTERIOR',
 'SNG.POSTERIOR',
 'SNG.BEST.GUESS',
 'SNG.BEST.LLK',
 'SNG.NEXT.GUESS',
 'SNG.NEXT.LLK',
 'SNG.ONLY.POSTERIOR',
 'DBL.BEST.GUESS',
 'DBL.BEST.LLK',
 'DIFF.LLK.SNG.DBL',
 'ubarcode',
 'replicate',
 'sample']

In [153]:
df["tech"] = [x.split("_")[1] for x in df.index]

In [154]:
df.groupby("tech")["DIFF.LLK.SNG.DBL"].median()

tech
10xmultiome    143.02
10xv1          172.32
10xv11         157.45
10xv2          223.55
ddseq          141.21
hydrop          41.73
mtscatac       286.06
s3atac         215.79
Name: DIFF.LLK.SNG.DBL, dtype: float64

In [155]:
df.groupby("tech")["DIFF.LLK.BEST.NEXT"].median()

tech
10xmultiome    152.460
10xv1          175.975
10xv11         159.710
10xv2          234.100
ddseq          143.520
hydrop          42.150
mtscatac       288.780
s3atac         215.790
Name: DIFF.LLK.BEST.NEXT, dtype: float64

In [61]:
df_median = df.groupby("replicate").median()
df_median.index = [x.split(".")[0] for x in df_median.index]

  df_median = df.groupby("replicate").median()


In [62]:
df_stats["fmx_n_snps"] = df_median["NUM.SNPS"]
df_stats["fmx_best_llk"] = df_median["BEST.LLK"]

# add male/female T cell ratios

In [63]:
df = pd.read_csv(
    "../fixedcells_3_cistopic_consensus/fraction_cd4_to_cd8_t_cells.tsv",
    sep="\t",
    index_col=0,
)
df.index = [x.split(".")[0] for x in df.index]

In [64]:
df_stats["ratio_cd4T_to_cd8T_in_male"] = df["fraction in male"]
df_stats["ratio_cd4T_to_cd8T_in_female"] = df["fraction in female"]
df_stats["ratio_cd4T_to_cd8T_normalized"] = df["fraction in female normalized"]

# add doublet counts

In [65]:
common = glob.glob(
    "../fixedcells_3_cistopic_consensus/cistopic_objects/*common_doublets.txt*"
)
scr_only = glob.glob(
    "../fixedcells_3_cistopic_consensus/cistopic_objects/*scr_doublets_unique.txt*"
)
fmx_only = glob.glob(
    "../fixedcells_3_cistopic_consensus/cistopic_objects/*fmx_doublets_unique.txt*"
)

In [66]:
for path in common:
    n_lines = sum(1 for line in open(path))
    # print(n_lines)
    sample = path.split("/")[-1].split(".")[0]
    df_stats.at[sample, "common_doublets"] = n_lines

df_stats["common_doublets"].replace(0, np.nan, inplace=True)

for path in scr_only:
    n_lines = sum(1 for line in open(path))
    # print(n_lines)
    sample = path.split("/")[-1].split(".")[0]
    df_stats.at[sample, "scr_exclusive_doublets"] = n_lines

for path in fmx_only:
    n_lines = sum(1 for line in open(path))
    # print(n_lines)
    sample = path.split("/")[-1].split(".")[0]
    df_stats.at[sample, "fmx_exclusive_doublets"] = n_lines

In [67]:
for var in [
    "fmx_n_snps",
    "scr_exclusive_doublets",
    "fmx_exclusive_doublets",
    "common_doublets",
]:
    df_stats.at["VIB_hydrop_1", var] = (
        df_stats.at["VIB_hydrop_11", var] + df_stats.at["VIB_hydrop_12", var]
    )
    df_stats.at["VIB_hydrop_2", var] = (
        df_stats.at["VIB_hydrop_21", var] + df_stats.at["VIB_hydrop_22", var]
    )

In [68]:
df_stats["total_doublets"] = (
    df_stats["scr_exclusive_doublets"]
    + df_stats["fmx_exclusive_doublets"]
    - df_stats["common_doublets"]
)

df_stats["total_doublets_pct"] = df_stats["total_doublets"] / df_stats["cells"]
df_stats["scr_exclusive_doublets_pct"] = (
    df_stats["scr_exclusive_doublets"] / df_stats["cells"]
)
df_stats["fmx_exclusive_doublets_pct"] = (
    df_stats["fmx_exclusive_doublets"] / df_stats["cells"]
)
df_stats["common_doublets_pct_of_doublets"] = (
    df_stats["common_doublets"] / df_stats["total_doublets"]
)
df_stats["common_doublets_pct"] = df_stats["common_doublets"] / df_stats["cells"]

# add cell counts

In [69]:
path_dict = {
    x.split("/")[-1].split(".")[0]: x
    for x in sorted(
        glob.glob(
            "../fixedcells_3_cistopic_consensus/cistopic_objects/*consensus.cell_data.tsv"
        )
    )
}
if verbose:
    print(path_dict)

In [70]:
df_merged = pd.DataFrame(index=df_stats.index)
for sample, path in path_dict.items():
    print(sample)
    df = pd.read_csv(path, index_col=0, sep="\t")
    df_stats.at[sample, "seurat_score"] = df["seurat_cell_type_pred_score"].median()
    df_sub = pd.DataFrame(df["seurat_cell_type"].value_counts())

    for cell_type in df_sub.index:
        col_name = "n_seurat_cells__" + cell_type
        df_merged.at[sample, col_name] = df_sub.loc[cell_type][0]

    df_sub = pd.DataFrame(df["seurat_cell_type"].value_counts())
    for cell_type in df_sub.index:
        col_name = "n_consensus_cells__" + cell_type
        df_merged.at[sample, col_name] = df_sub.loc[cell_type][0]

BIO_ddseq_1
BIO_ddseq_2
BIO_ddseq_3
BIO_ddseq_4
BRO_mtscatac_1
BRO_mtscatac_2
CNA_10xmultiome_1
CNA_10xmultiome_2
CNA_10xv11_1
CNA_10xv11_2
CNA_10xv11_3
CNA_10xv11_4
CNA_10xv11_5
CNA_10xv2_1
CNA_10xv2_2
CNA_hydrop_1
CNA_hydrop_2
CNA_hydrop_3
CNA_mtscatac_1
CNA_mtscatac_2
EPF_hydrop_1
EPF_hydrop_2
EPF_hydrop_3
EPF_hydrop_4
HAR_ddseq_1
HAR_ddseq_2
MDC_mtscatac_1
MDC_mtscatac_2
OHS_s3atac_1
OHS_s3atac_2
SAN_10xmultiome_1
SAN_10xmultiome_2
STA_10xv11_1
STA_10xv11_2
TXG_10xv11_1
TXG_10xv2_1
TXG_10xv2_2
UCS_ddseq_1
UCS_ddseq_2
VIB_10xmultiome_1
VIB_10xmultiome_2
VIB_10xv1_1
VIB_10xv1_2
VIB_10xv2_1
VIB_10xv2_2
VIB_hydrop_1
VIB_hydrop_2


In [71]:
df.columns

Index(['cisTopic_log_nr_acc', 'cisTopic_log_nr_frag', 'cisTopic_nr_frag',
       'cisTopic_nr_acc', 'Log_total_nr_frag', 'Log_unique_nr_frag',
       'Total_nr_frag', 'Unique_nr_frag', 'Dupl_nr_frag', 'Dupl_rate',
       'Total_nr_frag_in_regions', 'Unique_nr_frag_in_regions', 'FRIP',
       'TSS_enrichment', 'sample_id', 'barcode', 'Doublet_scores_fragments',
       'Predicted_doublets_fragments', 'fmx_droplet_type', 'fmx_sample',
       'seurat_cell_type', 'seurat_cell_type_pred_score',
       'pycisTopic_leiden_10_3.0', 'consensus_cell_type', 'UMAP_1', 'UMAP_2',
       'tSNE_1', 'tSNE_2'],
      dtype='object')

In [72]:
df_merged_rownorm = df_merged.copy()
df_merged_rownorm = df_merged_rownorm.div(df_merged_rownorm.sum(axis=1) / 2, axis=0)
df_merged_rownorm.columns = [
    x.replace(" ", "_").replace("n_", "pct_") for x in df_merged_rownorm.columns
]

In [73]:
if not "n_seurat_cells__CD4+ T cell" in df_stats.columns:
    df_stats = pd.concat([df_stats, df_merged], axis=1)


if not "pct_consensus_cells__CD4+_T_cell" in df_stats.columns:
    df_stats = pd.concat([df_stats, df_merged_rownorm], axis=1)

In [74]:
df_merged = pd.DataFrame(index=df_stats.index)
for sample, path in path_dict.items():
    print(sample)
    df = pd.read_csv(path, index_col=0, sep="\t")
    df_sub = pd.DataFrame(df["seurat_cell_type"].value_counts())

    df_merged.at[sample, "Median_Unique_nr_frag_in_regions"] = df[
        "Unique_nr_frag_in_regions"
    ].median()
    df_merged.at[sample, "Median_scrublet_doublet_scores_fragments"] = df[
        "Doublet_scores_fragments"
    ].median()

    df_merged.at[sample, "Mean_Unique_nr_frag_in_regions"] = df[
        "Unique_nr_frag_in_regions"
    ].mean()
    df_merged.at[sample, "Mean_scrublet_doublet_scores_fragments"] = df[
        "Doublet_scores_fragments"
    ].mean()

BIO_ddseq_1
BIO_ddseq_2
BIO_ddseq_3
BIO_ddseq_4
BRO_mtscatac_1
BRO_mtscatac_2
CNA_10xmultiome_1
CNA_10xmultiome_2
CNA_10xv11_1
CNA_10xv11_2
CNA_10xv11_3
CNA_10xv11_4
CNA_10xv11_5
CNA_10xv2_1
CNA_10xv2_2
CNA_hydrop_1
CNA_hydrop_2
CNA_hydrop_3
CNA_mtscatac_1
CNA_mtscatac_2
EPF_hydrop_1
EPF_hydrop_2
EPF_hydrop_3
EPF_hydrop_4
HAR_ddseq_1
HAR_ddseq_2
MDC_mtscatac_1
MDC_mtscatac_2
OHS_s3atac_1
OHS_s3atac_2
SAN_10xmultiome_1
SAN_10xmultiome_2
STA_10xv11_1
STA_10xv11_2
TXG_10xv11_1
TXG_10xv2_1
TXG_10xv2_2
UCS_ddseq_1
UCS_ddseq_2
VIB_10xmultiome_1
VIB_10xmultiome_2
VIB_10xv1_1
VIB_10xv1_2
VIB_10xv2_1
VIB_10xv2_2
VIB_hydrop_1
VIB_hydrop_2


In [75]:
df_merged["log_median_unique_nr_frag_in_regions"] = np.log10(
    df_merged["Median_Unique_nr_frag_in_regions"]
)

In [76]:
df_merged

Unnamed: 0,Median_Unique_nr_frag_in_regions,Median_scrublet_doublet_scores_fragments,Mean_Unique_nr_frag_in_regions,Mean_scrublet_doublet_scores_fragments,log_median_unique_nr_frag_in_regions
BIO_ddseq_1,3849.0,0.220779,3787.501305,0.216256,3.585348
BIO_ddseq_2,4972.0,0.190578,4808.57805,0.194312,3.696531
BIO_ddseq_3,4122.0,0.127753,4262.026468,0.137995,3.615108
BIO_ddseq_4,4611.0,0.127753,4752.200614,0.143042,3.663795
BRO_mtscatac_1,7814.0,0.076283,8514.084109,0.094029,3.892873
BRO_mtscatac_2,7623.0,0.086154,8207.707206,0.100562,3.882126
CNA_10xmultiome_1,5558.0,0.10299,6186.908235,0.120977,3.744919
CNA_10xmultiome_2,3784.0,0.132075,6054.972598,0.144116,3.577951
CNA_10xv11_1,4584.0,0.142857,5097.934648,0.15093,3.661245
CNA_10xv11_2,4951.0,0.114754,5439.018889,0.135385,3.694693


In [77]:
if not "Median_Unique_nr_frag_in_regions" in df_stats.columns:
    df_stats = pd.concat([df_stats, df_merged], axis=1)

In [78]:
df_stats.to_csv("fixedcells_general_statistics.tsv", sep="\t")

# calculate fric barplots

total reads > total fragments > in cells > unique > in peaks

In [79]:
df_sub = df_stats[
    [
        "reads",
        "total_nr_frag_in_selected_barcodes",
        "total_nr_unique_frag_in_selected_barcodes",
        "total_nr_unique_frag_in_selected_barcodes_in_regions",
    ]
]

In [80]:
df_sub["with_correct_barcode"] = df_sub["reads"] * df_stats["%_correct_barcodes"] / 100

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sub["with_correct_barcode"] = df_sub["reads"] * df_stats["%_correct_barcodes"] / 100


In [81]:
df_sub["mapped"] = df_sub["with_correct_barcode"] * df_stats["%_mapq30"] / 100

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sub["mapped"] = df_sub["with_correct_barcode"] * df_stats["%_mapq30"] / 100


In [82]:
df_sub = df_sub.div(df_sub["reads"], axis=0)

In [83]:
df_sub["reads"] = df_sub["reads"] - df_sub["with_correct_barcode"]
df_sub["with_correct_barcode"] = df_sub["with_correct_barcode"] - df_sub["mapped"]
df_sub["mapped"] = df_sub["mapped"] - df_sub["total_nr_frag_in_selected_barcodes"]
df_sub["total_nr_frag_in_selected_barcodes"] = (
    df_sub["total_nr_frag_in_selected_barcodes"]
    - df_sub["total_nr_unique_frag_in_selected_barcodes"]
)
df_sub["total_nr_unique_frag_in_selected_barcodes"] = (
    df_sub["total_nr_unique_frag_in_selected_barcodes"]
    - df_sub["total_nr_unique_frag_in_selected_barcodes_in_regions"]
)

In [84]:
df_sub.columns = [
    "No correct barcode",
    "Duplicate fragments in cells",
    "Unique fragments in cells, not in peaks",
    "Unique fragments in cells and in peaks",
    "Not mapped properly",
    "Fragments in background noise barcodes",
]

In [85]:
df_sub = df_sub[
    [
        "No correct barcode",
        "Not mapped properly",
        "Fragments in background noise barcodes",
        "Duplicate fragments in cells",
        "Unique fragments in cells, not in peaks",
        "Unique fragments in cells and in peaks",
    ]
]

## add normalized

In [86]:
df_sub2 = df_stats[
    [
        "total_nr_frag_in_selected_barcodes",
        "total_nr_unique_frag_in_selected_barcodes",
        "total_nr_unique_frag_in_selected_barcodes_in_regions",
    ]
]

In [87]:
# df_sub2 = df_sub2.div(df_sub2["total_nr_frag_in_selected_barcodes"], axis=0)

In [88]:
df_sub2["Duplicate fragments in cells, normalized to fragments in cells"] = (
    df_sub2["total_nr_frag_in_selected_barcodes"]
    - df_sub2["total_nr_unique_frag_in_selected_barcodes"]
) / df_sub2["total_nr_frag_in_selected_barcodes"]


df_sub2[
    "Unique fragments in cells and in peaks, normalized to unique fragments in cells"
] = (
    df_sub2["total_nr_unique_frag_in_selected_barcodes_in_regions"]
    / df_sub2["total_nr_unique_frag_in_selected_barcodes"]
)

df_sub2[
    "Unique fragments in cells, not in peaks, normalized to unique fragments in cells"
] = (
    df_sub2["total_nr_unique_frag_in_selected_barcodes"]
    - df_sub2["total_nr_unique_frag_in_selected_barcodes_in_regions"]
) / df_sub2[
    "total_nr_unique_frag_in_selected_barcodes"
]

df_sub2 = df_sub2[
    [
        "Duplicate fragments in cells, normalized to fragments in cells",
        "Unique fragments in cells, not in peaks, normalized to unique fragments in cells",
        "Unique fragments in cells and in peaks, normalized to unique fragments in cells",
    ]
]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sub2["Duplicate fragments in cells, normalized to fragments in cells"] = (
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sub2[
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sub2[


In [89]:
df_sub2

Unnamed: 0,"Duplicate fragments in cells, normalized to fragments in cells","Unique fragments in cells, not in peaks, normalized to unique fragments in cells","Unique fragments in cells and in peaks, normalized to unique fragments in cells"
BIO_ddseq_1,0.704205,0.45044,0.54956
BIO_ddseq_2,0.724223,0.37834,0.62166
BIO_ddseq_3,0.667613,0.371563,0.628437
BIO_ddseq_4,0.643843,0.370333,0.629667
BRO_mtscatac_1,0.44251,0.510438,0.489562
BRO_mtscatac_2,0.45456,0.516627,0.483373
CNA_10xmultiome_1,0.461778,0.411665,0.588335
CNA_10xmultiome_2,0.291976,0.473382,0.526618
CNA_10xv11_1,0.481966,0.381358,0.618642
CNA_10xv11_2,0.503942,0.360916,0.639084


In [90]:
df_sub = pd.concat([df_sub, df_sub2], axis=1)

In [91]:
df_sub

Unnamed: 0,No correct barcode,Not mapped properly,Fragments in background noise barcodes,Duplicate fragments in cells,"Unique fragments in cells, not in peaks",Unique fragments in cells and in peaks,"Duplicate fragments in cells, normalized to fragments in cells","Unique fragments in cells, not in peaks, normalized to unique fragments in cells","Unique fragments in cells and in peaks, normalized to unique fragments in cells"
BIO_ddseq_1,0.0346,0.091713,0.259337,0.432628,0.081855,0.099867,0.704205,0.45044,0.54956
BIO_ddseq_2,0.0324,0.076053,0.17704,0.517462,0.07455,0.122495,0.724223,0.37834,0.62166
BIO_ddseq_3,0.1253,0.111874,0.256717,0.337885,0.062506,0.105718,0.667613,0.371563,0.628437
BIO_ddseq_4,0.119,0.112328,0.246076,0.33647,0.068929,0.117198,0.643843,0.370333,0.629667
BRO_mtscatac_1,0.0534,0.117852,0.059414,0.340438,0.218925,0.209971,0.44251,0.510438,0.489562
BRO_mtscatac_2,0.0521,0.116971,0.054692,0.352846,0.218735,0.204656,0.45456,0.516627,0.483373
CNA_10xmultiome_1,0.0169,0.101751,0.366111,0.237926,0.11416,0.163153,0.461778,0.411665,0.588335
CNA_10xmultiome_2,0.016,0.104698,0.461293,0.122049,0.140102,0.155858,0.291976,0.473382,0.526618
CNA_10xv11_1,0.0181,0.097895,0.474439,0.197397,0.080912,0.131257,0.481966,0.381358,0.618642
CNA_10xv11_2,0.0183,0.097188,0.443402,0.222294,0.078974,0.139842,0.503942,0.360916,0.639084


In [92]:
df_sub.to_csv("fixedcells_general_losses.csv")

In [93]:
if not "No correct barcode" in df_stats.columns:
    df_stats = pd.concat([df_stats, df_sub], axis=1)

# save

In [94]:
cols = [x for x in list(df_stats.columns) if ".1" not in x]

In [95]:
tech_dict = {
    "10xmultiome": "10x Multiome",
    "10xv1": "10x v1",
    "10xv11": "10x v1.1",
    "10xv2": "10x v2",
    "ddseq": "ddSEQ SureCell",
    "hydrop": "HyDrop",
    "mtscatac": "mtscATAC-seq",
    "s3atac": "s3-ATAC",
}

tech_alias_dict = {
    "10xmultiome": "10x Multiome",
    "10xv1": "10x v1",
    "10xv11": "10x v1.1",
    "10xv11c": "10x v1.1 (control)",
    "10xv2": "10x v2",
    "ddseq": "Bio-Rad ddSEQ SureCell",
    "hydrop": "HyDrop",
    "mtscatac": "mtscATAC-seq",
    "mtscatacfacs": "mtscATAC-seq (FACS)",
    "s3atac": "s3-ATAC",
}

In [96]:
df_stats["centre"] = [
    centre_dict[x] for x in np.array([x.split("_") for x in list(df_stats.index)])[:, 0]
]

In [97]:
df_stats.index = [
    x.replace("CNA_10xv11_4", "CNA_10xv11c_1")
    .replace("CNA_10xv11_5", "CNA_10xv11c_2")
    .replace("BRO_mtscatac_1", "BRO_mtscatacfacs_1")
    .replace("BRO_mtscatac_2", "BRO_mtscatacfacs_2")
    for x in df_stats.index
]

In [98]:
df_stats["technology"] = [
    tech_alias_dict[x]
    for x in np.array([x.split("_") for x in list(df_stats.index)])[:, 1]
]

# check some na

In [99]:
df_stats = df_stats.drop(
    ["VIB_hydrop_11", "VIB_hydrop_12", "VIB_hydrop_21", "VIB_hydrop_22"]
)

In [100]:
pd.DataFrame(df_stats.isna().sum(axis=0))[
    pd.DataFrame(df_stats.isna().sum(axis=0))[0] > 0
]

Unnamed: 0,0
n_dars__B_cell,2
top_2000_dars_median_logfc__B_cell,2
top_2000_dars_median_fc__B_cell,2
n_dars__CD4+_T_cell,1
top_2000_dars_median_logfc__CD4+_T_cell,1
...,...
common_doublets_pct,21
n_seurat_cells__CD16+ monocyte,1
n_consensus_cells__CD16+ monocyte,1
pct_seurat_cells__CD16+_monocyte,1


# save

In [101]:
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.set_option("display.width", 1000)

In [102]:
df_stats

Unnamed: 0,short_identifier,centre,technology,sequencing_instrument,reads,cells,RPC,nreads,%_correct_barcodes,r1_length,r2_length,avg_insert_size,%_mapq30,avg_map_quality,Median_total_nr_frag,Median_unique_nr_frag,Median_dupl_rate,Median_total_nr_frag_in_regions,Median_frip,Median_tss_enrichment,total_nr_frag_in_selected_barcodes,total_nr_unique_frag_in_selected_barcodes,total_nr_unique_frag_in_selected_barcodes_in_regions,n_barcodes_merged,frac_barcodes_merged,efficiency,chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrM,chrX,chrY,nonstandard,total_fragments,n_dars__B_cell,top_2000_dars_median_logfc__B_cell,top_2000_dars_median_fc__B_cell,n_dars__CD14+_monocyte,top_2000_dars_median_logfc__CD14+_monocyte,top_2000_dars_median_fc__CD14+_monocyte,n_dars__CD4+_T_cell,top_2000_dars_median_logfc__CD4+_T_cell,top_2000_dars_median_fc__CD4+_T_cell,n_dars__Cytotoxic_T_cell,top_2000_dars_median_logfc__Cytotoxic_T_cell,top_2000_dars_median_fc__Cytotoxic_T_cell,n_dars__Dendritic_cell,top_2000_dars_median_logfc__Dendritic_cell,top_2000_dars_median_fc__Dendritic_cell,n_dars__Natural_killer_cell,top_2000_dars_median_logfc__Natural_killer_cell,top_2000_dars_median_fc__Natural_killer_cell,n_dars__CD16+_monocyte,top_2000_dars_median_logfc__CD16+_monocyte,top_2000_dars_median_fc__CD16+_monocyte,n_peaks__Bcell,top10k_peaks_strength__Bcell,top10k_peaks_pvalBcell,top10k_peaks_qvalBcell,n_peaks__CD14_monocyte,top10k_peaks_strength__CD14_monocyte,top10k_peaks_pvalCD14_monocyte,top10k_peaks_qvalCD14_monocyte,n_peaks__CD4_Tcell,top10k_peaks_strength__CD4_Tcell,top10k_peaks_pvalCD4_Tcell,top10k_peaks_qvalCD4_Tcell,n_peaks__CytotoxicTcell,top10k_peaks_strength__CytotoxicTcell,top10k_peaks_pvalCytotoxicTcell,top10k_peaks_qvalCytotoxicTcell,n_peaks__Dendriticcell,top10k_peaks_strength__Dendriticcell,top10k_peaks_pvalDendriticcell,top10k_peaks_qvalDendriticcell,n_peaks__Naturalkillercell,top10k_peaks_strength__Naturalkillercell,top10k_peaks_pvalNaturalkillercell,top10k_peaks_qvalNaturalkillercell,n_peaks__CD16_monocyte,top10k_peaks_strength__CD16_monocyte,top10k_peaks_pvalCD16_monocyte,top10k_peaks_qvalCD16_monocyte,B_cells_bot20peaks_recovery,Naive_T_cells_bot20peaks_recovery,Cytotoxic_T_cells_bot20peaks_recovery,NK_cells_bot20peaks_recovery,CD14+_monocytes_bot20peaks_recovery,CD16+_monocytes_bot20peaks_recovery,Dendritic_cells_bot20peaks_recovery,mean_bot20peaks_recovery,B_cells_top20peaks_recovery,Naive_T_cells_top20peaks_recovery,Cytotoxic_T_cells_top20peaks_recovery,NK_cells_top20peaks_recovery,CD14+_monocytes_top20peaks_recovery,CD16+_monocytes_top20peaks_recovery,Dendritic_cells_top20peaks_recovery,mean_top20peaks_recovery,B_cell,CD14+_monocyte,CD16+_monocyte,CD4+_T_cell,Cytotoxic_T_cell,Dendritic_cell,Natural_killer_cell,mean_bot20dars_recovery,B_cell_top20dars_recovery,CD4+_T_cell_top20dars_recovery,Cytotoxic_T_cell_top20dars_recovery,Natural_killer_cell_top20dars_recovery,CD14+_monocyte_top20dars_recovery,CD16+_monocyte_top20dars_recovery,Dendritic_cell_top20dars_recovery,mean_top20dars_recovery,alldars_median_dar_logfc,alldars_median_dar_tss_dist,allpeaks_median_peak_logfc,allpeaks_median_peak_tss_dist,top2kdars_median_dar_logfc,top2kdars_median_dar_tss_dist,median_frag_len,median_log10_frag_dist_nearest_tss,median_frag_dist_nearest_tss,nucleosome-free_proximal,mononucleosomal_distal,mononucleosomal_proximal,nucleosome-free_distal,multinucleosomal_distal,multinucleosomal_proximal,nucleosome-free,mononucleosomal,multinucleosomal,proximal,distal,fmx_n_snps,fmx_best_llk,ratio_cd4T_to_cd8T_in_male,ratio_cd4T_to_cd8T_in_female,ratio_cd4T_to_cd8T_normalized,common_doublets,scr_exclusive_doublets,fmx_exclusive_doublets,total_doublets,total_doublets_pct,scr_exclusive_doublets_pct,fmx_exclusive_doublets_pct,common_doublets_pct_of_doublets,common_doublets_pct,seurat_score,n_seurat_cells__CD4+ T cell,n_seurat_cells__Cytotoxic T cell,n_seurat_cells__CD14+ monocyte,n_seurat_cells__B cell,n_seurat_cells__Natural killer cell,n_seurat_cells__CD16+ monocyte,n_seurat_cells__Dendritic cell,n_consensus_cells__CD4+ T cell,n_consensus_cells__Cytotoxic T cell,n_consensus_cells__CD14+ monocyte,n_consensus_cells__B cell,n_consensus_cells__Natural killer cell,n_consensus_cells__CD16+ monocyte,n_consensus_cells__Dendritic cell,pct_seurat_cells__CD4+_T_cell,pct_seurat_cells__Cytotoxic_T_cell,pct_seurat_cells__CD14+_monocyte,pct_seurat_cells__B_cell,pct_seurat_cells__Natural_killer_cell,pct_seurat_cells__CD16+_monocyte,pct_seurat_cells__Dendritic_cell,pct_consensus_cells__CD4+_T_cell,pct_consensus_cells__Cytotoxic_T_cell,pct_consensus_cells__CD14+_monocyte,pct_consensus_cells__B_cell,pct_consensus_cells__Natural_killer_cell,pct_consensus_cells__CD16+_monocyte,pct_consensus_cells__Dendritic_cell,Median_Unique_nr_frag_in_regions,Median_scrublet_doublet_scores_fragments,Mean_Unique_nr_frag_in_regions,Mean_scrublet_doublet_scores_fragments,log_median_unique_nr_frag_in_regions,No correct barcode,Not mapped properly,Fragments in background noise barcodes,Duplicate fragments in cells,"Unique fragments in cells, not in peaks",Unique fragments in cells and in peaks,"Duplicate fragments in cells, normalized to fragments in cells","Unique fragments in cells, not in peaks, normalized to unique fragments in cells","Unique fragments in cells and in peaks, normalized to unique fragments in cells"
BIO_ddseq_1,ddS Bi1,BioRad,Bio-Rad ddSEQ SureCell,NovaSeq,259455742.0,6359,40801.343293,259455742.0,96.54,53.0,40.0,172.0,90.5,36.1,24130.5,7156.0,0.701896,13348.0,0.56273,34.318184,159396596.0,47148735.0,25911059.0,5497.0,0.864308,0.099867,9.254443,4.504364,5.334989,4.91139,2.225899,3.339849,3.071638,4.114966,5.321248,1.878399,5.82604,7.292982,2.775776,1.097576,2.269442,5.863884,4.090092,4.892771,5.251748,4.926587,3.968795,4.197078,0.336184,3.121082,0.01256,0.120217,82162568.0,5237.0,2.691359,6.459216,12405.0,4.669139,25.441976,3370.0,1.800491,3.483389,3084.0,1.202235,2.300959,12586.0,2.241948,4.730352,3458.0,1.902389,3.738317,,,,57701.0,8.90682,14.6127,12.1274,87908.0,8.62963,13.8394,11.5665,74153.0,11.2272,22.9222,20.6489,63790.0,10.8579,18.9892,16.6108,41965.0,4.68874,7.99287,5.27685,48188.0,9.73588,16.3752,13.8458,,,,,0.048732,0.05449,0.005072,0.032241,0.073321,,0.033081,0.041156,0.875821,0.922818,0.70704,0.951144,0.852011,,0.766978,0.845968,0.282326,0.451383,,0.219186,0.146809,0.356336,0.224573,0.280102,0.248372,0.329651,0.282447,0.36041,0.349878,,0.334588,0.317558,1.739136,8397.5,3.848679,8214.5,2.236555,13182.0,174.0,2.725095,530.0,0.255515,0.248135,0.22025,0.12725,0.0747,0.07415,0.382765,0.468385,0.14885,0.549915,0.450085,,,,,,,611.0,,,,0.096084,,,,0.71059,2492.0,1418.0,799.0,472.0,362.0,146.0,60.0,2492.0,1418.0,799.0,472.0,362.0,146.0,60.0,0.433467,0.246652,0.138981,0.082101,0.062967,0.025396,0.010437,0.433467,0.246652,0.138981,0.082101,0.062967,0.025396,0.010437,3849.0,0.220779,3787.501305,0.216256,3.585348,0.0346,0.091713,0.259337,0.432628,0.081855,0.099867,0.704205,0.45044,0.54956
BIO_ddseq_2,ddS Bi2,BioRad,Bio-Rad ddSEQ SureCell,NovaSeq,210510517.0,5159,40804.519674,210510517.0,96.76,52.0,40.0,167.0,92.14,36.1,29141.0,8032.5,0.724897,18044.5,0.628184,34.345536,150411233.0,41479939.0,25786404.0,4170.0,0.80814,0.122495,9.595212,4.360995,5.403592,5.070142,2.059217,3.420191,2.954526,4.240905,5.835397,1.713694,6.830936,7.173773,2.702922,1.051003,2.36161,5.629897,3.770908,4.687666,5.317748,4.849987,3.718071,4.073848,0.379139,2.71724,0.006454,0.074927,60676228.0,6525.0,2.978202,7.880035,12781.0,5.566743,47.397641,3574.0,2.081978,4.233873,2973.0,1.304816,2.470523,12880.0,2.271212,4.827284,4331.0,2.187268,4.554423,,,,62368.0,8.90248,14.6007,12.1667,79716.0,9.98831,16.6494,14.316,84139.0,10.3295,20.8821,18.6925,57590.0,12.7619,22.9365,20.51,34108.0,4.77955,8.77622,5.97085,50243.0,10.6175,18.3217,15.8032,,,,,0.058525,0.106334,0.005103,0.040168,0.059927,,0.021725,0.04863,0.891785,0.938597,0.668422,0.953205,0.847092,,0.73599,0.839182,0.294884,0.437958,,0.203488,0.136702,0.365747,0.308532,0.291218,0.231163,0.305814,0.239362,0.322184,0.399919,,0.304057,0.300416,1.812306,8696.5,3.536877,8237.0,2.322224,12514.0,169.0,2.536558,343.0,0.281615,0.192165,0.233715,0.134655,0.07555,0.0823,0.41627,0.42588,0.15785,0.59763,0.40237,,,,,,,349.0,,,,0.067649,,,,0.757492,2347.0,996.0,645.0,347.0,373.0,75.0,28.0,2347.0,996.0,645.0,347.0,373.0,75.0,28.0,0.48784,0.207026,0.134068,0.072126,0.077531,0.015589,0.00582,0.48784,0.207026,0.134068,0.072126,0.077531,0.015589,0.00582,4972.0,0.190578,4808.57805,0.194312,3.696531,0.0324,0.076053,0.17704,0.517462,0.07455,0.122495,0.724223,0.37834,0.62166
BIO_ddseq_3,ddS Bi3,BioRad,Bio-Rad ddSEQ SureCell,NextSeq 500/550,114310431.0,2801,40810.578722,114310431.0,87.47,53.0,40.0,129.0,87.21,31.5,19452.5,6535.0,0.667785,11914.5,0.63251,34.950065,57853477.0,19229729.0,12084666.0,2480.0,0.885082,0.105718,9.436095,4.323692,5.349826,4.955942,2.032042,3.332963,2.964943,4.091392,5.537343,1.709889,6.676512,7.119247,2.714765,1.063793,2.345716,5.702627,3.713483,4.712656,5.222003,4.807887,3.714184,4.025692,2.141846,2.094054,0.112847,0.09856,36926843.0,5365.0,3.03524,8.197817,20354.0,3.9835,15.818051,8528.0,1.945128,3.850718,6614.0,1.32317,2.502153,20094.0,1.563074,2.954827,3078.0,1.560351,2.949256,19710.0,1.714335,3.281453,59383.0,5.51777,9.24088,6.67184,118783.0,7.3375,11.1054,8.86074,68448.0,11.6126,20.2385,17.7922,62817.0,6.35149,10.6779,8.11562,24160.0,4.90025,10.5431,7.33999,34775.0,5.77004,11.2693,8.39623,52855.0,4.7329,8.34186,5.67225,0.029213,0.024504,0.007033,0.010109,0.087114,0.048462,0.007077,0.030501,0.855937,0.911619,0.612935,0.855588,0.95053,0.843415,0.584831,0.802122,0.219535,0.559805,0.619889,0.465116,0.360106,0.589921,0.143345,0.422531,0.334419,0.397093,0.172872,0.108532,0.764849,0.432505,0.554161,0.394919,1.425794,11649.0,3.159674,11868.0,1.71814,12377.5,129.0,2.523746,333.0,0.352085,0.153485,0.194645,0.203455,0.042235,0.054095,0.55554,0.34813,0.09633,0.600825,0.399175,,,,,,,44.0,,,,0.015709,,,,0.749472,1165.0,352.0,706.0,263.0,98.0,123.0,51.0,1165.0,352.0,706.0,263.0,98.0,123.0,51.0,0.422408,0.127629,0.255983,0.095359,0.035533,0.044598,0.018492,0.422408,0.127629,0.255983,0.095359,0.035533,0.044598,0.018492,4122.0,0.127753,4262.026468,0.137995,3.615108,0.1253,0.111874,0.256717,0.337885,0.062506,0.105718,0.667613,0.371563,0.628437
BIO_ddseq_4,ddS Bi4,BioRad,Bio-Rad ddSEQ SureCell,NextSeq 500/550,108112191.0,2649,40812.454134,108112191.0,88.1,54.0,40.0,129.0,87.25,31.4,20164.0,7214.0,0.646847,12595.5,0.634784,34.643336,56499013.0,20122534.0,12670502.0,2436.0,0.919245,0.117198,9.38365,4.307199,5.369472,4.962759,2.048674,3.319182,2.977847,4.078061,5.565344,1.702085,6.669436,7.159297,2.700015,1.064107,2.312174,5.742102,3.728216,4.725525,5.239566,4.791919,3.708891,3.965734,2.242906,2.04289,0.103526,0.089423,37191572.0,5557.0,3.233812,9.407504,21271.0,4.684283,25.710451,9010.0,2.369463,5.167487,5469.0,1.144186,2.210214,6833.0,1.555194,2.938733,2427.0,1.319865,2.496428,20125.0,1.854752,3.616897,62871.0,5.48514,9.05721,6.51619,95985.0,10.7025,18.2239,15.8592,69504.0,11.6629,20.3463,17.8968,62970.0,6.35542,10.698,8.13167,25622.0,4.89007,10.3282,7.1919,40088.0,5.72032,10.7401,7.94823,48731.0,4.75946,8.57932,5.86075,0.033727,0.026697,0.008433,0.014046,0.040793,0.041192,0.011027,0.025131,0.871296,0.917942,0.610815,0.891691,0.942954,0.827313,0.59817,0.808597,0.227907,0.573434,0.613109,0.464535,0.330319,0.118151,0.127645,0.350729,0.377674,0.424419,0.137766,0.077816,0.794752,0.443189,0.221455,0.353867,1.42454,11057.0,3.78736,11399.0,1.918273,12071.0,128.0,2.518514,329.0,0.35487,0.1531,0.193885,0.203835,0.04131,0.053,0.558705,0.346985,0.09431,0.601755,0.398245,,,,,,,43.0,,,,0.016233,,,,0.787312,1049.0,343.0,657.0,266.0,133.0,135.0,24.0,1049.0,343.0,657.0,266.0,133.0,135.0,24.0,0.402378,0.131569,0.252014,0.102033,0.051016,0.051784,0.009206,0.402378,0.131569,0.252014,0.102033,0.051016,0.051784,0.009206,4611.0,0.127753,4752.200614,0.143042,3.663795,0.119,0.112328,0.246076,0.33647,0.068929,0.117198,0.643843,0.370333,0.629667
BRO_mtscatacfacs_1,mt* Br1,Broad,mtscATAC-seq (FACS),NextSeq 500/550,145886207.0,3575,40807.330629,145886207.0,94.66,72.0,72.0,138.0,87.55,33.1,28102.0,16066.5,0.416198,13485.0,0.499141,22.565713,112235277.0,62570052.0,30631898.0,27.0,0.00755,0.209971,7.39927,3.573151,3.86345,4.052749,2.009991,2.645604,2.209586,2.47154,3.443458,1.490426,3.524064,6.25278,1.760433,0.849338,1.287644,5.204424,3.735327,4.323057,4.914781,3.960367,3.218435,3.045364,22.558772,2.086616,0.044214,0.075159,68164589.0,11589.0,3.457201,10.983006,35362.0,3.969873,15.669342,15735.0,4.342172,20.282614,14935.0,2.294624,4.90626,25733.0,2.33667,5.051355,11068.0,3.574994,11.917366,32202.0,2.181804,4.537205,71752.0,7.24608,11.1991,8.69879,140588.0,9.19483,17.9608,15.8357,86972.0,10.7487,19.5833,17.2585,69004.0,8.67346,13.9942,11.5372,76971.0,4.58448,7.32543,4.73886,72265.0,8.66292,13.9674,11.5211,82629.0,7.38505,11.5869,9.10192,0.046508,0.045302,0.004139,0.076452,0.119403,0.138792,0.080151,0.072964,0.979192,0.994179,0.841267,0.998514,0.999604,0.995566,0.962912,0.967319,0.486512,0.730269,0.608588,0.794767,0.725532,0.452321,0.688737,0.640961,0.552093,0.917442,0.864894,0.845051,0.973759,0.961167,0.769971,0.840625,1.549457,17235.0,3.471522,14120.0,3.158546,20975.5,124.0,2.701568,502.0,0.33411,0.167075,0.173855,0.22776,0.04616,0.05104,0.56187,0.34093,0.0972,0.559005,0.440995,2321.5,-4132.53,1.584615,6.795181,4.288221,17.0,112.0,11.0,106.0,0.02965,0.031329,0.003077,0.160377,0.004755,0.886356,1159.0,387.0,1118.0,203.0,329.0,187.0,53.0,1159.0,387.0,1118.0,203.0,329.0,187.0,53.0,0.337311,0.112631,0.325378,0.05908,0.095751,0.054424,0.015425,0.337311,0.112631,0.325378,0.05908,0.095751,0.054424,0.015425,7814.0,0.076283,8514.084109,0.094029,3.892873,0.0534,0.117852,0.059414,0.340438,0.218925,0.209971,0.44251,0.510438,0.489562
BRO_mtscatacfacs_2,mt* Br2,Broad,mtscATAC-seq (FACS),NextSeq 500/550,138669154.0,3398,40809.050618,138669154.0,94.79,72.0,72.0,137.0,87.66,33.1,28097.0,15757.0,0.427015,13319.0,0.492458,22.60085,107640170.0,58711278.0,28379466.0,78.0,0.022948,0.204656,7.344691,3.495138,3.852343,4.073935,1.994657,2.66218,2.212414,2.465401,3.476962,1.487936,3.556427,6.263986,1.757498,0.836482,1.276383,5.15612,3.777046,4.281815,4.837845,3.952008,3.191981,3.017753,22.829009,2.080539,0.045054,0.074398,63590628.0,10205.0,3.685264,12.86397,34555.0,4.131158,17.522756,13696.0,5.430364,43.12236,13689.0,2.527599,5.766114,29381.0,2.282333,4.864638,9957.0,3.520353,11.47445,33478.0,2.115445,4.333237,58383.0,7.55611,12.1033,9.47154,134299.0,9.49346,18.4562,16.2998,87388.0,10.2697,18.3362,16.0152,72358.0,8.54835,13.6819,11.2411,50757.0,4.74212,8.42162,5.49876,67250.0,8.87158,14.516,12.0178,75278.0,7.56703,12.1379,9.58385,0.020548,0.04205,0.005134,0.058523,0.098036,0.106447,0.029954,0.051527,0.95797,0.99477,0.853577,0.997842,0.999604,0.993155,0.87438,0.953042,0.460465,0.73271,0.657489,0.736047,0.68617,0.624007,0.643686,0.648653,0.48186,0.919186,0.87766,0.857338,0.970098,0.981303,0.808239,0.84224,1.611469,17844.0,4.315528,13509.0,3.091603,22052.5,122.0,2.682145,480.0,0.341345,0.166895,0.17324,0.226265,0.04427,0.047985,0.56761,0.340135,0.092255,0.56257,0.43743,2271.0,-4021.92,1.418803,5.838095,4.114802,14.0,86.0,10.0,82.0,0.024132,0.025309,0.002943,0.170732,0.00412,0.877486,1152.0,458.0,1009.0,159.0,313.0,167.0,31.0,1152.0,458.0,1009.0,159.0,313.0,167.0,31.0,0.350258,0.139252,0.30678,0.048343,0.095166,0.050775,0.009425,0.350258,0.139252,0.30678,0.048343,0.095166,0.050775,0.009425,7623.0,0.086154,8207.707206,0.100562,3.882126,0.0521,0.116971,0.054692,0.352846,0.218735,0.204656,0.45456,0.516627,0.483373
CNA_10xmultiome_1,MO C1,CNAG,10x Multiome,NovaSeq 6000,144290077.0,3536,40806.017251,144290077.0,98.31,50.0,49.0,160.0,89.65,36.1,17525.0,9436.0,0.465085,10776.0,0.614667,26.790498,74343792.0,40013462.0,23541327.0,1.0,0.000283,0.163153,8.811367,4.568622,5.09691,4.791199,2.524914,3.36358,2.898304,3.676041,4.575629,2.078515,5.036825,7.670104,2.514806,1.117847,1.889046,6.134186,4.710056,5.071444,5.783579,5.056858,4.362314,4.01354,0.898299,3.149826,0.073079,0.133109,69495709.0,10153.0,4.480955,22.330679,26677.0,4.855123,28.942603,9882.0,2.793322,6.932243,9977.0,2.247478,4.748519,26793.0,1.86273,3.636953,7935.0,2.390883,5.244782,,,,74017.0,8.0911,12.9251,10.4722,110893.0,8.42797,13.4657,11.2234,105530.0,7.494595,11.69675,9.50254,79137.0,7.71749,11.967,9.62277,53436.0,4.72091,8.242,5.46733,63518.0,8.31053,13.5587,11.061,,,,,0.060284,0.118472,0.006784,0.056501,0.072338,,0.037196,0.058596,0.992703,0.998582,0.885925,0.997938,0.995174,,0.920943,0.965211,0.423256,0.688161,,0.601163,0.494149,0.692388,0.517406,0.56942,0.820465,0.805233,0.851596,0.869625,0.878153,,0.813885,0.839826,1.63516,16416.0,4.309478,11159.5,2.468669,22459.5,140.0,3.022016,1051.0,0.297725,0.20144,0.13916,0.22539,0.077245,0.05904,0.523115,0.3406,0.136285,0.495925,0.504075,1342.0,-2381.54,1.137592,2.346491,2.062682,37.0,83.0,126.0,172.0,0.048643,0.023473,0.035633,0.215116,0.010464,0.798464,1007.0,594.0,683.0,398.0,338.0,224.0,47.0,1007.0,594.0,683.0,398.0,338.0,224.0,47.0,0.305986,0.180492,0.207536,0.120936,0.102704,0.068064,0.014281,0.305986,0.180492,0.207536,0.120936,0.102704,0.068064,0.014281,5558.0,0.10299,6186.908235,0.120977,3.744919,0.0169,0.101751,0.366111,0.237926,0.11416,0.163153,0.461778,0.411665,0.588335
CNA_10xmultiome_2,MO C2,CNAG,10x Multiome,NovaSeq 6000,127397732.0,3122,40806.44843,127397732.0,98.4,50.0,49.0,165.0,89.36,36.1,9206.0,6513.0,0.29556,5540.0,0.598629,27.339666,53253408.0,37704678.0,19855965.0,0.0,0.0,0.155858,8.710613,4.522078,5.038773,4.787221,2.617213,3.289445,2.853234,3.514668,4.181309,2.259914,4.376614,7.852133,2.410743,1.134083,1.769664,6.418751,5.32577,5.456497,5.7227,5.088039,4.33506,4.016296,0.604701,3.473639,0.081215,0.159627,79907472.0,8305.0,3.367345,10.319811,27881.0,3.613343,12.238397,9858.0,2.139097,4.404862,6654.0,1.305278,2.471314,29568.0,1.841265,3.583241,7041.0,1.581533,2.992877,,,,56656.0,7.77314,12.8376,10.2322,109120.0,6.64137,9.74473,7.48082,107732.0,8.19668,14.8984,12.7007,75938.0,5.28575,8.12313,5.68933,39575.0,4.77589,8.73914,5.7548,62306.0,5.47667,9.01131,6.44149,,,,,0.030474,0.096578,0.011016,0.043254,0.069308,,0.018378,0.044835,0.946912,0.997902,0.783637,0.992089,0.990675,,0.843554,0.925795,0.228372,0.62144,,0.525581,0.223936,0.691133,0.337884,0.438058,0.637674,0.768605,0.633511,0.645734,0.910903,,0.846508,0.740489,1.54407,15800.0,4.046183,11224.0,1.956868,19126.0,141.0,3.17667,1501.0,0.283755,0.22261,0.12899,0.24033,0.07318,0.051135,0.524085,0.3516,0.124315,0.46388,0.53612,934.0,-1678.21,1.66782,3.095506,1.856019,1.0,11.0,155.0,165.0,0.052851,0.003523,0.049648,0.006061,0.00032,0.696506,875.0,560.0,670.0,282.0,278.0,233.0,58.0,875.0,560.0,670.0,282.0,278.0,233.0,58.0,0.296008,0.189445,0.226658,0.095399,0.094046,0.078823,0.019621,0.296008,0.189445,0.226658,0.095399,0.094046,0.078823,0.019621,3784.0,0.132075,6054.972598,0.144116,3.577951,0.016,0.104698,0.461293,0.122049,0.140102,0.155858,0.291976,0.473382,0.526618
CNA_10xv11_1,v1.1 C1,CNAG,10x v1.1,NovaSeq 6000,111532272.0,2733,40809.46652,111532272.0,98.19,50.0,49.0,161.0,90.03,36.4,13290.5,7207.0,0.460257,8731.0,0.652302,25.328764,45679754.0,23663663.0,14639346.0,37.0,0.013533,0.131257,8.863023,4.405321,4.914165,4.791968,2.870076,3.384918,2.94705,3.383294,3.830397,2.319741,3.715964,7.869801,2.442625,1.064677,1.749885,6.543615,5.370477,5.710715,5.674754,5.297088,4.430019,3.78455,0.591716,3.762033,0.086792,0.195333,56405087.0,11780.0,6.351004,81.628668,28396.0,4.089363,17.022401,14176.0,3.831668,14.237935,14288.0,2.056409,4.159497,,,,10381.0,3.512798,11.414518,28288.0,2.045707,4.128755,57249.0,9.44093,16.2705,13.5276,87076.0,8.89731,14.5865,12.0342,97618.0,8.78533,14.0663,11.6919,75004.0,6.23972,10.148,7.60316,,,,,77612.0,6.22181,10.0697,7.5056,47002.0,5.71907,10.728,7.74022,0.017494,0.065154,0.008682,0.073207,0.02041,0.035107,,0.036676,0.956121,0.996247,0.81081,0.995685,0.975895,0.911396,,0.941026,0.442791,0.659886,0.626464,0.722674,0.709574,,0.604778,0.627695,0.907442,0.913372,0.893617,0.879863,0.888731,0.935895,,0.903153,1.789991,16763.0,4.857912,13074.0,3.66062,22501.0,116.0,3.282396,1915.0,0.301795,0.210765,0.103135,0.273525,0.071325,0.039455,0.57532,0.3139,0.11078,0.444385,0.555615,1029.5,-1823.595,1.902913,6.191781,3.253844,12.0,110.0,26.0,124.0,0.045371,0.040249,0.009513,0.096774,0.004391,0.83296,848.0,329.0,673.0,330.0,263.0,124.0,19.0,848.0,329.0,673.0,330.0,263.0,124.0,19.0,0.32792,0.127224,0.260247,0.12761,0.101701,0.047951,0.007347,0.32792,0.127224,0.260247,0.12761,0.101701,0.047951,0.007347,4584.0,0.142857,5097.934648,0.15093,3.661245,0.0181,0.097895,0.474439,0.197397,0.080912,0.131257,0.481966,0.381358,0.618642
CNA_10xv11_2,v1.1 C2,CNAG,10x v1.1,NovaSeq 6000,113657151.0,2785,40810.467145,113657151.0,98.17,50.0,49.0,157.0,90.1,36.4,15031.5,7728.0,0.473173,9942.0,0.66365,25.734662,50135257.0,24869977.0,15894004.0,127.0,0.045585,0.139842,9.099497,4.504297,5.3833,4.759605,2.64951,3.233059,3.040234,3.425203,3.94579,2.222134,4.009687,7.535437,2.48467,1.091846,1.706855,6.580228,4.870331,5.768286,5.861875,5.03351,4.4616,3.87712,0.479816,3.718857,0.072027,0.185228,56102783.0,11251.0,5.074503,33.695952,28716.0,6.241048,75.638481,13879.0,4.240351,18.900483,12133.0,1.978538,3.940935,31150.0,1.728645,3.314163,8388.0,3.27836,9.702522,,,,52223.0,9.54195,16.6364,13.8729,108533.0,9.1104,14.7918,12.3496,94633.0,9.61892,15.8388,13.4112,56613.0,10.3933,18.4719,15.7667,22730.0,4.90307,10.6065,7.11743,72609.0,6.2737,10.3012,7.71425,,,,,0.010191,0.056797,0.000809,0.062194,0.042892,,0.00576,0.029774,0.951469,0.995834,0.740277,0.995205,0.991468,,0.603741,0.879666,0.420465,0.671277,,0.717442,0.618085,0.72752,0.486689,0.606913,0.867907,0.90814,0.871809,0.873038,0.890968,,0.869302,0.880194,1.695207,16899.0,4.219575,13165.0,3.615945,22782.5,114.0,3.263518,1833.5,0.30945,0.210295,0.101035,0.273765,0.06764,0.037815,0.583215,0.31133,0.105455,0.4483,0.5517,1093.0,-1934.45,1.503906,5.21978,3.470815,9.0,96.0,34.0,121.0,0.043447,0.03447,0.012208,0.07438,0.003232,0.813782,894.0,351.0,695.0,303.0,262.0,113.0,29.0,894.0,351.0,695.0,303.0,262.0,113.0,29.0,0.337741,0.132603,0.262561,0.114469,0.09898,0.04269,0.010956,0.337741,0.132603,0.262561,0.114469,0.09898,0.04269,0.010956,4951.0,0.114754,5439.018889,0.135385,3.694693,0.0183,0.097188,0.443402,0.222294,0.078974,0.139842,0.503942,0.360916,0.639084


In [103]:
df_stats[cols].to_csv("fixedcells_general_statistics.tsv", sep="\t")

# write smaller df to excel:

In [104]:
list(df_stats.columns)

['short_identifier',
 'centre',
 'technology',
 'sequencing_instrument',
 'reads',
 'cells',
 'RPC',
 'nreads',
 '%_correct_barcodes',
 'r1_length',
 'r2_length',
 'avg_insert_size',
 '%_mapq30',
 'avg_map_quality',
 'Median_total_nr_frag',
 'Median_unique_nr_frag',
 'Median_dupl_rate',
 'Median_total_nr_frag_in_regions',
 'Median_frip',
 'Median_tss_enrichment',
 'total_nr_frag_in_selected_barcodes',
 'total_nr_unique_frag_in_selected_barcodes',
 'total_nr_unique_frag_in_selected_barcodes_in_regions',
 'n_barcodes_merged',
 'frac_barcodes_merged',
 'efficiency',
 'chr1',
 'chr10',
 'chr11',
 'chr12',
 'chr13',
 'chr14',
 'chr15',
 'chr16',
 'chr17',
 'chr18',
 'chr19',
 'chr2',
 'chr20',
 'chr21',
 'chr22',
 'chr3',
 'chr4',
 'chr5',
 'chr6',
 'chr7',
 'chr8',
 'chr9',
 'chrM',
 'chrX',
 'chrY',
 'nonstandard',
 'total_fragments',
 'n_dars__B_cell',
 'top_2000_dars_median_logfc__B_cell',
 'top_2000_dars_median_fc__B_cell',
 'n_dars__CD14+_monocyte',
 'top_2000_dars_median_logfc__CD14+

In [105]:
cols_sorted = [
    "short_identifier",
    "centre",
    "technology",
    "sequencing_instrument",
    "reads",
    "cells",
    "RPC",
    "nreads",
    "%_correct_barcodes",
    "r1_length",
    "r2_length",
    "avg_insert_size",
    "%_mapq30",
    "avg_map_quality",
    "Median_total_nr_frag",
    "Median_unique_nr_frag",
    "Median_dupl_rate",
    "Median_total_nr_frag_in_regions",
    "Median_frip",
    "Median_tss_enrichment",
    "total_nr_frag_in_selected_barcodes",
    "total_nr_unique_frag_in_selected_barcodes",
    "total_nr_unique_frag_in_selected_barcodes_in_regions",
    "n_barcodes_merged",
    "frac_barcodes_merged",
    "No correct barcode",
    "Not mapped properly",
    "Fragments in background noise barcodes",
    "Duplicate fragments in cells",
    "Unique fragments in cells, not in peaks",
    "Unique fragments in cells and in peaks",
    "Duplicate fragments in cells, normalized to fragments in cells",
    "Unique fragments in cells, not in peaks, normalized to unique fragments in cells",
    "Unique fragments in cells and in peaks, normalized to unique fragments in cells",
    "Mean_scrublet_doublet_scores_fragments",
    "scr_exclusive_doublets",
    "fmx_exclusive_doublets",
    "common_doublets",
    "total_doublets",
    "total_doublets_pct",
    "scr_exclusive_doublets_pct",
    "fmx_exclusive_doublets_pct",
    "common_doublets_pct_of_doublets",
    "common_doublets_pct",
    "Median_Unique_nr_frag_in_regions",
    "Median_scrublet_doublet_scores_fragments",
    "fmx_n_snps",
    "fmx_best_llk",
    "ratio_cd4T_to_cd8T_in_male",
    "ratio_cd4T_to_cd8T_in_female",
    "ratio_cd4T_to_cd8T_normalized",
    "total_fragments",
    "chr1",
    "chr10",
    "chr11",
    "chr12",
    "chr13",
    "chr14",
    "chr15",
    "chr16",
    "chr17",
    "chr18",
    "chr19",
    "chr2",
    "chr20",
    "chr21",
    "chr22",
    "chr3",
    "chr4",
    "chr5",
    "chr6",
    "chr7",
    "chr8",
    "chr9",
    "chrM",
    "chrX",
    "chrY",
    "nonstandard",
    "nucleosome-free_proximal",
    "mononucleosomal_distal",
    "mononucleosomal_proximal",
    "nucleosome-free_distal",
    "multinucleosomal_distal",
    "multinucleosomal_proximal",
    "nucleosome-free",
    "mononucleosomal",
    "multinucleosomal",
    "proximal",
    "distal",
    "seurat_score",
    "n_seurat_cells__CD4+ T cell",
    "n_seurat_cells__Cytotoxic T cell",
    "n_seurat_cells__CD14+ monocyte",
    "n_seurat_cells__B cell",
    "n_seurat_cells__Natural killer cell",
    "n_seurat_cells__CD16+ monocyte",
    "n_seurat_cells__Dendritic cell",
    "n_consensus_cells__CD4+ T cell",
    "n_consensus_cells__Cytotoxic T cell",
    "n_consensus_cells__CD14+ monocyte",
    "n_consensus_cells__B cell",
    "n_consensus_cells__Natural killer cell",
    "n_consensus_cells__CD16+ monocyte",
    "n_consensus_cells__Dendritic cell",
    "pct_seurat_cells__CD4+_T_cell",
    "pct_seurat_cells__Cytotoxic_T_cell",
    "pct_seurat_cells__CD14+_monocyte",
    "pct_seurat_cells__B_cell",
    "pct_seurat_cells__Natural_killer_cell",
    "pct_seurat_cells__CD16+_monocyte",
    "pct_seurat_cells__Dendritic_cell",
    "pct_consensus_cells__CD4+_T_cell",
    "pct_consensus_cells__Cytotoxic_T_cell",
    "pct_consensus_cells__CD14+_monocyte",
    "pct_consensus_cells__B_cell",
    "pct_consensus_cells__Natural_killer_cell",
    "pct_consensus_cells__CD16+_monocyte",
    "pct_consensus_cells__Dendritic_cell",
    "alldars_median_dar_logfc",
    "alldars_median_dar_tss_dist",
    "allpeaks_median_peak_logfc",
    "allpeaks_median_peak_tss_dist",
    "top2kdars_median_dar_logfc",
    "top2kdars_median_dar_tss_dist",
    "median_frag_len",
    "median_log10_frag_dist_nearest_tss",
    "median_frag_dist_nearest_tss",
    "n_dars__B_cell",
    "top_2000_dars_median_logfc__B_cell",
    "top_2000_dars_median_fc__B_cell",
    "n_dars__CD14+_monocyte",
    "top_2000_dars_median_logfc__CD14+_monocyte",
    "top_2000_dars_median_fc__CD14+_monocyte",
    "n_dars__CD4+_T_cell",
    "top_2000_dars_median_logfc__CD4+_T_cell",
    "top_2000_dars_median_fc__CD4+_T_cell",
    "n_dars__Cytotoxic_T_cell",
    "top_2000_dars_median_logfc__Cytotoxic_T_cell",
    "top_2000_dars_median_fc__Cytotoxic_T_cell",
    "n_dars__Dendritic_cell",
    "top_2000_dars_median_logfc__Dendritic_cell",
    "top_2000_dars_median_fc__Dendritic_cell",
    "n_dars__Natural_killer_cell",
    "top_2000_dars_median_logfc__Natural_killer_cell",
    "top_2000_dars_median_fc__Natural_killer_cell",
    "n_dars__CD16+_monocyte",
    "top_2000_dars_median_logfc__CD16+_monocyte",
    "top_2000_dars_median_fc__CD16+_monocyte",
    "n_peaks__Bcell",
    "top10k_peaks_strength__Bcell",
    "n_peaks__CD14_monocyte",
    "top10k_peaks_strength__CD14_monocyte",
    "n_peaks__CD4_Tcell",
    "top10k_peaks_strength__CD4_Tcell",
    "n_peaks__CytotoxicTcell",
    "top10k_peaks_strength__CytotoxicTcell",
    "n_peaks__Dendriticcell",
    "top10k_peaks_strength__Dendriticcell",
    "n_peaks__Naturalkillercell",
    "top10k_peaks_strength__Naturalkillercell",
    "n_peaks__CD16_monocyte",
    "top10k_peaks_strength__CD16_monocyte",
]
rows_sorted = [
    "VIB_10xv1_1",
    "VIB_10xv1_2",
    "CNA_10xv11_1",
    "CNA_10xv11_2",
    "CNA_10xv11_3",
    "CNA_10xv11c_1",
    "CNA_10xv11c_2",
    "STA_10xv11_1",
    "STA_10xv11_2",
    "TXG_10xv11_1",
    "CNA_10xv2_1",
    "CNA_10xv2_2",
    "TXG_10xv2_1",
    "TXG_10xv2_2",
    "VIB_10xv2_1",
    "VIB_10xv2_2",
    "CNA_10xmultiome_1",
    "CNA_10xmultiome_2",
    "SAN_10xmultiome_1",
    "SAN_10xmultiome_2",
    "VIB_10xmultiome_1",
    "VIB_10xmultiome_2",
    "BRO_mtscatacfacs_1",
    "BRO_mtscatacfacs_2",
    "CNA_mtscatac_1",
    "CNA_mtscatac_2",
    "MDC_mtscatac_1",
    "MDC_mtscatac_2",
    "BIO_ddseq_1",
    "BIO_ddseq_2",
    "BIO_ddseq_3",
    "BIO_ddseq_4",
    "HAR_ddseq_1",
    "HAR_ddseq_2",
    "UCS_ddseq_1",
    "UCS_ddseq_2",
    "OHS_s3atac_1",
    "OHS_s3atac_2",
    "CNA_hydrop_1",
    "CNA_hydrop_2",
    "CNA_hydrop_3",
    "EPF_hydrop_1",
    "EPF_hydrop_2",
    "EPF_hydrop_3",
    "EPF_hydrop_4",
    "VIB_hydrop_1",
    "VIB_hydrop_2",
]

In [106]:
df_stats.loc[rows_sorted][cols_sorted].to_csv(
    "fixedcells_general_statistics_table.tsv", sep="\t", header=True
)

# check some things for text

In [107]:
df_stats = pd.read_csv("fixedcells_general_statistics.tsv", sep="\t", index_col=0)

In [108]:
df_stats.groupby("technology")["Median_frip"].median()

technology
10x Multiome              0.606648
10x v1                    0.574628
10x v1.1                  0.573633
10x v1.1 (control)        0.616573
10x v2                    0.619329
Bio-Rad ddSEQ SureCell    0.606599
HyDrop                    0.412306
mtscATAC-seq              0.390855
mtscATAC-seq (FACS)       0.495800
s3-ATAC                   0.189464
Name: Median_frip, dtype: float64

In [109]:
df_medians = df_stats.groupby("technology").median()
df_means = df_stats.groupby("technology").mean()

  df_medians = df_stats.groupby("technology").median()
  df_means = df_stats.groupby("technology").mean()


In [110]:
series = df_medians["No correct barcode"] + df_medians["Not mapped properly"]
series.sort_values()

technology
10x v2                    0.104695
10x Multiome              0.118274
10x v1.1                  0.120092
10x v1                    0.142169
mtscATAC-seq              0.142807
Bio-Rad ddSEQ SureCell    0.143326
10x v1.1 (control)        0.145505
s3-ATAC                   0.147050
mtscATAC-seq (FACS)       0.170161
HyDrop                    0.226784
dtype: float64

In [111]:
df_means["Fragments in background noise barcodes"].sort_values()

technology
mtscATAC-seq (FACS)       0.057053
10x v2                    0.068884
10x v1.1 (control)        0.178743
Bio-Rad ddSEQ SureCell    0.339764
mtscATAC-seq              0.361478
HyDrop                    0.396965
10x v1.1                  0.440424
10x Multiome              0.475981
10x v1                    0.493503
s3-ATAC                   0.624330
Name: Fragments in background noise barcodes, dtype: float64

In [112]:
df_means["Duplicate fragments in cells"].sort_values()

technology
s3-ATAC                   0.011772
10x v1                    0.090516
10x Multiome              0.121836
10x v1.1                  0.158550
mtscATAC-seq              0.189433
10x v1.1 (control)        0.260713
HyDrop                    0.289868
Bio-Rad ddSEQ SureCell    0.307599
mtscATAC-seq (FACS)       0.346642
10x v2                    0.376667
Name: Duplicate fragments in cells, dtype: float64

In [113]:
df_means["Duplicate fragments in cells, normalized to fragments in cells"].sort_values()

technology
s3-ATAC                   0.051093
10x v1                    0.247749
10x Multiome              0.314283
10x v1.1                  0.360317
mtscATAC-seq              0.368760
10x v1.1 (control)        0.385801
mtscATAC-seq (FACS)       0.448535
10x v2                    0.457507
Bio-Rad ddSEQ SureCell    0.579556
HyDrop                    0.716042
Name: Duplicate fragments in cells, normalized to fragments in cells, dtype: float64

In [114]:
df_means

Unnamed: 0_level_0,reads,cells,RPC,nreads,%_correct_barcodes,r1_length,r2_length,avg_insert_size,%_mapq30,avg_map_quality,Median_total_nr_frag,Median_unique_nr_frag,Median_dupl_rate,Median_total_nr_frag_in_regions,Median_frip,Median_tss_enrichment,total_nr_frag_in_selected_barcodes,total_nr_unique_frag_in_selected_barcodes,total_nr_unique_frag_in_selected_barcodes_in_regions,n_barcodes_merged,frac_barcodes_merged,efficiency,chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrM,chrX,chrY,nonstandard,total_fragments,n_dars__B_cell,top_2000_dars_median_logfc__B_cell,top_2000_dars_median_fc__B_cell,n_dars__CD14+_monocyte,top_2000_dars_median_logfc__CD14+_monocyte,top_2000_dars_median_fc__CD14+_monocyte,n_dars__CD4+_T_cell,top_2000_dars_median_logfc__CD4+_T_cell,top_2000_dars_median_fc__CD4+_T_cell,n_dars__Cytotoxic_T_cell,top_2000_dars_median_logfc__Cytotoxic_T_cell,top_2000_dars_median_fc__Cytotoxic_T_cell,n_dars__Dendritic_cell,top_2000_dars_median_logfc__Dendritic_cell,top_2000_dars_median_fc__Dendritic_cell,n_dars__Natural_killer_cell,top_2000_dars_median_logfc__Natural_killer_cell,top_2000_dars_median_fc__Natural_killer_cell,n_dars__CD16+_monocyte,top_2000_dars_median_logfc__CD16+_monocyte,top_2000_dars_median_fc__CD16+_monocyte,n_peaks__Bcell,top10k_peaks_strength__Bcell,top10k_peaks_pvalBcell,top10k_peaks_qvalBcell,n_peaks__CD14_monocyte,top10k_peaks_strength__CD14_monocyte,top10k_peaks_pvalCD14_monocyte,top10k_peaks_qvalCD14_monocyte,n_peaks__CD4_Tcell,top10k_peaks_strength__CD4_Tcell,top10k_peaks_pvalCD4_Tcell,top10k_peaks_qvalCD4_Tcell,n_peaks__CytotoxicTcell,top10k_peaks_strength__CytotoxicTcell,top10k_peaks_pvalCytotoxicTcell,top10k_peaks_qvalCytotoxicTcell,n_peaks__Dendriticcell,top10k_peaks_strength__Dendriticcell,top10k_peaks_pvalDendriticcell,top10k_peaks_qvalDendriticcell,n_peaks__Naturalkillercell,top10k_peaks_strength__Naturalkillercell,top10k_peaks_pvalNaturalkillercell,top10k_peaks_qvalNaturalkillercell,n_peaks__CD16_monocyte,top10k_peaks_strength__CD16_monocyte,top10k_peaks_pvalCD16_monocyte,top10k_peaks_qvalCD16_monocyte,B_cells_bot20peaks_recovery,Naive_T_cells_bot20peaks_recovery,Cytotoxic_T_cells_bot20peaks_recovery,NK_cells_bot20peaks_recovery,CD14+_monocytes_bot20peaks_recovery,CD16+_monocytes_bot20peaks_recovery,Dendritic_cells_bot20peaks_recovery,mean_bot20peaks_recovery,B_cells_top20peaks_recovery,Naive_T_cells_top20peaks_recovery,Cytotoxic_T_cells_top20peaks_recovery,NK_cells_top20peaks_recovery,CD14+_monocytes_top20peaks_recovery,CD16+_monocytes_top20peaks_recovery,Dendritic_cells_top20peaks_recovery,mean_top20peaks_recovery,B_cell,CD14+_monocyte,CD16+_monocyte,CD4+_T_cell,Cytotoxic_T_cell,Dendritic_cell,Natural_killer_cell,mean_bot20dars_recovery,B_cell_top20dars_recovery,CD4+_T_cell_top20dars_recovery,Cytotoxic_T_cell_top20dars_recovery,Natural_killer_cell_top20dars_recovery,CD14+_monocyte_top20dars_recovery,CD16+_monocyte_top20dars_recovery,Dendritic_cell_top20dars_recovery,mean_top20dars_recovery,alldars_median_dar_logfc,alldars_median_dar_tss_dist,allpeaks_median_peak_logfc,allpeaks_median_peak_tss_dist,top2kdars_median_dar_logfc,top2kdars_median_dar_tss_dist,median_frag_len,median_log10_frag_dist_nearest_tss,median_frag_dist_nearest_tss,nucleosome-free_proximal,mononucleosomal_distal,mononucleosomal_proximal,nucleosome-free_distal,multinucleosomal_distal,multinucleosomal_proximal,nucleosome-free,mononucleosomal,multinucleosomal,proximal,distal,fmx_n_snps,fmx_best_llk,ratio_cd4T_to_cd8T_in_male,ratio_cd4T_to_cd8T_in_female,ratio_cd4T_to_cd8T_normalized,common_doublets,scr_exclusive_doublets,fmx_exclusive_doublets,total_doublets,total_doublets_pct,scr_exclusive_doublets_pct,fmx_exclusive_doublets_pct,common_doublets_pct_of_doublets,common_doublets_pct,seurat_score,n_seurat_cells__CD4+ T cell,n_seurat_cells__Cytotoxic T cell,n_seurat_cells__CD14+ monocyte,n_seurat_cells__B cell,n_seurat_cells__Natural killer cell,n_seurat_cells__CD16+ monocyte,n_seurat_cells__Dendritic cell,n_consensus_cells__CD4+ T cell,n_consensus_cells__Cytotoxic T cell,n_consensus_cells__CD14+ monocyte,n_consensus_cells__B cell,n_consensus_cells__Natural killer cell,n_consensus_cells__CD16+ monocyte,n_consensus_cells__Dendritic cell,pct_seurat_cells__CD4+_T_cell,pct_seurat_cells__Cytotoxic_T_cell,pct_seurat_cells__CD14+_monocyte,pct_seurat_cells__B_cell,pct_seurat_cells__Natural_killer_cell,pct_seurat_cells__CD16+_monocyte,pct_seurat_cells__Dendritic_cell,pct_consensus_cells__CD4+_T_cell,pct_consensus_cells__Cytotoxic_T_cell,pct_consensus_cells__CD14+_monocyte,pct_consensus_cells__B_cell,pct_consensus_cells__Natural_killer_cell,pct_consensus_cells__CD16+_monocyte,pct_consensus_cells__Dendritic_cell,Median_Unique_nr_frag_in_regions,Median_scrublet_doublet_scores_fragments,Mean_Unique_nr_frag_in_regions,Mean_scrublet_doublet_scores_fragments,log_median_unique_nr_frag_in_regions,No correct barcode,Not mapped properly,Fragments in background noise barcodes,Duplicate fragments in cells,"Unique fragments in cells, not in peaks",Unique fragments in cells and in peaks,"Duplicate fragments in cells, normalized to fragments in cells","Unique fragments in cells, not in peaks, normalized to unique fragments in cells","Unique fragments in cells and in peaks, normalized to unique fragments in cells"
technology,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1
10x Multiome,119959700.0,2939.5,40812.778572,119959700.0,98.34,50.0,49.333333,154.166667,88.906667,35.266667,12007.916667,8253.25,0.317008,7445.0,0.627268,27.420777,51842920.0,36736810.0,20144260.0,2.0,0.001,0.155331,8.679054,4.559346,5.023091,4.80649,2.679279,3.273391,2.87592,3.504152,4.235366,2.196235,4.345046,7.896188,2.456925,1.133086,1.805338,6.350941,5.149256,5.400382,5.753036,5.144264,4.427587,4.008075,0.518056,3.568133,0.058083,0.15328,75215080.0,8778.833333,3.650628,15.24877,27118.5,5.196613,57.755976,10293.0,2.45614,5.853249,8663.333333,1.941707,4.23222,27543.0,1.783172,3.4496,7780.166667,2.060109,4.431792,28507.0,1.548363,2.925856,66500.0,6.923773,11.6085,9.005018,120946.333333,6.915557,11.000442,8.743733,100733.833333,7.309692,11.963488,9.665245,73837.0,6.53134,10.454973,7.993813,52368.333333,3.826087,7.114133,4.20813,55271.5,7.032808,12.268852,9.57782,79899.5,4.13136,6.588815,3.97199,0.065418,0.121888,0.008692,0.042021,0.107869,0.110983,0.029113,0.066623,0.917065,0.978045,0.7644,0.963226,0.989828,0.919253,0.813068,0.910363,0.326279,0.649512,0.651633,0.57093,0.435816,0.700404,0.464619,0.51496,0.610543,0.72655,0.700709,0.692833,0.924871,0.913088,0.781612,0.739728,1.529713,14954.5,4.867215,10528.166667,2.34758,20062.916667,133.666667,3.238684,1811.666667,0.284242,0.232417,0.121244,0.260032,0.061126,0.040939,0.544274,0.353661,0.102065,0.446425,0.553575,1143.416667,-2035.123333,1.454155,3.214377,2.19107,17.833333,69.0,117.0,168.166667,0.051107,0.022644,0.033801,0.108565,0.005339,0.775975,838.166667,474.666667,697.833333,309.166667,238.833333,145.5,32.5,838.166667,474.666667,697.833333,309.166667,238.833333,145.5,32.5,0.311136,0.177626,0.250905,0.113351,0.084779,0.051156,0.011047,0.311136,0.177626,0.250905,0.113351,0.084779,0.051156,0.011047,4911.5,0.133535,5942.001174,0.143526,3.671993,0.0166,0.109044,0.475981,0.121836,0.121209,0.155331,0.314283,0.418831,0.581169
10x v1,223355800.0,5474.0,40804.996912,223355800.0,97.47,49.5,48.5,142.0,88.01,35.75,13606.5,10437.0,0.228808,7781.5,0.574628,27.586278,80929280.0,63175630.0,33088310.0,167.0,0.036866,0.153879,8.699992,4.678451,4.996829,4.818669,2.659091,3.245977,3.121626,3.445122,4.079677,2.215567,3.826054,7.805889,2.591135,1.117223,1.84437,6.455723,5.083834,5.499495,5.690889,5.152362,4.542071,4.075172,0.173021,3.973948,0.043061,0.164751,156721300.0,9495.5,4.528985,24.11531,26612.0,3.88061,14.733206,12429.5,3.999683,16.179691,10560.0,2.06706,4.195917,25008.0,1.878451,3.802616,8454.0,3.263179,9.613974,26039.5,1.94212,3.843975,68851.0,10.13578,17.3695,14.82055,131095.0,8.25383,14.46515,12.23205,99405.0,10.724035,19.9774,17.6763,66772.0,11.2153,19.55815,17.0215,38584.5,4.794975,9.09405,5.968105,72282.0,8.923835,15.1763,12.67285,67340.0,6.588055,10.873375,8.23375,0.023785,0.055058,0.001291,0.056741,0.098023,0.081117,0.018515,0.04779,0.989596,0.997902,0.825731,0.998705,0.996177,0.987665,0.808662,0.943491,0.457209,0.685313,0.644237,0.722965,0.626064,0.634463,0.633447,0.6291,0.734186,0.900291,0.742287,0.773038,0.932974,0.905897,0.805625,0.827757,1.51541,15272.5,4.363642,11854.5,3.16844,20687.5,121.5,3.417839,2752.25,0.263508,0.245655,0.104038,0.310085,0.04675,0.029965,0.573592,0.349692,0.076715,0.39751,0.60249,1428.5,-2533.575,1.51754,6.125321,4.107177,55.0,247.0,68.0,260.0,0.039601,0.037543,0.010103,0.192364,0.008044,0.84179,1850.0,633.5,1487.5,471.5,388.0,225.5,49.0,1850.0,633.5,1487.5,471.5,388.0,225.5,49.0,0.355397,0.123772,0.288594,0.095664,0.081611,0.0442,0.010761,0.355397,0.123772,0.288594,0.095664,0.081611,0.0442,0.010761,5730.0,0.121331,6017.274345,0.132137,3.756657,0.0253,0.116869,0.493503,0.090516,0.119933,0.153879,0.247749,0.432239,0.567761
10x v1.1,153426700.0,3759.833333,40814.022457,153426700.0,97.32,44.666667,44.0,163.833333,88.46,35.666667,15049.583333,9664.083333,0.344837,8764.166667,0.591613,21.689454,74304870.0,45174320.0,25079990.0,114.833333,0.035725,0.146895,8.798268,4.613747,5.019644,4.880559,2.790413,3.302268,2.963668,3.341668,3.93447,2.271415,3.777765,7.921172,2.473747,1.118841,1.715561,6.570233,5.233488,5.679963,5.826984,5.121418,4.534233,3.937809,0.375163,3.550759,0.08904,0.157703,85368670.0,10738.0,4.543963,31.448677,31179.666667,4.408332,30.494062,15249.166667,3.279669,10.965191,9982.4,1.670591,3.385995,29746.0,2.043578,4.18013,10813.166667,2.62252,7.217645,28572.75,1.614652,3.211501,61242.0,7.553733,12.95782,10.215747,122481.833333,7.881062,13.384065,11.06665,96314.0,7.983357,14.192347,11.827277,74845.0,7.32982,12.650296,10.11098,48022.0,5.411287,9.702973,6.710627,73407.333333,6.754925,11.240293,8.677978,59066.75,5.477965,9.522527,6.671683,0.04232,0.108105,0.012005,0.088317,0.126551,0.106052,0.040432,0.072359,0.893655,0.977001,0.758396,0.985121,0.983782,0.907048,0.805978,0.911415,0.402636,0.630119,0.526865,0.689922,0.495426,0.707305,0.52025,0.555271,0.752636,0.836822,0.685638,0.719681,0.92399,0.872149,0.854594,0.801631,1.578221,16142.583333,5.917894,12405.0,3.022299,20743.333333,132.166667,3.372941,2495.75,0.260889,0.255446,0.109631,0.271418,0.06369,0.038927,0.532307,0.365077,0.102617,0.409447,0.590553,1149.6,-2033.678,1.773031,4.819369,2.92591,15.333333,165.833333,29.0,166.0,0.047928,0.042437,0.008038,0.08999,0.004351,0.78901,1327.833333,517.833333,950.0,293.0,300.166667,134.5,39.833333,1327.833333,517.833333,950.0,293.0,300.166667,134.5,39.833333,0.324939,0.142168,0.297511,0.09107,0.08818,0.045004,0.011128,0.324939,0.142168,0.297511,0.09107,0.08818,0.045004,0.011128,5398.5,0.135307,5738.575914,0.146583,3.72684,0.0268,0.112383,0.440424,0.15855,0.114948,0.146895,0.360317,0.435052,0.564948
10x v1.1 (control),39089160.0,957.0,40846.35355,39089160.0,94.67,50.0,49.0,147.5,90.26,35.7,24495.0,15136.75,0.377602,14847.25,0.616573,22.161807,26416480.0,16127790.0,9387216.0,29.0,0.030213,0.240739,8.991017,4.5481,5.088905,5.084235,2.510873,3.256398,2.886746,3.479445,4.558585,2.022032,4.949176,7.554881,2.392723,1.093087,1.881531,6.37156,4.638388,5.288181,5.853258,4.999443,4.305488,3.896829,1.034045,3.103833,0.082166,0.129075,21463020.0,12018.5,5.067743,33.678152,30726.0,4.864324,30.426475,14301.0,3.000636,8.237899,11939.5,1.800729,3.530351,32320.0,1.611463,3.055616,10565.5,3.000771,8.14019,28707.0,1.90009,3.732365,77012.0,4.99277,8.056325,5.426745,77871.0,7.45402,11.8306,9.20289,80217.5,8.117335,13.00755,10.4958,69093.0,5.502,9.19067,6.521755,12753.0,3.94475,8.79978,5.00697,88134.5,4.90659,7.57198,5.07607,25111.0,3.88635,7.52627,4.2766,0.053081,0.031783,0.007593,0.085151,0.018337,0.011777,0.002359,0.033835,0.936476,0.975652,0.710505,0.992664,0.9278,0.612003,0.331883,0.83584,0.415349,0.643714,0.567495,0.649128,0.457979,0.681723,0.496246,0.547837,0.85186,0.852907,0.856383,0.881911,0.879373,0.912061,0.829569,0.865542,1.667947,16801.25,5.029629,13246.25,2.999511,22296.75,89.5,2.887544,771.25,0.37365,0.163942,0.111943,0.263868,0.05083,0.035768,0.637518,0.275885,0.086597,0.52136,0.47864,2053.0,-3627.5525,1.616225,4.70872,3.078501,1.0,40.5,1.0,43.0,0.038704,0.043186,0.0009,0.023256,0.0009,0.806329,301.5,147.5,208.0,95.5,108.0,42.0,13.5,301.5,147.5,208.0,95.5,108.0,42.0,13.5,0.329186,0.161177,0.225463,0.103378,0.120272,0.046392,0.014132,0.329186,0.161177,0.225463,0.103378,0.120272,0.046392,0.014132,8917.0,0.115414,9548.32934,0.128977,3.950152,0.0533,0.092205,0.178743,0.260713,0.1743,0.240739,0.385801,0.419502,0.580498
10x v2,248004000.0,6078.166667,40804.06631,248004000.0,96.425,50.0,49.333333,125.5,92.876667,35.083333,30602.416667,16875.083333,0.446616,18540.25,0.606718,25.653297,205711900.0,112076400.0,66827720.0,58.5,0.009874,0.274071,9.688992,4.313576,5.238689,5.160749,2.113559,3.452086,3.017191,3.950236,5.616973,1.661556,6.486593,7.164204,2.623328,1.085721,2.249386,5.814137,3.825912,4.694604,5.657141,4.74436,3.726129,3.987428,1.297318,2.303388,0.068897,0.057847,124248700.0,10563.5,4.599764,30.768687,28683.5,4.04954,17.572366,12573.166667,3.843324,14.790813,9587.166667,2.182843,4.585667,22641.666667,2.42649,5.456198,8858.333333,2.921503,7.773719,28610.833333,2.322634,5.045191,119690.333333,8.924248,14.56145,12.188167,185376.333333,8.383427,15.57585,13.488583,177057.0,7.969137,14.867883,12.781233,117301.0,8.996418,15.361883,13.033717,94038.833333,5.37708,8.865525,6.241183,109011.666667,8.43563,13.648617,11.268963,109858.166667,7.123042,11.43594,8.979295,0.143357,0.290004,0.018308,0.156062,0.243075,0.23208,0.104263,0.169593,0.997626,0.998335,0.929532,0.998745,0.997638,0.99403,0.978508,0.984916,0.547597,0.729692,0.71728,0.699419,0.529078,0.518228,0.625825,0.623874,0.698837,0.913663,0.753812,0.74471,0.951383,0.950414,0.760456,0.824754,1.695574,15651.083333,2.608754,12339.583333,2.898188,21483.75,91.833333,2.805643,659.0,0.399779,0.152575,0.116372,0.273668,0.032205,0.0254,0.673447,0.268948,0.057605,0.541552,0.458448,1935.125,-3435.41875,1.528606,5.190643,3.314806,27.75,215.5,136.75,174.25,0.036801,0.027677,0.027995,0.380708,0.006888,0.856857,2388.166667,839.0,1353.666667,457.166667,441.166667,198.833333,76.0,2388.166667,839.0,1353.666667,457.166667,441.166667,198.833333,76.0,0.396333,0.143401,0.231665,0.091644,0.081699,0.039867,0.01539,0.396333,0.143401,0.231665,0.091644,0.081699,0.039867,0.01539,10021.333333,0.091341,10685.281356,0.105738,3.985101,0.03575,0.068689,0.068884,0.376667,0.17594,0.274071,0.457507,0.391402,0.608598
Bio-Rad ddSEQ SureCell,186814600.0,4578.25,40805.712,186814600.0,94.18875,53.375,40.0,166.625,89.23625,33.9125,19774.625,7640.875,0.572865,11312.1875,0.573795,32.584607,94242020.0,36381940.0,20485730.0,3489.625,0.774647,0.110428,9.315665,4.500101,5.288669,4.962542,2.27181,3.345344,3.052085,3.951298,5.149566,1.921824,5.641178,7.417292,2.72999,1.09709,2.190199,5.961255,4.169971,5.010121,5.407558,4.922782,4.002801,4.083384,0.772149,2.645661,0.078546,0.111116,75154490.0,6480.875,3.433588,11.392936,15885.625,4.859652,30.798422,6734.875,2.325093,5.186835,4348.5,1.397993,2.660669,13375.2,1.922344,3.879387,4809.0,2.0775,4.386577,17211.5,2.11318,4.443513,68688.125,7.454151,11.948374,9.491049,92092.375,8.969455,14.518413,12.177371,87693.75,9.925846,18.21375,15.94295,66279.75,8.888894,14.873563,12.417736,33289.8,4.805418,9.261222,6.325484,51912.125,7.807655,13.37498,10.783463,49427.5,4.935975,8.6788,5.992275,0.063413,0.085654,0.00676,0.034216,0.054177,0.052308,0.020968,0.046492,0.905097,0.943535,0.689486,0.934482,0.897718,0.822706,0.687214,0.854685,0.318779,0.492448,0.599188,0.366642,0.194082,0.371435,0.301962,0.359935,0.380872,0.448765,0.319215,0.357338,0.555965,0.335422,0.361188,0.408367,1.682191,10419.0,3.195962,10550.0625,2.363849,14437.625,157.125,2.852111,998.875,0.284198,0.231882,0.184087,0.169538,0.072237,0.058057,0.453736,0.415969,0.130294,0.526343,0.473658,1138.0,-2029.1125,1.792266,7.669374,4.257123,40.5,214.5,101.0,235.0,0.047514,0.041426,0.020716,0.226444,0.008424,0.736218,1845.625,915.25,748.75,403.75,280.5,101.75,33.75,1845.625,915.25,748.75,403.75,280.5,101.75,33.75,0.424209,0.201462,0.182137,0.094605,0.062839,0.026353,0.008394,0.424209,0.201462,0.182137,0.094605,0.062839,0.026353,0.008394,4228.5,0.1528,4369.095741,0.162605,3.616789,0.058112,0.100893,0.339764,0.307599,0.083202,0.110428,0.579556,0.429532,0.570468
HyDrop,104307900.0,2555.555556,40816.978115,104307900.0,92.19579,49.111111,48.555556,137.46336,84.764735,33.959095,11024.611111,3066.277778,0.666506,4058.222222,0.384974,25.740686,43758290.0,10329490.0,4066965.0,95.888889,0.044394,0.036071,7.7814,4.048376,4.478108,4.206186,2.091933,2.818381,2.538492,3.084351,3.75558,1.763169,3.956397,6.691957,2.210403,0.90008,1.61463,5.397582,3.989725,4.597443,4.806988,4.347181,3.674746,3.406324,14.785973,2.847923,0.078112,0.128559,32612200.0,9613.0,2.80797,7.650951,26574.555556,4.108777,44.899319,8274.333333,1.677867,3.412092,8992.444444,1.663342,3.530948,27322.0,1.271719,2.414491,2892.0,1.10034,2.144053,25409.5,1.500935,2.831509,35080.75,5.403668,10.552339,7.427826,79288.111111,7.313942,11.923731,9.250444,45683.777778,6.910654,12.322019,9.456214,52135.111111,5.763943,9.805294,6.992193,19411.0,4.9219,11.0822,7.57322,27240.0,5.84317,12.2946,9.1344,37302.0,3.43808,7.42607,4.272265,0.005365,0.004332,0.003993,0.003192,0.011501,0.015525,0.002524,0.006539,0.617253,0.713673,0.470098,0.813684,0.835707,0.683967,0.543432,0.646307,0.118339,0.498034,0.510787,0.329134,0.205437,0.551025,0.18157,0.295682,0.333555,0.17261,0.135343,0.049147,0.781778,0.740497,0.654747,0.357308,1.888254,17917.222222,5.440017,16608.555556,2.160944,20360.333333,120.777778,2.941691,1672.0,0.356249,0.212629,0.135764,0.25593,0.033966,0.016726,0.612179,0.348394,0.039427,0.505022,0.494978,445.611111,-623.355,0.982315,1.190614,1.376194,10.75,50.714286,21.428571,98.75,0.03301,0.018424,0.007063,0.110511,0.003607,0.56868,688.444444,666.555556,730.222222,246.555556,66.333333,61.5,19.888889,688.444444,666.555556,730.222222,246.555556,66.333333,61.5,19.888889,0.277346,0.281232,0.285612,0.102214,0.025334,0.023422,0.007443,0.277346,0.281232,0.285612,0.102214,0.025334,0.023422,0.007443,1180.166667,0.146521,1443.605021,0.158922,3.016767,0.078042,0.140686,0.396965,0.289868,0.058368,0.036071,0.716042,0.636084,0.363916
mtscATAC-seq,177704600.0,4355.0,40811.68799,177704600.0,96.225,100.0,99.5,170.5,89.24,35.975,16599.0,10353.875,0.352933,6205.25,0.373141,19.331241,102721100.0,61598910.0,22453040.0,82.5,0.024461,0.112336,7.514587,4.185708,4.250147,4.449735,2.770506,2.806026,2.278383,2.401299,2.752144,2.178676,2.424287,7.611365,1.818823,1.005108,1.084544,6.338271,5.426231,5.521675,5.615435,4.811245,4.304458,3.439603,11.515116,3.220574,0.0902,0.185851,96621140.0,12687.75,4.259815,19.527287,27147.25,5.707271,58.942093,10094.75,2.991618,9.05656,9856.5,3.148026,10.486524,27082.333333,2.213617,4.729327,10386.666667,2.378752,5.284084,31223.0,2.203352,4.605482,63104.5,7.41245,11.95533,9.263045,108605.5,6.474642,11.486273,8.99536,85361.25,7.969597,14.29605,11.798745,71548.0,8.565455,14.662125,12.09445,56039.666667,5.475337,8.680517,5.754423,58512.666667,5.976707,9.290627,6.61578,47819.0,6.16848,9.84575,6.91705,0.039528,0.060626,0.006574,0.054604,0.078018,0.040009,0.052227,0.042666,0.880083,0.921104,0.790265,0.921401,0.863137,0.92668,0.798916,0.857044,0.477907,0.627034,0.716252,0.505959,0.316223,0.51192,0.512856,0.492314,0.839884,0.815407,0.791489,0.823891,0.725641,0.94247,0.717273,0.787138,1.909305,18471.5,5.750097,16857.875,3.246071,23759.0,140.75,3.324885,2244.625,0.273729,0.263266,0.121665,0.237581,0.073674,0.030085,0.51131,0.384931,0.103759,0.425479,0.574521,1859.625,-3303.81875,1.192683,2.316332,1.986525,21.5,229.75,34.75,429.0,0.060783,0.049416,0.004375,0.037566,0.002717,0.7019,1613.75,847.75,768.5,433.75,295.0,59.5,62.5,1613.75,847.75,768.5,433.75,295.0,59.5,62.5,0.392487,0.238976,0.161285,0.110653,0.071285,0.010709,0.014605,0.392487,0.238976,0.161285,0.110653,0.071285,0.010709,0.014605,3775.75,0.136579,4390.938478,0.1488,3.556659,0.03775,0.103585,0.361478,0.189433,0.195419,0.112336,0.36876,0.635342,0.364658
mtscATAC-seq (FACS),142277700.0,3486.5,40808.190624,142277700.0,94.725,72.0,72.0,137.5,87.605,33.1,28099.5,15911.75,0.421607,13402.0,0.4958,22.583282,109937700.0,60640660.0,29505680.0,52.5,0.015249,0.207314,7.37198,3.534145,3.857897,4.063342,2.002324,2.653892,2.211,2.46847,3.46021,1.489181,3.540245,6.258383,1.758966,0.84291,1.282013,5.180272,3.756186,4.302436,4.876313,3.956188,3.205208,3.031559,22.69389,2.083577,0.044634,0.074779,65877610.0,10897.0,3.571233,11.923488,34958.5,4.050515,16.596049,14715.5,4.886268,31.702487,14312.0,2.411112,5.336187,27557.0,2.309501,4.957997,10512.5,3.547673,11.695908,32840.0,2.148624,4.435221,65067.5,7.401095,11.6512,9.085165,137443.5,9.344145,18.2085,16.06775,87180.0,10.5092,18.95975,16.63685,70681.0,8.610905,13.83805,11.38915,63864.0,4.6633,7.873525,5.11881,69757.5,8.76725,14.2417,11.76945,78953.5,7.47604,11.8624,9.342885,0.033528,0.043676,0.004637,0.067488,0.108719,0.122619,0.055053,0.062246,0.968581,0.994474,0.847422,0.998178,0.999604,0.99436,0.918646,0.960181,0.473488,0.731489,0.633039,0.765407,0.705851,0.538164,0.666212,0.644807,0.516977,0.918314,0.871277,0.851195,0.971928,0.971235,0.789105,0.841433,1.580463,17539.5,3.893525,13814.5,3.125075,21514.0,123.0,2.691857,491.0,0.337728,0.166985,0.173548,0.227013,0.045215,0.049513,0.56474,0.340532,0.094727,0.560787,0.439212,2296.25,-4077.225,1.501709,6.316638,4.201511,15.5,99.0,10.5,94.0,0.026891,0.028319,0.00301,0.165555,0.004438,0.881921,1155.5,422.5,1063.5,181.0,321.0,177.0,42.0,1155.5,422.5,1063.5,181.0,321.0,177.0,42.0,0.343785,0.125942,0.316079,0.053712,0.095458,0.0526,0.012425,0.343785,0.125942,0.316079,0.053712,0.095458,0.0526,0.012425,7718.5,0.081218,8360.895658,0.097296,3.8875,0.05275,0.117411,0.057053,0.346642,0.21883,0.207314,0.448535,0.513533,0.486467
s3-ATAC,100124900.0,2453.5,40810.477724,100124900.0,100.0,87.5,78.5,175.5,85.295,33.65,6161.5,5874.0,0.046748,1278.25,0.189464,10.506113,24103070.0,22849650.0,4709004.0,0.0,0.0,0.041354,6.8942,4.506719,4.514061,4.565762,3.768505,2.83016,2.016727,2.217868,1.917898,2.771693,1.479768,8.96015,1.654076,1.174311,0.795536,7.694018,7.813656,6.823699,6.394886,5.746424,5.434452,3.667189,1.746234,4.054592,0.174887,0.382526,74480110.0,9810.5,1.822969,4.479852,17559.0,2.287631,5.882499,10733.0,1.553125,2.934521,6903.0,1.113593,2.163839,,,,4660.0,1.096677,2.138615,,,,42178.0,4.478005,6.11959,2.92378,54664.5,4.646555,6.5045,3.48333,42687.0,5.04531,7.146555,4.10779,43763.0,5.06938,6.97267,4.101455,,,,,38592.0,5.05578,7.29903,4.27823,,,,,0.048334,0.020666,0.015543,0.039849,0.060432,,,0.033531,0.459993,0.51702,0.39868,0.788225,0.520235,,,0.474503,0.106977,0.193145,,0.362791,0.204787,,0.161092,0.137721,0.36907,0.445349,0.315957,0.233447,0.410496,,,0.25759,1.097237,207255.5,11.438046,100589.5,1.316102,175706.75,146.0,3.770083,5999.5,0.161385,0.309318,0.07275,0.333902,0.098998,0.023648,0.495287,0.382068,0.122645,0.257783,0.742217,1700.0,-2999.05,2.682657,5.330435,1.986998,,27.5,0.0,,,0.012879,0.0,,,0.541532,807.0,609.0,463.0,414.0,129.0,2.0,3.0,807.0,609.0,463.0,414.0,129.0,2.0,3.0,0.314906,0.274923,0.17906,0.180458,0.048544,0.000893,0.001215,0.314906,0.274923,0.17906,0.180458,0.048544,0.000893,0.001215,1203.0,0.182909,1656.364481,0.183973,3.025496,0.0,0.14705,0.62433,0.011772,0.175493,0.041354,0.051093,0.821242,0.178758


In [115]:
df_means[
    "Unique fragments in cells, not in peaks, normalized to unique fragments in cells"
].sort_values()

technology
10x v2                    0.391402
10x Multiome              0.418831
10x v1.1 (control)        0.419502
Bio-Rad ddSEQ SureCell    0.429532
10x v1                    0.432239
10x v1.1                  0.435052
mtscATAC-seq (FACS)       0.513533
mtscATAC-seq              0.635342
HyDrop                    0.636084
s3-ATAC                   0.821242
Name: Unique fragments in cells, not in peaks, normalized to unique fragments in cells, dtype: float64

In [116]:
df_means

Unnamed: 0_level_0,reads,cells,RPC,nreads,%_correct_barcodes,r1_length,r2_length,avg_insert_size,%_mapq30,avg_map_quality,Median_total_nr_frag,Median_unique_nr_frag,Median_dupl_rate,Median_total_nr_frag_in_regions,Median_frip,Median_tss_enrichment,total_nr_frag_in_selected_barcodes,total_nr_unique_frag_in_selected_barcodes,total_nr_unique_frag_in_selected_barcodes_in_regions,n_barcodes_merged,frac_barcodes_merged,efficiency,chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrM,chrX,chrY,nonstandard,total_fragments,n_dars__B_cell,top_2000_dars_median_logfc__B_cell,top_2000_dars_median_fc__B_cell,n_dars__CD14+_monocyte,top_2000_dars_median_logfc__CD14+_monocyte,top_2000_dars_median_fc__CD14+_monocyte,n_dars__CD4+_T_cell,top_2000_dars_median_logfc__CD4+_T_cell,top_2000_dars_median_fc__CD4+_T_cell,n_dars__Cytotoxic_T_cell,top_2000_dars_median_logfc__Cytotoxic_T_cell,top_2000_dars_median_fc__Cytotoxic_T_cell,n_dars__Dendritic_cell,top_2000_dars_median_logfc__Dendritic_cell,top_2000_dars_median_fc__Dendritic_cell,n_dars__Natural_killer_cell,top_2000_dars_median_logfc__Natural_killer_cell,top_2000_dars_median_fc__Natural_killer_cell,n_dars__CD16+_monocyte,top_2000_dars_median_logfc__CD16+_monocyte,top_2000_dars_median_fc__CD16+_monocyte,n_peaks__Bcell,top10k_peaks_strength__Bcell,top10k_peaks_pvalBcell,top10k_peaks_qvalBcell,n_peaks__CD14_monocyte,top10k_peaks_strength__CD14_monocyte,top10k_peaks_pvalCD14_monocyte,top10k_peaks_qvalCD14_monocyte,n_peaks__CD4_Tcell,top10k_peaks_strength__CD4_Tcell,top10k_peaks_pvalCD4_Tcell,top10k_peaks_qvalCD4_Tcell,n_peaks__CytotoxicTcell,top10k_peaks_strength__CytotoxicTcell,top10k_peaks_pvalCytotoxicTcell,top10k_peaks_qvalCytotoxicTcell,n_peaks__Dendriticcell,top10k_peaks_strength__Dendriticcell,top10k_peaks_pvalDendriticcell,top10k_peaks_qvalDendriticcell,n_peaks__Naturalkillercell,top10k_peaks_strength__Naturalkillercell,top10k_peaks_pvalNaturalkillercell,top10k_peaks_qvalNaturalkillercell,n_peaks__CD16_monocyte,top10k_peaks_strength__CD16_monocyte,top10k_peaks_pvalCD16_monocyte,top10k_peaks_qvalCD16_monocyte,B_cells_bot20peaks_recovery,Naive_T_cells_bot20peaks_recovery,Cytotoxic_T_cells_bot20peaks_recovery,NK_cells_bot20peaks_recovery,CD14+_monocytes_bot20peaks_recovery,CD16+_monocytes_bot20peaks_recovery,Dendritic_cells_bot20peaks_recovery,mean_bot20peaks_recovery,B_cells_top20peaks_recovery,Naive_T_cells_top20peaks_recovery,Cytotoxic_T_cells_top20peaks_recovery,NK_cells_top20peaks_recovery,CD14+_monocytes_top20peaks_recovery,CD16+_monocytes_top20peaks_recovery,Dendritic_cells_top20peaks_recovery,mean_top20peaks_recovery,B_cell,CD14+_monocyte,CD16+_monocyte,CD4+_T_cell,Cytotoxic_T_cell,Dendritic_cell,Natural_killer_cell,mean_bot20dars_recovery,B_cell_top20dars_recovery,CD4+_T_cell_top20dars_recovery,Cytotoxic_T_cell_top20dars_recovery,Natural_killer_cell_top20dars_recovery,CD14+_monocyte_top20dars_recovery,CD16+_monocyte_top20dars_recovery,Dendritic_cell_top20dars_recovery,mean_top20dars_recovery,alldars_median_dar_logfc,alldars_median_dar_tss_dist,allpeaks_median_peak_logfc,allpeaks_median_peak_tss_dist,top2kdars_median_dar_logfc,top2kdars_median_dar_tss_dist,median_frag_len,median_log10_frag_dist_nearest_tss,median_frag_dist_nearest_tss,nucleosome-free_proximal,mononucleosomal_distal,mononucleosomal_proximal,nucleosome-free_distal,multinucleosomal_distal,multinucleosomal_proximal,nucleosome-free,mononucleosomal,multinucleosomal,proximal,distal,fmx_n_snps,fmx_best_llk,ratio_cd4T_to_cd8T_in_male,ratio_cd4T_to_cd8T_in_female,ratio_cd4T_to_cd8T_normalized,common_doublets,scr_exclusive_doublets,fmx_exclusive_doublets,total_doublets,total_doublets_pct,scr_exclusive_doublets_pct,fmx_exclusive_doublets_pct,common_doublets_pct_of_doublets,common_doublets_pct,seurat_score,n_seurat_cells__CD4+ T cell,n_seurat_cells__Cytotoxic T cell,n_seurat_cells__CD14+ monocyte,n_seurat_cells__B cell,n_seurat_cells__Natural killer cell,n_seurat_cells__CD16+ monocyte,n_seurat_cells__Dendritic cell,n_consensus_cells__CD4+ T cell,n_consensus_cells__Cytotoxic T cell,n_consensus_cells__CD14+ monocyte,n_consensus_cells__B cell,n_consensus_cells__Natural killer cell,n_consensus_cells__CD16+ monocyte,n_consensus_cells__Dendritic cell,pct_seurat_cells__CD4+_T_cell,pct_seurat_cells__Cytotoxic_T_cell,pct_seurat_cells__CD14+_monocyte,pct_seurat_cells__B_cell,pct_seurat_cells__Natural_killer_cell,pct_seurat_cells__CD16+_monocyte,pct_seurat_cells__Dendritic_cell,pct_consensus_cells__CD4+_T_cell,pct_consensus_cells__Cytotoxic_T_cell,pct_consensus_cells__CD14+_monocyte,pct_consensus_cells__B_cell,pct_consensus_cells__Natural_killer_cell,pct_consensus_cells__CD16+_monocyte,pct_consensus_cells__Dendritic_cell,Median_Unique_nr_frag_in_regions,Median_scrublet_doublet_scores_fragments,Mean_Unique_nr_frag_in_regions,Mean_scrublet_doublet_scores_fragments,log_median_unique_nr_frag_in_regions,No correct barcode,Not mapped properly,Fragments in background noise barcodes,Duplicate fragments in cells,"Unique fragments in cells, not in peaks",Unique fragments in cells and in peaks,"Duplicate fragments in cells, normalized to fragments in cells","Unique fragments in cells, not in peaks, normalized to unique fragments in cells","Unique fragments in cells and in peaks, normalized to unique fragments in cells"
technology,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1
10x Multiome,119959700.0,2939.5,40812.778572,119959700.0,98.34,50.0,49.333333,154.166667,88.906667,35.266667,12007.916667,8253.25,0.317008,7445.0,0.627268,27.420777,51842920.0,36736810.0,20144260.0,2.0,0.001,0.155331,8.679054,4.559346,5.023091,4.80649,2.679279,3.273391,2.87592,3.504152,4.235366,2.196235,4.345046,7.896188,2.456925,1.133086,1.805338,6.350941,5.149256,5.400382,5.753036,5.144264,4.427587,4.008075,0.518056,3.568133,0.058083,0.15328,75215080.0,8778.833333,3.650628,15.24877,27118.5,5.196613,57.755976,10293.0,2.45614,5.853249,8663.333333,1.941707,4.23222,27543.0,1.783172,3.4496,7780.166667,2.060109,4.431792,28507.0,1.548363,2.925856,66500.0,6.923773,11.6085,9.005018,120946.333333,6.915557,11.000442,8.743733,100733.833333,7.309692,11.963488,9.665245,73837.0,6.53134,10.454973,7.993813,52368.333333,3.826087,7.114133,4.20813,55271.5,7.032808,12.268852,9.57782,79899.5,4.13136,6.588815,3.97199,0.065418,0.121888,0.008692,0.042021,0.107869,0.110983,0.029113,0.066623,0.917065,0.978045,0.7644,0.963226,0.989828,0.919253,0.813068,0.910363,0.326279,0.649512,0.651633,0.57093,0.435816,0.700404,0.464619,0.51496,0.610543,0.72655,0.700709,0.692833,0.924871,0.913088,0.781612,0.739728,1.529713,14954.5,4.867215,10528.166667,2.34758,20062.916667,133.666667,3.238684,1811.666667,0.284242,0.232417,0.121244,0.260032,0.061126,0.040939,0.544274,0.353661,0.102065,0.446425,0.553575,1143.416667,-2035.123333,1.454155,3.214377,2.19107,17.833333,69.0,117.0,168.166667,0.051107,0.022644,0.033801,0.108565,0.005339,0.775975,838.166667,474.666667,697.833333,309.166667,238.833333,145.5,32.5,838.166667,474.666667,697.833333,309.166667,238.833333,145.5,32.5,0.311136,0.177626,0.250905,0.113351,0.084779,0.051156,0.011047,0.311136,0.177626,0.250905,0.113351,0.084779,0.051156,0.011047,4911.5,0.133535,5942.001174,0.143526,3.671993,0.0166,0.109044,0.475981,0.121836,0.121209,0.155331,0.314283,0.418831,0.581169
10x v1,223355800.0,5474.0,40804.996912,223355800.0,97.47,49.5,48.5,142.0,88.01,35.75,13606.5,10437.0,0.228808,7781.5,0.574628,27.586278,80929280.0,63175630.0,33088310.0,167.0,0.036866,0.153879,8.699992,4.678451,4.996829,4.818669,2.659091,3.245977,3.121626,3.445122,4.079677,2.215567,3.826054,7.805889,2.591135,1.117223,1.84437,6.455723,5.083834,5.499495,5.690889,5.152362,4.542071,4.075172,0.173021,3.973948,0.043061,0.164751,156721300.0,9495.5,4.528985,24.11531,26612.0,3.88061,14.733206,12429.5,3.999683,16.179691,10560.0,2.06706,4.195917,25008.0,1.878451,3.802616,8454.0,3.263179,9.613974,26039.5,1.94212,3.843975,68851.0,10.13578,17.3695,14.82055,131095.0,8.25383,14.46515,12.23205,99405.0,10.724035,19.9774,17.6763,66772.0,11.2153,19.55815,17.0215,38584.5,4.794975,9.09405,5.968105,72282.0,8.923835,15.1763,12.67285,67340.0,6.588055,10.873375,8.23375,0.023785,0.055058,0.001291,0.056741,0.098023,0.081117,0.018515,0.04779,0.989596,0.997902,0.825731,0.998705,0.996177,0.987665,0.808662,0.943491,0.457209,0.685313,0.644237,0.722965,0.626064,0.634463,0.633447,0.6291,0.734186,0.900291,0.742287,0.773038,0.932974,0.905897,0.805625,0.827757,1.51541,15272.5,4.363642,11854.5,3.16844,20687.5,121.5,3.417839,2752.25,0.263508,0.245655,0.104038,0.310085,0.04675,0.029965,0.573592,0.349692,0.076715,0.39751,0.60249,1428.5,-2533.575,1.51754,6.125321,4.107177,55.0,247.0,68.0,260.0,0.039601,0.037543,0.010103,0.192364,0.008044,0.84179,1850.0,633.5,1487.5,471.5,388.0,225.5,49.0,1850.0,633.5,1487.5,471.5,388.0,225.5,49.0,0.355397,0.123772,0.288594,0.095664,0.081611,0.0442,0.010761,0.355397,0.123772,0.288594,0.095664,0.081611,0.0442,0.010761,5730.0,0.121331,6017.274345,0.132137,3.756657,0.0253,0.116869,0.493503,0.090516,0.119933,0.153879,0.247749,0.432239,0.567761
10x v1.1,153426700.0,3759.833333,40814.022457,153426700.0,97.32,44.666667,44.0,163.833333,88.46,35.666667,15049.583333,9664.083333,0.344837,8764.166667,0.591613,21.689454,74304870.0,45174320.0,25079990.0,114.833333,0.035725,0.146895,8.798268,4.613747,5.019644,4.880559,2.790413,3.302268,2.963668,3.341668,3.93447,2.271415,3.777765,7.921172,2.473747,1.118841,1.715561,6.570233,5.233488,5.679963,5.826984,5.121418,4.534233,3.937809,0.375163,3.550759,0.08904,0.157703,85368670.0,10738.0,4.543963,31.448677,31179.666667,4.408332,30.494062,15249.166667,3.279669,10.965191,9982.4,1.670591,3.385995,29746.0,2.043578,4.18013,10813.166667,2.62252,7.217645,28572.75,1.614652,3.211501,61242.0,7.553733,12.95782,10.215747,122481.833333,7.881062,13.384065,11.06665,96314.0,7.983357,14.192347,11.827277,74845.0,7.32982,12.650296,10.11098,48022.0,5.411287,9.702973,6.710627,73407.333333,6.754925,11.240293,8.677978,59066.75,5.477965,9.522527,6.671683,0.04232,0.108105,0.012005,0.088317,0.126551,0.106052,0.040432,0.072359,0.893655,0.977001,0.758396,0.985121,0.983782,0.907048,0.805978,0.911415,0.402636,0.630119,0.526865,0.689922,0.495426,0.707305,0.52025,0.555271,0.752636,0.836822,0.685638,0.719681,0.92399,0.872149,0.854594,0.801631,1.578221,16142.583333,5.917894,12405.0,3.022299,20743.333333,132.166667,3.372941,2495.75,0.260889,0.255446,0.109631,0.271418,0.06369,0.038927,0.532307,0.365077,0.102617,0.409447,0.590553,1149.6,-2033.678,1.773031,4.819369,2.92591,15.333333,165.833333,29.0,166.0,0.047928,0.042437,0.008038,0.08999,0.004351,0.78901,1327.833333,517.833333,950.0,293.0,300.166667,134.5,39.833333,1327.833333,517.833333,950.0,293.0,300.166667,134.5,39.833333,0.324939,0.142168,0.297511,0.09107,0.08818,0.045004,0.011128,0.324939,0.142168,0.297511,0.09107,0.08818,0.045004,0.011128,5398.5,0.135307,5738.575914,0.146583,3.72684,0.0268,0.112383,0.440424,0.15855,0.114948,0.146895,0.360317,0.435052,0.564948
10x v1.1 (control),39089160.0,957.0,40846.35355,39089160.0,94.67,50.0,49.0,147.5,90.26,35.7,24495.0,15136.75,0.377602,14847.25,0.616573,22.161807,26416480.0,16127790.0,9387216.0,29.0,0.030213,0.240739,8.991017,4.5481,5.088905,5.084235,2.510873,3.256398,2.886746,3.479445,4.558585,2.022032,4.949176,7.554881,2.392723,1.093087,1.881531,6.37156,4.638388,5.288181,5.853258,4.999443,4.305488,3.896829,1.034045,3.103833,0.082166,0.129075,21463020.0,12018.5,5.067743,33.678152,30726.0,4.864324,30.426475,14301.0,3.000636,8.237899,11939.5,1.800729,3.530351,32320.0,1.611463,3.055616,10565.5,3.000771,8.14019,28707.0,1.90009,3.732365,77012.0,4.99277,8.056325,5.426745,77871.0,7.45402,11.8306,9.20289,80217.5,8.117335,13.00755,10.4958,69093.0,5.502,9.19067,6.521755,12753.0,3.94475,8.79978,5.00697,88134.5,4.90659,7.57198,5.07607,25111.0,3.88635,7.52627,4.2766,0.053081,0.031783,0.007593,0.085151,0.018337,0.011777,0.002359,0.033835,0.936476,0.975652,0.710505,0.992664,0.9278,0.612003,0.331883,0.83584,0.415349,0.643714,0.567495,0.649128,0.457979,0.681723,0.496246,0.547837,0.85186,0.852907,0.856383,0.881911,0.879373,0.912061,0.829569,0.865542,1.667947,16801.25,5.029629,13246.25,2.999511,22296.75,89.5,2.887544,771.25,0.37365,0.163942,0.111943,0.263868,0.05083,0.035768,0.637518,0.275885,0.086597,0.52136,0.47864,2053.0,-3627.5525,1.616225,4.70872,3.078501,1.0,40.5,1.0,43.0,0.038704,0.043186,0.0009,0.023256,0.0009,0.806329,301.5,147.5,208.0,95.5,108.0,42.0,13.5,301.5,147.5,208.0,95.5,108.0,42.0,13.5,0.329186,0.161177,0.225463,0.103378,0.120272,0.046392,0.014132,0.329186,0.161177,0.225463,0.103378,0.120272,0.046392,0.014132,8917.0,0.115414,9548.32934,0.128977,3.950152,0.0533,0.092205,0.178743,0.260713,0.1743,0.240739,0.385801,0.419502,0.580498
10x v2,248004000.0,6078.166667,40804.06631,248004000.0,96.425,50.0,49.333333,125.5,92.876667,35.083333,30602.416667,16875.083333,0.446616,18540.25,0.606718,25.653297,205711900.0,112076400.0,66827720.0,58.5,0.009874,0.274071,9.688992,4.313576,5.238689,5.160749,2.113559,3.452086,3.017191,3.950236,5.616973,1.661556,6.486593,7.164204,2.623328,1.085721,2.249386,5.814137,3.825912,4.694604,5.657141,4.74436,3.726129,3.987428,1.297318,2.303388,0.068897,0.057847,124248700.0,10563.5,4.599764,30.768687,28683.5,4.04954,17.572366,12573.166667,3.843324,14.790813,9587.166667,2.182843,4.585667,22641.666667,2.42649,5.456198,8858.333333,2.921503,7.773719,28610.833333,2.322634,5.045191,119690.333333,8.924248,14.56145,12.188167,185376.333333,8.383427,15.57585,13.488583,177057.0,7.969137,14.867883,12.781233,117301.0,8.996418,15.361883,13.033717,94038.833333,5.37708,8.865525,6.241183,109011.666667,8.43563,13.648617,11.268963,109858.166667,7.123042,11.43594,8.979295,0.143357,0.290004,0.018308,0.156062,0.243075,0.23208,0.104263,0.169593,0.997626,0.998335,0.929532,0.998745,0.997638,0.99403,0.978508,0.984916,0.547597,0.729692,0.71728,0.699419,0.529078,0.518228,0.625825,0.623874,0.698837,0.913663,0.753812,0.74471,0.951383,0.950414,0.760456,0.824754,1.695574,15651.083333,2.608754,12339.583333,2.898188,21483.75,91.833333,2.805643,659.0,0.399779,0.152575,0.116372,0.273668,0.032205,0.0254,0.673447,0.268948,0.057605,0.541552,0.458448,1935.125,-3435.41875,1.528606,5.190643,3.314806,27.75,215.5,136.75,174.25,0.036801,0.027677,0.027995,0.380708,0.006888,0.856857,2388.166667,839.0,1353.666667,457.166667,441.166667,198.833333,76.0,2388.166667,839.0,1353.666667,457.166667,441.166667,198.833333,76.0,0.396333,0.143401,0.231665,0.091644,0.081699,0.039867,0.01539,0.396333,0.143401,0.231665,0.091644,0.081699,0.039867,0.01539,10021.333333,0.091341,10685.281356,0.105738,3.985101,0.03575,0.068689,0.068884,0.376667,0.17594,0.274071,0.457507,0.391402,0.608598
Bio-Rad ddSEQ SureCell,186814600.0,4578.25,40805.712,186814600.0,94.18875,53.375,40.0,166.625,89.23625,33.9125,19774.625,7640.875,0.572865,11312.1875,0.573795,32.584607,94242020.0,36381940.0,20485730.0,3489.625,0.774647,0.110428,9.315665,4.500101,5.288669,4.962542,2.27181,3.345344,3.052085,3.951298,5.149566,1.921824,5.641178,7.417292,2.72999,1.09709,2.190199,5.961255,4.169971,5.010121,5.407558,4.922782,4.002801,4.083384,0.772149,2.645661,0.078546,0.111116,75154490.0,6480.875,3.433588,11.392936,15885.625,4.859652,30.798422,6734.875,2.325093,5.186835,4348.5,1.397993,2.660669,13375.2,1.922344,3.879387,4809.0,2.0775,4.386577,17211.5,2.11318,4.443513,68688.125,7.454151,11.948374,9.491049,92092.375,8.969455,14.518413,12.177371,87693.75,9.925846,18.21375,15.94295,66279.75,8.888894,14.873563,12.417736,33289.8,4.805418,9.261222,6.325484,51912.125,7.807655,13.37498,10.783463,49427.5,4.935975,8.6788,5.992275,0.063413,0.085654,0.00676,0.034216,0.054177,0.052308,0.020968,0.046492,0.905097,0.943535,0.689486,0.934482,0.897718,0.822706,0.687214,0.854685,0.318779,0.492448,0.599188,0.366642,0.194082,0.371435,0.301962,0.359935,0.380872,0.448765,0.319215,0.357338,0.555965,0.335422,0.361188,0.408367,1.682191,10419.0,3.195962,10550.0625,2.363849,14437.625,157.125,2.852111,998.875,0.284198,0.231882,0.184087,0.169538,0.072237,0.058057,0.453736,0.415969,0.130294,0.526343,0.473658,1138.0,-2029.1125,1.792266,7.669374,4.257123,40.5,214.5,101.0,235.0,0.047514,0.041426,0.020716,0.226444,0.008424,0.736218,1845.625,915.25,748.75,403.75,280.5,101.75,33.75,1845.625,915.25,748.75,403.75,280.5,101.75,33.75,0.424209,0.201462,0.182137,0.094605,0.062839,0.026353,0.008394,0.424209,0.201462,0.182137,0.094605,0.062839,0.026353,0.008394,4228.5,0.1528,4369.095741,0.162605,3.616789,0.058112,0.100893,0.339764,0.307599,0.083202,0.110428,0.579556,0.429532,0.570468
HyDrop,104307900.0,2555.555556,40816.978115,104307900.0,92.19579,49.111111,48.555556,137.46336,84.764735,33.959095,11024.611111,3066.277778,0.666506,4058.222222,0.384974,25.740686,43758290.0,10329490.0,4066965.0,95.888889,0.044394,0.036071,7.7814,4.048376,4.478108,4.206186,2.091933,2.818381,2.538492,3.084351,3.75558,1.763169,3.956397,6.691957,2.210403,0.90008,1.61463,5.397582,3.989725,4.597443,4.806988,4.347181,3.674746,3.406324,14.785973,2.847923,0.078112,0.128559,32612200.0,9613.0,2.80797,7.650951,26574.555556,4.108777,44.899319,8274.333333,1.677867,3.412092,8992.444444,1.663342,3.530948,27322.0,1.271719,2.414491,2892.0,1.10034,2.144053,25409.5,1.500935,2.831509,35080.75,5.403668,10.552339,7.427826,79288.111111,7.313942,11.923731,9.250444,45683.777778,6.910654,12.322019,9.456214,52135.111111,5.763943,9.805294,6.992193,19411.0,4.9219,11.0822,7.57322,27240.0,5.84317,12.2946,9.1344,37302.0,3.43808,7.42607,4.272265,0.005365,0.004332,0.003993,0.003192,0.011501,0.015525,0.002524,0.006539,0.617253,0.713673,0.470098,0.813684,0.835707,0.683967,0.543432,0.646307,0.118339,0.498034,0.510787,0.329134,0.205437,0.551025,0.18157,0.295682,0.333555,0.17261,0.135343,0.049147,0.781778,0.740497,0.654747,0.357308,1.888254,17917.222222,5.440017,16608.555556,2.160944,20360.333333,120.777778,2.941691,1672.0,0.356249,0.212629,0.135764,0.25593,0.033966,0.016726,0.612179,0.348394,0.039427,0.505022,0.494978,445.611111,-623.355,0.982315,1.190614,1.376194,10.75,50.714286,21.428571,98.75,0.03301,0.018424,0.007063,0.110511,0.003607,0.56868,688.444444,666.555556,730.222222,246.555556,66.333333,61.5,19.888889,688.444444,666.555556,730.222222,246.555556,66.333333,61.5,19.888889,0.277346,0.281232,0.285612,0.102214,0.025334,0.023422,0.007443,0.277346,0.281232,0.285612,0.102214,0.025334,0.023422,0.007443,1180.166667,0.146521,1443.605021,0.158922,3.016767,0.078042,0.140686,0.396965,0.289868,0.058368,0.036071,0.716042,0.636084,0.363916
mtscATAC-seq,177704600.0,4355.0,40811.68799,177704600.0,96.225,100.0,99.5,170.5,89.24,35.975,16599.0,10353.875,0.352933,6205.25,0.373141,19.331241,102721100.0,61598910.0,22453040.0,82.5,0.024461,0.112336,7.514587,4.185708,4.250147,4.449735,2.770506,2.806026,2.278383,2.401299,2.752144,2.178676,2.424287,7.611365,1.818823,1.005108,1.084544,6.338271,5.426231,5.521675,5.615435,4.811245,4.304458,3.439603,11.515116,3.220574,0.0902,0.185851,96621140.0,12687.75,4.259815,19.527287,27147.25,5.707271,58.942093,10094.75,2.991618,9.05656,9856.5,3.148026,10.486524,27082.333333,2.213617,4.729327,10386.666667,2.378752,5.284084,31223.0,2.203352,4.605482,63104.5,7.41245,11.95533,9.263045,108605.5,6.474642,11.486273,8.99536,85361.25,7.969597,14.29605,11.798745,71548.0,8.565455,14.662125,12.09445,56039.666667,5.475337,8.680517,5.754423,58512.666667,5.976707,9.290627,6.61578,47819.0,6.16848,9.84575,6.91705,0.039528,0.060626,0.006574,0.054604,0.078018,0.040009,0.052227,0.042666,0.880083,0.921104,0.790265,0.921401,0.863137,0.92668,0.798916,0.857044,0.477907,0.627034,0.716252,0.505959,0.316223,0.51192,0.512856,0.492314,0.839884,0.815407,0.791489,0.823891,0.725641,0.94247,0.717273,0.787138,1.909305,18471.5,5.750097,16857.875,3.246071,23759.0,140.75,3.324885,2244.625,0.273729,0.263266,0.121665,0.237581,0.073674,0.030085,0.51131,0.384931,0.103759,0.425479,0.574521,1859.625,-3303.81875,1.192683,2.316332,1.986525,21.5,229.75,34.75,429.0,0.060783,0.049416,0.004375,0.037566,0.002717,0.7019,1613.75,847.75,768.5,433.75,295.0,59.5,62.5,1613.75,847.75,768.5,433.75,295.0,59.5,62.5,0.392487,0.238976,0.161285,0.110653,0.071285,0.010709,0.014605,0.392487,0.238976,0.161285,0.110653,0.071285,0.010709,0.014605,3775.75,0.136579,4390.938478,0.1488,3.556659,0.03775,0.103585,0.361478,0.189433,0.195419,0.112336,0.36876,0.635342,0.364658
mtscATAC-seq (FACS),142277700.0,3486.5,40808.190624,142277700.0,94.725,72.0,72.0,137.5,87.605,33.1,28099.5,15911.75,0.421607,13402.0,0.4958,22.583282,109937700.0,60640660.0,29505680.0,52.5,0.015249,0.207314,7.37198,3.534145,3.857897,4.063342,2.002324,2.653892,2.211,2.46847,3.46021,1.489181,3.540245,6.258383,1.758966,0.84291,1.282013,5.180272,3.756186,4.302436,4.876313,3.956188,3.205208,3.031559,22.69389,2.083577,0.044634,0.074779,65877610.0,10897.0,3.571233,11.923488,34958.5,4.050515,16.596049,14715.5,4.886268,31.702487,14312.0,2.411112,5.336187,27557.0,2.309501,4.957997,10512.5,3.547673,11.695908,32840.0,2.148624,4.435221,65067.5,7.401095,11.6512,9.085165,137443.5,9.344145,18.2085,16.06775,87180.0,10.5092,18.95975,16.63685,70681.0,8.610905,13.83805,11.38915,63864.0,4.6633,7.873525,5.11881,69757.5,8.76725,14.2417,11.76945,78953.5,7.47604,11.8624,9.342885,0.033528,0.043676,0.004637,0.067488,0.108719,0.122619,0.055053,0.062246,0.968581,0.994474,0.847422,0.998178,0.999604,0.99436,0.918646,0.960181,0.473488,0.731489,0.633039,0.765407,0.705851,0.538164,0.666212,0.644807,0.516977,0.918314,0.871277,0.851195,0.971928,0.971235,0.789105,0.841433,1.580463,17539.5,3.893525,13814.5,3.125075,21514.0,123.0,2.691857,491.0,0.337728,0.166985,0.173548,0.227013,0.045215,0.049513,0.56474,0.340532,0.094727,0.560787,0.439212,2296.25,-4077.225,1.501709,6.316638,4.201511,15.5,99.0,10.5,94.0,0.026891,0.028319,0.00301,0.165555,0.004438,0.881921,1155.5,422.5,1063.5,181.0,321.0,177.0,42.0,1155.5,422.5,1063.5,181.0,321.0,177.0,42.0,0.343785,0.125942,0.316079,0.053712,0.095458,0.0526,0.012425,0.343785,0.125942,0.316079,0.053712,0.095458,0.0526,0.012425,7718.5,0.081218,8360.895658,0.097296,3.8875,0.05275,0.117411,0.057053,0.346642,0.21883,0.207314,0.448535,0.513533,0.486467
s3-ATAC,100124900.0,2453.5,40810.477724,100124900.0,100.0,87.5,78.5,175.5,85.295,33.65,6161.5,5874.0,0.046748,1278.25,0.189464,10.506113,24103070.0,22849650.0,4709004.0,0.0,0.0,0.041354,6.8942,4.506719,4.514061,4.565762,3.768505,2.83016,2.016727,2.217868,1.917898,2.771693,1.479768,8.96015,1.654076,1.174311,0.795536,7.694018,7.813656,6.823699,6.394886,5.746424,5.434452,3.667189,1.746234,4.054592,0.174887,0.382526,74480110.0,9810.5,1.822969,4.479852,17559.0,2.287631,5.882499,10733.0,1.553125,2.934521,6903.0,1.113593,2.163839,,,,4660.0,1.096677,2.138615,,,,42178.0,4.478005,6.11959,2.92378,54664.5,4.646555,6.5045,3.48333,42687.0,5.04531,7.146555,4.10779,43763.0,5.06938,6.97267,4.101455,,,,,38592.0,5.05578,7.29903,4.27823,,,,,0.048334,0.020666,0.015543,0.039849,0.060432,,,0.033531,0.459993,0.51702,0.39868,0.788225,0.520235,,,0.474503,0.106977,0.193145,,0.362791,0.204787,,0.161092,0.137721,0.36907,0.445349,0.315957,0.233447,0.410496,,,0.25759,1.097237,207255.5,11.438046,100589.5,1.316102,175706.75,146.0,3.770083,5999.5,0.161385,0.309318,0.07275,0.333902,0.098998,0.023648,0.495287,0.382068,0.122645,0.257783,0.742217,1700.0,-2999.05,2.682657,5.330435,1.986998,,27.5,0.0,,,0.012879,0.0,,,0.541532,807.0,609.0,463.0,414.0,129.0,2.0,3.0,807.0,609.0,463.0,414.0,129.0,2.0,3.0,0.314906,0.274923,0.17906,0.180458,0.048544,0.000893,0.001215,0.314906,0.274923,0.17906,0.180458,0.048544,0.000893,0.001215,1203.0,0.182909,1656.364481,0.183973,3.025496,0.0,0.14705,0.62433,0.011772,0.175493,0.041354,0.051093,0.821242,0.178758


In [117]:
df_means[
    "Unique fragments in cells and in peaks, normalized to unique fragments in cells"
].sort_values()

technology
s3-ATAC                   0.178758
HyDrop                    0.363916
mtscATAC-seq              0.364658
mtscATAC-seq (FACS)       0.486467
10x v1.1                  0.564948
10x v1                    0.567761
Bio-Rad ddSEQ SureCell    0.570468
10x v1.1 (control)        0.580498
10x Multiome              0.581169
10x v2                    0.608598
Name: Unique fragments in cells and in peaks, normalized to unique fragments in cells, dtype: float64

In [118]:
df_medians["Unique fragments in cells and in peaks"].sort_values()

technology
s3-ATAC                   0.041354
HyDrop                    0.043792
mtscATAC-seq              0.103005
Bio-Rad ddSEQ SureCell    0.111458
10x v1.1                  0.137735
10x v1                    0.153879
10x Multiome              0.159505
mtscATAC-seq (FACS)       0.207314
10x v1.1 (control)        0.240739
10x v2                    0.279082
Name: Unique fragments in cells and in peaks, dtype: float64

In [119]:
df_stats["n_peaks__Bcell"]

BIO_ddseq_1            57701.0
BIO_ddseq_2            62368.0
BIO_ddseq_3            59383.0
BIO_ddseq_4            62871.0
BRO_mtscatacfacs_1     71752.0
BRO_mtscatacfacs_2     58383.0
CNA_10xmultiome_1      74017.0
CNA_10xmultiome_2      56656.0
CNA_10xv11_1           57249.0
CNA_10xv11_2           52223.0
CNA_10xv11_3          109204.0
CNA_10xv11c_1          60050.0
CNA_10xv11c_2          93974.0
CNA_10xv2_1           130659.0
CNA_10xv2_2           145489.0
CNA_hydrop_1               NaN
CNA_hydrop_2           43706.0
CNA_hydrop_3            8785.0
CNA_mtscatac_1         53662.0
CNA_mtscatac_2         32664.0
EPF_hydrop_1           45063.0
EPF_hydrop_2           26942.0
EPF_hydrop_3           33573.0
EPF_hydrop_4           33886.0
HAR_ddseq_1            82924.0
HAR_ddseq_2            80476.0
MDC_mtscatac_1         97089.0
MDC_mtscatac_2         69003.0
OHS_s3atac_1           64256.0
OHS_s3atac_2           20100.0
SAN_10xmultiome_1      88741.0
SAN_10xmultiome_2     108238.0
STA_10xv

In [120]:
list(df_stats.columns)

['short_identifier',
 'centre',
 'technology',
 'sequencing_instrument',
 'reads',
 'cells',
 'RPC',
 'nreads',
 '%_correct_barcodes',
 'r1_length',
 'r2_length',
 'avg_insert_size',
 '%_mapq30',
 'avg_map_quality',
 'Median_total_nr_frag',
 'Median_unique_nr_frag',
 'Median_dupl_rate',
 'Median_total_nr_frag_in_regions',
 'Median_frip',
 'Median_tss_enrichment',
 'total_nr_frag_in_selected_barcodes',
 'total_nr_unique_frag_in_selected_barcodes',
 'total_nr_unique_frag_in_selected_barcodes_in_regions',
 'n_barcodes_merged',
 'frac_barcodes_merged',
 'efficiency',
 'chr1',
 'chr10',
 'chr11',
 'chr12',
 'chr13',
 'chr14',
 'chr15',
 'chr16',
 'chr17',
 'chr18',
 'chr19',
 'chr2',
 'chr20',
 'chr21',
 'chr22',
 'chr3',
 'chr4',
 'chr5',
 'chr6',
 'chr7',
 'chr8',
 'chr9',
 'chrM',
 'chrX',
 'chrY',
 'nonstandard',
 'total_fragments',
 'n_dars__B_cell',
 'top_2000_dars_median_logfc__B_cell',
 'top_2000_dars_median_fc__B_cell',
 'n_dars__CD14+_monocyte',
 'top_2000_dars_median_logfc__CD14+

In [138]:
cell_data_path_dict = {
    os.path.basename(x).split(".")[0]: x
    for x in sorted(
        glob.glob(
            "../fixedcells_3_cistopic_consensus/cistopic_objects/*dimreduc.consensus*cell_data.tsv"
        )
    )
}

In [143]:
df_merged = pd.DataFrame()
for sample, path in cell_data_path_dict.items():
    print(sample)
    df = pd.read_csv(path, sep="\t", index_col=0)

    df_merged = pd.concat([df_merged, df])

BIO_ddseq_1
BIO_ddseq_2
BIO_ddseq_3
BIO_ddseq_4
BRO_mtscatac_1
BRO_mtscatac_2
CNA_10xmultiome_1
CNA_10xmultiome_2
CNA_10xv11_1
CNA_10xv11_2
CNA_10xv11_3
CNA_10xv11_4
CNA_10xv11_5
CNA_10xv2_1
CNA_10xv2_2
CNA_hydrop_1
CNA_hydrop_2
CNA_hydrop_3
CNA_mtscatac_1
CNA_mtscatac_2
EPF_hydrop_1
EPF_hydrop_2
EPF_hydrop_3
EPF_hydrop_4
HAR_ddseq_1
HAR_ddseq_2
MDC_mtscatac_1
MDC_mtscatac_2
OHS_s3atac_1
OHS_s3atac_2
SAN_10xmultiome_1
SAN_10xmultiome_2
STA_10xv11_1
STA_10xv11_2
TXG_10xv11_1
TXG_10xv2_1
TXG_10xv2_2
UCS_ddseq_1
UCS_ddseq_2
VIB_10xmultiome_1
VIB_10xmultiome_2
VIB_10xv1_1
VIB_10xv1_2
VIB_10xv2_1
VIB_10xv2_2
VIB_hydrop_1
VIB_hydrop_2


Unnamed: 0,cisTopic_log_nr_acc,cisTopic_log_nr_frag,cisTopic_nr_frag,cisTopic_nr_acc,Log_total_nr_frag,Log_unique_nr_frag,Total_nr_frag,Unique_nr_frag,Dupl_nr_frag,Dupl_rate,Total_nr_frag_in_regions,Unique_nr_frag_in_regions,FRIP,TSS_enrichment,sample_id,barcode,Doublet_scores_fragments,Predicted_doublets_fragments,seurat_cell_type,seurat_cell_type_pred_score,pycisTopic_leiden_10_3.0,consensus_cell_type,UMAP_1,UMAP_2,tSNE_1,tSNE_2,fmx_droplet_type,fmx_sample
CAGGCGGATGAATAAAGTGCG_CGCGGCGACCTACCGCAGTGT___BIO_ddseq_1.FIXEDCELLS,3.684217,3.775756,5967,4833,4.495669,3.925415,31309,8422,22887,0.731004,16299,4660,0.553313,38.887733,BIO_ddseq_1.FIXEDCELLS,CAGGCGGATGAATAAAGTGCG_CGCGGCGACCTACCGCAGTGT,0.170213,False,CD4+ T cell,0.794101,18,CD4+ T cell,-1.676865,-1.709572,-22.865466,-16.789552,,
ACACGCGATATAACATTCGTT_TTCCTCTTCGTTCTGCTAATT___BIO_ddseq_1.FIXEDCELLS,3.597366,3.681241,4800,3957,4.434553,3.819215,27199,6595,20604,0.757528,16613,3955,0.599697,40.580825,BIO_ddseq_1.FIXEDCELLS,ACACGCGATATAACATTCGTT_TTCCTCTTCGTTCTGCTAATT,0.163511,False,Cytotoxic T cell,0.581627,7,Cytotoxic T cell,-0.056257,5.810438,-13.460642,31.882443,,
AACGGTGGAGAGGTTAGTGTT_TTGTAAGCGTTTGATGAGGAG___BIO_ddseq_1.FIXEDCELLS,3.850524,3.967782,9285,7088,4.727330,4.125741,53374,13358,40016,0.749728,34227,8510,0.637071,29.227156,BIO_ddseq_1.FIXEDCELLS,AACGGTGGAGAGGTTAGTGTT_TTGTAAGCGTTTGATGAGGAG,0.283820,False,CD4+ T cell,0.680369,12,Cytotoxic T cell,-1.336349,1.076014,-10.645339,3.344298,,
ATAGTTGTGAGATTGAATCAA___BIO_ddseq_1.FIXEDCELLS,3.683317,3.776411,5976,4823,4.381927,3.901295,24095,7967,16128,0.669350,14783,4961,0.622694,29.081667,BIO_ddseq_1.FIXEDCELLS,ATAGTTGTGAGATTGAATCAA,0.300000,False,B cell,1.000000,3,B cell,9.337917,-1.678898,15.481422,-39.594394,,
TACGCATTCTGAACGAGCGTG___BIO_ddseq_1.FIXEDCELLS,3.658107,3.740600,5503,4551,4.313952,3.876102,20604,7518,13086,0.635119,10367,3903,0.519154,33.464056,BIO_ddseq_1.FIXEDCELLS,TACGCATTCTGAACGAGCGTG,0.283820,False,CD4+ T cell,0.916906,18,CD4+ T cell,-1.358059,-1.996953,-24.730488,-19.812250,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
GACGAGGTAAGATGGCCAAC___VIB_hydrop_22.FIXEDCELLS,2.828015,2.883093,764,673,3.219846,3.011147,1659,1026,633,0.381555,863,541,0.527290,35.755000,VIB_hydrop_22.FIXEDCELLS,GACGAGGTAAGATGGCCAAC,0.061224,False,Cytotoxic T cell,0.677033,5,Cytotoxic T cell,-5.841873,10.412942,-25.110930,-22.789226,SNG,sampleA
AGGTTGCATTATCCGAGTAT___VIB_hydrop_22.FIXEDCELLS,2.781037,2.961895,916,604,3.386321,3.064083,2434,1159,1275,0.523829,1212,590,0.509060,23.245000,VIB_hydrop_22.FIXEDCELLS,AGGTTGCATTATCCGAGTAT,0.114754,False,Cytotoxic T cell,0.517525,16,Cytotoxic T cell,-5.213475,8.696330,-22.000815,-9.940833,SNG,sampleA
TCAAGAGGCGAGGACGTTCG___VIB_hydrop_22.FIXEDCELLS,2.986772,3.149835,1412,970,3.632457,3.265996,4290,1845,2445,0.569930,2452,1044,0.565854,43.741158,VIB_hydrop_22.FIXEDCELLS,TCAAGAGGCGAGGACGTTCG,0.114754,False,Cytotoxic T cell,0.538525,8,Cytotoxic T cell,-6.212889,8.207357,-33.741268,-4.946876,SNG,sampleB
GGAGTATTCTCAAGACGTCT___VIB_hydrop_22.FIXEDCELLS,3.048442,3.171726,1485,1118,3.788239,3.291369,6141,1956,4185,0.681485,3585,1106,0.565440,41.785295,VIB_hydrop_22.FIXEDCELLS,GGAGTATTCTCAAGACGTCT,0.097345,False,Cytotoxic T cell,0.440256,4,CD4+ T cell,-4.861959,6.946391,-26.550515,6.269226,SNG,sampleB
