# Clonotype Verification

For the data provided by 10x clonotypes were already indicated for each donor individually. It might occur, that "public" TCRs are shared across donors. 

In [1]:
import pandas as pd

Load TRBs and TRAs of each donor.

In [42]:
def extract_tcrs(data):
    alphas = []
    betas = []
    for row in data['cell_clono_cdr3_aa']:
        if row.count('TRB') != 1 or row.count('TRA') != 1:
            continue
        for chain in row.split(';'):
            label, sequence = chain.split(':')
            if label == 'TRA':
                alphas.append(sequence)
            elif label == 'TRB':
                betas.append(sequence)
    return alphas, betas

In [43]:
path_base = '../data/10x_CD8TC/'

cdr3s_alpha_by_donor = []
cdr3s_beta_by_donor = []
for donor_idx in range(1, 5):
    path_tcr = path_base + f'patient_{donor_idx}/vdj_v1_hs_aggregated_donor{donor_idx}_binarized_matrix.csv'
    data_tmp = pd.read_csv(path_tcr)
    cdr3s_alpha, cdr3s_beta = extract_tcrs(data_tmp)
    print(f'donor {donor_idx}: {len(cdr3s_alpha)} TRAs, {len(cdr3s_beta)} TRBs')
    cdr3s_alpha_by_donor.append(cdr3s_alpha)
    cdr3s_beta_by_donor.append(cdr3s_beta)

donor 1: 32906 TRAs, 32920 TRBs
donor 2: 54518 TRAs, 54524 TRBs
donor 3: 29904 TRAs, 29907 TRBs
donor 4: 22027 TRAs, 22027 TRBs


Calculate the overlap between donors.

In [52]:
def size_overlap(list_1, list_2):
    overlap = list(set(list_1) & set(list_2))
    n = len(overlap)
    return n

In [53]:
def check_for_overlap(cdrs, name):
    for idx_1 in range(4):
        list_1 = cdrs[idx_1]
        for idx_2 in range(4):
            if idx_1 == idx_2:
                continue
            list_2 = cdrs[idx_2]
            n = size_overlap(list_1, list_2)
            print(f'{n} overlapping {name}s between donors {idx_1} and {idx_2}')

In [54]:
check_for_overlap(cdr3s_alpha_by_donor, 'TRA')

1151 overlapping TRAs between donors 0 and 1
1069 overlapping TRAs between donors 0 and 2
1025 overlapping TRAs between donors 0 and 3
1151 overlapping TRAs between donors 1 and 0
510 overlapping TRAs between donors 1 and 2
501 overlapping TRAs between donors 1 and 3
1069 overlapping TRAs between donors 2 and 0
510 overlapping TRAs between donors 2 and 1
629 overlapping TRAs between donors 2 and 3
1025 overlapping TRAs between donors 3 and 0
501 overlapping TRAs between donors 3 and 1
629 overlapping TRAs between donors 3 and 2


In [56]:
check_for_overlap(cdr3s_beta_by_donor, 'TRB')

398 overlapping TRBs between donors 0 and 1
173 overlapping TRBs between donors 0 and 2
210 overlapping TRBs between donors 0 and 3
398 overlapping TRBs between donors 1 and 0
90 overlapping TRBs between donors 1 and 2
104 overlapping TRBs between donors 1 and 3
173 overlapping TRBs between donors 2 and 0
90 overlapping TRBs between donors 2 and 1
106 overlapping TRBs between donors 2 and 3
210 overlapping TRBs between donors 3 and 0
104 overlapping TRBs between donors 3 and 1
106 overlapping TRBs between donors 3 and 2


Load the joint TRA and TRB sequences per donor.

In [60]:
def extract_joint_tcrs(data):
    cdrs = []
    for row in data['cell_clono_cdr3_aa']:
        if row.count('TRB') != 1 or row.count('TRA') != 1:
            continue
        for chain in row.split(';'):
            label, sequence = chain.split(':')
            if label == 'TRA':
                alpha = sequence
            elif label == 'TRB':
                beta = sequence
        cdrs.append(f'TRA:{alpha};TRB:{beta}')
    return cdrs

In [61]:
path_base = '../data/10x_CD8TC/'

cdr3s_by_donor = []
for donor_idx in range(1, 5):
    path_tcr = path_base + f'patient_{donor_idx}/vdj_v1_hs_aggregated_donor{donor_idx}_binarized_matrix.csv'
    data_tmp = pd.read_csv(path_tcr)
    cdr3s = extract_joint_tcrs(data_tmp)
    print(f'donor {donor_idx}: {len(cdr3s)} TRs')
    cdr3s_by_donor.append(cdr3s)

donor 1: 32920 TRs
donor 2: 54524 TRs
donor 3: 29907 TRs
donor 4: 22027 TRs


Calculate the overlap

In [62]:
check_for_overlap(cdr3s_by_donor, 'TR')

83 overlapping TRs between donors 0 and 1
1 overlapping TRs between donors 0 and 2
0 overlapping TRs between donors 0 and 3
83 overlapping TRs between donors 1 and 0
3 overlapping TRs between donors 1 and 2
1 overlapping TRs between donors 1 and 3
1 overlapping TRs between donors 2 and 0
3 overlapping TRs between donors 2 and 1
0 overlapping TRs between donors 2 and 3
0 overlapping TRs between donors 3 and 0
1 overlapping TRs between donors 3 and 1
0 overlapping TRs between donors 3 and 2


No matter how we define clonotypes: There is an overlap between the different donors. Therefore, the clonotype definition by 10x is not applicable.