# Dual blocker therapy (DBT) plasma proteome Paper — Proteome quantification

This Jupyter Notebook (with Python 3 kernel) contained the code for the proteomic quantification of the DBT cohort and independent validation cohort

Input files:  
* The DIANN output merged files: DBT_DIANN_output_merge.txt, validation_DIANN_output_merge.txt

Output files:  
* MaxLFQ quantification matrix

# MaxLFQ quantification

In [None]:
import os 
from rpy2.robjects import r
from rpy2.robjects.packages import importr

inext = importr("iq") # library R package

for name in ['DBT', 'validation']:
    rscript = """
    process_long_format("../documents/{}_DIANN_output_merge.txt", 
                        output_filename = "../documents/MaxLFQ/{}-qvalues-0.01.tsv", 
                        annotation_col = c("Protein.Names", "Genes"),
                        filter_double_less = c("Q.Value" = "0.01", "Protein.Q.Value" = "0.01"))
    """.format(name, name.lower())
    r(rscript)

## MaxLFQ quantification preprocessing

In [None]:
import pandas as pd

path = '../documents/MaxLFQ'
for f in os.listdir(path):
    if f.endswith('tsv'):
        cohort, *_ = f.split('-')
        df = pd.read_table(os.path.join(path, f))
        df['Genes'] = df['Genes'].str.split(';', expand=True)[0]
        df = df.iloc[:, 2:].groupby('Genes').mean()
        df.columns = df.columns.map(lambda x: x.split('_')[0])
        df.to_csv('../document/{}.csv'.format(cohort))