# Applying MaxMass to iBAQ proteomics

1.  Given  molar percentage measurements from step N, run `MaxMass` (our modified version of `MinGenome`)  using those molar percentages to predict which genes should be knocked out in step N+1. Compare our `MaxMass` predictions with the choice of actual step N+1 and `MinGenome` step to see how similar they are.
![MinGenome](MinGenome.png "MinGenome workflow")

2.  Take the iBAQ measurements for step N, remove all genes that were knocked out in actual step N+1 and recalculate the molar percentage.  Compare with actual molar percentages in step N+1 using [KL divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) 
$$D_{KL}(P\|Q) = -\sum_iP(i)\log\frac{Q(i)}{P(i)}$$

or even more simply the Euclidean distance metric 

$$D_2(P,Q) = \sum_i\|P(i)-Q(i)\|^2$$

where $P$ is the actual molar percentages and $Q$ is the predicted molar percentage. This gives us a measure of how much protein expression changed as a result of the knockouts. If $D_{KL}=0$ then it is an exact match. The greater the $D_{KL}$, the greater the divergence between prediction and actual. This doesn't tell us how much protein capacity we reclaimed because all if all proteins had the same percentage of increased expression, we wouldn't detect it with iBAQ, but it will give us an idea of how valid our assumptions are for using molar percentage to choose which genes to knock out.
 

In [16]:
def explode(df, lst_cols, fill_value=''):
    # make sure `lst_cols` is a list
    if lst_cols and not isinstance(lst_cols, list):
        lst_cols = [lst_cols]
    # all columns except `lst_cols`
    idx_cols = df.columns.difference(lst_cols)

    # calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    if (lens > 0).all():
        # ALL lists in cells aren't empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .loc[:, df.columns]
    else:
        # at least one list in cells is empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .append(df.loc[lens==0, idx_cols]).fillna(fill_value) \
          .loc[:, df.columns]

## Preliminaries to map W3110 genes to blattner genes

In [149]:
uniprot2blattner = pd.read_table('../data/Ecoli/blatter-to-uniprot.tab')


K12toW3110 = pd.read_table('E_coli_K12_and_E_coli_W3110_BBH.tab')
K12toW3110['Uniprot'] = K12toW3110['E_coli_K12'].str.split('|').str.get(1)
K12toW3110 = K12toW3110.join(uniprot2blattner.set_index('Uniprot'), on='Uniprot')
ecoli_ko = pd.read_table('E.coli_kos.tab',index_col='locus')
ecoli_ko = ecoli_ko\
            .join( K12toW3110\
                      .set_index('Blattner')
                 )[['gene',
                    'Step',
                    'E_coli_W3110']]
ecoli_ko[ecoli_ko['E_coli_W3110'].isnull()].to_csv('missing_mapping.tab',sep='\t')

#            .dropna()\
#            .reset_index()\
#            .set_index( 'E_coli_W3110' )
ecoli_ko

Unnamed: 0,gene,Step,E_coli_W3110
b0061,araD,29,W3110_lambdaRed.CDS.57
b0062,araA,29,W3110_lambdaRed.CDS.58
b0063,araB,29,W3110_lambdaRed.CDS.59
b0064,araC,29,W3110_lambdaRed.CDS.60
b0065,yabI,29,W3110_lambdaRed.CDS.61
b0066,thiQ,29,W3110_lambdaRed.CDS.62
b0067,thiP,29,W3110_lambdaRed.CDS.63
b0068,tbpA,29,W3110_lambdaRed.CDS.64
b0069,sgrR,29,W3110_lambdaRed.CDS.65
b0070,setA,29,W3110_lambdaRed.CDS.67


# Preliminary Mol %

In [201]:
cols = ['protein_ID','iBAQ_Step04_1', 'iBAQ_Step04_2', 'iBAQ_Step04_3',
       'iBAQ_Step05_1', 'iBAQ_Step05_2', 'iBAQ_Step05_3', 'iBAQ_Step09_1',
       'iBAQ_Step09_2', 'iBAQ_Step09_3', 'iBAQ_Step10_1', 'iBAQ_Step10_2',
       'iBAQ_Step10_3', 'iBAQ_W3110_1', 'iBAQ_W3110_2', 'iBAQ_W3110_3']
ibaq = pd.read_table('E_coli_data_frame.txt',index_col='protein_ID',usecols=cols)
ibaq
norm_ibaq = ibaq/ibaq.sum(axis=0)
norm_ibaq

Unnamed: 0_level_0,iBAQ_Step04_1,iBAQ_Step04_2,iBAQ_Step04_3,iBAQ_Step05_1,iBAQ_Step05_2,iBAQ_Step05_3,iBAQ_Step09_1,iBAQ_Step09_2,iBAQ_Step09_3,iBAQ_Step10_1,iBAQ_Step10_2,iBAQ_Step10_3,iBAQ_W3110_1,iBAQ_W3110_2,iBAQ_W3110_3
protein_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
W3110_lambdaRed.CDS.1,1.010359e-03,0.000981,0.001047,0.001899,0.001885,0.001793,1.964024e-03,0.001509,0.001674,0.001911,0.001961,0.001944,9.809605e-04,0.000925,0.000834
W3110_lambdaRed.CDS.100,3.481408e-05,0.000023,0.000025,0.000025,0.000028,0.000026,3.911826e-05,0.000033,0.000009,0.000015,0.000015,0.000012,3.135649e-05,0.000011,0.000018
W3110_lambdaRed.CDS.1007,3.077660e-05,0.000045,0.000052,0.000037,0.000061,0.000027,2.146177e-05,0.000012,0.000028,0.000015,0.000026,0.000030,3.381265e-05,0.000087,0.000056
W3110_lambdaRed.CDS.101,4.043584e-04,0.000375,0.000534,0.000603,0.000613,0.000591,5.217002e-04,0.000566,0.000755,0.000517,0.000490,0.000623,5.659019e-04,0.000564,0.000594
W3110_lambdaRed.CDS.1011,3.695590e-04,0.000345,0.000396,0.000298,0.000211,0.000186,3.437747e-04,0.000341,0.000235,0.000373,0.000360,0.000251,4.027966e-04,0.000355,0.000314
W3110_lambdaRed.CDS.1012,3.639415e-05,0.000032,0.000036,0.000043,0.000064,0.000104,4.309663e-05,0.000043,0.000036,0.000048,0.000051,0.000035,2.988162e-05,0.000045,0.000037
W3110_lambdaRed.CDS.1014,5.254829e-04,0.001421,0.000819,0.000652,0.000735,0.000622,1.948164e-04,0.000074,0.000220,0.000275,0.000193,0.000234,8.165195e-04,0.000845,0.000420
W3110_lambdaRed.CDS.1017,4.976897e-06,0.000009,0.000005,0.000012,0.000002,0.000003,8.379408e-06,0.000007,0.000009,0.000010,0.000007,0.000004,7.410636e-06,0.000011,0.000018
W3110_lambdaRed.CDS.1018,1.532792e-04,0.000135,0.000159,0.000143,0.000097,0.000130,1.180548e-04,0.000130,0.000133,0.000145,0.000162,0.000151,1.972265e-04,0.000169,0.000185
W3110_lambdaRed.CDS.1022,9.637780e-05,0.000138,0.000150,0.000124,0.000129,0.000104,1.177609e-04,0.000132,0.000134,0.000140,0.000142,0.000143,1.057341e-04,0.000107,0.000129


   ## Melted iBAQ separates replicate from strain

In [202]:
melted_ibaq = norm_ibaq.reset_index().melt(id_vars=['protein_ID'],value_name='iBAQ')
melted_ibaq['Replicate'] = melted_ibaq['variable'].str.split('_').str.get(-1)
melted_ibaq['Strain'] = melted_ibaq['variable'].str.rsplit('_',n=1).str.get(0)
melted_ibaq

Unnamed: 0,protein_ID,variable,iBAQ,Replicate,Strain
0,W3110_lambdaRed.CDS.1,iBAQ_Step04_1,1.010359e-03,1,iBAQ_Step04
1,W3110_lambdaRed.CDS.100,iBAQ_Step04_1,3.481408e-05,1,iBAQ_Step04
2,W3110_lambdaRed.CDS.1007,iBAQ_Step04_1,3.077660e-05,1,iBAQ_Step04
3,W3110_lambdaRed.CDS.101,iBAQ_Step04_1,4.043584e-04,1,iBAQ_Step04
4,W3110_lambdaRed.CDS.1011,iBAQ_Step04_1,3.695590e-04,1,iBAQ_Step04
5,W3110_lambdaRed.CDS.1012,iBAQ_Step04_1,3.639415e-05,1,iBAQ_Step04
6,W3110_lambdaRed.CDS.1014,iBAQ_Step04_1,5.254829e-04,1,iBAQ_Step04
7,W3110_lambdaRed.CDS.1017,iBAQ_Step04_1,4.976897e-06,1,iBAQ_Step04
8,W3110_lambdaRed.CDS.1018,iBAQ_Step04_1,1.532792e-04,1,iBAQ_Step04
9,W3110_lambdaRed.CDS.1022,iBAQ_Step04_1,9.637780e-05,1,iBAQ_Step04


# Mean and std of iBAQ across replicates for each step

Take the iBAQ measurements for step N, remove all genes that were knocked out in actual step N+1 and recalculate the molar percentage.  Compare with actual molar percentages in step N+1 using [KL divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) 
$$D_{KL}(P\|Q) = -\sum_iP(i)\log\frac{Q(i)}{P(i)}$$

or even more simply the Euclidean distance metric 

$$D_2(P,Q) = \sum_i\|P(i)-Q(i)\|^2$$

where $P$ is the actual molar percentages and $Q$ is the predicted molar percentage. This gives us a measure of how much protein expression changed as a result of the knockouts. If $D_{KL}=0$ then it is an exact match. The greater the $D_{KL}$, the greater the divergence between prediction and actual. This doesn't tell us how much protein capacity we reclaimed because all if all proteins had the same percentage of increased expression, we wouldn't detect it with iBAQ, but it will give us an idea of how valid our assumptions are for using molar percentage to choose which genes to knock out.
 

In [212]:
mean_ibaq = melted_ibaq.groupby( by=['protein_ID','Strain'])['iBAQ'].mean().unstack()
std_ibaq  = melted_ibaq.groupby( by=['protein_ID','Strain'])['iBAQ'].std().unstack() 
blattner_mean_ibaq = mean_ibaq.join(K12toW3110.set_index('E_coli_W3110')['Blattner']).reset_index().dropna().set_index('Blattner')
blattner_mean_ibaq_steps = blattner_mean_ibaq.join(ecoli_ko)
for i in [5,10]:
    blattner_mean_ibaq_steps['Step{:02d}_predicted_from_Step{:02d}'.format(i,i-1)] = blattner_mean_ibaq_steps['iBAQ_Step{:02d}'.format(i-1)]
    blattner_mean_ibaq_steps['Step{:02d}_predicted_from_WT'.format(i)] = blattner_mean_ibaq_steps['iBAQ_W3110']
    blattner_mean_ibaq_steps['Step{:02d}_predicted_from_WT'.format(i-1)] = blattner_mean_ibaq_steps['iBAQ_W3110']
    idx = blattner_mean_ibaq_steps[blattner_mean_ibaq_steps['Step'] <= i].index
    blattner_mean_ibaq_steps.loc[idx,'Step{:02d}_predicted_from_Step{:02d}'.format(i,i-1)] = 0
    blattner_mean_ibaq_steps.loc[idx,'Step{:02d}_predicted_from_WT'.format(i)] = 0
    blattner_mean_ibaq_steps.loc[idx,'iBAQ_Step{:02d}'.format(i)] = 0
    idx2 = blattner_mean_ibaq_steps[blattner_mean_ibaq_steps['Step'] <= (i-1)].index
    blattner_mean_ibaq_steps.loc[idx2,'Step{:02d}_predicted_from_WT'.format(i-1)] = 0
    blattner_mean_ibaq_steps.loc[idx2,'iBAQ_Step{:02d}'.format(i-1)] = 0

#blattner_mean_ibaq_steps[blattner_mean_ibaq_steps['Predicted_iBAQ_Step04'] = blattner_mean_ibaq_steps[]
sorted_mean_ibaq_steps = blattner_mean_ibaq_steps.sort_values(by=['Step']).reset_index().set_index(['level_0','index','gene', 'Step','E_coli_W3110'])
sorted_mean_ibaq_steps.index = sorted_mean_ibaq_steps.index.droplevel(-1)
sorted_mean_ibaq_steps

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,iBAQ_Step04,iBAQ_Step05,iBAQ_Step09,iBAQ_Step10,iBAQ_W3110,Step05_predicted_from_Step04,Step05_predicted_from_WT,Step04_predicted_from_WT,Step10_predicted_from_Step09,Step10_predicted_from_WT,Step09_predicted_from_WT
level_0,index,gene,Step,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
b2977,W3110_lambdaRed.CDS.2976,glcG,2.0,0.000000,0.000000,0.000000,0.000000,0.000035,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b2976,W3110_lambdaRed.CDS.2975,glcB,2.0,0.000000,0.000000,0.000000,0.000000,0.000103,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b4034,W3110_lambdaRed.CDS.4048,malE,3.0,0.000000,0.000000,0.000000,0.000000,0.000018,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b4035,W3110_lambdaRed.CDS.4049,malK,3.0,0.000000,0.000000,0.000000,0.000000,0.000029,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b4037,W3110_lambdaRed.CDS.4051,malM,3.0,0.000000,0.000000,0.000000,0.000000,0.000087,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b1152,W3110_lambdaRed.CDS.1115,ymfP,4.0,0.000000,0.000000,0.000000,0.000000,0.000055,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b1166,W3110_lambdaRed.CDS.1127,ymgB,4.0,0.000000,0.000000,0.000000,0.000000,0.000009,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
b1550,W3110_lambdaRed.CDS.1537,gnsB,5.0,0.000825,0.000000,0.000000,0.000000,0.000878,0.000000,0.000000,0.000878,0.000000,0.000000,0.000000
b1557,W3110_lambdaRed.CDS.1543,cspB,5.0,0.000020,0.000000,0.000000,0.000000,0.000019,0.000000,0.000000,0.000019,0.000000,0.000000,0.000000
b1564,W3110_lambdaRed.CDS.1549,relB,5.0,0.000071,0.000000,0.000000,0.000000,0.000083,0.000000,0.000000,0.000083,0.000000,0.000000,0.000000


## Compute % protein and KL divergence

In [215]:
from scipy.special import kl_div
prot_pct = sorted_mean_ibaq_steps/sorted_mean_ibaq_steps.sum(axis=0).sort_values(ascending=False)
prot_pct[r'$D_{KL}\left(\text{iBAQ_Step05}\|\text{Step05_predicted_from_Step04}\right)$'] = kl_div(prot_pct['iBAQ_Step05'], 
                                                                                      prot_pct['Step05_predicted_from_Step04'])
prot_pct[r'$D_{KL}\left(\text{iBAQ_Step10}\|\text{Step10_predicted_from_Step09}\right)$'] = kl_div(prot_pct['iBAQ_Step10'], 
                                                                                      prot_pct['Step10_predicted_from_Step09'])
prot_pct[r'$D_{KL}\left(\text{iBAQ_Step05}\|\text{Step05_predicted_from_WT}\right)$'] = kl_div(prot_pct['iBAQ_Step05'], 
                                                                                      prot_pct['Step05_predicted_from_WT'])
prot_pct[r'$D_{KL}\left(\text{iBAQ_Step10}\|\text{Step10_predicted_from_WT}\right)$'] = kl_div(prot_pct['iBAQ_Step10'], 
                                                                                      prot_pct['Step10_predicted_from_WT'])
prot_pct[r'$D_{KL}\left(\text{iBAQ_Step04}\|\text{Step04_predicted_from_WT}\right)$'] = kl_div(prot_pct['iBAQ_Step04'], 
                                                                                      prot_pct['Step04_predicted_from_WT'])
prot_pct[r'$D_{KL}\left(\text{iBAQ_Step09}\|\text{Step09_predicted_from_WT}\right)$'] = kl_div(prot_pct['iBAQ_Step09'], 
                                                                                      prot_pct['Step09_predicted_from_WT'])

display(prot_pct.sum(axis=0).to_frame('Sum'))
prot_pct.index = prot_pct.index.droplevel(level=['index','gene','Step'])

Unnamed: 0,Sum
Step04_predicted_from_WT,1.0
Step05_predicted_from_Step04,1.0
Step05_predicted_from_WT,1.0
Step09_predicted_from_WT,1.0
Step10_predicted_from_Step09,1.0
Step10_predicted_from_WT,1.0
iBAQ_Step04,1.0
iBAQ_Step05,1.0
iBAQ_Step09,1.0
iBAQ_Step10,1.0


# MaxMass applied to wild-type iBAQ

In [245]:
prot_pct['iBAQ_W3110'].to_csv('../iBAQ_W3110.tab')

abundance = pd.read_table('../data/Ecoli/A14.07036_cumulative_mass.tab', index_col='gene_or_promoter')

abundance = abundance.join(prot_pct['iBAQ_W3110']).sort_values('start',ascending=True)
abundance['iBAQ_W3110'] = abundance['iBAQ_W3110'].fillna(abundance['iBAQ_W3110'].min())
abundance['iBAQ_W3110'] = abundance['iBAQ_W3110']/abundance['iBAQ_W3110'].sum()
abundance['iBAQ_W3110'] = abundance['iBAQ_W3110'].cumsum()
abundance.to_csv('../data/Ecoli/A14.07036_cumulative_mass_w_ibaq.tab',sep='\t',index_label='gene_or_promoter')
abundance

Unnamed: 0,start,end,strand,class,genes_in_TU,start_if_select_as_start,cannot_as_start,Uniprot,Description,Gene,Cellular protein location (according to www.uniprot.org),A14.07036,A14.07037,A14.07038,cumulativeMass,iBAQ_W3110
PM00249,148,189,1,promoter,"[b0001, b0002, b0003, b0004]",148,0,,,,,0.000000,,,0.000000,8.125507e-07
b0001,190,255,1,gene,,190,0,P0AD86,,,,0.017912,,,0.017912,1.625101e-06
b0002,337,2799,1,gene,,337,0,P00561,,,,0.017912,,,0.035824,9.942986e-04
b0003,2801,3733,1,gene,,2801,0,P00547,Homoserine kinase OS=Escherichia coli (strain ...,thrB,Cytoplasm,0.033131,0.032556,0.033302,0.068954,1.167583e-03
b0004,3734,5020,1,gene,,3734,0,P00934,,,,0.017912,,,0.086866,2.736426e-03
b0005,5234,5530,1,gene,,5234,0,P75616,,,,0.017912,,,0.104778,2.737239e-03
b0006,5683,6459,-1,gene,,5683,0,P0A8I3,,,,0.017912,,,0.122690,2.798535e-03
b0007,6529,7959,-1,gene,,6529,0,P30143,,,,0.017912,,,0.140602,2.799348e-03
PM0-9956,8191,8237,1,promoter,[b0008],8191,0,,,,,0.000000,,,0.140602,2.800160e-03
b0008,8238,9191,1,gene,,8238,0,P0A870,Transaldolase B OS=Escherichia coli (strain K1...,talB,Cytoplasm,0.620909,0.622423,0.610794,0.761511,5.339384e-03


In [253]:
import io
solutionstr = """
,end,start,status
0,b0971,b0957,Optimal
1,b4433,b1823,Optimal
2,b3461,b3460,Optimal
3,b0024,b0023,Optimal
4,b4698,b1480,Optimal
5,b4674,b1324,Optimal
6,PM0-9375,b0755,Optimal
7,b4677,b1857,Optimal
8,b4595,b1237,Optimal
9,b4471,b3098,Optimal
10,PM0-10230,b1739,Optimal
"""

solution = pd.read_csv('../out/local_result_essential.csv',index_col=0)
solution['start'] = solution['start'].str.split('_').str.get(-1)
solution['end'] = solution['end'].str.split('_').str.get(-1)
solution['CCO'] = (abundance.loc[solution['end'], 
              'iBAQ_W3110'].reset_index()['iBAQ_W3110'] - abundance.loc[solution['start'], 'iBAQ_W3110'].reset_index()['iBAQ_W3110']).round(3)*100
solution[['start','end','CCO']]

Unnamed: 0,start,end,CCO
0,b4493,b1636,2.6
1,PM0-7163,b0628,1.9
2,b0955,b0971,1.8
3,b1813,b1852,1.7
4,b3465,b3559,1.6
5,b3635,b3636,1.5
6,b1913,PM0-7141,1.4
7,b3998,b4005,1.3
8,b0433,b0452,1.2
9,PM00577,b1779,1.2


In [241]:
abundance.loc['b0957':'b0971','iBAQ_W3110']

b0957    0.017035
b3636    0.031715
b4000    0.044674
b0605    0.056993
b1779    0.069052
b2414    0.081034
b2779    0.092381
b1823    0.103069
b3315    0.113712
b3312    0.124162
b2609    0.134340
b3460    0.143978
b3495    0.153475
b0023    0.162821
b3829    0.171653
b3341    0.180351
b4200    0.188965
b1480    0.196625
b3313    0.204063
b3307    0.211304
b3301    0.218331
b1136    0.225290
b3305    0.232170
b3303    0.239038
b3314    0.245692
b3298    0.252334
b1324    0.258933
b2925    0.265402
b0755    0.271786
b3317    0.278131
           ...   
b0900    0.998260
b0904    0.998261
b0906    0.998262
b0909    0.998263
b0913    0.998264
b0915    0.998264
b0916    0.998265
b0919    0.998266
b0926    0.998267
b0933    0.998268
b0934    0.998268
b0935    0.998269
b0936    0.998270
b0937    0.998271
b0938    0.998272
b0939    0.998273
b0940    0.998273
b0941    0.998274
b0942    0.998275
b0943    0.998276
b0944    0.998277
b0946    0.998277
b0950    0.998278
b0953    0.998279
b0958    0

# iBAQ excel

In [31]:
ibaq = pd.read_excel('CCO_iBAQ_MolPercentage.xlsx',
                       sheet_name='iBAQ_MolPerc', 
                       header=[0,1])\
            .xs('iBAQ',axis=1)
ibaq.columns

Index(['iBAQ DGF-298_22', 'iBAQ DGF-298_23', 'iBAQ DGF-298_24',
       'iBAQ MGF-01_10', 'iBAQ MGF-01_11', 'iBAQ MGF-01_12', 'iBAQ MGF-02_16',
       'iBAQ MGF-02_17', 'iBAQ MGF-02_18', 'iBAQ Step04_07', 'iBAQ Step04_08',
       'iBAQ Step04_09', 'iBAQ Step05_13', 'iBAQ Step05_14', 'iBAQ Step05_15',
       'iBAQ Step09_19', 'iBAQ Step09_20', 'iBAQ Step09_21', 'iBAQ Step10_04',
       'iBAQ Step10_05', 'iBAQ Step10_06', 'iBAQ W3110_01', 'iBAQ W3110_02',
       'iBAQ W3110_03'],
      dtype='object', name='Protein IDs')

In [33]:
melted_ibaq = ibaq.reset_index().melt(id_vars=['index'],value_name='iBAQ')
melted_ibaq['Replicate'] = melted_ibaq['Protein IDs'].str.split('_').str.get(-1)
melted_ibaq['Strain'] = melted_ibaq['Protein IDs'].str.split('_').str.get(0).str.split(' ').str.get(-1)
melted_ibaq

Unnamed: 0,index,Protein IDs,iBAQ,Replicate,Strain
0,W3110_lambdaRed.CDS.1007,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
1,W3110_lambdaRed.CDS.1014,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
2,W3110_lambdaRed.CDS.1034,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
3,W3110_lambdaRed.CDS.1035,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
4,W3110_lambdaRed.CDS.1036,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
5,W3110_lambdaRed.CDS.1038,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
6,W3110_lambdaRed.CDS.1041,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
7,W3110_lambdaRed.CDS.1042,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
8,W3110_lambdaRed.CDS.1043,iBAQ DGF-298_22,0.000000e+00,22,DGF-298
9,W3110_lambdaRed.CDS.1127,iBAQ DGF-298_22,0.000000e+00,22,DGF-298


In [28]:
melted_molpct.groupby(by=['index','Strain']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,mol %
index,Strain,Unnamed: 2_level_1
W3110_lambdaRed.CDS.1,DGF-298,0.179089
W3110_lambdaRed.CDS.1,MGF-01,0.093845
W3110_lambdaRed.CDS.1,MGF-02,0.109474
W3110_lambdaRed.CDS.1,Step04,0.101340
W3110_lambdaRed.CDS.1,Step05,0.186276
W3110_lambdaRed.CDS.1,Step09,0.171936
W3110_lambdaRed.CDS.1,Step10,0.194278
W3110_lambdaRed.CDS.1,W3110,0.091442
W3110_lambdaRed.CDS.100,DGF-298,0.005234
W3110_lambdaRed.CDS.100,MGF-01,0.003249
