**The spin systems (aka frames) in this experiment are defined by w1 and w2, which correspond to a covalently bonded C-H pair.**

> Remove any protons but the HN, HD, HE, HZ (all amide protons, bb and sc) from w3 before computing the per-frame intensity ranks and relative intensities and plot the distribution of 
* Intensity rank of CA(i)-HA(i)-HN(i) wrt any CA(i)-HA(i)-HN([0-9]+)
* Intensity rank of CA(i)-HA(i)-HN(i-1) wrt any CA(i)-HA(i)-HN([0-9]+)
* Intensity rank of CA(i)-HA(i)-HN(i+1) wrt any CA(i)-HA(i)-HN([0-9]+)
* Relative intensity of CA(i)-HA(i)-HN(i) wrt any CA(i)-HA(i)-HN([0-9]+)
* Relative intensity of CA(i)-HA(i)-HN(i-1) wrt any CA(i)-HA(i)-HN([0-9]+)
* Relative intensity of CA(i)-HA(i)-HN(i+1) wrt any CA(i)-HA(i)-HN([0-9]+)
* Do all the above but only for Gly, which has characteristic CA-HA shifts and not sidechain protons to diffuse the magentization.

> **For every amino acid type individually**, ~~remove all HN, HD, HE (all amide protons, bb and sc)~~, all HA and all aromatic protons from w3, and count how many times the most intense peak in the frame belongs to residue i and how many it doesn't.

In [1]:
import pandas as pd
import plotly.express as px
import plotly.io as pio

import warnings
warnings.simplefilter('ignore')

from functions import *

pio.templates.default = "plotly_dark"
# pio.renderers.default = 'svg'

In [2]:
aa_sidechain_protons = {
    'R': 'HH',
    'N': 'HD',
    'Q': 'HE',
    'H': 'HD',
    'K': 'HZ',
    'W': 'HE',
}

In [3]:
pdb_ids = ['2K52', '2KD0', '2LTM', '2LF2', '2LTM', '2LX7']
heteronucleus = '13CALI'

In [10]:
df = concat_protein_data(pdb_ids=pdb_ids, heteronucleus=heteronucleus)
df

Unnamed: 0,pdb_id,res,noe,X,Hn,H,height,noe_res,inter,resnum,noe_resnum,res_diff,atom_type,atom_type_pos
0,2K52,M1CA,HA,55.325,4.091,4.091,120663,M1CA,False,1,1,0,HA,HA_i
1,2K52,M1CA,HB2,55.325,4.091,2.189,12652,M1CA,False,1,1,0,HB,HB_i
2,2K52,M1CA,HB3,55.325,4.091,2.076,10144,M1CA,False,1,1,0,HB,HB_i
3,2K52,M1CA,HE,55.325,4.091,2.041,11026,M1CA,False,1,1,0,HE,HE_i
4,2K52,M1CA,HG2,55.325,4.091,2.524,9269,M1CA,False,1,1,0,HG,HG_i
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3349,2LX7,L60CG,HB2,27.900,1.510,1.560,57996,L60CG,False,60,60,0,HB,HB_i
3350,2LX7,L60CG,HB3,27.900,1.510,1.450,39606,L60CG,False,60,60,0,HB,HB_i
3351,2LX7,L60CG,HD1,27.900,1.510,0.760,2322,L60CG,False,60,60,0,HD,HD_i
3352,2LX7,L60CG,HD2,27.900,1.510,0.790,2330,L60CG,False,60,60,0,HD,HD_i


We are interested only $H^A$ - $C^A$ frames: remove the side-chain peaks

In [11]:
df = df[df['res'].str.endswith('CA')]
df.loc[:, 'res'] = df.res.str.removesuffix('CA')

In [12]:
df

Unnamed: 0,pdb_id,res,noe,X,Hn,H,height,noe_res,inter,resnum,noe_resnum,res_diff,atom_type,atom_type_pos
0,2K52,M1,HA,55.325,4.091,4.091,120663,M1CA,False,1,1,0,HA,HA_i
1,2K52,M1,HB2,55.325,4.091,2.189,12652,M1CA,False,1,1,0,HB,HB_i
2,2K52,M1,HB3,55.325,4.091,2.076,10144,M1CA,False,1,1,0,HB,HB_i
3,2K52,M1,HE,55.325,4.091,2.041,11026,M1CA,False,1,1,0,HE,HE_i
4,2K52,M1,HG2,55.325,4.091,2.524,9269,M1CA,False,1,1,0,HG,HG_i
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3278,2LX7,L60,HB2,56.700,4.210,1.560,2014,L60CA,False,60,60,0,HB,HB_i
3279,2LX7,L60,HB3,56.700,4.210,1.450,1173,L60CA,False,60,60,0,HB,HB_i
3280,2LX7,L60,HD1,56.700,4.210,0.760,1994,L60CA,False,60,60,0,HD,HD_i
3281,2LX7,L60,HD2,56.700,4.210,0.790,3072,L60CA,False,60,60,0,HD,HD_i


Simplifying the NOE contact categories: everything that's more than 1 residue away is now "far"

In [13]:
df.loc[df.res_diff.abs() > 1, "atom_type_pos"] = df.loc[df.res_diff.abs() > 1, "atom_type"] + "_far"

**Calculating the relative intensities for the frames**

Removing the non-amide protons

In [14]:
sc_amide_mask = df.apply(lambda row: is_sc_amide(row, aa_sidechain_protons), axis=1)
df_hn = df[sc_amide_mask | (df.atom_type == 'H')]
df_hn

Unnamed: 0,pdb_id,res,noe,X,Hn,H,height,noe_res,inter,resnum,noe_resnum,res_diff,atom_type,atom_type_pos
6,2K52,M1,D2H,55.325,4.091,8.938,28325,D2,True,1,2,-1,H,H_i+1
72,2K52,D2,H,53.964,4.796,8.938,-1876,D2CA,False,2,2,0,H,H_i
76,2K52,D2,V3H,53.964,4.796,8.139,1497,V3,True,2,3,-1,H,H_i+1
94,2K52,V3,H,62.100,3.957,8.139,1901,V3CA,False,3,3,0,H,H_i
99,2K52,V3,E4H,62.100,3.957,8.578,7316,E4,True,3,4,-1,H,H_i+1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3195,2LX7,L59,Q58H,56.700,4.300,8.770,369,Q58,True,59,58,1,H,H_i-1
3198,2LX7,L59,H,56.700,4.300,9.000,698,L59CA,False,59,59,0,H,H_i
3205,2LX7,L59,L60H,56.700,4.300,8.000,5718,L60,True,59,60,-1,H,H_i+1
3272,2LX7,L60,L59H,56.700,4.210,9.000,461,L59,True,60,59,1,H,H_i-1


Calculating the intensities

In [15]:
df_hn.insert(7, 'rel_height', df_hn['height'].to_frame() / df_hn[['res', 'height']].groupby('res').transform('max'))

In [16]:
df_hn

Unnamed: 0,pdb_id,res,noe,X,Hn,H,height,rel_height,noe_res,inter,resnum,noe_resnum,res_diff,atom_type,atom_type_pos
6,2K52,M1,D2H,55.325,4.091,8.938,28325,1.000000,D2,True,1,2,-1,H,H_i+1
72,2K52,D2,H,53.964,4.796,8.938,-1876,-1.253173,D2CA,False,2,2,0,H,H_i
76,2K52,D2,V3H,53.964,4.796,8.139,1497,1.000000,V3,True,2,3,-1,H,H_i+1
94,2K52,V3,H,62.100,3.957,8.139,1901,0.259841,V3CA,False,3,3,0,H,H_i
99,2K52,V3,E4H,62.100,3.957,8.578,7316,1.000000,E4,True,3,4,-1,H,H_i+1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3195,2LX7,L59,Q58H,56.700,4.300,8.770,369,0.064533,Q58,True,59,58,1,H,H_i-1
3198,2LX7,L59,H,56.700,4.300,9.000,698,0.122071,L59CA,False,59,59,0,H,H_i
3205,2LX7,L59,L60H,56.700,4.300,8.000,5718,1.000000,L60,True,59,60,-1,H,H_i+1
3272,2LX7,L60,L59H,56.700,4.210,9.000,461,0.471370,L59,True,60,59,1,H,H_i-1


Leaving only Gly as residues $i$ (i.e. in w1 and w2 dimensions)

In [None]:
df_gly = df[df.res.str.match(r'^G')]

In [None]:
df_gly

## Calculting atom ranks

### NOEs → $H^N$

In [None]:
px.histogram(df, x='height')