# Check of the NOE quantitative reliability

## Todo:
- H1: the most intense peak per spin system is HA(i)
- H2: the 2nd most intense is HA(i-1) and the 3rd is HN(i-1)
- H3: the most intense inter-residual HN peak per spin system is HN(i-1)

In [1]:
import pandas as pd
import numpy as np

from functions import *

In [2]:
pdb_ids = ['2LEA', '2K52', '2LTM', '2KD0', '2LF2', ]
pdb_id = pdb_ids[0]

## Reading the individual 3D $^{15}N$-NOESY peak lists

In [3]:
path = f'~/Sparky/Lists/{pdb_id}.list' # set correctly

# Reading the data
df = pd.read_csv(path, header=0, index_col=None, sep='\s+')
df = tidy_list(df)

# Why do we have negative NOEs? 
# Anyway, the phase is not important for this analysis, just remove the sign
df['height'] = np.abs(df.height)

df

Unnamed: 0,res,noe,N,Hn,H,height,noe_res,inter,resnum,noe_resnum
0,S1,H,116.240,8.111,8.111,1571,S1,False,1,1
1,S1,HA,116.240,8.111,4.389,1756,S1,False,1,1
2,S1,HB2,116.240,8.111,3.750,2457,S1,False,1,1
3,S1,HB3,116.240,8.111,3.750,2457,S1,False,1,1
4,S1,Y2H,116.240,8.111,8.062,554,Y2,True,1,2
...,...,...,...,...,...,...,...,...,...,...
1789,S100,H99HB3,123.154,8.119,3.245,1350,S100,False,100,100
1790,S100,H,123.154,8.119,8.119,69967,S100,False,100,100
1791,S100,HA,123.154,8.119,4.258,2956,S100,False,100,100
1792,S100,HB2,123.154,8.119,3.849,3318,S100,False,100,100


Removing the side-chains

In [4]:
df = df.loc[~ (df.res.str.contains('ND') | df.res.str.contains('NE'))]

In [5]:
df.shape

(1715, 10)

# How many $H^{i-k}_{\alpha}$s are stronger than $H^{i}_{\alpha}$?

In [6]:
df_intra_Ha = df[~df.inter & df.noe.str.contains('HA')]
df_inter_Ha = df[df.inter & df.noe.str.contains('HA')]

In [16]:
df_intra_Ha

Unnamed: 0,res,noe,N,Hn,H,height,noe_res,inter,resnum,noe_resnum
1,S1,HA,116.240,8.111,4.389,1756,S1,False,1,1
10,Y2,HA,121.776,8.062,4.545,5887,Y2,False,2,2
22,G3,HA2,110.102,8.221,3.860,11348,G3,False,3,3
23,G3,HA3,110.102,8.221,3.860,11348,G3,False,3,3
29,R4,HA,121.307,7.967,4.582,8560,R4,False,4,4
...,...,...,...,...,...,...,...,...,...,...
1757,G92,HA2,110.707,7.969,3.642,848,G92,False,92,92
1758,G92,HA3,110.707,7.969,3.782,457,G92,False,92,92
1764,R93,HA,121.504,7.930,4.465,7823,R93,False,93,93
1776,S97,HA,115.594,8.128,4.348,2611,S97,False,97,97


In [7]:
n_anomalies = get_n_anomalies(df_strong=df_intra_Ha,
                              df_weak=df_inter_Ha)

45


For 2LEA, there are 45 cases where an inter-residual NOE is stronger than the very own NOE peak!

In [8]:
compare_strongest_noes(df_intra_Ha, df_inter_Ha)

Unnamed: 0_level_0,height_intra,height_inter,noe_resnum
resnum,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1756,0,0
2,5887,13351,1
3,11348,7299,2
4,8560,28898,3
8,14410,32052,7
...,...,...,...
92,848,631,91
92,457,631,91
93,7823,13397,92
97,2611,0,0


## The most intense peaks of a spin system are either $H^A_{i}$, $H^A_{i-1}$, or $H^N_{i-1}$

In [9]:
df.set_index('res')

Unnamed: 0_level_0,noe,N,Hn,H,height,noe_res,inter,resnum,noe_resnum
res,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
S1,H,116.240,8.111,8.111,1571,S1,False,1,1
S1,HA,116.240,8.111,4.389,1756,S1,False,1,1
S1,HB2,116.240,8.111,3.750,2457,S1,False,1,1
S1,HB3,116.240,8.111,3.750,2457,S1,False,1,1
S1,Y2H,116.240,8.111,8.062,554,Y2,True,1,2
...,...,...,...,...,...,...,...,...,...
S100,H99HB3,123.154,8.119,3.245,1350,S100,False,100,100
S100,H,123.154,8.119,8.119,69967,S100,False,100,100
S100,HA,123.154,8.119,4.258,2956,S100,False,100,100
S100,HB2,123.154,8.119,3.849,3318,S100,False,100,100


Removing the diagonals (Hn-N-Hn peaks)

In [10]:
df = df.query('Hn != H')

In [11]:
df.shape

(1624, 10)

In [14]:
idx_strongest_in_spinsys = df[['res', 'height']].groupby('res').idxmax()\
    .height.to_list()

df_sss = df.loc[idx_strongest_in_spinsys].sort_values('resnum')
df_sss

Unnamed: 0,res,noe,N,Hn,H,height,noe_res,inter,resnum,noe_resnum
2,S1,HB2,116.240,8.111,3.750,2457,S1,False,1,1
6,Y2,S1HA,121.776,8.062,4.389,13351,S1,True,2,1
22,G3,HA2,110.102,8.221,3.860,11348,G3,False,3,3
26,R4,G3HA2,121.307,7.967,3.860,28898,G3,True,4,3
46,D8,E10H,119.234,8.260,8.304,32275,E10,True,8,10
...,...,...,...,...,...,...,...,...,...,...
1760,R93,G92H,121.504,7.930,7.969,27619,G92,True,93,92
1769,D96,HB2,120.004,8.289,2.627,19180,D96,False,96,96
1776,S97,HA,115.594,8.128,4.348,2611,S97,False,97,97
1784,H99,HB2,120.371,7.251,3.168,227,H99,False,99,99


How many strongest-for-the-spin-system NOEs come from $i-1$ residue?

In [15]:
df_sss['im1'] = df_sss.resnum - df_sss.noe_resnum == 1

print("Residue count where the peaks from the own spinsys are not the strongest: ", df_sss['inter'].sum())
print()
print("Residue count where the strongest NOE comes from residue i-1: ", df_sss['im1'].sum())

Residue count where the peaks from the own spinsys are not the strongest:  57

Residue count where the strongest NOE comes from residue i-1:  43
