Notebook for reading and manipulating data provided in the paper *PDZ Domain Selectivity is optimized across the mouse proteome* by Stiffler et al. The MDSM model is also implemented. 

In [1]:
import os
os.chdir('E:\Ecole\Year 3\Projet 3A')

In [2]:
import pandas as pd 
import numpy as np 

The data is provided in a multi-dimensional array. We write two classes: One called Domain which holds the data particular to the domain such as the values of the $\theta_{i,p,q}$ and the threshold values. 

The other class called Data holds all the data provided in the excel file. This class contains a list of Domains as well as a list of amino acids. 

These two classes make it easy to manipulate and read data. For testing the model proposed in the paper, it is important that we extract the $\theta_{p,q}$ for each domain. This is where the class Domain will be useful 

In [3]:
class Domain:
    def __init__(self, name):
        self.name = name
        self.thresholds = None
        self.thetas = None

In [4]:
class Data:
    def __init__(self, filename):
        self.filename = filename
        temp_df = pd.read_excel(self.filename)
        self.aminoacids = [acid.encode('utf-8') for acid in list(temp_df.columns[:20])]
        self.df = temp_df.T
        self.domains = [Domain(domain.encode('utf-8')) for domain in list(self.df.columns)]
        self.names = [domain.name for domain in self.domains]
    def create_domains(self):
        for domain in self.domains:
            domain.thetas = self.df[domain.name][:100]
            domain.thetas = np.asarray(domain.thetas)
            domain.thetas = domain.thetas.reshape(5,20)
            domain.thresholds = np.asarray(self.df[domain.name][100:])  

In [5]:
PDZ_Data = Data('Data_PDZ/MDSM_01_stiffler_bis.xls')
PDZ_Data.create_domains()

Let us browse the data now using the these two classes. Lets start off with the amino acids.

In [6]:
print PDZ_Data.aminoacids

['G', 'A', 'V', 'L', 'I', 'M', 'P', 'F', 'W', 'S', 'T', 'N', 'Q', 'Y', 'C', 'K', 'R', 'H', 'D', 'E']


Let us now see what are the different PDZ domains that we have in the data.

In [7]:
print PDZ_Data.names

['Cipp (03/10)', 'Cipp (05/10)', 'Cipp (08/10)', 'Cipp (09/10)', 'Cipp (10/10)', 'D930005D10Rik (1/1)', 'Dlgh3 (1/1)', 'Dvl1 (1/1)', 'Dvl2 (1/1)', 'Dvl3 (1/1)', 'Erbin (1/1)', 'Gm1582 (2/3)', 'GRASP55 (1/1)', 'Grip1 (6/7)', 'Grip2 (5/7)', 'Harmonin (2/3)', 'HtrA1 (1/1)', 'HtrA3 (1/1)', 'Interleukin 16 (1/4)', 'LARG (1/1)', 'LIN-7A (1/1)', 'Lin7c (1/1)', 'Lnx1 (2/4)', 'Lrrc7 (1/1)', 'Magi-1 (2/6)', 'Magi-1 (4/6)', 'Magi-1 (6/6)', 'Magi-2 (5/6)', 'Magi-2 (6/6)', 'Magi-3 (2/5)', 'Magi-3 (5/5)', 'Mpp7 (1/1)', 'MUPP1 (01/13)', 'MUPP1 (05/13)', 'MUPP1 (10/13)', 'MUPP1 (11/13)', 'MUPP1 (12/13)', 'MUPP1 (13/13)', 'NHERF-1 (1/2)', 'NHERF-2 (2/2)', 'nNOS (1/1)', 'PAR-3 (3/3)', 'PAR3B (1/3)', 'PAR6B (1/1)', 'Pdlim5 (1/1)', 'Pdzk1 (1/4)', 'Pdzk1 (3/4)', 'Pdzk3 (1/1)', 'Pdzk3 (2/2)', 'Pdzk11 (1/1)', 'PDZ-RGS3 (1/1)', 'PSD95 (1/3)', 'PTP-BL (2/5)', 'SAP97 (1/3)', 'SAP97 (3/3)', 'SAP102 (3/3)', 'Scrb1 (1/4)', 'Scrb1 (2/4)', 'Scrb1 (3/4)', 'Semcap3 (1/2)', 'Shank1 (1/1)', 'Shank3 (1/1)', 'Shroom (1/1)

Let us now explore the data for a given PDZ domain, for example the 20th one. You can access the domains by their index. The 20th domain is *LIN-7A (1/1)*

In [8]:
PDZ_Data.domains[20]
print PDZ_Data.domains[20].name

LIN-7A (1/1)


In [9]:
test_domain = PDZ_Data.domains[20]
print test_domain.name

LIN-7A (1/1)


Each domain has two variables: thetas and thresholds. The thetas are the $\theta_{i,p,q}$ as mentioned in the paper, whereas the thresholds are the values used for determining whether the PDZ domain binds to a given peptide or not

In [10]:
print test_domain.thresholds

[ 7.1429  7.1429  7.3827]


The thetas form a 5X20 matrix, that is 5 positions in the C-terminal of the peptide considered and the 20 aminon acids. You can access the data for each of the positions using the index. We note that the position -4 in the peptide corresponds to the index 0, the position -3 to the index 1 and so on.

In [11]:
print test_domain.thetas[0]

[ 0.       -0.019589  0.062421  0.68493   0.49776  -0.47603   0.2779
 -1.1723   -0.44707  -0.06063  -0.23115  -0.1164   -0.1734   -0.73216
 -0.45431   0.71784   0.95649   0.37813  -0.3057   -0.28117 ]


In [12]:
test_domain.thetas.shape

(5L, 20L)

Now we move on to the peptides. There were 217 peptides which were considered in the experiment. Each of these peptides were tested and modelled against the 74 PDZ domains which we treated earlier. Once we have the data, we simply add another variable to the class PDZ_Data and we are done.

In [13]:
pep_seqs = []
pep_names = []
with open('Data_PDZ/peptides.free') as f:
    for line in f:
        x = line.split()
        pep_seqs.append(x[1])
        pep_names.append(x[0])
        

In [46]:
PDZ_Data.pep_seqs = pep_seqs
PDZ_Data.pep_names = pep_names
PDZ_Data.pep_names

['AN2',
 'APC',
 'Aquaporin4',
 'ASIC2',
 'Caspr2',
 'Cav2.2',
 'Cftr',
 'c-KIT',
 'Claudin1',
 'Cnksr2',
 'Connexin43',
 'CRIPT',
 'CtBP1',
 'Dlgap123',
 'EphA71',
 'EphB2',
 'EphrinB12',
 'ErbB4',
 'Frizzled',
 'GluR1',
 'GluR2_1',
 'GluR5_1',
 'GlycphrinC',
 'GRK6',
 'Htr2c',
 'JAM-1',
 'KIF17',
 'KIF1B',
 'Kir2.1',
 'Kv1.4',
 'Lgltminase',
 'Liprin2',
 'Megalin',
 'Mel1a/b',
 'mGluR3',
 'ctransprtr',
 'Nav1.4',
 'Nav1.5',
 'Neurxin1/2',
 'NMDAR2A',
 'NMDAR2B',
 'P2Y1',
 'Parkin',
 'PDGFR',
 'PFK-M',
 'PIX',
 'PKC',
 'PMCA1',
 'Ril',
 'Sapk3',
 'SSTR2',
 'Stargazin',
 'Syndecan1',
 'Syndecan2',
 'TAZ',
 'Trip6',
 'TRPC4',
 'AcvR1',
 'AcvR2',
 'AcvR2b',
 'Cacna1a',
 'Cav1.2',
 'Cav2.3',
 'Cav3.2',
 'ITPR3',
 'RYR2',
 'SERCA1',
 'SERCA2A',
 'SERCA3',
 'TPC1',
 'Claudin2',
 'Claudin3',
 'Claudin4',
 'Claudin5',
 'Claudin6',
 'Claudin7',
 'Claudin8',
 'Claudin9',
 'Claudin10',
 'Claudin11',
 'Claudin13',
 'Claudin14',
 'Claudin15',
 'Claudin16',
 'Claudin18',
 'Claudin19',
 'Claudin22',

Now that the peptides have been entered, we will create one-hot representations of the last 5 positions of the peptides. Then the model itself can be validated by a simple matrix product between the one-hot representation and the transpose of the $\theta$ matrix. Taking the trace of the matrix gives us the binding score $\phi_{i}$ which we can compare with the threshold for the PDZ domain.

In [15]:
test_pep = list(PDZ_Data.pep_seqs[0])
rel_pep = test_pep[5:]
pep_matrix = np.zeros((5,20), dtype=np.int)
for i in range(len(rel_pep)):
    j = PDZ_Data.aminoacids.index(rel_pep[i])
    pep_matrix[i,j] = 1
score = np.trace(np.dot(pep_matrix, test_domain.thetas.T))

In [16]:
print score

-1.88001


In [17]:
for thres in test_domain.thresholds:
    print score > thres

False
False
False


We now compare this prediction with the data provided by Stiffler et al. The name of the PDZ Domain is *LIN-7A (1/1)* and since we took the first peptide in the sequence which according to the table S2 from the Supplementary Material to the paper is derived from the protein *AN2*, we find that it is indeed true that *LIN-7A (1/1)* doesnt bind to *AN2*

To facilitate further data analysis, we create another object called Peptide which will store the name and sequence of each of the peptides in the data set. Once created, this peptide object will be added to the Data class. Thus all our data will stay in the Data class, while we can do other manipulations using the Domain or Peptide classes.

**UPDATE**:

Rather than creating a separate class for the peptides, we shall keep the current struture which is to have the sequences of the peptides and their names in the PDZ Data class. For the purposes of the monte-carlo simulation, we dont need to create a new class for the peptides, since the only thing we need is a random peptide sequence. 

Furthermore, we make slight changes to the function evaluate model: the one-hot representation is too heavy for the purposes of a monte-carlo simulation, especially if we are going to run many thousand runs for each of the domains. 

In [18]:
def evaluate_model_old(domain, peptide):
    test_pep = list(peptide)
    rel_pep = test_pep[5:]
    pep_matrix = np.zeros((5,20), dtype=np.int)
    for i in range(len(rel_pep)):
        j = PDZ_Data.aminoacids.index(rel_pep[i])
        pep_matrix[i,j] = 1
    score = np.trace(np.dot(pep_matrix, domain.thetas.T))
    print score
    for thres in domain.thresholds:
        print score > thres

In [19]:
def evaluate_model(domain, peptide):
    test_pep = list(peptide)
    if len(test_pep) > 5:
        rel_pep = test_pep[5:]
    else:
        rel_pep = test_pep
    score = 0.0
    for i in range(len(rel_pep)):
        j = PDZ_Data.aminoacids.index(rel_pep[i])
        score += domain.thetas[i,j]
    return score - domain.thresholds[0]
    

In [20]:
print PDZ_Data.domains[16].name
print PDZ_Data.pep_names[6]
evaluate_model_old(PDZ_Data.domains[16], PDZ_Data.pep_seqs[9])
evaluate_model(PDZ_Data.domains[16], PDZ_Data.pep_seqs[9])

HtrA1 (1/1)
Cftr
15.88935
True
True
True


10.72625

## Monte-Carlo Simulation

Before starting off with a full blown Monte Carlo simulation, we take sequences at random from the space of all possible sequences and see whether they bind to the domain or not. 

In [21]:
print np.random.randint(0,20,5)

[ 2  4  4 19  8]


In [22]:
def convert2seq(seq_int):
    return [PDZ_Data.aminoacids[i] for i in seq_int]

In [23]:
def create_batch(nb_runs = 10000):
    return [np.random.randint(0,20,5) for i in range(nb_runs)]

In [24]:
ran_seq = create_batch()

In [25]:
def sigmoid(x, a=1):
    return 1.0/(1+np.exp(-1.0*a*x))
def run_sim(domain):
    scores = []
    sigs = [] 
    for i in range(len(ran_seq)):
        score = 0.0
        for j in range(5):
            score += domain.thetas[j, ran_seq[i][j]]
        scores.append(score-domain.thresholds[0])
        sigs.append(sigmoid(score-domain.thresholds[0]))
    return scores, sigs
    

In [26]:
domain_for_mc = PDZ_Data.domains[3]
print domain_for_mc.name    

Cipp (09/10)


In [27]:
scores, sigs = run_sim(domain_for_mc)

In [28]:
non_zero_scores = []
for i in range(len(scores)):
    if scores[i] > 0.0:
        non_zero_scores.append(ran_seq[i])
print np.argmax(scores)
print ran_seq[np.argmax(scores)]
print scores[np.argmax(scores)]
convert2seq(ran_seq[np.argmax(scores)])

1344
[18  8  4 12  2]
11.09275


['D', 'W', 'I', 'Q', 'V']

In [29]:
len(non_zero_scores)

674

In [30]:
for i in range(len(PDZ_Data.pep_seqs)):
    x = evaluate_model(domain_for_mc, PDZ_Data.pep_seqs[i])
    if x > 0:
        print PDZ_Data.pep_names[i], x, list(PDZ_Data.pep_seqs[i])[5:]

CtBP1 5.33433 ['T', 'S', 'D', 'Q', 'L']
EphA71 9.1073 ['T', 'G', 'I', 'Q', 'V']
EphB2 10.4813 ['Q', 'S', 'V', 'E', 'V']
GluR2_1 3.00243 ['E', 'S', 'V', 'K', 'I']
Parkin 5.39562 ['H', 'W', 'F', 'D', 'V']
SSTR2 1.19969 ['I', 'I', 'A', 'W', 'V']
Claudin8 3.5778 ['K', 'S', 'Q', 'Y', 'V']
GluR3 3.00243 ['E', 'S', 'V', 'K', 'I']
EphA3 1.1592 ['G', 'P', 'V', 'P', 'V']
EphB3 9.723 ['L', 'P', 'V', 'Q', 'V']
EphB4 0.15674 ['P', 'A', 'Q', 'Q', 'F']
EphB6_1 7.4186 ['G', 'S', 'V', 'E', 'V']
FGFR2 2.75448 ['G', 'S', 'V', 'K', 'T']
ROR1 5.10049 ['I', 'S', 'A', 'E', 'V']


Let us  introduce the experimental interaction matrix calculated experimentally for the 74 PDZ Domains and the 217 peptide sequences. 

In [31]:
fp_interaction_matrix = pd.read_excel('Data_PDZ/fp_interaction_matrix.xlsx')
for column in fp_interaction_matrix.columns:
    fp_interaction_matrix.loc[fp_interaction_matrix[column] == 0.0, column] = -1.0

In [32]:
test = fp_interaction_matrix.T
new_columns = []
for name in list(test.columns):
    new_columns.append(name.encode('utf-8'))
test.columns = new_columns

PDZ_Data.int_matrix = test

In [33]:
PDZ_Data.int_matrix

Unnamed: 0,Cipp (03/10),Cipp (05/10),Cipp (08/10),Cipp (09/10),Cipp (10/10),D930005D10Rik (1/1),Dlgh3 (1/1),Dvl1 (1/1),Dvl2 (1/1),Dvl3 (1/1),...,b1-syntrophin (1/1),g2-syntrophin (1/1),SLIM (1/1),Tiam2 (1/1),TIP-1 (1/1),Whirlin (3/3),ZO-1 (1/3),ZO-1 (2/3),ZO-2 (1/3),ZO-3 (1/3)
AcvR1,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
AcvR2,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
AcvR2b,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
AN2,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
APC,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
Aquaporin 4,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
ASIC2,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
AXL,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
Cacna1a,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1
Caspr2,4162.31648,-1.00000,-1.00000,-1.00000,-1.00000,-1.00000,8186.37689,-1.00000,-1.00000,-1.00000,...,-1.00000,-1.00000,-1.00000,-1.00000,-1,-1.00000,-1.00000,-1,-1.00000,-1


We have created the interaction matrix as an object of the PDZ Data class. Now we shall calculate the scores and sigmoid values for each of the PDZ Domains for each of the peptides given. We shall further use the interaction matrix to change the sign of the input to the sigmoid function. 

In [34]:
test_interaction_matrix = PDZ_Data.int_matrix[PDZ_Data.domains[17].name]
for name in PDZ_Data.pep_names:
    print name

AN2
APC
Aquaporin4
ASIC2
Caspr2
Cav2.2
Cftr
c-KIT
Claudin1
Cnksr2
Connexin43
CRIPT
CtBP1
Dlgap123
EphA71
EphB2
EphrinB12
ErbB4
Frizzled
GluR1
GluR2_1
GluR5_1
GlycphrinC
GRK6
Htr2c
JAM-1
KIF17
KIF1B
Kir2.1
Kv1.4
Lgltminase
Liprin2
Megalin
Mel1a/b
mGluR3
ctransprtr
Nav1.4
Nav1.5
Neurxin1/2
NMDAR2A
NMDAR2B
P2Y1
Parkin
PDGFR
PFK-M
PIX
PKC
PMCA1
Ril
Sapk3
SSTR2
Stargazin
Syndecan1
Syndecan2
TAZ
Trip6
TRPC4
AcvR1
AcvR2
AcvR2b
Cacna1a
Cav1.2
Cav2.3
Cav3.2
ITPR3
RYR2
SERCA1
SERCA2A
SERCA3
TPC1
Claudin2
Claudin3
Claudin4
Claudin5
Claudin6
Claudin7
Claudin8
Claudin9
Claudin10
Claudin11
Claudin13
Claudin14
Claudin15
Claudin16
Claudin18
Claudin19
Claudin22
Claudin23
EphrinB3
GluR2_2
GluR2_3
GluR3
GluR5_2
GluRdelta1
GluRdelta2
KA-2
mGluR1
NMDAR1
NMDAR2C
NMDAR2D
NMDAR3B
Nuroligin1
Nuroligin2
Nuroligin3
Caspr4
Neurexin3
Neurexin4
CNGA2
CNGA3
KCNAB2
KCNE1
KCNE4_1
KCNE4_2
KCNH1
KCNK3
KCNK4_1
KCNK4_2
KCNK5
KCNK6
KCNK7
KCNQ2
KCNQ3
Kir2.2
Kir3.1
Kir3.2_1
Kir3.2_2
Kir3.2_3
Kir3.3
Kir3.4
Kir4.1
Kir4.2
Kir5.

In [35]:
for domain in PDZ_Data.domains:
    domain.int_matrix = PDZ_Data.int_matrix[domain.name]
    domain.scores = []
    domain.sigs = []
    domain.mut_scores = []
    domain.mut_sigs = []

In [36]:
def calc_scores_true(domain):
    for i in range(len(PDZ_Data.pep_seqs)):
        x = evaluate_model(domain, PDZ_Data.pep_seqs[i])  
        domain.scores.append(x)
        domain.sigs.append(sigmoid(x))
        ##if domain.int_matrix[i] > 0:
          ##domain.sigs.append(sigmoid(x))
        ##else:
            ##domain.sigs.append(sigmoid(x, -1.0))
for domain in PDZ_Data.domains:
    calc_scores_true(domain)

In [37]:
for domain in PDZ_Data.domains:
    scores, sigs = run_sim(domain)
    domain.mut_scores = scores
    domain.mut_sigs = sigs
    

Till now, what we have been able to compute the binding affinity scores for 10000 mutated peptide sequences. The ones which interest us are the ones for which the scores predicted are positive. 

Let us take one concrete example. We shall take **Htr-A3** as an example domain. 

In [38]:
domain_for_demo = PDZ_Data.domains[17]

In [39]:
print domain_for_demo.name

HtrA3 (1/1)


For this given PDZ Domain, we first see the relevant scores for the real peptides used in the experiments

In [40]:
for i in range(len(PDZ_Data.pep_seqs)):
    if (domain_for_demo.scores[i] > 0.0) & (domain_for_demo.int_matrix[i] > 0.0):
        print i, PDZ_Data.pep_names[i], domain_for_demo.scores[i], domain_for_demo.int_matrix[i], list(PDZ_Data.pep_seqs[i])[5:]

8 Claudin1 48.3743 584.78748 ['G', 'K', 'D', 'Y', 'V']
22 GlycphrinC 24.18891 14676.921 ['K', 'E', 'Y', 'F', 'I']
24 Htr2c 2.7819 5886.19426 ['R', 'I', 'S', 'S', 'V']
25 JAM-1 3.718422 19150.72762 ['S', 'S', 'F', 'L', 'V']
33 Mel1a/b 19.2856 15260.25685 ['K', 'V', 'D', 'S', 'V']
43 PDGFR 9.36292 40164.1071 ['E', 'D', 'S', 'F', 'L']
190 Sema4b 0.21989 11185.79968 ['R', 'D', 'S', 'V', 'V']


In [41]:
max_ix = np.argmax(domain_for_demo.mut_scores)
print max_ix
print np.max(domain_for_demo.mut_scores)

9732
52.5624


In [42]:
print ran_seq[max_ix]

[15 15 12 13  0]


In [43]:
print convert2seq(ran_seq[max_ix])

['K', 'K', 'Q', 'Y', 'G']


In [44]:
for i in range(len(PDZ_Data.pep_seqs)):
    if (domain_for_demo.scores[i] > 0.0):
        print i, PDZ_Data.pep_names[i], domain_for_demo.scores[i], domain_for_demo.int_matrix[i], list(PDZ_Data.pep_seqs[i])[5:]

0 AN2 11.6355 -1.0 ['G', 'Q', 'Y', 'W', 'V']
4 Caspr2 10.569272 -1.0 ['K', 'E', 'W', 'L', 'I']
5 Cav2.2 2.27912 -1.0 ['Q', 'D', 'H', 'W', 'C']
8 Claudin1 48.3743 584.78748 ['G', 'K', 'D', 'Y', 'V']
22 GlycphrinC 24.18891 14676.921 ['K', 'E', 'Y', 'F', 'I']
24 Htr2c 2.7819 5886.19426 ['R', 'I', 'S', 'S', 'V']
25 JAM-1 3.718422 19150.72762 ['S', 'S', 'F', 'L', 'V']
33 Mel1a/b 19.2856 15260.25685 ['K', 'V', 'D', 'S', 'V']
36 Nav1.4 3.315262 -1.0 ['K', 'E', 'S', 'L', 'V']
38 Neurxin1/2 40.274 -1.0 ['K', 'E', 'Y', 'Y', 'V']
42 Parkin 0.61527 -1.0 ['H', 'W', 'F', 'D', 'V']
43 PDGFR 9.36292 40164.1071 ['E', 'D', 'S', 'F', 'L']
48 Ril 18.326162 -1.0 ['K', 'V', 'E', 'L', 'V']
50 SSTR2 21.7628 -1.0 ['I', 'I', 'A', 'W', 'V']
52 Syndecan1 33.2415 -1.0 ['E', 'E', 'F', 'Y', 'A']
53 Syndecan2 40.7557 -1.0 ['K', 'E', 'F', 'Y', 'A']
54 TAZ 5.0302 -1.0 ['F', 'L', 'T', 'W', 'L']
60 Cacna1a 19.94962 -1.0 ['E', 'D', 'D', 'W', 'C']
70 Claudin2 24.989964 -1.0 ['L', 'T', 'G', 'Y', 'V']
71 Claudin3 56.423 -1.0

In [45]:
domain_for_demo.int_matrix[71]

-1.0