# BINN - Biologically Informed Neural Network

This notebooks demonstrates some examples on how a BINN can be created.

Read some test data. This requires and input and a pathway file. These correspond to the first layer (input) and intermediary (hidden) layers in the model. We also include the option to have a translation-file which maps the input to the intermediary layers.

In this example, the input layers consist of proteins with UniProt IDs and the intermediary layers consist of biological pathways with Reactome IDs. The translation file maps the UniProt IDs to the Reactome IDs.

In [2]:
import pandas as pd

input_data = pd.read_csv("../data/test_data.tsv", sep="\t")
translation = pd.read_csv("../data/translation.tsv", sep="\t")
pathways = pd.read_csv("../data/pathways.tsv", sep="\t")


In [3]:
input_data.head()

Unnamed: 0,PeptideSequence,Charge,Decoy,Protein,CK_P1912_146,CK_P1912_147,CK_P1912_148,CK_P1912_150,CK_P1912_151,CK_P1912_152,...,TM_M2012_191,TM_M2012_192,TM_M2012_196,TM_M2012_197,TM_M2012_198,TM_M2012_199,TM_M2012_200,TM_M2012_202,TM_M2012_203,RetentionTime
0,VDRDVAPGTLC(UniMod:4)DVAGWGIVNHAGR,3,False,P00746,7238870.0,,,,,,...,,,,,,,,,,3749.82
1,VDRDVAPGTLC(UniMod:4)DVAGWGIVNHAGR,4,False,P00746,2681940.0,2634110.0,2297470.0,1935300.0,2181160.0,2615960.0,...,,519698.0,,,,,,2221730.0,,3593.61
2,VDTVDPPYPR,2,False,P04004,28535800.0,34874600.0,34586900.0,25820800.0,24657400.0,30830100.0,...,12486000.0,11995900.0,24003800.0,9802000.0,6933130.0,7297560.0,4328240.0,13002400.0,4716600.0,2502.15
3,AVTEQGAELSNEER,2,False,P27348,,,,,,,...,,,,340523.0,336960.0,435119.0,257422.0,,,1790.84
4,VDVIPVNLPGEHGQR,2,False,P02751,652100.0,,,,,,...,,,,,,,,,,3158.43


In [4]:
pathways.head()

Unnamed: 0,parent,child
0,R-HSA-109581,R-HSA-109606
1,R-HSA-109581,R-HSA-169911
2,R-HSA-109581,R-HSA-5357769
3,R-HSA-109581,R-HSA-75153
4,R-HSA-109582,R-HSA-140877


In [5]:
translation.head()

Unnamed: 0.1,Unnamed: 0,input,translation
0,1323,A0A075B6P5,R-HSA-166663
1,1324,A0A075B6P5,R-HSA-173623
2,1325,A0A075B6P5,R-HSA-198933
3,1326,A0A075B6P5,R-HSA-202733
4,1327,A0A075B6P5,R-HSA-2029481


The first step is to create the network as described above.

In [8]:
from binn import Network
network = Network(
    input_data=input_data,
    pathways=pathways,
    mapping=translation,
    input_data_column = "Protein", # specify the column for entities in input data
    source_column = "child", # defined by our pathways-file
    target_column = "parent",
)

Thereafter we can create a BINN (model). The BINN is implemented in PyTorch Lightning and takes the network as input argument, as well as some other arguments.

In [10]:
from binn import BINN

binn = BINN(
    network=network,
    n_layers=4,
    dropout=0.2,
    validate=False,
)
binn.layers

Sequential(
  (Layer_0): Linear(in_features=449, out_features=443, bias=True)
  (BatchNorm_0): BatchNorm1d(443, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_0): Dropout(p=0.2, inplace=False)
  (Tanh 0): Tanh()
  (Layer_1): Linear(in_features=443, out_features=285, bias=True)
  (BatchNorm_1): BatchNorm1d(285, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_1): Dropout(p=0.2, inplace=False)
  (Tanh 1): Tanh()
  (Layer_2): Linear(in_features=285, out_features=116, bias=True)
  (BatchNorm_2): BatchNorm1d(116, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_2): Dropout(p=0.2, inplace=False)
  (Tanh 2): Tanh()
  (Layer_3): Linear(in_features=116, out_features=28, bias=True)
  (BatchNorm_3): BatchNorm1d(28, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_3): Dropout(p=0.2, inplace=False)
  (Tanh 3): Tanh()
  (Output layer): Linear(in_features=28, out_features=2, bias=True)
)

Looking at the layer names, we see that these correspond to the input and intermediary layers in the model.

In [12]:
layers = binn.layer_names
layers[0]

['A0M8Q6',
 'O00194',
 'O00391',
 'O14786',
 'O14791',
 'O15145',
 'O43707',
 'O75369',
 'O75594',
 'O75636',
 'O75874',
 'O95399',
 'O95445',
 'O95497',
 'O95633',
 'O95678',
 'O96013',
 'P00325',
 'P00338',
 'P00352',
 'P00367',
 'P00450',
 'P00480',
 'P00488',
 'P00558',
 'P00734',
 'P00736',
 'P00738',
 'P00739',
 'P00740',
 'P00742',
 'P00746',
 'P00747',
 'P00748',
 'P00751',
 'P00915',
 'P00918',
 'P00966',
 'P01008',
 'P01009',
 'P01011',
 'P01019',
 'P01023',
 'P01024',
 'P01031',
 'P01033',
 'P01034',
 'P01042',
 'P01344',
 'P01591',
 'P01593',
 'P01601',
 'P01602',
 'P01611',
 'P01614',
 'P01619',
 'P01700',
 'P01714',
 'P01717',
 'P01742',
 'P01743',
 'P01763',
 'P01764',
 'P01766',
 'P01767',
 'P01780',
 'P01833',
 'P01834',
 'P01857',
 'P01859',
 'P01860',
 'P01861',
 'P01871',
 'P01876',
 'P01877',
 'P01880',
 'P01891',
 'P02144',
 'P02452',
 'P02533',
 'P02538',
 'P02647',
 'P02649',
 'P02652',
 'P02654',
 'P02655',
 'P02656',
 'P02671',
 'P02675',
 'P02679',
 'P02730',