# BINN - Biologically Informed Neural Network

This notebooks demonstrates some examples on how a BINN can be created.

Read some test data. This requires and input and a pathway file. These correspond to the first layer (input) and intermediary (hidden) layers in the model. We also include the option to have a translation-file which maps the input to the intermediary layers.

In this example, the input layers consist of proteins with UniProt IDs and the intermediary layers consist of biological pathways with Reactome IDs. The translation file maps the UniProt IDs to the Reactome IDs.

In [1]:
import pandas as pd

input_data = pd.read_csv("../data/test_qm.csv")
mapping = pd.read_csv("../data/uniprot_2_reactome_2025_01_14.txt", sep="\t")
pathways = pd.read_csv("../data/reactome_pathways_relation_2025_01_14.txt", sep="\t")

input_data = input_data["Protein"].tolist()
pathways = list(pathways.itertuples(index=False, name=None))
mapping = list(mapping.itertuples(index=False, name=None))


The first step is to create the network as described above.

In [3]:
from binn import PathwayNetwork
network = PathwayNetwork(
    input_data=input_data,
    pathways=pathways,
    mapping=mapping,
)

Thereafter we can create a BINN (model). The BINN is implemented in PyTorch Lightning and takes the network as input argument, as well as some other arguments.

In [4]:
from binn import BINN

binn = BINN(
    network=network,
    n_layers=5,
    dropout=0.2,
    validate=False,
    )
binn.layers


BINN is on the device: cpu


Sequential(
  (Layer_0): Linear(in_features=288, out_features=270, bias=True)
  (BatchNorm_0): BatchNorm1d(270, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_0): Dropout(p=0.2, inplace=False)
  (Tanh 0): Tanh()
  (Layer_1): Linear(in_features=270, out_features=265, bias=True)
  (BatchNorm_1): BatchNorm1d(265, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_1): Dropout(p=0.2, inplace=False)
  (Tanh 1): Tanh()
  (Layer_2): Linear(in_features=265, out_features=254, bias=True)
  (BatchNorm_2): BatchNorm1d(254, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_2): Dropout(p=0.2, inplace=False)
  (Tanh 2): Tanh()
  (Layer_3): Linear(in_features=254, out_features=211, bias=True)
  (BatchNorm_3): BatchNorm1d(211, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_3): Dropout(p=0.2, inplace=False)
  (Tanh 3): Tanh()
  (Layer_4): Linear(in_features=211, out_features=96, bias=True)
  (BatchNor

In [5]:
binn.trainable_params

4079

Looking at the layer names, we see that these correspond to the input and intermediary layers in the model.

In [6]:
layers = binn.layer_names
layers[0][0]

'A0M8Q6'