# Scikit-learn API

For those not familiar with PyTorch, we've created a wrapper for scikit-learn. This contains the familiar fit/predict-methods.

In [1]:
from binn import BINNClassifier, Network, SuperLogger
import pandas as pd


  from .autonotebook import tqdm as notebook_tqdm


Similar to before, we load data and create a network, however, now we instead create a BINNClassifier object (this is the scikit-learn wrapper class).

In [13]:
pathways = pd.read_csv("../data/pathways.tsv", sep="\t")
translation = pd.read_csv("../data/translation.tsv", sep="\t")
input_data = pd.read_csv("../data/test_qm.csv")
design_matrix = pd.read_csv("../data/design_matrix.tsv", sep="\t")

network = Network(
    input_data=input_data,
    pathways=pathways,
    mapping=translation,
)

binn = BINNClassifier(
    pathways=network,
    n_layers=4,
    dropout=0.2,
    epochs=3,
    threads=10,
    logger=SuperLogger("logs/test")
)
binn.clf.features

Index(['A0M8Q6', 'O00194', 'O00391', 'O14786', 'O14791', 'O15145', 'O43707',
       'O75369', 'O75594', 'O75636',
       ...
       'Q9UBE0', 'Q9UBQ7', 'Q9UBR2', 'Q9UBX5', 'Q9UGM3', 'Q9UK55', 'Q9UNW1',
       'Q9Y490', 'Q9Y4L1', 'Q9Y6Z7'],
      dtype='object', length=449)

We have to make our data-matrix fit the input layer in the BINN. Then we fit the BINN.

In [11]:
from util_for_examples import generate_data, fit_data_matrix_to_network_input

X = fit_data_matrix_to_network_input(input_data, features=binn.clf.features)

X, y = generate_data(X, design_matrix)

X_test = X[:10]
X_train = X[10:]
y_test = y[:10]
y_train = y[10:]

binn.fit(X_train, y_train)

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.

  | Name   | Type             | Params
--------------------------------------------
0 | layers | Sequential       | 364 K 
1 | loss   | CrossEntropyLoss | 0     
--------------------------------------------
364 K     Trainable params
0         Non-trainable params
364 K     Total params
1.457     Total estimated model params size (MB)


Epoch 0:   0%|          | 0/24 [01:17<?, ?it/s]


Experiment logs directory logs/test/lightning_logs/version_1 exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved!
The number of training batches (24) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 2: 100%|██████████| 24/24 [00:00<00:00, 52.40it/s, loss=0.909, v_num=1, train_loss=0.901, train_acc=0.545]

`Trainer.fit` stopped: `max_epochs=3` reached.


Epoch 2: 100%|██████████| 24/24 [00:03<00:00,  6.25it/s, loss=0.909, v_num=1, train_loss=0.901, train_acc=0.545]


We can predict some instances.

In [12]:
binn.predict(X_test)

tensor([[ 0.9221,  0.6372],
        [-0.1756, -0.8618],
        [-0.7503, -0.4109],
        [ 0.9660, -0.8077],
        [ 2.0756, -1.2263],
        [ 1.2871, -0.4783],
        [-0.4326, -0.5821],
        [ 1.5961,  0.7129],
        [-2.0181,  2.4639],
        [-1.5397,  1.0532]])