# Basic usage

*`skorch`* is designed to maximize interoperability between `sklearn` and `pytorch`. The aim is to keep 99% of the flexibility of `pytorch` while being able to leverage most features of `sklearn`. Below, we show the basic usage of `skorch` and how it can be combined with `sklearn`.

This notebook shows you how to use the basic functionality of `skorch`.

### Table of contents

* [Definition of the pytorch module](#Definition-of-the-pytorch-module)
* [Training a classifier](#Training-a-classifier-and-making-predictions)
  * [Dataset](#A-toy-binary-classification-task)
  * [pytorch module](#Definition-of-the-pytorch-classification-module)
  * [Model training](#Defining-and-training-the-neural-net-classifier)
  * [Inference](#Making-predictions,-classification)
* [Training a regressor](#Training-a-regressor)
  * [Dataset](#A-toy-regression-task)
  * [pytorch module](#Definition-of-the-pytorch-regression-module)
  * [Model training](#Defining-and-training-the-neural-net-regressor)
  * [Inference](#Making-predictions,-regression)
* [Saving and loading a model](#Saving-and-loading-a-model)
  * [Whole model](#Saving-the-whole-model)
  * [Only parameters](#Saving-only-the-model-parameters)
* [Usage with an sklearn Pipeline](#Usage-with-an-sklearn-Pipeline)
* [Callbacks](#Callbacks)
* [Grid search](#Usage-with-sklearn-GridSearchCV)
  * [Special prefixes](#Special-prefixes)
  * [Performing a grid search](#Performing-a-grid-search)

In [1]:
import torch
from torch import nn
import torch.nn.functional as F

In [2]:
torch.manual_seed(0);

## Training a classifier and making predictions

### A toy binary classification task

We load a toy classification task from `sklearn`.

In [3]:
import numpy as np
from sklearn.datasets import make_classification

In [4]:
X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)

In [5]:
X.shape, y.shape, y.mean()

((1000, 20), (1000,), 0.5)

### Definition of the `pytorch` classification `module`

We define a vanilla neural network with two hidden layers. The output layer should have 2 output units since there are two classes. In addition, it should have a softmax nonlinearity, because later, when calling `predict_proba`, the output from the `forward` call will be used.

In [6]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin
        self.dropout = dropout

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(dropout)
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 2)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = F.relu(self.dense1(X))
        X = F.softmax(self.output(X))
        return X

### Defining and training the neural net classifier

We use `NeuralNetClassifier` because we're dealing with a classifcation task. The first argument should be the `pytorch module`. As additional arguments, we pass the number of epochs and the learning rate (`lr`), but those are optional.

*Note*: To use the cuda backend, pass `use_cuda=True` as an additional argument.

In [7]:
from skorch.net import NeuralNetClassifier

In [8]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    # use_cuda=True,  # uncomment this to train with CUDA
)

As in `sklearn`, we call `fit` passing the input data `X` and the targets `y`. By default, `NeuralNetClassifier` makes a `StratifiedKFold` split on the data (80/20) to track the validation loss. This is shown, as well as the train loss and the accuracy on the validation set.

In [9]:
net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.7111[0m       [32m0.5100[0m        [35m0.6894[0m  0.1407
      2        [36m0.6928[0m       [32m0.5500[0m        [35m0.6803[0m  0.0775
      3        [36m0.6833[0m       [32m0.5650[0m        [35m0.6741[0m  0.0726
      4        [36m0.6763[0m       [32m0.5850[0m        [35m0.6674[0m  0.0670
      5        [36m0.6727[0m       [32m0.6450[0m        [35m0.6616[0m  0.0656
      6        [36m0.6606[0m       [32m0.6600[0m        [35m0.6536[0m  0.0969
      7        [36m0.6560[0m       0.6600        [35m0.6443[0m  0.0625
      8        [36m0.6427[0m       [32m0.6650[0m        [35m0.6354[0m  0.0646
      9        [36m0.6300[0m       [32m0.6800[0m        [35m0.6264[0m  0.0758
     10        [36m0.6289[0m       0.6800        [35m0.6189[0m  0.1337
     11        [36m0.6241[0m       [32m0.7150[0m        [35

<skorch.net.NeuralNetClassifier at 0x7ff019ec0908>

Also, as in `sklearn`, you may call `predict` or `predict_proba` on the fitted model.

### Making predictions, classification

In [10]:
y_pred = net.predict(X[:5])
y_pred

array([1, 0, 0, 0, 0])

In [11]:
y_proba = net.predict_proba(X[:5])
y_proba

array([[ 0.33409804,  0.66590196],
       [ 0.65906334,  0.34093666],
       [ 0.70409262,  0.29590738],
       [ 0.70345545,  0.29654452],
       [ 0.65079051,  0.34920952]], dtype=float32)

## Training a regressor

### A toy regression task

In [12]:
from sklearn.datasets import make_regression

In [13]:
X_regr, y_regr = make_regression(1000, 20, n_informative=10, random_state=0)
X_regr = X_regr.astype(np.float32)
y_regr = y_regr.astype(np.float32) / 100
y_regr = y_regr.reshape(-1, 1)

In [14]:
X_regr.shape, y_regr.shape, y_regr.min(), y_regr.max()

((1000, 20), (1000, 1), -6.4901485, 6.1545048)

*Note*: Regression currently requires the target to be 2-dimensional, hence the need to reshape. This should be fixed with an upcoming version of pytorch.

### Definition of the `pytorch` regression `module`

Again, define a vanilla neural network with two hidden layers. The main difference is that the output layer only has one unit and does not apply a softmax nonlinearity.

In [15]:
class RegressorModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
    ):
        super(RegressorModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 1)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = F.relu(self.dense1(X))
        X = self.output(X)
        return X

### Defining and training the neural net regressor

Training a regressor is almost the same as training a classifier. Mainly, we use `NeuralNetRegressor` instead of `NeuralNetClassifier` (this is the same terminology as in `sklearn`).

In [16]:
from skorch.net import NeuralNetRegressor

In [17]:
net_regr = NeuralNetRegressor(
    RegressorModule,
    max_epochs=20,
    lr=0.1,
    # use_cuda=True,  # uncomment this to train with CUDA
)

In [18]:
net_regr.fit(X_regr, y_regr)

  epoch    train_loss    valid_loss     dur
-------  ------------  ------------  ------
      1        [36m4.6084[0m        [32m3.8757[0m  0.0557
      2        [36m3.9371[0m        [32m2.2167[0m  0.0525
      3        [36m1.0512[0m        [32m0.2644[0m  0.0349
      4        [36m0.2770[0m        0.3933  0.0363
      5        0.3052        [32m0.1762[0m  0.0328
      6        [36m0.1650[0m        [32m0.1601[0m  0.0324
      7        [36m0.1115[0m        [32m0.0990[0m  0.0327
      8        [36m0.1067[0m        0.1418  0.0387
      9        [36m0.0907[0m        [32m0.0828[0m  0.0373
     10        [36m0.0792[0m        [32m0.0760[0m  0.0381
     11        [36m0.0470[0m        [32m0.0529[0m  0.0389
     12        0.0500        [32m0.0426[0m  0.0387
     13        [36m0.0266[0m        [32m0.0365[0m  0.0362
     14        0.0346        [32m0.0255[0m  0.0398
     15        [36m0.0170[0m        0.0269  0.0374
     16        0.0253        [32m0.

<skorch.net.NeuralNetRegressor at 0x7ff018084518>

### Making predictions, regression

You may call `predict` or `predict_proba` on the fitted model. For regressions, both methods return the same value.

In [19]:
y_pred = net_regr.predict(X_regr[:5])
y_pred

array([[ 0.70368975],
       [-1.37799883],
       [-0.60438287],
       [ 0.0090515 ],
       [-0.52674961]], dtype=float32)

## Saving and loading a model

Save and load either the whole model by using pickle or just the learned model parameters by calling `save_params` and `load_params`.

### Saving the whole model

In [20]:
import pickle

In [21]:
file_name = '/tmp/mymodel.pkl'

In [22]:
with open(file_name, 'wb') as f:
    pickle.dump(net, f)

  "type " + obj.__name__ + ". It won't be checked "


In [23]:
with open(file_name, 'rb') as f:
    new_net = pickle.load(f)

### Saving only the model parameters

This only saves and loads the proper `module` parameters, meaning that hyperparameters such as `lr` and `max_epochs` are not saved. Therefore, to load the model, we have to re-initialize it beforehand.

In [24]:
net.save_params(file_name)  # a file handler also works

In [25]:
# first initialize the model
new_net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
).initialize()

In [26]:
new_net.load_params(file_name)

## Usage with an `sklearn Pipeline`

It is possible to put the `NeuralNetClassifier` inside an `sklearn Pipeline`, as you would with any `sklearn` classifier.

In [27]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [28]:
pipe = Pipeline([
    ('scale', StandardScaler()),
    ('net', net),
])

In [29]:
pipe.fit(X, y)

Re-initializing module!
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.6982[0m       [32m0.4950[0m        [35m0.6960[0m  0.1439
      2        [36m0.6943[0m       [32m0.5300[0m        [35m0.6943[0m  0.1277
      3        [36m0.6934[0m       0.5100        [35m0.6927[0m  0.0856
      4        [36m0.6924[0m       0.5150        [35m0.6912[0m  0.0682
      5        0.6931       0.5100        [35m0.6899[0m  0.0555
      6        [36m0.6897[0m       [32m0.5350[0m        [35m0.6891[0m  0.0640
      7        [36m0.6884[0m       [32m0.5450[0m        [35m0.6877[0m  0.0596
      8        [36m0.6821[0m       [32m0.5650[0m        [35m0.6856[0m  0.0535
      9        0.6854       [32m0.5800[0m        [35m0.6835[0m  0.0585
     10        [36m0.6801[0m       [32m0.6000[0m        [35m0.6813[0m  0.0571
     11        [36m0.6781[0m       [32m0.6150[0m        [35m0.

Pipeline(memory=None,
     steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('net', <skorch.net.NeuralNetClassifier object at 0x7ff019ec0908>)])

In [30]:
y_proba = pipe.predict_proba(X[:5])
y_proba

array([[ 0.52606189,  0.47393814],
       [ 0.56090653,  0.43909347],
       [ 0.58122021,  0.41877982],
       [ 0.58913922,  0.41086078],
       [ 0.58016819,  0.41983184]], dtype=float32)

To save the whole pipeline, including the pytorch module, use `pickle`.

## Callbacks

Adding a new callback to the model is straightforward. Below we show how to add a new callback that determines the area under the ROC (AUC) score.

In [31]:
from skorch.callbacks import Scoring
from skorch.utils import to_numpy

There is a scoring callback in skorch, `Scoring`, which we use for this. We need to specify a `name` of the score, as well as which score to calculate. Here we just pass a string, `'roc_auc_score'`, as score. For a list of all existing scores, look [here](http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics). We could also pass a function with the signature `func(model, X, y) -> score`, or `None`, in which case the `score` method of the model is used. Note that this is exactly the same behavior as in `sklearn`.

In [32]:
auc = Scoring(
    name='AUC',
    scoring='roc_auc_score',
    lower_is_better=False,
    pred_extractor=lambda y_proba: to_numpy(y_proba)[:, 1],
)

Furthermore, we should tell the callback that higher scores are better (to get the correct colors printed below), and how to extract the data from the prediction. The latter must be specified because sometimes we need the class predictions, sometimes the class probabilities, and sometimes, as in this example, only probability of the `1` class. Moreover, we must convert the data from `torch` tensors to `numpy` arrays (using the helper function `to_numpy`).

In [33]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    callbacks=[auc],
)

Finally, we pass the scoring callback to the `callbacks` parameter as a list and then call `fit`. Notice that we get the printed scores and color highlighting for free.

In [34]:
net.fit(X, y)

  epoch     AUC    train_loss    valid_acc    valid_loss     dur
-------  ------  ------------  -----------  ------------  ------
      1  [36m0.5558[0m        [32m0.7150[0m       [35m0.4950[0m        [31m0.6935[0m  0.0745
      2  [36m0.6040[0m        [32m0.6811[0m       [35m0.5100[0m        [31m0.6756[0m  0.0589
      3  [36m0.6170[0m        [32m0.6694[0m       [35m0.5350[0m        [31m0.6661[0m  0.0555
      4  [36m0.6295[0m        [32m0.6505[0m       [35m0.5950[0m        [31m0.6557[0m  0.0568
      5  [36m0.6478[0m        [32m0.6447[0m       [35m0.6250[0m        [31m0.6457[0m  0.0577
      6  [36m0.6578[0m        [32m0.6389[0m       [35m0.6300[0m        [31m0.6359[0m  0.0550
      7  [36m0.6692[0m        [32m0.6282[0m       [35m0.6350[0m        [31m0.6277[0m  0.0557
      8  [36m0.6897[0m        [32m0.6159[0m       [35m0.6600[0m        [31m0.6172[0m  0.0674
      9  [36m0.7035[0m        [32m0.6085[0m       [35m0.6

<skorch.net.NeuralNetClassifier at 0x7ff01805e5c0>

For information on how to write custom callbacks, have a look at the *Advanced_Usage* notebook.

## Usage with sklearn `GridSearchCV`

### Special prefixes

The `NeuralNet` class allows to directly access parameters of the `pytorch module` by using the `module__` prefix. So e.g. if you defined the `module` to have a `num_units` parameter, you can set it via the `module__num_units` argument. This is exactly the same logic that allows to access estimator parameters in `sklearn Pipeline`s and `FeatureUnion`s.

This feature is useful in several ways. For one, it allows to set those parameters in the model definition. Furthermore, it allows you to set parameters in an `sklearn GridSearchCV` as shown below.

In addition to the parameters prefixed by `module__`, you may access a couple of other attributes, such as those of the optimizer by using the `optimizer__` prefix (again, see below). All those special prefixes are stored in the `prefixes_` attribute:

In [35]:
print(', '.join(net.prefixes_))

module, iterator_train, iterator_test, optim, criterion, callbacks


### Performing a grid search

Below we show how to perform a grid search over the learning rate (`lr`), the module's number of hidden units (`module__num_units`), the module's dropout rate (`module__dropout`), and whether the SGD optimizer should use Nesterov momentum or not (`optimizer__nesterov`).

In [36]:
from sklearn.model_selection import GridSearchCV

In [37]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    verbose=0,
    optimizer__momentum=0.9,
)

In [38]:
params = {
    'lr': [0.05, 0.1],
    'module__num_units': [10, 20],
    'module__dropout': [0, 0.5],
    'optimizer__nesterov': [False, True],
}

In [39]:
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', verbose=2)

In [40]:
gs.fit(X, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False, total=   1.0s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s


[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False, total=   1.0s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False, total=   1.2s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True, total=   1.7s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True, total=   2.2s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True, total=   1.1s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optim__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optim__nesterov=False, total=   0.8s
[CV] lr=0.05, module__dropout=0, module__num_units=20, opt

[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:   45.6s finished


GridSearchCV(cv=3, error_score='raise',
       estimator=<skorch.net.NeuralNetClassifier object at 0x7ff018061208>,
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'lr': [0.05, 0.1], 'module__num_units': [10, 20], 'module__dropout': [0, 0.5], 'optim__nesterov': [False, True]},
       pre_dispatch='2*n_jobs', refit=False, return_train_score=True,
       scoring='accuracy', verbose=2)

In [41]:
print(gs.best_score_, gs.best_params_)

0.851 {'lr': 0.1, 'module__dropout': 0, 'module__num_units': 20, 'optim__nesterov': True}


Of course, we could further nest the `NeuralNetClassifier` within an `sklearn Pipeline`, in which case we just prefix the parameter by the name of the net (e.g. `net__module__num_units`).