# Basic usage

In [1]:
import torch
from torch import nn
import torch.nn.functional as F

In [2]:
torch.manual_seed(0);

*`skorch`* is designed to maximize interoperability between `sklearn` and `pytorch`. The aim is to keep 99% of the flexibility of `pytorch` while being able to leverage most features of `sklearn`. Below, we show the basic usage of `skorch` and how it can be combined with `sklearn`.

### Table of contents

* [Definition of the pytorch module](#Definition-of-the-pytorch-module)
* [Training a model and predicting](#Training-a-model-and-predicting)
* [Saving and loading a model](#Saving-and-loading-a-model)
* [Usage with an sklearn Pipeline](#Usage-with-an-sklearn-Pipeline)
* [Callbacks](#Callbacks)
* [Grid search](#Usage-with-sklearn-GridSearchCV)

## Toy dataset

This is a toy binary classification task

In [3]:
import numpy as np
from sklearn.datasets import make_classification

In [4]:
X, y = make_classification(1000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)

In [5]:
X.shape, y.shape, y.mean()

((1000, 20), (1000,), 0.5)

## Definition of the `pytorch module`

We define a vanilla neural network with two hidden layers. The output layer should have 2 output units since there are two classes. In addition, it should have a softmax nonlinearity, because later, when calling `predict_proba`, the output from the `forward` call will be used.

In [6]:
class MyModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=F.relu,
            dropout=0.5,
    ):
        super(MyModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin
        self.dropout = dropout

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(dropout)
        self.dense1 = nn.Linear(num_units, 10)
        self.output = nn.Linear(10, 2)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = F.relu(self.dense1(X))
        X = F.softmax(self.output(X))
        return X

## Training a model and predicting

We use `NeuralNetClassifier` because we're dealing with a classifcation task. The first argument should be the `pytorch module`. As additional arguments, we pass the number of epochs and the learning rate (`lr`), but those are optional.

*Note*: To use the cuda backend, pass `use_cuda=True` as an additional argument.

In [7]:
from skorch.net import NeuralNetClassifier

In [8]:
net = NeuralNetClassifier(
    MyModule,
    max_epochs=20,
    lr=0.1,
)

As in `sklearn`, we call `fit` passing the input data `X` and the targets `y`. By default, `NeuralNetClassifier` makes a `StratifiedKFold` split on the data (80/20) to track the validation loss. This is shown, as well as the train loss and the accuracy on the validation set.

In [9]:
pdb on

Automatic pdb calling has been turned ON


In [10]:
net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.7111[0m       [32m0.5100[0m        [35m0.6894[0m  0.1245
      2        [36m0.6928[0m       [32m0.5500[0m        [35m0.6803[0m  0.0601
      3        [36m0.6833[0m       [32m0.5650[0m        [35m0.6741[0m  0.0546
      4        [36m0.6763[0m       [32m0.5850[0m        [35m0.6674[0m  0.0865
      5        [36m0.6727[0m       [32m0.6450[0m        [35m0.6616[0m  0.0726
      6        [36m0.6606[0m       [32m0.6600[0m        [35m0.6536[0m  0.0697
      7        [36m0.6560[0m       0.6600        [35m0.6443[0m  0.0777
      8        [36m0.6427[0m       [32m0.6650[0m        [35m0.6354[0m  0.0789
      9        [36m0.6300[0m       [32m0.6800[0m        [35m0.6264[0m  0.0956
     10        [36m0.6289[0m       0.6800        [35m0.6189[0m  0.0537
     11        [36m0.6241[0m       [32m0.7150[0m        [35

<skorch.net.NeuralNetClassifier at 0x7fb8d31230f0>

Also, as in `sklearn`, you may call `predict` or `predict_proba` on the fitted model.

In [11]:
y_pred = net.predict(X[:5])
y_pred

array([1, 0, 0, 0, 0])

In [12]:
y_proba = net.predict_proba(X[:5])
y_proba

array([[ 0.33409804,  0.66590196],
       [ 0.65906334,  0.34093666],
       [ 0.70409262,  0.29590738],
       [ 0.70345545,  0.29654452],
       [ 0.65079051,  0.34920952]], dtype=float32)

## Saving and loading a model

Save and load either the whole model by using pickle or just the learned model parameters by calling `save_params` and `load_params`.

### The whole model

In [13]:
import pickle

In [14]:
file_name = '/tmp/mymodel.pkl'

In [15]:
with open(file_name, 'wb') as f:
    pickle.dump(net, f)

In [16]:
with open(file_name, 'rb') as f:
    new_net = pickle.load(f)

### Only the parameters

This only saves and loads the proper `module` parameters, meaning that hyperparameters such as `lr` and `max_epochs` are not saved. Therefore, to load the model, we have to re-initialize it beforehand.

In [17]:
net.save_params(file_name)  # a file handler also works

In [18]:
# first initialize the model
new_net = NeuralNetClassifier(
    MyModule,
    max_epochs=20,
    lr=0.1,
).initialize()

In [19]:
new_net.load_params(file_name)

## Usage with an `sklearn Pipeline`

It is possible to put the `NeuralNetClassifier` inside an `sklearn Pipeline`, as you would with any `sklearn` classifier.

In [20]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [21]:
pipe = Pipeline([
    ('scale', StandardScaler()),
    ('net', net),
])

In [22]:
pipe.fit(X, y)

Re-initializing module!
  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.7102[0m       [32m0.5050[0m        [35m0.6991[0m  0.0688
      2        [36m0.6971[0m       [32m0.5100[0m        [35m0.6940[0m  0.0633
      3        [36m0.6885[0m       [32m0.5350[0m        [35m0.6896[0m  0.0443
      4        [36m0.6878[0m       [32m0.5450[0m        [35m0.6863[0m  0.0516
      5        [36m0.6821[0m       [32m0.5750[0m        [35m0.6833[0m  0.0962
      6        0.6845       [32m0.5850[0m        [35m0.6802[0m  0.0668
      7        [36m0.6751[0m       [32m0.6450[0m        [35m0.6760[0m  0.0601
      8        [36m0.6716[0m       [32m0.6600[0m        [35m0.6719[0m  0.0644
      9        [36m0.6676[0m       [32m0.6700[0m        [35m0.6669[0m  0.0650
     10        [36m0.6575[0m       [32m0.6800[0m        [35m0.6591[0m  0.0552
     11        [36m0.6510[0m 

Pipeline(memory=None,
     steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('net', <skorch.net.NeuralNetClassifier object at 0x7fb8d31230f0>)])

In [23]:
y_proba = pipe.predict_proba(X[:5])
y_proba

array([[ 0.38140485,  0.61859512],
       [ 0.67595905,  0.32404095],
       [ 0.58000326,  0.41999674],
       [ 0.65313405,  0.34686592],
       [ 0.65428513,  0.34571484]], dtype=float32)

To save the whole pipeline, including the pytorch module, use `pickle`.

## Callbacks

Adding a new callback to the model is straightforward. Below we show how to add a new score to the validation set.

### Add area under the ROC (AUC) score

In [24]:
from skorch.callbacks import Scoring
from skorch.utils import to_numpy

There is a scoring callback in skorch, `Scoring`, which we use for this. We need to specify a `name` of the score, as well as which score to calculate. Here we just pass a string, `'roc_auc_score'`, as score. For a list of all existing scores, look [here](http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics). We could also pass a function with the signature `func(model, X, y) -> score`, or `None`, in which case the `score` method of the model is used. Note that this is exactly the same behavior as in `sklearn`.

In [25]:
auc = Scoring(
    name='AUC',
    scoring='roc_auc_score',
    lower_is_better=False,
    pred_extractor=lambda y_proba: to_numpy(y_proba)[:, 1],
)

Furthermore, we should tell the callback that higher scores are better (to get the correct colors printed below), and how to extract the data from the prediction. The latter must be specified because sometimes we need the class predictions, sometimes the class probabilities, and sometimes, as in this example, only probability of the `1` class. Moreover, we must convert the data from `torch` tensors to `numpy` arrays (using the helper function `to_numpy`).

In [26]:
net = NeuralNetClassifier(
    MyModule,
    max_epochs=20,
    lr=0.1,
    callbacks=[auc],
)

Finally, we pass the scoring callback to the `callbacks` parameter as a list and then call `fit`. Notice that we get the printed scores and color highlighting for free.

In [27]:
net.fit(X, y)

  epoch     AUC    train_loss    valid_acc    valid_loss     dur
-------  ------  ------------  -----------  ------------  ------
      1  [36m0.6722[0m        [32m0.6730[0m       [35m0.6100[0m        [31m0.6730[0m  0.0969
      2  [36m0.6990[0m        [32m0.6469[0m       [35m0.6650[0m        [31m0.6621[0m  0.0831
      3  [36m0.7084[0m        [32m0.6423[0m       0.6500        [31m0.6548[0m  0.0896
      4  [36m0.7171[0m        [32m0.6257[0m       [35m0.6700[0m        [31m0.6482[0m  0.0780
      5  [36m0.7255[0m        0.6308       [35m0.6750[0m        [31m0.6402[0m  0.0748
      6  [36m0.7358[0m        [32m0.6043[0m       0.6650        [31m0.6330[0m  0.0882
      7  [36m0.7402[0m        [32m0.5999[0m       [35m0.6950[0m        [31m0.6277[0m  0.0806
      8  [36m0.7443[0m        [32m0.5935[0m       [35m0.7100[0m        [31m0.6238[0m  0.0734
      9  [36m0.7527[0m        [32m0.5866[0m       0.7000        [31m0.6095[0m  0.08

<skorch.net.NeuralNetClassifier at 0x7fb8c81b7630>

### Writing your own callbacks

Writing your own callbacks is also straightforward. Just remember these rules:
* They should inherit from `skorch.callbacks.Callback`.
* They should implement at least one of the `on_`-methods provided by the parent class (e.g. `on_batch_begin` or `on_epoch_end`).
* As argument, the `on_`-methods first get the `NeuralNet` instance, and, where appropriate, the local data (e.g. the data from the current batch). The method should also have `**kwargs` in the signature for potentially unused arguments.

Here is an example of a callback that saves the model if the validation loss has improved.

In [28]:
from skorch.callbacks import Callback


class Checkpoint(Callback):
    def __init__(self, file_name):
        self.file_name = file_name

    def on_epoch_end(self, net, **kwargs):
        # check if valid accuracy of most recent epoch is the best so far
        if net.history[-1, 'valid_acc_best']:
            print("Save model to {}.".format(self.file_name))
            net.save_params(self.file_name)

In [29]:
net = NeuralNetClassifier(
    MyModule,
    max_epochs=10,
    callbacks=[auc, Checkpoint(file_name)],
)

In [30]:
net.fit(X, y)

Save model to /tmp/mymodel.pkl.
  epoch     AUC    train_loss    valid_acc    valid_loss     dur
-------  ------  ------------  -----------  ------------  ------
      1  [36m0.5598[0m        [32m0.6962[0m       [35m0.5500[0m        [31m0.6894[0m  0.0941
Save model to /tmp/mymodel.pkl.
      2  [36m0.5648[0m        [32m0.6905[0m       [35m0.5550[0m        [31m0.6889[0m  0.0619
Save model to /tmp/mymodel.pkl.
      3  [36m0.5716[0m        0.6910       [35m0.5600[0m        [31m0.6883[0m  0.1134
Save model to /tmp/mymodel.pkl.
      4  [36m0.5773[0m        [32m0.6904[0m       [35m0.5650[0m        [31m0.6878[0m  0.0817
      5  [36m0.5810[0m        0.6920       0.5650        [31m0.6873[0m  0.0709
Save model to /tmp/mymodel.pkl.
      6  [36m0.5880[0m        [32m0.6889[0m       [35m0.5700[0m        [31m0.6867[0m  0.1115
      7  [36m0.5904[0m        0.6923       0.5600        [31m0.6862[0m  0.0659
Save model to /tmp/mymodel.pkl.
      8  [36m0.

<skorch.net.NeuralNetClassifier at 0x7fb938c157f0>

## Usage with sklearn `GridSearchCV`

### Special prefixes

The `NeuralNet` class allows to directly access parameters of the `pytorch module` by using the `module__` prefix. So e.g. if you defined the `module` to have a `num_units` parameter, you can set it via the `module__num_units` argument. This is exactly the same logic that allows to access estimator parameters in `sklearn Pipeline`s and `FeatureUnion`s.

This feature is useful in several ways. For one, it allows to set those parameters in the model definition. Furthermore, it allows you to set parameters in an `sklearn GridSearchCV` as shown below.

In addition to the parameters prefixed by `module__`, you may access a couple of other attributes, such as those of the optimizer by using the `optim__` prefix (again, see below). All those special prefixes are stored in the `prefixes_` attribute:

In [31]:
print(', '.join(net.prefixes_))

module, iterator_train, iterator_test, optim, criterion, callbacks


### Performing a grid search

Below we show how to perform a grid search over the learning rate (`lr`), the module's number of hidden units (`module__num_units`), the module's dropout rate (`module__dropout`), and whether the SGD optimizer should use Nesterov momentum or not (`optim__nesterov`).

In [32]:
from sklearn.model_selection import GridSearchCV

In [33]:
net = NeuralNetClassifier(
    MyModule,
    max_epochs=20,
    lr=0.1,
    verbose=0,
    optim__momentum=0.9,
)

In [34]:
params = {
    'lr': [0.05, 0.1],
    'module__num_units': [10, 20],
    'module__dropout': [0, 0.5],
    'optim__nesterov': [False, True],
}

In [35]:
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', verbose=2)

In [36]:
gs.fit(X, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False, total=   1.1s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s


[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False, total=   0.9s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=False, total=   0.9s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True, total=   0.9s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True, total=   0.9s
[CV] lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True 
[CV]  lr=0.05, module__dropout=0, module__num_units=10, optim__nesterov=True, total=   0.9s
[CV] lr=0.05, module__dropout=0, module__num_units=20, optim__nesterov=False 
[CV]  lr=0.05, module__dropout=0, module__num_units=20, optim__nesterov=False, total=   0.9s
[CV] lr=0.05, module__dropout=0, module__num_units=20, opt

[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:   45.8s finished


GridSearchCV(cv=3, error_score='raise',
       estimator=<skorch.net.NeuralNetClassifier object at 0x7fb8c8176860>,
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'lr': [0.05, 0.1], 'module__num_units': [10, 20], 'module__dropout': [0, 0.5], 'optim__nesterov': [False, True]},
       pre_dispatch='2*n_jobs', refit=False, return_train_score=True,
       scoring='accuracy', verbose=2)

In [37]:
print(gs.best_score_, gs.best_params_)

0.856 {'lr': 0.1, 'module__dropout': 0, 'module__num_units': 20, 'optim__nesterov': True}


Of course, we could further nest the `NeuralNetClassifier` within an `sklearn Pipeline`, in which case we just prefix the parameter by the name of the net (e.g. `net__module__num_units`).