# Showcase some of the features of skorch

This notebook introduces you to some of the nice features offered by [skorch](https://github.com/skorch-dev/skorch)

It is a companion notebook the PyCon/PyData Berlin 2019 presentation that can be found [here](https://github.com/BenjaminBossan/public-presentations/blob/master/20191010-pycon-pydata/presentation.org).

## Basic setup

### Imports

In [1]:
import numpy as np
from sklearn.datasets import make_classification
import torch
from torch import nn
import torch.nn.functional as F

### Seeds and constants

In [2]:
np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed(0);

In [3]:
DEVICE = 'cpu'  # choose 'cuda' or 'cpu'

### A toy binary classification task

In [4]:
X, y = make_classification(10000, 20, n_informative=10, random_state=0)
X = X.astype(np.float32)

In [5]:
X.shape, y.shape, y.mean()

((10000, 20), (10000,), 0.5003)

### Definition of the PyTorch `module`

We define a vanilla neural network with one hidden layer. The output layer should have 2 output units since there are two classes. In addition, it should have a softmax nonlinearity, because later, when calling `predict_proba`, the output from the `forward` call will be used.

In [6]:
class MyModule(nn.Module):
    def __init__(self, num_units=10, dropout=0.5):
        super().__init__()

        self.dense = nn.Linear(20, num_units)
        self.dropout = nn.Dropout(dropout)
        self.output = nn.Linear(num_units, 2)

    def forward(self, X, **kwargs):
        X = F.relu(self.dense(X))
        X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

## Reduction of boilerplate code

### Pure PyTorch implementation

Below we show a basic training loop implemented with just PyTorch.

In [7]:
import time
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from torch.utils.data import TensorDataset, DataLoader

In [8]:
X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state=0)

In [9]:
ds_train = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
loader_train = DataLoader(ds_train, batch_size=256, shuffle=True)
ds_valid = TensorDataset(torch.from_numpy(X_valid), torch.from_numpy(y_valid))
loader_valid = DataLoader(ds_valid, batch_size=256)
module = MyModule(num_units=50).to(DEVICE)
optimizer = torch.optim.SGD(module.parameters(), lr=0.02)
criterion = nn.NLLLoss()
template = "epoch: {} | loss train: {:.4f} | loss valid: {:.4f} | acc valid: {:.4f} | dur: {:.3f}"

In [10]:
for epoch in range(20):
    tic = time.time()
    losses_train = []
    for Xb, yb in loader_train:
        Xb, yb = Xb.to(DEVICE), yb.to(DEVICE)
        y_proba = module(Xb)
        loss = criterion(torch.log(y_proba), yb)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        losses_train.append(loss.item())
        
    losses_valid = []
    accuracy_valid = []
    for Xb, yb in loader_valid:
        Xb, yb = Xb.to(DEVICE), yb.to(DEVICE)
        y_proba = module(Xb)
        loss = criterion(torch.log(y_proba), yb)
        optimizer.step()
        optimizer.zero_grad()
        losses_valid.append(loss.item())
        accuracy_valid.append(accuracy_score(yb.cpu().numpy(), y_proba.argmax(1).cpu().numpy()))
        
    toc = time.time() - tic
    print(template.format(
        epoch + 1, np.mean(losses_train), np.mean(losses_valid), np.mean(accuracy_valid), toc))

epoch: 1 | loss train: 0.7184 | loss valid: 0.6442 | acc valid: 0.6205 | dur: 0.124
epoch: 2 | loss train: 0.6249 | loss valid: 0.6044 | acc valid: 0.6877 | dur: 0.138
epoch: 3 | loss train: 0.5889 | loss valid: 0.5691 | acc valid: 0.7178 | dur: 0.127
epoch: 4 | loss train: 0.5610 | loss valid: 0.5460 | acc valid: 0.7309 | dur: 0.117
epoch: 5 | loss train: 0.5437 | loss valid: 0.5190 | acc valid: 0.7558 | dur: 0.110
epoch: 6 | loss train: 0.5186 | loss valid: 0.5070 | acc valid: 0.7565 | dur: 0.120
epoch: 7 | loss train: 0.5023 | loss valid: 0.4884 | acc valid: 0.7801 | dur: 0.121
epoch: 8 | loss train: 0.4923 | loss valid: 0.4761 | acc valid: 0.7767 | dur: 0.122
epoch: 9 | loss train: 0.4779 | loss valid: 0.4555 | acc valid: 0.7919 | dur: 0.118
epoch: 10 | loss train: 0.4776 | loss valid: 0.4525 | acc valid: 0.7893 | dur: 0.116
epoch: 11 | loss train: 0.4591 | loss valid: 0.4543 | acc valid: 0.7902 | dur: 0.110
epoch: 12 | loss train: 0.4423 | loss valid: 0.4296 | acc valid: 0.8195 | 

### Training with skorch

Now we show how to achieve the same outcome with skorch. Note how we don't need to make any adjustments to the `module`.

In [11]:
from skorch import NeuralNetClassifier

In [12]:
net = NeuralNetClassifier(
    MyModule,
    module__num_units=50,
    max_epochs=20,
    lr=0.02,
    batch_size=256,
    iterator_train__shuffle=True,
    device=DEVICE,
)

In [13]:
net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.7102[0m       [32m0.7066[0m        [35m0.6139[0m  0.1345
      2        [36m0.6294[0m       [32m0.7446[0m        [35m0.5722[0m  0.1276
      3        [36m0.5893[0m       [32m0.7656[0m        [35m0.5414[0m  0.1256
      4        [36m0.5662[0m       [32m0.7746[0m        [35m0.5166[0m  0.1284
      5        [36m0.5473[0m       [32m0.7921[0m        [35m0.4957[0m  0.1253
      6        [36m0.5242[0m       [32m0.8036[0m        [35m0.4752[0m  0.1270
      7        [36m0.5048[0m       [32m0.8121[0m        [35m0.4569[0m  0.1282
      8        [36m0.4934[0m       [32m0.8186[0m        [35m0.4410[0m  0.1349
      9        [36m0.4894[0m       [32m0.8266[0m        [35m0.4268[0m  0.1368
     10        [36m0.4623[0m       [32m0.8311[0m        [35m0.4141[0m  0.1339
     11        [36m0.4557[0m       [32m0.83

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
)

## Compatibility with sklearn API

### Monitor sklearn metrics during training

In [14]:
from skorch.callbacks import EpochScoring
from sklearn.metrics import roc_auc_score

In [15]:
auc = EpochScoring(
    scoring=roc_auc_score,  # <-- just passing 'roc_auc' would also work
    lower_is_better=False,
)

In [16]:
net = NeuralNetClassifier(
    MyModule,
    module__num_units=50,
    max_epochs=20,
    lr=0.02,
    batch_size=256,
    iterator_train__shuffle=True,
    device=DEVICE,
    callbacks=[auc],
)

In [17]:
net.fit(X, y)

  epoch    roc_auc_score    train_loss    valid_acc    valid_loss     dur
-------  ---------------  ------------  -----------  ------------  ------
      1           [36m0.6882[0m        [32m0.6891[0m       [35m0.6882[0m        [31m0.6065[0m  0.3627
      2           [36m0.7326[0m        [32m0.6141[0m       [35m0.7326[0m        [31m0.5624[0m  0.5033
      3           [36m0.7586[0m        [32m0.5755[0m       [35m0.7586[0m        [31m0.5306[0m  0.4027
      4           [36m0.7786[0m        [32m0.5533[0m       [35m0.7786[0m        [31m0.5038[0m  0.4762
      5           [36m0.7926[0m        [32m0.5285[0m       [35m0.7926[0m        [31m0.4809[0m  0.4843
      6           [36m0.8086[0m        [32m0.5108[0m       [35m0.8086[0m        [31m0.4603[0m  0.4090
      7           [36m0.8226[0m        [32m0.4872[0m       [35m0.8226[0m        [31m0.4411[0m  0.3477
      8           [36m0.8311[0m        [32m0.4737[0m       [35m0.8311[0m    

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
)

### Support for the basic methods

In [18]:
from sklearn.base import clone
from sklearn.model_selection import cross_validate

In [19]:
y_pred = net.predict(X[:5])
y_pred

array([0, 1, 1, 1, 1])

In [20]:
y_proba = net.predict_proba(X[:5])
y_proba

array([[0.54133016, 0.4586698 ],
       [0.13347742, 0.86652255],
       [0.2590016 , 0.7409983 ],
       [0.47930065, 0.5206993 ],
       [0.32075438, 0.6792456 ]], dtype=float32)

In [21]:
net.partial_fit(X, y)

     21           0.8905        [32m0.3754[0m       0.8906        [31m0.3052[0m  0.7788
     22           [36m0.8955[0m        [32m0.3678[0m       [35m0.8956[0m        [31m0.2995[0m  0.3332
     23           [36m0.8995[0m        [32m0.3596[0m       [35m0.8996[0m        [31m0.2936[0m  0.4569
     24           [36m0.9025[0m        0.3622       [35m0.9025[0m        [31m0.2874[0m  0.5510
     25           [36m0.9075[0m        [32m0.3546[0m       [35m0.9075[0m        [31m0.2829[0m  0.3548
     26           [36m0.9080[0m        [32m0.3536[0m       [35m0.9080[0m        [31m0.2788[0m  0.4301
     27           0.9070        0.3545       0.9070        [31m0.2753[0m  0.2874
     28           [36m0.9080[0m        [32m0.3441[0m       0.9080        [31m0.2715[0m  0.2989
     29           [36m0.9115[0m        [32m0.3428[0m       [35m0.9115[0m        [31m0.2670[0m  0.1904
     30           [36m0.9120[0m        [32m0.3359[0m       [35m0.912

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
)

In [22]:
net.get_params();

In [23]:
net.set_params(verbose=0)

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
)

In [24]:
_ = clone(net)

In [25]:
cross_validate(net, X, y, cv=3)

{'fit_time': array([5.74459791, 6.21544385, 6.15612483]),
 'score_time': array([0.13037705, 0.07822895, 0.05389428]),
 'test_score': array([0.87762448, 0.86082783, 0.8817527 ])}

### Use inside an sklearn `Pipeline`

In [25]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [26]:
pipe = Pipeline([
    ('scale', StandardScaler()),
    ('net', net),
])

pipe.fit(X, y)

Pipeline(memory=None,
         steps=[('scale',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('net',
                 <class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
))],
         verbose=False)

In [27]:
pipe.predict(X[:5])

array([0, 1, 1, 0, 1])

In [28]:
pipe.predict_proba(X[:5])

array([[0.58262163, 0.4173784 ],
       [0.29107776, 0.7089222 ],
       [0.2788414 , 0.7211586 ],
       [0.51545745, 0.48454258],
       [0.37120497, 0.628795  ]], dtype=float32)

### Pickle the whole pipeline

In [29]:
import pickle

Saves the whole pipeline, including preprocessing and the neural net.

In [30]:
with open('my_pipeline.pickle', 'wb') as f:
    pickle.dump(pipe, f)

  "type " + obj.__name__ + ". It won't be checked "


### GridSearchCV

No special adjustments need to be made to perform a hyperparameter search on the net parameters. We can even search on the `__init__` parameters of our `module` by using the `'module__'` prefix.

In [31]:
from sklearn.model_selection import GridSearchCV

In [32]:
params = {
    'max_epochs': [10, 20],
    'optimizer__momentum': [0.0, 0.9],
    'module__num_units': [10, 50],  # <-- just works
    'module__dropout': [0, 0.5],  # <-- just works
}

In [33]:
%time search = GridSearchCV(net, params, verbose=2, cv=3).fit(X, y)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV] max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.0 
[CV]  max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.0, total=   3.2s
[CV] max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.0 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    3.2s remaining:    0.0s


[CV]  max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.0, total=   3.6s
[CV] max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.0 
[CV]  max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.0, total=   2.4s
[CV] max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.9 
[CV]  max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.9, total=   2.8s
[CV] max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.9 
[CV]  max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.9, total=   5.4s
[CV] max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.9 
[CV]  max_epochs=10, module__dropout=0, module__num_units=10, optimizer__momentum=0.9, total=   3.7s
[CV] max_epochs=10, module__dropout=0, module__num_units=50, optimizer__momentum=0.0 
[CV]  max_epochs=10, module__dropout=0, module__num_units=50, opt

[CV]  max_epochs=20, module__dropout=0.5, module__num_units=50, optimizer__momentum=0.9, total=   6.3s
[CV] max_epochs=20, module__dropout=0.5, module__num_units=50, optimizer__momentum=0.9 
[CV]  max_epochs=20, module__dropout=0.5, module__num_units=50, optimizer__momentum=0.9, total=   7.1s
[CV] max_epochs=20, module__dropout=0.5, module__num_units=50, optimizer__momentum=0.9 
[CV]  max_epochs=20, module__dropout=0.5, module__num_units=50, optimizer__momentum=0.9, total=   6.6s


[Parallel(n_jobs=1)]: Done  48 out of  48 | elapsed:  4.6min finished


CPU times: user 29min 19s, sys: 17.6 s, total: 29min 37s
Wall time: 4min 46s


In [34]:
search.best_score_, search.best_params_

(0.956,
 {'max_epochs': 20,
  'module__dropout': 0,
  'module__num_units': 50,
  'optimizer__momentum': 0.9})

#### Grid search everything!

You can grid search the parameters of almost everything:

- NeuralNet
- module
- optimizer
- criterion
- DataLoader
- callbacks

Just use the `__` notation known from sklearn, e.g. `optimizer__momentum` to set the `momentum` parameter of the optimizer. To make a search on callback parameters, give the parameter a name by passing a tuple of name and callback (like in an sklearn `Pipeline`). skorch uses the name, e.g. `'mycb'`, to dispatch to the callback. E.g.:

```
net = NeuralNetClassifier(..., callbacks=[('mycb', MyCallback(foo=1))])
params = {'callbacks__mycb__foo': [1, 2, 3]}
```

### Swap skorch net for any other sklearn estimator

Since skorch's estimators work like any other sklearn estimator, you can swap them out to see which one leads to the best results.

Here we compare our neural network with a logistic regression and a KNN classifier.

In [35]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier

In [36]:
net.set_params(**search.best_params_)  # use the best parameters from grid search

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
)

In [37]:
pipe = Pipeline([
    ('scale', StandardScaler()),
    ('model', net),
])
params = {'model': [net, LogisticRegression(), KNeighborsClassifier()]}
search = GridSearchCV(pipe, params, verbose=2, cv=3)

In [38]:
%time search.fit(X, y)

Fitting 3 folds for each of 3 candidates, totalling 9 fits
[CV] model=<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
) 


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV]  model=<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
), total=   8.2s
[CV] model=<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
) 


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    8.2s remaining:    0.0s


[CV]  model=<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
), total=   6.4s
[CV] model=<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
) 
[CV]  model=<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
), total=   5.4s
[CV] model=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10



[CV]  model=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform'), total=   0.8s
[CV] model=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform') 
[CV]  model=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform'), total=   0.8s
[CV] model=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform') 
[CV]  model=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
               

[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:   22.4s finished


CPU times: user 5min 12s, sys: 4.01 s, total: 5min 16s
Wall time: 32.2 s


GridSearchCV(cv=3, error_score='raise-deprecating',
             estimator=Pipeline(memory=None,
                                steps=[('scale',
                                        StandardScaler(copy=True,
                                                       with_mean=True,
                                                       with_std=True)),
                                       ('model',
                                        <class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0, inplace=False)
    (output): Linear(in_...
                                                      max_iter=100,
                                                      multi_class='warn',
                                                      n_jobs=None, penalty='l2',
                                                      random_state=None,
                                           

In [39]:
search.best_score_, search.best_params_

(0.9497,
 {'model': <class 'skorch.classifier.NeuralNetClassifier'>[initialized](
    module_=MyModule(
      (dense): Linear(in_features=20, out_features=50, bias=True)
      (dropout): Dropout(p=0, inplace=False)
      (output): Linear(in_features=50, out_features=2, bias=True)
    ),
  )})

### distributed `GridSearchCV` with dask

To run a distributed hyperparameter search, you need `dask` and `dask.distributed`:

`$ pip install dask distributed`

Setup your dask workers as described [here](https://docs.dask.org/en/latest/setup.html).

Then run the following lines:

```
from dask.distributed import Client
from joblib import parallel_backend

client = Client('127.0.0.1:8786')

search = GridSearchCV(net, params, verbose=2, cv=3)

with parallel_backend('dask'):
    search.fit(X, y)
```

## More additions

### Save the `state_dict`

If we just want to save the `state_dict` of our module (and maybe our optimizer), we can either use the `Checkpoint` callback or call the `save_params` method. Use `load_params` to load the `state_dict` later on.

In [40]:
from skorch.callbacks import Checkpoint

In [41]:
cp = Checkpoint(monitor='valid_loss_best', dirname='exp1')
net = NeuralNetClassifier(
    MyModule,
    module__num_units=50,
    max_epochs=20,
    lr=0.02,
    batch_size=256,
    iterator_train__shuffle=True,
    device=DEVICE,
    callbacks=[cp],
)

In [42]:
net.fit(X, y)  # Checkpoint saves each time valid lost improves

  epoch    train_loss    valid_acc    valid_loss    cp     dur
-------  ------------  -----------  ------------  ----  ------
      1        [36m0.7194[0m       [32m0.6932[0m        [35m0.6178[0m     +  0.4781
      2        [36m0.6290[0m       [32m0.7341[0m        [35m0.5726[0m     +  0.3285
      3        [36m0.5901[0m       [32m0.7606[0m        [35m0.5401[0m     +  0.4104
      4        [36m0.5637[0m       [32m0.7716[0m        [35m0.5159[0m     +  1.2257
      5        [36m0.5448[0m       [32m0.7931[0m        [35m0.4921[0m     +  0.6553
      6        [36m0.5252[0m       [32m0.8061[0m        [35m0.4709[0m     +  0.7063
      7        [36m0.5090[0m       [32m0.8156[0m        [35m0.4534[0m     +  0.6670
      8        [36m0.5042[0m       [32m0.8261[0m        [35m0.4392[0m     +  0.7381
      9        [36m0.4866[0m       [32m0.8321[0m        [35m0.4250[0m     +  0.5423
     10        [36m0.4735[0m       [32m0.8421[0m        [35

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=50, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=50, out_features=2, bias=True)
  ),
)

In [43]:
net.save_params(
    f_params='exp1/mynet.pt',  # <- state dict of module
    f_optimizer='exp1/myoptimizer.pt',  # <- state dict of optimizer
)

### Handling of different data formats

By default, skorch handles the most common data formats, even more complex ones like dictionaries. If this doesn't fit your need, just define your own `Dataset`.

- numpy arrays
- PyTorch Datasets (most)
- dict or list of arrays
- pandas DataFrames
- scipy sparse CSR matrices

### Callbacks

skorch comes packaged with a few useful callbacks:

In [26]:
from skorch.callbacks import GradientNormClipping
from skorch.callbacks import LRScheduler
from skorch.callbacks import EpochScoring, BatchScoring
from skorch.callbacks import Checkpoint, TrainEndCheckpoint, LoadInitState
from skorch.callbacks import Freezer
from skorch.callbacks import TensorBoard

### CLI

With the help of skorch and Google's fire library, it is exceedingly easy to transform your training script into a nice CLI. This is what skorch and fire will automatically take care of:

* help for the CLI usage
* show docstrings in CLI help
* set __all__ possible parameters from the command line without any manuel argument parsing

First install fire and numpydoc:
    
`$ pip install fire numpydoc`

It requires only a few lines of code to turn your script into a nice CLI:

```
def main(..., **kwargs):
    model = ...  # put model definition here

    model = parse_args(kwargs)(model)  # <-- add this line
    
    model.fit(X, y)


if __name__ == '__main__':
    fire.Fire(main)
```

Here is the complete train.py script. Note the few lines that needed to be added:

In [27]:
!cat train.py

"""Simple training script for a MLP classifier.

See accompanying `pycon_showcase_skorch.ipynb` for details.

"""

import pickle

import fire
import numpy as np
from sklearn.datasets import make_classification
from skorch import NeuralNetClassifier
import torch
from torch import nn

from skorch.helper import parse_args


np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed(0)


# number of input features
N_FEATURES = 20

# number of classes
N_CLASSES = 2

# custom defaults for net
DEFAULTS = {
    'batch_size': 256,
    'module__hidden_units': 30,
}


class MLPClassifier(nn.Module):
    """A simple multi-layer perceptron module.

    This can be adapted for usage in different contexts, e.g. binary
    and multi-class classification, regression, etc.

    Note: This docstring is used to create the help for the CLI.

    Parameters
    ----------
    hidden_units : int (default=10)
      Number of units in hidden layers.

    num_

General help:

In [28]:
!python train.py -- --help

[1mNAME[0m
    train.py - Train an MLP classifier on synthetic data.

[1mSYNOPSIS[0m
    train.py <flags>

[1mDESCRIPTION[0m
    n_samples : int (default=100)
      Number of training samples

    output_file : str (default=None)
      If not None, file name used to save the model.

    kwargs : dict
      Additional model parameters.

[1mFLAGS[0m
    --n_samples=[4mN_SAMPLES[0m
    --output_file=[4mOUTPUT_FILE[0m
    Additional flags are accepted.


Model-specific help:

In [29]:
!python train.py --help

This is the help for the model-specific parameters.
To invoke help for the remaining options, run:
python train.py -- --help

<NeuralNetClassifier> options:
  --module : torch module (class or instance)
    A PyTorch :class:`~torch.nn.Module`. In general, the
    uninstantiated class should be passed, although instantiated
    modules will also work.
  --criterion : torch criterion (class, default=torch.nn.NLLLoss)
    Negative log likelihood loss. Note that the module should return
    probabilities, the log is applied during ``get_loss``.
  --optimizer : torch optim (class, default=torch.optim.SGD)
    The uninitialized optimizer (update rule) used to optimize the
    module
  --lr : float (default=0.01)
    Learning rate passed to the optimizer. You may use ``lr`` instead
    of using ``optimizer__lr``, which would result in the same outcome.
  --max_epochs : int (default=10)
    The number of epochs to train for each ``fit`` call. Note that you
    may keyboard-

This is how you can call the script from the command line:

In [30]:
!python train.py --n_samples 1000 --output_file 'exp1/model.pkl' --lr 0.1 --max_epochs 5 \
  --device 'cuda' --module__hidden_units 50 --module__nonlin 'torch.nn.RReLU(0.1, upper=0.4)'\
  --callbacks__valid_acc__on_train --callbacks__valid_acc__name train_acc

Training MLP classifier
  epoch    train_acc    train_loss    valid_loss     dur
-------  -----------  ------------  ------------  ------
      1       [36m0.7872[0m        [32m0.5813[0m        [35m0.5011[0m  0.0157
      2       [36m0.9049[0m        [32m0.4876[0m        [35m0.4309[0m  0.0150
      3       [36m0.9262[0m        [32m0.4191[0m        [35m0.3783[0m  0.0150
      4       [36m0.9312[0m        [32m0.3663[0m        [35m0.3381[0m  0.0153
      5       0.9312        [32m0.3258[0m        [35m0.3076[0m  0.0150
Saved model to file 'exp1/model.pkl'.


Note how you can even pass Python objects as arguments like `--module__nonlin 'torch.nn.RReLU(0.1, upper=0.4)'`.

## Easily hackable

We made sure that skorch is as hackable as possible. On the neural net classes, look out for methods that start with `get_`, such as `get_loss`, or override the `train_step` itself. On the callbacks, look for methods that start with `on_`, such as `on_train_begin`. They always receive the associated `net` instance as the first parameter.

### Custom callbacks

In [49]:
from skorch.callbacks import Callback

In [50]:
def send_tweet(msg):
    print("*tweet* {}".format(msg))


class TweetAccuracy(Callback):
    def __init__(self, min_accuracy=0.99):
        self.min_accuracy = min_accuracy

    def on_train_end(self, net, **kwargs):
        best_accuracy = max(net.history[:, 'valid_acc'])
        if best_accuracy >= self.min_accuracy:
            msg = "Reached an accuracy of {:.4f}!!!".format(best_accuracy)
            send_tweet(msg)

### Implement gradient accumulation

In [51]:
class GradAccNet(NeuralNetClassifier):
    def __init__(self, *args, acc_steps=2, **kwargs):
        super().__init__(*args, **kwargs)
        self.acc_steps = acc_steps

    def get_loss(self, *args, **kwargs):
        loss = super().get_loss(*args, **kwargs)
        return loss / self.acc_steps  # normalize loss

    def train_step(self, Xi, yi, **fit_params):
        n_train_batches = len(self.history[-1, 'batches'])
        step = self.train_step_single(Xi, yi, **fit_params)

        if n_train_batches % self.acc_steps == 0:
            self.optimizer_.step()
            self.optimizer_.zero_grad()
        return step

#### Putting it together

In [52]:
grad_acc_net = GradAccNet(MyModule, callbacks=[TweetAccuracy(min_accuracy=0.7)])

In [53]:
grad_acc_net.fit(X, y)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.4108[0m       [32m0.5532[0m        [35m0.3549[0m  0.1539
      2        [36m0.3719[0m       [32m0.5912[0m        [35m0.3325[0m  0.1642
      3        [36m0.3486[0m       [32m0.6227[0m        [35m0.3195[0m  0.5162
      4        [36m0.3343[0m       [32m0.6537[0m        [35m0.3099[0m  0.8270
      5        [36m0.3246[0m       [32m0.6722[0m        [35m0.3026[0m  0.1768
      6        [36m0.3193[0m       [32m0.6907[0m        [35m0.2969[0m  0.2796
      7        [36m0.3101[0m       [32m0.7046[0m        [35m0.2915[0m  0.3925
      8        [36m0.3075[0m       [32m0.7161[0m        [35m0.2868[0m  0.1689
      9        [36m0.3014[0m       [32m0.7276[0m        [35m0.2823[0m  0.1214
     10        [36m0.2982[0m       [32m0.7336[0m        [35m0.2781[0m  0.5428
*tweet* Reached an accuracy of 0.7336!!!


<class '__main__.GradAccNet'>[initialized](
  module_=MyModule(
    (dense): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (output): Linear(in_features=10, out_features=2, bias=True)
  ),
)