[![Run in Colab](https://www.tensorflow.org/images/colab_logo_32px.png)](https://colab.research.google.com/github/adriangb/scikeras/blob/master/docs/source/notebooks/Basic_Usage.ipynb) Run in Colab

# Basic usage

`SciKeras` is designed to maximize interoperability between `sklearn` and `Keras/TensorFlow`. The aim is to keep 99% of the flexibility of `Keras` while being able to leverage most features of `sklearn`. Below, we show the basic usage of `SciKeras` and how it can be combined with `sklearn`.

This notebook shows you how to use the basic functionality of `SciKeras`.

### Table of contents

* [Definition of the Keras Model](#Definition-of-the-keras-model)
* [Training a classifier](#Training-a-classifier-and-making-predictions)
  * [Dataset](#A-toy-binary-classification-task)
  * [Keras Model](#Definition-of-the-classification-model)
  * [Model training](#Training-the-neural-net-classifier)
  * [Inference](#Making-predictions,-classification)
* [Training a regressor](#Training-a-regressor)
  * [Dataset](#A-toy-regression-task)
  * [Keras Model](#Definition-of-the-regression-model)
  * [Model training](#Training-the-neural-net-regressor)
  * [Inference](#Making-predictions,-regression)
* [Saving and loading a model](#Saving-and-loading-a-model)
  * [Whole model](#Saving-the-whole-model)
  * [Only parameters](#Saving-only-the-model-parameters)
* [Usage with an sklearn Pipeline](#Usage-with-an-sklearn-Pipeline)
* [Callbacks](#Callbacks)
* [Grid search](#Usage-with-sklearn-GridSearchCV)
  * [Special prefixes](#Special-prefixes)
  * [Performing a grid search](#Performing-a-grid-search)

Install SciKeras

In [None]:
try:
    import scikeras
except ImportError:
    !python -m pip install scikeras

Silence TensorFlow logging to keep output succint.

In [None]:
from tensorflow import get_logger
get_logger().setLevel('ERROR')

In [None]:
from scikeras.wrappers import KerasClassifier, KerasRegressor
from tensorflow import keras

## Training a classifier and making predictions

### A toy binary classification task

We load a toy classification task from `sklearn`.

In [None]:
import numpy as np
from sklearn.datasets import make_classification

In [None]:
X, y = make_classification(1000, 20, n_informative=10, random_state=0)

In [None]:
X.shape, y.shape, y.mean()

### Definition of the `Keras` classification `Model`

We define a vanilla neural network with.

Because we are dealing with 2 classes, the output layer can be constructed in
two different ways:
1. Single unit with a `"sigmoid"` nonlinearity. The loss must be `"binary_crossentropy"`.
2. Two units (one for each class) and a `"softmax"` nonlinearity. The loss must be `"sparse_categorical_crossentropy"`.

In this example, we choose the first option, which is what you would usually
do for binary classification. The second option is usually reserved for when
you have >2 classes.

In [None]:
from tensorflow import keras

In [None]:
def get_clf(meta, hidden_layer_sizes, dropout):
    n_features_in_ = meta["n_features_in_"]
    n_classes_ = meta["n_classes_"]
    model = keras.models.Sequential()
    model.add(keras.layers.Input(shape=(n_features_in_,)))
    for hidden_layer_size in hidden_layer_sizes:
        model.add(keras.layers.Dense(hidden_layer_size, activation="relu"))
        model.add(keras.layers.Dropout(dropout))
    model.add(keras.layers.Dense(1, activation="sigmoid"))
    return model

### Defining and training the neural net classifier

We use `KerasClassifier` because we're dealing with a classifcation task. The first argument should be a callable returning a `Keras.Model`, in this case, `get_clf`. As additional arguments, we pass the number of loss function (required) and the optimizer, but the later is optional. We must also pass all of the arguments to `get_clf` as keyword arguments to `KerasClassifier` if they don't have a default value in `get_clf`. Note that if you do not pass an argument to `KerasClassifier`, it will not be avilable for hyperparameter tuning. Finally, we also pass `random_state=0` for reproducible results.

In [None]:
from scikeras.wrappers import KerasClassifier

In [None]:
clf = KerasClassifier(
    model=get_clf,
    loss="binary_crossentropy",
    optimizer="adam",
    hidden_layer_sizes=(100,),
    dropout=0.5,
)

As in `sklearn`, we call `fit` passing the input data `X` and the targets `y`.

In [None]:
clf.fit(X, y)

Also, as in `sklearn`, you may call `predict` or `predict_proba` on the fitted model.

### Making predictions, classification

In [None]:
y_pred = clf.predict(X[:5])
y_pred

In [None]:
y_proba = clf.predict_proba(X[:5])
y_proba

## Training a regressor

### A toy regression task

In [None]:
from sklearn.datasets import make_regression

In [None]:
X_regr, y_regr = make_regression(1000, 20, n_informative=10, random_state=0)

In [None]:
X_regr.shape, y_regr.shape, y_regr.min(), y_regr.max()

### Definition of the `Keras` regression `Model`

Again, define a vanilla neural network. The main difference is that the output layer always has a single unit and does not apply any nonlinearity.

In [None]:
def get_reg(meta, hidden_layer_sizes, dropout):
    n_features_in_ = meta["n_features_in_"]
    model = keras.models.Sequential()
    model.add(keras.layers.Input(shape=(n_features_in_,)))
    for hidden_layer_size in hidden_layer_sizes:
        model.add(keras.layers.Dense(hidden_layer_size, activation="relu"))
        model.add(keras.layers.Dropout(dropout))
    model.add(keras.layers.Dense(1))
    return model

### Defining and training the neural net regressor

Training a regressor is almost the same as training a classifier. Mainly, we use `KerasRegressor` instead of `KerasClassifier` (this is the same terminology as in `sklearn`). We also change the loss function to `KerasRegressor.r_squared`. SciKeras provides this loss function because most of the `sklearn` ecosystem expects `R^2` as the loss function, but Keras does not have a default implementation.

In [None]:
from scikeras.wrappers import KerasRegressor

In [None]:
reg = KerasRegressor(
    model=get_reg,
    loss=KerasRegressor.r_squared,
    optimizer="adam",
    hidden_layer_sizes=(100,),
    dropout=0.5,
)

In [None]:
reg.fit(X_regr, y_regr)

### Making predictions, regression

You may call `predict` or `predict_proba` on the fitted model. For regressions, both methods return the same value.

In [None]:
y_pred = reg.predict(X_regr[:5])
y_pred

## Saving and loading a model

Save and load either the whole model by using pickle, or use Keras' specialized save methods on the `KerasClassifier.model_` or `KerasRegressor.model_` attribute that is created after fitting. You will want to use Keras' model saving utilities if any of the following apply:
1. You wish to save only the weights or only the training configuration of your model.
2. You wish to share your model with collaborators. Pickle is a relatively unsafe protocol and it is not recommended to share or load pickle objects publically.
3. You care about performance, especially if doing in-memory serialization.

For more information, see Keras' [saving documentation](https://www.tensorflow.org/guide/keras/save_and_serialize).

### Saving the whole model

In [None]:
import pickle

In [None]:
bytes_model = pickle.dumps(reg)

In [None]:
new_reg = pickle.loads(bytes_model)
new_reg

### Saving using Keras' saving methods

This efficiently and safely saves the model to disk, including trained weights.


In [None]:
# Save to disk
reg.model_.save("/tmp/my_model")

In [None]:
# Load the model back into memory
new_reg_model = keras.models.load_model("/tmp/my_model")
# Now we need to instantiate a new SciKeras object with this model
# Note that we no longer pass paramters like hidden_layer_sizes, those
# are note "fixed"
reg = KerasRegressor(
    new_reg_model,
    loss=KerasRegressor.r_squared,
    optimizer="adam",
)
reg.fit(X_regr, y_regr)
reg.predict(X_regr[:5])

## Usage with an `sklearn Pipeline`

It is possible to put the `KerasClassifier` inside an `sklearn Pipeline`, as you would with any `sklearn` classifier.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
pipe = Pipeline([
    ('scale', StandardScaler()),
    ('clf', clf),
])

In [None]:
pipe.fit(X, y)

In [None]:
y_proba = pipe.predict_proba(X[:5])
y_proba

To save the whole pipeline, including the Keras model, use `pickle`.

## Callbacks

Adding a new callback to the model is straightforward. Below we show how to add an `EarlyStopping` callback to prevent overfitting.

In [None]:
es = keras.callbacks.EarlyStopping(monitor='val_binary_accuracy', mode='max', patience=200, verbose=1)

We now generate a toy dataset using `sklearn.datasets.make_moons`. This dataset was chosen specifically to trigger early stopping.

In [None]:
from sklearn.datasets import make_moons

In [None]:
X, y = make_moons(n_samples=100, noise=0.2, random_state=0)
X.shape, y.shape

We will first check fitting without the callback and then with. We will compare the training time and final accuracy.

In [None]:
import time

In [None]:
# First test without the callback
clf = KerasClassifier(
    model=get_clf,
    loss="binary_crossentropy",
    optimizer="adam",
    hidden_layer_sizes=(500,),
    dropout=0.5,
    metrics=["binary_accuracy"],
    fit__validation_split=0.2,
    epochs=500,
    verbose=False,
)
start = time.time()
clf.fit(X, y)
print(f"Training time: {time.time() - start}")
print(f"Final accuracy: {clf.history_['val_binary_accuracy'][-1]}")  # get last value of last fit/partial_fit call

In [None]:
# Test with the callback
clf = KerasClassifier(
    model=get_clf,
    loss="binary_crossentropy",
    optimizer="adam",
    hidden_layer_sizes=(500,),
    dropout=0.5,
    metrics=["binary_accuracy"],
    fit__validation_split=0.2,
    epochs=500,
    verbose=False,
    callbacks=[es]
)
start = time.time()
clf.fit(X, y)
print(f"Training time: {time.time() - start}")
print(f"Final accuracy: {clf.history_['val_binary_accuracy'][-1]}")  # get last value of last fit/partial_fit call

For information on how to write custom callbacks, have a look at the 

---

[Advanced_Usage](https://nbviewer.jupyter.org/github/adriangb/scikeras/blob/master/notebooks/Advanced_Usage.ipynb) notebook.

## Usage with sklearn `GridSearchCV`

### Special prefixes

SciKeras allows to direct access to all parameters passed to the wrapper constructors, including deeply nested routed parameters. This allows tunning of
paramters like `hidden_layer_sizes` as well as `optimizer__learning_rate`.

This is exactly the same logic that allows to access estimator parameters in `sklearn Pipeline`s and `FeatureUnion`s.

This feature is useful in several ways. For one, it allows to set those parameters in the model definition. Furthermore, it allows you to set parameters in an `sklearn GridSearchCV` as shown below.

To differentiate paramters like `callbacks` which are accepted by both `tf.keras.Model.fit` and `tf.keras.Model.predict` you can add a `fit__` or `predict__` routing suffix respectively. Similar, the `model__` prefix may be used to specify that a paramter is destined only for `get_clf`/`get_reg` (or whatever callable you pass as your `model` argument).

For more information on parameter routing with special prefixes, see the [Advanced Usage Docs](https://scikeras.org.readthedocs.build/en/latest/advanced.html#routed-parameters)

### Performing a grid search

Below we show how to perform a grid search over the learning rate (`optimizer__lr`), the model's number of hidden layers (`model__hidden_layer_sizes`), the model's dropout rate (`model__dropout`).

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
clf = KerasClassifier(
    model=get_clf,
    loss="binary_crossentropy",
    optimizer="adam",
    optimizer__lr=0.1,
    model__hidden_layer_sizes=(100,),
    model__dropout=0.5,
    verbose=False,
)

*Note*: We set the verbosity level to zero (`verbose=False`) to prevent too much print output from being shown.

In [None]:
params = {
    'optimizer__lr': [0.05, 0.1],
    'model__hidden_layer_sizes': [(100, ), (50, 50, ), (33, 33, 33, )],
    'model__dropout': [0, 0.5],
}

In [None]:
gs = GridSearchCV(clf, params, scoring='accuracy', n_jobs=-1, verbose=True)

In [None]:
gs.fit(X, y)

In [None]:
print(gs.best_score_, gs.best_params_)

Of course, we could further nest the `KerasClassifier` within an `sklearn Pipeline`, in which case we just prefix the parameter by the name of the net (e.g. `clf__model__hidden_layer_sizes`).