PicklingError on compute with HyperbandSearchCV #549

fonnesbeck · 2019-10-01T23:19:23Z

I'm attempting to do a hyperparameter search using HyperbandSearchCV on a PyTorch model that has been wrapped with skorch, but am running into a failure when I call fit:

Exception: PicklingError("Can't pickle <class '__main__.DNNRegressor'>: it's not the same object as __main__.DNNRegressor")

The exception does not seem to make sense.

My model is a subclass of torch.nn.Module that is just a deep neural network regressor, and this has been wrapped by a skorch NeuralNetRegressor as follows

dnnr = NeuralNetRegressor(
    module=DNNRegressor,
    module__n_feature=len(NUMERIC_COLUMNS),
    module__n_hidden=128,
    module__n_output=1,
    module__dropout_rate=0.5,
    criterion=torch.nn.MSELoss,
    device=device
)

Any obvious reason for this to be happening?

Running dask_ml 1.0.0, skorch 0.6.0 and pytorch 1.1.0 on a GCS instance.

The text was updated successfully, but these errors were encountered:

stsievert · 2019-10-02T00:43:03Z

That looks like an error unclear definitions, and Python doesn't know what to pickle. Here's a good SO question: https://stackoverflow.com/questions/1412787/picklingerror-cant-pickle-class-decimal-decimal-its-not-the-same-object

Basically, either remove the %load_ext; %autoreload 2 in the notebook or put the definition of DNNRegressor in a separate module/Python file.

I'd be surprised if this is an issue with skorch: https://skorch.readthedocs.io/en/stable/user/save_load.html

TomAugspurger · 2019-10-16T19:01:13Z

@fonnesbeck have you had a chance to look into this again? I was recently able to use HyperbandSearchCV with Skorch.

fonnesbeck · 2019-10-16T19:18:15Z

Yes, removing my model class from the notebook and putting it into a Python file did the trick. The error message will continue to confuse users, though.

TomAugspurger · 2019-10-16T19:23:25Z

I suspect there's not much we can do about it, since it's an error from Python about a different package. We just happen to hit it here since dask needs to pickle things to move them around :/

mrocklin · 2020-08-05T21:20:11Z

Would this be a good place for us to build custom serialization? Is there an obvious subclass for all of these and a clean way of serializing them?

(I also ran into this)

TomAugspurger · 2020-08-05T21:41:06Z

Does anyone have a reproducible example? This doesn't do it

from distributed import Client
import torch
import skorch
import pickle


client = Client()


class DNNRegressor(torch.nn.Module):
    pass

dnnr = skorch.NeuralNetRegressor(
    module=DNNRegressor,
    module__n_feature=128,
    module__n_hidden=128,
    module__n_output=1,
    module__dropout_rate=0.5,
    criterion=torch.nn.MSELoss,
)

pickle.loads(pickle.dumps(dnnr))

client.scatter([dnnr], broadcast=True)

Do I need Hyperband to reproduce the problem?

jrbourbeau · 2020-08-05T22:06:35Z

Here's a reproducer

from distributed import Client
from dask_ml.model_selection import HyperbandSearchCV
from dask_ml.datasets import make_classification
import torch
import skorch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
from skorch import NeuralNetRegressor
from scipy.stats import loguniform, uniform


client = Client()

X, y = make_classification(chunks=(10, -1))


class HiddenLayerNet(nn.Module):
    def __init__(self, n_features=10, n_outputs=1, n_hidden=100, activation="relu"):
        super().__init__()
        self.fc1 = nn.Linear(n_features, n_hidden)
        self.fc2 = nn.Linear(n_hidden, n_outputs)
        self.activation = getattr(F, activation)

    def forward(self, x, **kwargs):
        return self.fc2(self.activation(self.fc1(x)))


niceties = {
    "callbacks": False,
    "warm_start": True,
    "train_split": None,
    "max_epochs": 1,
}

model = NeuralNetRegressor(
    module=HiddenLayerNet,
    module__n_features=X.shape[1],
    optimizer=optim.SGD,
    criterion=nn.MSELoss,
    lr=0.0001,
    **niceties,
)


params = {
    "module__activation": ["relu", "elu", "softsign", "leaky_relu", "rrelu"],
    "batch_size": [32, 64, 128, 256],
    "optimizer__lr": loguniform(1e-4, 1e-3),
    "optimizer__weight_decay": loguniform(1e-6, 1e-3),
    "optimizer__momentum": uniform(0, 1),
    "optimizer__nesterov": [True],
}

search = HyperbandSearchCV(model, params, random_state=2, verbose=True, max_iter=2)
search.fit(X, y)

mrocklin · 2020-08-05T22:51:58Z

It is odd that it requires HyperbandSearchCV though. We might try various combinations of scatter/submit

…

On Wed, Aug 5, 2020 at 3:06 PM James Bourbeau ***@***.***> wrote: Here's a reproducer from distributed import Clientfrom dask_ml.model_selection import HyperbandSearchCVfrom dask_ml.datasets import make_classificationimport torchimport skorchimport torch.optim as optimimport torch.nn as nnimport torch.nn.functional as Ffrom skorch import NeuralNetRegressorfrom scipy.stats import loguniform, uniform client = Client() X, y = make_classification(chunks=(10, -1)) class HiddenLayerNet(nn.Module): def __init__(self, n_features=10, n_outputs=1, n_hidden=100, activation="relu"): super().__init__() self.fc1 = nn.Linear(n_features, n_hidden) self.fc2 = nn.Linear(n_hidden, n_outputs) self.activation = getattr(F, activation) def forward(self, x, **kwargs): return self.fc2(self.activation(self.fc1(x))) niceties = { "callbacks": False, "warm_start": True, "train_split": None, "max_epochs": 1, } model = NeuralNetRegressor( module=HiddenLayerNet, module__n_features=X.shape[1], optimizer=optim.SGD, criterion=nn.MSELoss, lr=0.0001, **niceties, ) params = { "module__activation": ["relu", "elu", "softsign", "leaky_relu", "rrelu"], "batch_size": [32, 64, 128, 256], "optimizer__lr": loguniform(1e-4, 1e-3), "optimizer__weight_decay": loguniform(1e-6, 1e-3), "optimizer__momentum": uniform(0, 1), "optimizer__nesterov": [True], } search = HyperbandSearchCV(model, params, random_state=2, verbose=True, max_iter=2)search.fit(X, y) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#549 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTA77S7UYADHBZ76463R7HJXTANCNFSM4I4P4FOA> .

TomAugspurger · 2020-08-06T12:00:43Z

It's almost like the class is being mutated, by hyperband or someone else? I'll look a bit today.

TomAugspurger · 2020-08-06T13:24:50Z

Nothing on the pickling yet, but a couple updates to James' reproducer based on using Client(processes=False)

We're using skorch.NeurelNetRegressor, so the data should be make_regression()
Something in torch wants int32 / float32, so astype to those
I think torch wants y to be (n_samples, 1) so reshape to that.

from distributed import Client
from dask_ml.model_selection import HyperbandSearchCV
from dask_ml.datasets import make_classification, make_regression
import torch
import skorch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
from skorch import NeuralNetRegressor
from scipy.stats import loguniform, uniform


client = Client(processes=True)

X, y = make_regression(chunks=(10, -1))
y = y.reshape(-1, 1).astype("float32")
X = X.astype("float32")


class HiddenLayerNet(nn.Module):
    def __init__(self, n_features=10, n_outputs=1, n_hidden=100, activation="relu"):
        super().__init__()
        self.fc1 = nn.Linear(n_features, n_hidden)
        self.fc2 = nn.Linear(n_hidden, n_outputs)
        self.activation = getattr(F, activation)

    def forward(self, x, **kwargs):
        return self.fc2(self.activation(self.fc1(x)))


niceties = {
    "callbacks": False,
    "warm_start": True,
    "train_split": None,
    "max_epochs": 1,
}

model = NeuralNetRegressor(
    module=HiddenLayerNet,
    module__n_features=X.shape[1],
    optimizer=optim.SGD,
    criterion=nn.MSELoss,
    lr=0.0001,
    **niceties,
)


params = {
    "module__activation": ["relu", "elu", "softsign", "leaky_relu", "rrelu"],
    "batch_size": [32, 64, 128, 256],
    "optimizer__lr": loguniform(1e-4, 1e-3),
    "optimizer__weight_decay": loguniform(1e-6, 1e-3),
    "optimizer__momentum": uniform(0, 1),
    "optimizer__nesterov": [True],
}

search = HyperbandSearchCV(model, params, random_state=2, verbose=True, max_iter=2)
search.fit(X, y)

But still seeing the

_pickle.PicklingError: Can't pickle <class '__main__.HiddenLayerNet'>: attribute lookup HiddenLayerNet on __main__ failed

with that.

TomAugspurger · 2020-08-07T14:47:11Z

This rabbit hole keeps on going. I don't fully understand the issue, but the original exception came from trying to pickle model.module_. That's set when model.initialize() is called, and does the equivalent of model.module().to("cpu").

>>> model.module().to("cpu")
HiddenLayerNet(
  (fc1): Linear(in_features=10, out_features=100, bias=True)
  (fc2): Linear(in_features=100, out_features=1, bias=True)
)

That's an instance of the interactively defined class. Apparently something (cloudpickle? Dask?) has trouble serializing those when they're attributes of another object.

Anyway, we can get around that by serializing it separately

import cloudpickle

import skorch
from .serialize import dask_serialize, dask_deserialize


@dask_serialize.register(skorch.NeuralNet)
def serialize_skorch(x):
    has_module = hasattr(x, "module_")
    headers = {"has_module": has_module}
    if has_module:
        module = x.__dict__.pop("module_")
        try:
            frames = [cloudpickle.dumps(x), cloudpickle.dumps(module)]
        finally:
            x.__dict__["module_"] = module
    else:
        frames = [cloudpickle.dumps(x)]

    return headers, frames


@dask_deserialize.register(skorch.NeuralNet)
def deserialize_skorch(header, frames):
    model = cloudpickle.loads(frames[0])
    if header["has_module"]:
        module = cloudpickle.loads(frames[1])
        model.module_ = module
    return model

But now we face a trickier problem. Hyperband calls copy.deepcopy(model), which invokes torch.save, which eventually tries to pickle the interactively defined HiddenLayerNet, which pickle can't serialize (though cloudpickle can).

I'd hoped that

dask-ml/dask_ml/model_selection/_incremental.py

Line 101 in 6eac8a0

model = deepcopy(model)

can be changed to sklearn.base.clone, but that's failing some tests. Will need to look more later.

VibhuJawa mentioned this issue Feb 2, 2022

[WIP] Enable Skorch+Dask-ML dask/distributed#5748

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PicklingError on compute with HyperbandSearchCV #549

PicklingError on compute with HyperbandSearchCV #549

fonnesbeck commented Oct 1, 2019

stsievert commented Oct 2, 2019

TomAugspurger commented Oct 16, 2019

fonnesbeck commented Oct 16, 2019

TomAugspurger commented Oct 16, 2019

mrocklin commented Aug 5, 2020

TomAugspurger commented Aug 5, 2020

jrbourbeau commented Aug 5, 2020

mrocklin commented Aug 5, 2020 via email

TomAugspurger commented Aug 6, 2020

TomAugspurger commented Aug 6, 2020

TomAugspurger commented Aug 7, 2020

PicklingError on compute with HyperbandSearchCV #549

PicklingError on compute with HyperbandSearchCV #549

Comments

fonnesbeck commented Oct 1, 2019

stsievert commented Oct 2, 2019

TomAugspurger commented Oct 16, 2019

fonnesbeck commented Oct 16, 2019

TomAugspurger commented Oct 16, 2019

mrocklin commented Aug 5, 2020

TomAugspurger commented Aug 5, 2020

jrbourbeau commented Aug 5, 2020

mrocklin commented Aug 5, 2020 via email

TomAugspurger commented Aug 6, 2020

TomAugspurger commented Aug 6, 2020

TomAugspurger commented Aug 7, 2020