Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No apparent way to pickle a model from vowpalwabbit.sklearn_vw.VWClassifier #1040

Closed
dakami opened this issue Jul 16, 2016 · 6 comments
Closed

Comments

@dakami
Copy link

dakami commented Jul 16, 2016

Pickling a fitted classifier gives you:

RuntimeError: Pickling of "vowpalwabbit.pyvw.vw" instances is not enabled (http://www.boost.org/libs/python/doc/v2/pickle.html)

Means I basically need to build the classifier at runtime every time to do actual prediction (or, I suppose, use CRIU).

@pommedeterresautee
Copy link
Contributor

pommedeterresautee commented Jul 19, 2016

I am not using the Scikit wrapper but just the Python one.
I made the same mistake than you.
You need to provide the model path as an option (-f) in the constructor like you would do with the C++ client. Check the Python examples to see where to put it.
You perform your learning, and when it is finished, you call the finish() method of you vw instance.
At that moment your model is written on disk.

For sure, pickle and assimilated doesn't work.
Lots of documentation is still missing for the Python part :(

Hope it helps.

@gramhagen
Copy link
Contributor

#1052 adds save and load methods for the sklearn interface as an alternative to the f option.

I was thinking it might be helpful to use sphinx to autogenerate api docs for the python build. Is that the kind of documentation that would be helpful?

@JohnLangford
Copy link
Member

1052 is in, so closing for now. Better documentation is very welcome form anyone who can contribute.

@lfleck
Copy link

lfleck commented Jul 1, 2020

Is there any update on using pickle?
I'm defining multiple contextual bandits and try to train them in multiple python processes like this:

from vowpalwabbit import pyvw
from concurrent.futures import ProcessPoolExecutor

def dummy_train(model):
    pass

vw1 = pyvw.vw(f"--cb_explore 5 --interactions UUA --quiet --cover 4 -f test4.model")
vw2 = pyvw.vw(f"--cb_explore 5 --interactions UUA --quiet --cover 5 -f test5.model")
models = [vw1, vw2]

with ProcessPoolExecutor(max_workers=2) as executor:
    executor.map(dummy_train, models)

However, even with the "-f" option stated above, the originally mentioned 'Pickling of "vowpalwabbit.pyvw.vw" instances is not enabled' error appears. Same holds for cloudpickle which e.g. joblib uses for serialization.

Are there any suggested ways to run VowpalWabbit models in multiple python processes?

@jackgerrits
Copy link
Member

@gramhagen did you enable pickling in the PR you just submitted? (#2368)

@gramhagen
Copy link
Contributor

yes, you can pickle the sklearn model, under the hood this is accomplished using

pyvw.vw.save(file)
pyvw.vw(initial_regressor=file)

but the problem will persist here since this is passing the pyvw.vw object itself (which is not picklable) to each new process. So you need to encapsulate model creation in your dummy_train function and only pass the parameters you want to vary.

example:

from vowpalwabbit.pyvw import vw
from multiprocessing import Pool

def dummy_train(params):
    model = vw(**params)
    ec = model.example("1 | a b c")
    model.learn(ec)
    model.finish()


if __name__ == "__main__":
    pool = Pool(processes=2)
    variants = (
        {"f": "model_0.vw", "quiet": True},
        {"f": "model_1.vw", "quiet": True},
    )
    pool.map(dummy_train, variants)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants