New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Saving and Loading a Vowpal Wabbit model with --safe_resume and --cb_explore failes with RuntimeError: Model content is corrupted #3062
Comments
Hi @alxbar75, I was able to repro this on 8.10.1 also. But it seems as though this is fixed on master. Can you please try installing a wheel file produced by CI? See here for instructions: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Python#bleeding-edge-latest-commit-on-master |
Hi @jackgerrits, I will give it a shot. |
So I realized now why that helps. There's a new codepath in master (Where cb is converted to cb_adf) that means this bug isn't hit, but still exists there. I've found the bug and will put out a fix soon. I'll patch 8.10 and release 8.10.2 with the fix also. |
@alxbar75 8.10.2 has now been released on PyPi |
Tested vowpalwabbit-8.10.2. Model converged with "--save_resume" properly. |
Having this issue with 9.6.0. |
Oh, I see I was trying to use |
Describe the bug
When initializing a vw model with --cb_explore and --save_resume, an Exception is thrown when saving with
vw.save("model.vw")
(only happens if previouslyvw.learn("")
was called).Not using "--save_resume" prevents the Exception but model performance is not as good.
To Reproduce
Code example to reproduce:
Expected behavior
No Exception, model loaded and training can be continued.
Observed Behavior
Environment
vowpalwabbit==8.10.1
Python 3.8.5
Ubuntu 20.04.1
Additional context
I would like to perform online training, with many save and loads in a distributed environment (Ideally not saving to a file on disk). Preferably, I would like to be able to serialize the model as a binary string to send around. However, to my understanding, the python bindings allow only to save to a file directly, which is why I try to write with python to a named temp file which I can read afterwards to retrieve a binary string.
Models that I serialized that way are still converging but the performance is not as good as without serialization. Using " --save_resume" leads to the aforementioned exception.
The text was updated successfully, but these errors were encountered: