-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model loading from byte buffer #50
Comments
So... I have been working off main and apparently that branch is very far from up-to-date. Sadly. |
@abhi-agg wants multiple translators integration capability. My understanding is this means more than marian (maybe another CPP translator/whatever). Rename TranslationModel -> MarianIntegrationModel : AbstractTranslationModel and the existence could be justified, I guess.
Critical bit: It is
ServiceBase has common elements between the single-threaded and multi-threaded versions. They're not really clean per se, and are getting adjusted to more meaningful abstractions in the concurrent-queueing branch (don't try to edit this branch, it's a work in progress).
I put all of marian's translation code which I grabbed from mts in |
Model loading from bytes removes all pretense of being decoder independent and I told Mozilla that. A decoder-independent interface design can only be tested properly if two decoders are being integrated. Otherwise we're just gallivanting overengineers and should stop this now. |
@kpu IIUIC, if we want to integrate some other decoder in future we can accept their model files as bytes and the new interface (accepting bytes instead of files) would still work. Right? Am I missing something? |
At least two other toolkits, Sockeye and fairseq, are in Python. Your C++ abstraction layer would need substantial refactoring to support them. And it's not clear anybody will want a C++ abstraction layer so they can do javascript -> C++ API -> Python -> C++ toolkit backend (i.e. MXNet). Or for that matter if you use JoeyNMT, which is written in JavaScript, you're not going to do JavaScript -> C++ API -> node running on top of WASM. Toolkits would need some refactoring to efficiently accept binary files. And they are unlikely to agree on what files. Sockeye for instance normally treats models as directories with more files inside. It even has a separate file for the version number. Please pick one path:
There is no middle. Currently you have an API that pretends to support multiple toolkits but won't, which is costing time in extra implementation doodads. I agree we shouldn't be coupling Marian classes like History in directly. But too many levels of abstraction, without hard evidence that the abstraction adds value, is killing productivity. |
@andrenatal @abhi-agg an example of how to load a binary model through a byte array can be found here for the bergamot version of the app:
And here for the marian-translation-service version of the app:
It requires this branch of marian which does some small fixes for loading intgemm models: https://github.com/browsermt/marian-dev/tree/binaryload_enable Now the current functionality does not reduce the memory usage from the old approach. If you want the translator app to re-use the byte array memory that has been provided, you need to also change the binary format to SSSE3 only, as shown in this pull request. (And that would also mean regenerating all binary files that we distribute). What is the call? |
Check for alignment and die if it doesn't. |
@XapaJIaMnu
For model load I see modifications to Service and BatchTranslator. Service is currently subclassed as a NonThreaded implementation and a multithreaded one. Calls to initialize BatchTranslator happens in respective constructors.
The model loading happens actually in BatchTranslator.
bergamot-translator/src/translator/batch_translator.cpp
Lines 26 to 39 in f17f02a
So I'll need createScorers (L32) from marian with a bytesbuffer which I pass all the way from whichever implementation of Service requires it. The concurrent-queuing implementation is a bit ahead and both messier and cleaner depending on places (imo), but I'm taking responsibility of bringing it to sync once you have integrated your changes in main.
The text was updated successfully, but these errors were encountered: