Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model loading from byte buffer #50

Closed
jerinphilip opened this issue Mar 9, 2021 · 7 comments · Fixed by #55
Closed

Model loading from byte buffer #50

jerinphilip opened this issue Mar 9, 2021 · 7 comments · Fixed by #55
Assignees
Labels
high-priority Needs to be done at the earliest
Milestone

Comments

@jerinphilip
Copy link
Contributor

@XapaJIaMnu

For model load I see modifications to Service and BatchTranslator. Service is currently subclassed as a NonThreaded implementation and a multithreaded one. Calls to initialize BatchTranslator happens in respective constructors.

The model loading happens actually in BatchTranslator.

graph_ = New<ExpressionGraph>(true); // always optimize
auto prec = options_->get<std::vector<std::string>>("precision", {"float32"});
graph_->setDefaultElementType(typeFromString(prec[0]));
graph_->setDevice(device_);
graph_->getBackend()->configureDevice(options_);
graph_->reserveWorkspaceMB(options_->get<size_t>("workspace"));
scorers_ = createScorers(options_);
for (auto scorer : scorers_) {
scorer->init(graph_);
if (slgen_) {
scorer->setShortlistGenerator(slgen_);
}
}
graph_->forward();

So I'll need createScorers (L32) from marian with a bytesbuffer which I pass all the way from whichever implementation of Service requires it. The concurrent-queuing implementation is a bit ahead and both messier and cleaner depending on places (imo), but I'm taking responsibility of bringing it to sync once you have integrated your changes in main.

@XapaJIaMnu
Copy link
Collaborator

XapaJIaMnu commented Mar 9, 2021

So... I have been working off main and apparently that branch is very far from up-to-date. Sadly.
If you check https://github.com/browsermt/bergamot-translator/tree/load_as_byte_arr this exposes the byte array loading all the way to the top. Which branch do I need to merge this with so that I can write an example code?
I hope we are doing something about the redundant abstraction. AbstractTranslationModel can be merged with TranslationModel. I also didn't understand the purpose of separate Service and ServiceBase or ServiceBase and BatchTranslator

@jerinphilip
Copy link
Contributor Author

jerinphilip commented Mar 10, 2021

I hope we are doing something about the redundant abstraction. AbstractTranslationModel can be merged with TranslationModel.

@abhi-agg wants multiple translators integration capability. My understanding is this means more than marian (maybe another CPP translator/whatever). Rename TranslationModel -> MarianIntegrationModel : AbstractTranslationModel and the existence could be justified, I guess.

Which branch do I need to merge this with so that I can write an example code?

Critical bit: It is main. You're probably missing -DUSE_WASM_COMPATIBLE_SOURCES=off without which everything is turned off. bergamot-translator is running WASM default. I settled for the flag and the ability to switch modules to compile against full code in my development work. Have a look at:

I also didn't understand the purpose of separate Service and ServiceBase

ServiceBase has common elements between the single-threaded and multi-threaded versions. They're not really clean per se, and are getting adjusted to more meaningful abstractions in the concurrent-queueing branch (don't try to edit this branch, it's a work in progress).

ServiceBase and BatchTranslator

I put all of marian's translation code which I grabbed from mts in BatchTranslator. I'm not sure why you don't want them separately, I'd like to listen to the alternative. I need to load and store a graph (data) so I need a class (not a function). Some functions which could go in there are fit as functions.

@kpu
Copy link
Member

kpu commented Mar 10, 2021

I hope we are doing something about the redundant abstraction. AbstractTranslationModel can be merged with TranslationModel.

@abhi-agg wants multiple translators integration capability. My understanding is this means more than marian (maybe another CPP translator/whatever). Rename TranslationModel -> MarianIntegrationModel : AbstractTranslationModel and the existence could be justified, I guess.

Model loading from bytes removes all pretense of being decoder independent and I told Mozilla that. A decoder-independent interface design can only be tested properly if two decoders are being integrated. Otherwise we're just gallivanting overengineers and should stop this now.

@abhi-agg
Copy link
Contributor

Model loading from bytes removes all pretense of being decoder independent

@kpu IIUIC, if we want to integrate some other decoder in future we can accept their model files as bytes and the new interface (accepting bytes instead of files) would still work. Right? Am I missing something?

@kpu
Copy link
Member

kpu commented Mar 10, 2021

At least two other toolkits, Sockeye and fairseq, are in Python. Your C++ abstraction layer would need substantial refactoring to support them. And it's not clear anybody will want a C++ abstraction layer so they can do javascript -> C++ API -> Python -> C++ toolkit backend (i.e. MXNet). Or for that matter if you use JoeyNMT, which is written in JavaScript, you're not going to do JavaScript -> C++ API -> node running on top of WASM.

Toolkits would need some refactoring to efficiently accept binary files. And they are unlikely to agree on what files. Sockeye for instance normally treats models as directories with more files inside. It even has a separate file for the version number.

Please pick one path:

  1. Integrate one toolkit. Stop having two translation results.
  2. Integrate multiple toolkits and test your API design by showing that it scales to at least two of them.

There is no middle. Currently you have an API that pretends to support multiple toolkits but won't, which is costing time in extra implementation doodads. I agree we shouldn't be coupling Marian classes like History in directly. But too many levels of abstraction, without hard evidence that the abstraction adds value, is killing productivity.

@XapaJIaMnu
Copy link
Collaborator

@andrenatal @abhi-agg an example of how to load a binary model through a byte array can be found here for the bergamot version of the app:

void * model_bytes = bergamot::getBinaryModelFromConfig(options);

And here for the marian-translation-service version of the app:

void * model_bytes = bergamot::getBinaryModelFromConfig(options);

It requires this branch of marian which does some small fixes for loading intgemm models: https://github.com/browsermt/marian-dev/tree/binaryload_enable

Now the current functionality does not reduce the memory usage from the old approach. If you want the translator app to re-use the byte array memory that has been provided, you need to also change the binary format to SSSE3 only, as shown in this pull request. (And that would also mean regenerating all binary files that we distribute). What is the call?

@kpu
Copy link
Member

kpu commented Mar 18, 2021

Check for alignment and die if it doesn't.

@jerinphilip jerinphilip added the high-priority Needs to be done at the earliest label Mar 19, 2021
@kpu kpu closed this as completed in #55 Mar 22, 2021
@abhi-agg abhi-agg added this to the milestone 3 milestone May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-priority Needs to be done at the earliest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants