Model loading from byte buffer #50

jerinphilip · 2021-03-09T13:11:54Z

For model load I see modifications to Service and BatchTranslator. Service is currently subclassed as a NonThreaded implementation and a multithreaded one. Calls to initialize BatchTranslator happens in respective constructors.

The model loading happens actually in BatchTranslator.

bergamot-translator/src/translator/batch_translator.cpp

Lines 26 to 39 in f17f02a

    
           graph_ = New<ExpressionGraph>(true); // always optimize 
        
           auto prec = options_->get<std::vector<std::string>>("precision", {"float32"}); 
        
           graph_->setDefaultElementType(typeFromString(prec[0])); 
        
           graph_->setDevice(device_); 
        
           graph_->getBackend()->configureDevice(options_); 
        
           graph_->reserveWorkspaceMB(options_->get<size_t>("workspace")); 
        
           scorers_ = createScorers(options_); 
        
           for (auto scorer : scorers_) { 
        
             scorer->init(graph_); 
        
             if (slgen_) { 
        
               scorer->setShortlistGenerator(slgen_); 
        
             } 
        
           } 
        
           graph_->forward();

So I'll need createScorers (L32) from marian with a bytesbuffer which I pass all the way from whichever implementation of Service requires it. The concurrent-queuing implementation is a bit ahead and both messier and cleaner depending on places (imo), but I'm taking responsibility of bringing it to sync once you have integrated your changes in main.

XapaJIaMnu · 2021-03-09T23:57:34Z

So... I have been working off main and apparently that branch is very far from up-to-date. Sadly.
If you check https://github.com/browsermt/bergamot-translator/tree/load_as_byte_arr this exposes the byte array loading all the way to the top. Which branch do I need to merge this with so that I can write an example code?
I hope we are doing something about the redundant abstraction. AbstractTranslationModel can be merged with TranslationModel. I also didn't understand the purpose of separate Service and ServiceBase or ServiceBase and BatchTranslator

jerinphilip · 2021-03-10T08:57:41Z

I hope we are doing something about the redundant abstraction. AbstractTranslationModel can be merged with TranslationModel.

@abhi-agg wants multiple translators integration capability. My understanding is this means more than marian (maybe another CPP translator/whatever). Rename TranslationModel -> MarianIntegrationModel : AbstractTranslationModel and the existence could be justified, I guess.

Which branch do I need to merge this with so that I can write an example code?

Critical bit: It is main. You're probably missing -DUSE_WASM_COMPATIBLE_SOURCES=off without which everything is turned off. bergamot-translator is running WASM default. I settled for the flag and the ability to switch modules to compile against full code in my development work. Have a look at:

doc/marian-integration.md

I also didn't understand the purpose of separate Service and ServiceBase

ServiceBase has common elements between the single-threaded and multi-threaded versions. They're not really clean per se, and are getting adjusted to more meaningful abstractions in the concurrent-queueing branch (don't try to edit this branch, it's a work in progress).

ServiceBase and BatchTranslator

I put all of marian's translation code which I grabbed from mts in BatchTranslator. I'm not sure why you don't want them separately, I'd like to listen to the alternative. I need to load and store a graph (data) so I need a class (not a function). Some functions which could go in there are fit as functions.

kpu · 2021-03-10T09:36:17Z

I hope we are doing something about the redundant abstraction. AbstractTranslationModel can be merged with TranslationModel.

@abhi-agg wants multiple translators integration capability. My understanding is this means more than marian (maybe another CPP translator/whatever). Rename TranslationModel -> MarianIntegrationModel : AbstractTranslationModel and the existence could be justified, I guess.

Model loading from bytes removes all pretense of being decoder independent and I told Mozilla that. A decoder-independent interface design can only be tested properly if two decoders are being integrated. Otherwise we're just gallivanting overengineers and should stop this now.

abhi-agg · 2021-03-10T10:51:33Z

Model loading from bytes removes all pretense of being decoder independent

@kpu IIUIC, if we want to integrate some other decoder in future we can accept their model files as bytes and the new interface (accepting bytes instead of files) would still work. Right? Am I missing something?

kpu · 2021-03-10T12:00:16Z

At least two other toolkits, Sockeye and fairseq, are in Python. Your C++ abstraction layer would need substantial refactoring to support them. And it's not clear anybody will want a C++ abstraction layer so they can do javascript -> C++ API -> Python -> C++ toolkit backend (i.e. MXNet). Or for that matter if you use JoeyNMT, which is written in JavaScript, you're not going to do JavaScript -> C++ API -> node running on top of WASM.

Toolkits would need some refactoring to efficiently accept binary files. And they are unlikely to agree on what files. Sockeye for instance normally treats models as directories with more files inside. It even has a separate file for the version number.

Please pick one path:

Integrate one toolkit. Stop having two translation results.
Integrate multiple toolkits and test your API design by showing that it scales to at least two of them.

There is no middle. Currently you have an API that pretends to support multiple toolkits but won't, which is costing time in extra implementation doodads. I agree we shouldn't be coupling Marian classes like History in directly. But too many levels of abstraction, without hard evidence that the abstraction adds value, is killing productivity.

XapaJIaMnu · 2021-03-10T17:51:55Z

@andrenatal @abhi-agg an example of how to load a binary model through a byte array can be found here for the bergamot version of the app:

bergamot-translator/app/bergamot-translator-app-bytearray.cpp

Line 26 in 5a267e0

void * model_bytes = bergamot::getBinaryModelFromConfig(options);

And here for the marian-translation-service version of the app:

bergamot-translator/app/service-cli-bytearray.cpp

Line 18 in 5a267e0

void * model_bytes = bergamot::getBinaryModelFromConfig(options);

It requires this branch of marian which does some small fixes for loading intgemm models: https://github.com/browsermt/marian-dev/tree/binaryload_enable

Now the current functionality does not reduce the memory usage from the old approach. If you want the translator app to re-use the byte array memory that has been provided, you need to also change the binary format to SSSE3 only, as shown in this pull request. (And that would also mean regenerating all binary files that we distribute). What is the call?

kpu · 2021-03-18T14:19:50Z

Check for alignment and die if it doesn't.

jerinphilip assigned XapaJIaMnu and jerinphilip Mar 9, 2021

jerinphilip mentioned this issue Mar 10, 2021

Updated marian submodule #51

Merged

jerinphilip mentioned this issue Mar 10, 2021

Remove Histories from Response #53

Closed

This was referenced Mar 18, 2021

Binaryload enable browsermt/marian-dev#27

Closed

Load mode as a byte array #55

Merged

jerinphilip added the high-priority Needs to be done at the earliest label Mar 19, 2021

kpu closed this as completed in #55 Mar 22, 2021

abhi-agg added this to the milestone 3 milestone May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model loading from byte buffer #50

Model loading from byte buffer #50

jerinphilip commented Mar 9, 2021

XapaJIaMnu commented Mar 9, 2021 •

edited

Loading

jerinphilip commented Mar 10, 2021 •

edited

Loading

kpu commented Mar 10, 2021

abhi-agg commented Mar 10, 2021

kpu commented Mar 10, 2021 •

edited

Loading

XapaJIaMnu commented Mar 10, 2021

kpu commented Mar 18, 2021

Model loading from byte buffer #50

Model loading from byte buffer #50

Comments

jerinphilip commented Mar 9, 2021

XapaJIaMnu commented Mar 9, 2021 • edited Loading

jerinphilip commented Mar 10, 2021 • edited Loading

kpu commented Mar 10, 2021

abhi-agg commented Mar 10, 2021

kpu commented Mar 10, 2021 • edited Loading

XapaJIaMnu commented Mar 10, 2021

kpu commented Mar 18, 2021

XapaJIaMnu commented Mar 9, 2021 •

edited

Loading

jerinphilip commented Mar 10, 2021 •

edited

Loading

kpu commented Mar 10, 2021 •

edited

Loading