Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial native message iface #93

Merged
merged 67 commits into from
Apr 13, 2022
Merged

Initial native message iface #93

merged 67 commits into from
Apr 13, 2022

Conversation

XapaJIaMnu
Copy link
Owner

I will develop a more complete native message interface, with json communication that will accept commands from the browser. For now though it is a simple stdin listener...

To test you can use the following python code:

import ctypes
def encode_msg(message: str) -> bytearray:
	msglen = len(message.encode('utf-8'))
	len_in_bytes = bytes(ctypes.c_int(msglen))
	msg_in_bytes = bytes(message.encode('utf-8'))
	return len_in_bytes + msg_in_bytes

In [1]: encode_msg("Hello, world")
Out[1]: b'\r\x00\x00\x00Hello, world'

Then you get to experience some buffer overflow because it's 1:30 AM and I will fix it tmrw..

% echo "\x0c\x00\x00\x00Hello, world" | ./translateLocally -p                                                                                                                                              :(
Loading model: English-German tiny
Native msg interface accepting messages...
Received message size of: 12
Received message size of: 12 with content: Hello, world
Hallo, Welt
Received message size of: 10
Received message size of: 10 with content: ���

Received message size of: 10
Received message size of: 10 with content: ����

Needless to say this is an early draft and has a lot of space for improvements.

I propose that this client will accept the following json encoded commands:

  1. What models do you have?
  2. What remote models do you have?
  3. Load model X.
  4. Download and load model Y.
  5. Translate.
  6. Die.

It will also respond to all of them with some well formatted json that we can read in the browser.
@kpu @jelmervdl @jerinphilip comment on json.

I will develop a more complete native message interface, with json communication that will accept commands from the browser. For now though it is a simple stdin listener...
@jerinphilip
Copy link

jerinphilip commented Feb 11, 2022

I really appreciate the initiative and first draft here which gives us something, to begin with, discuss and synchronize ideas. I have tried to respond in a more general sense below as this is the early stages.

This will need to be coordinated with at least @jelmervdl. There needs to be parity with the existing WebAssembly extension code (we can have both with minimal overhead, we don't need to sunset one then try other stuff). I'm certain we can drop the remote models feature in the first iteration.

A desired feature in translateLocally is translations by language code (mentioned by @kpu already). The extension will most likely communicate in terms of language codes (@jelmervdl has expressed interest in doing so in slack if I remember correctly; If I were writing the JS, this is how I'd go). A "query by language to obtain model" feature with a wider scope is not strictly required when focusing on extension, but perhaps the one that fits in the best ((browsermt/bergamot-translator#320 and we can pool our efforts to make something query-able and better).

On message content, we need to discuss both directions. For an MVP, I think we only need to translate and translateLocally should do the rest (model loading, going from lang-code to model, etc). Note that we will still need the download, etc you've proposed in code, just that we're not allowing extension client interaction with those verbs.

Input

{
  "src": "en", 
  "tgt": "de",
  "id": "<id>",
  "input": "The quick brown fox jumps over the lazy dog.",
  "options": {
    "html": false,
    "quality": true,
    "alignments": true
  }
}

Output: on success

{
  "status": "ok",
  "id": "<id>",
  "response": "<toJSON(Response)>"
}

Output: on failure

{
  "status": "error",
   "id": "<id>",
  "message":  "<err.what()>"
}

Edit: Included id following comment below.

std::cin.read(input.get(), ilen);
std::cerr << "Received message size of: " << ilen << " with content: " << input.get() << std::endl;
translator_->translate(QString(input.get()));
eventLoop_.exec(); // Do not read any input before this translation is done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're doing serial? Perhaps we will allow multiple inputs and opts in this case then?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining js doing the batching, but as you and Jelmer pointed out, probably it's best to do it on the CPP side. Will change.

@jelmervdl
Copy link
Collaborator

In my extension I skipped the "load model" and "download & load model" messages because that's state that would need to be synced between both sides that are talking. However, it is useful to have these to "pre-load" the model before you start sending translation input. So let's keep them, but make sure that multiple calls to them, i.e. when the model is already loaded, don't cause trouble. Same for calling a download & load model when that specific model (and version) is already downloaded.

Some message for getting a list of available models is great. If that doesn't scale (because too many models when you include remote repositories maybe?) you could limit it by search for source and target languages (lists for each sides though, because the extension often has a list of languages it thinks the page can be in, and same for languages the user might like.)

I don't think there is need for a die message. I'd say closing stdin is enough of a signal.

Let's not use MarianInterface here. That one is specifically optimised for latency, and cancels all intermediate translation requests (and cannot batch multiple translation requests together) which is not ideal for a plugin-type usage

I would also want to propose to make all messages async. I.e. each message received by this endpoint has a message id, and the response will also have that same message id, but there is no guarantee that the order of requests and responses are the same. This way we can

  1. much better make use of AsyncService etc,
  2. do better batching on the C++ side instead of the javascript side,
  3. have translation responses returned as fast as they're available (instead of having a BlockingService like API),
  4. implement translation request cancellation at some point.

@kpu
Copy link
Collaborator

kpu commented Feb 11, 2022

Yes, async is better.

While there should be a managed default per language pair, language codes shouldn't be the sole identifier of a model. We already have speed variants and there will soon be OPUS overlap too. Search by pair makes sense here: that way pivoting and multilingual models can be hidden from the client rather than enumerating a long list of possibilities.

A load message is good to reduce latency when typing happens. Recommend combining the "load model" and "download & load model" message so it's identical from the client. And the message should be idempotent as Jelmer suggests. Response could be download progress, loaded, or error.

@XapaJIaMnu
Copy link
Owner Author

Suggestions heard, stand by later today for async variants. @jelmervdl I will keep using marianIface for now and replace it at a later point.

The python tester allows to quickly test some changes. Extra work to be done. Pending.
The old translation method will not and I am not sure why. We need to create a new service anyways, but this is a problem for the Nick of tomorrow.
…n requests. Lock on stdout writing via callback

Early versions. Several issues. Number one is termination. Is there a way of knowing when all service work is done so that we can safely die, before we have any queued translations pending?
@XapaJIaMnu
Copy link
Owner Author

First MVP-ablae version. It reads and writes json, including e mssage id (although I think I don't set the id yet, but that's easy to fix).

What is missing:

  1. Termination condition. Is there a way of knowing when service has finished all of its work and doesn't have any pending threads? I remember @jelmervdl had similar concerns when implementing quit immediately? At the moment, if a termination signal is sent, all threads will be killed immediately and any in flight translations will never arrive.

  2. Constructing models. Currently everything is hardcoded. Need to implement a model change, which is all conditioned on 1).

  3. What json do we actually send back. Response is a pretty big object, do we need all of its entries?

@jelmervdl
Copy link
Collaborator

jelmervdl commented Feb 14, 2022

Is there a way of knowing when service has finished all of its work and doesn't have any pending threads?

AsyncService's destructor will block until all translation requests are processed and all workers have finished their tasks. So you can just let it go out of scope (or reset a unique ptr or something, anything that causes its destructor to be called) and after that you're safe.

Think:

int main() {
  {
    AsyncService service;

    Message message;
    while (getMessage(std::cin, message)) {
      if (!models.has(message.model))
        models[message.model] = loadModel(message.model);

      service.translate(models[message.model], message.source, [](Response &&response){
        scoped_lock(std::cout);
        std::cout << toJSON(message.id, response);
      });
    }

    // now service will be deallocated, which will block until it finished.
    // At that point all requests are handled, all callbacks called.
  }

  std::cerr << "Done!\n";
  return 0;
}

Constructing models. Currently everything is hardcoded. Need to implement a model change, which is all conditioned on 1).

With AsyncService you don't need to wait for the service to finish. You can just load up another model, and when queueing new translation requests use that one. If you want to release models that are no longer used you can release its shared pointer (all pending requests that still need it will have a shared_ptr to it, so it will be released once that's safe to do.) (but how do you know that the next request won't need them)

What json do we actually send back. Response is a pretty big object, do we need all of its entries?

For practical purposes, I'd only need {id: Number, target: String} at the moment. But if you want to use alignment info, you'd also want source and alignments obviously. But if you expose id + target text, I can try to plug the thing into the firefox extension as an experiment.

@jerinphilip
Copy link

jerinphilip commented Feb 14, 2022

I have the same inputs as @jelmervdl. However, I want to do something "incremental" for the following:

What json do we actually send back. Response is a pretty big object, do we need all of its entries?

For practical purposes, I'd only need {id: Number, target: String} at the moment.

I gave this some thought while initially referring to https://github.com/mozilla/bergamot-translator/wiki/Unified-API-Example designing Response, keeping in mind a JSON export and preparing https://github.com/jerinphilip/lemonade/blob/7eca7e425b027b445a0047b1191d2989f9b0494c/lemonade/lib/json_interop.h#L45-L87. I propose to go with the following in the first iteration for toJSON(Response):

{
	"source": {"text": "..."},
	"target": {"text": "..."}
}

We may drop the "source" key in the first iteration but the structure needs to incrementally grow and mimic C++ Response, staying consistent across JSON, C++, WebAssembly and Python bindings.

I have also opened browsermt/bergamot-translator#350 to make WebAssembly consistent with the JSON so both can operate easily.

@XapaJIaMnu
Copy link
Owner Author

@jerinphilip , we had a discussion offline with @jelmervdl about the stdout buffer size: https://stackoverflow.com/questions/10904067/in-c-whats-the-size-of-stdout-buffer
It's much smaller than what I thought and it is also difficult to judge what the size is going to be on different platforms and different browsers. Currently, each worker executes a callback that writes to stdout before pushing for a new translation. A bad client that sends many small requests but doesn't read stdout before all data is sent will cause our translation threads to stall until the buffer is cleared..

This is why I am questioning what strictly needs to be in the json. It is otherwise relatively trivial for me to put extra things in there.

@jelmervdl
Copy link
Collaborator

jelmervdl commented Feb 14, 2022

Oh but it is more fun than that! I just noticed:

The maximum size of a single message from the application is 1 MB. The maximum size of a message sent to the application is 4 GB.

Source: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_messaging#app_side

At least Firefox seems to be doing concurrent reading & writing using async IO so no deadlock because "it can't read from stdin while it is still trying to write to stdout" on our side I expect. (I haven't looked at other browsers.)

this.readPromise = this.proc.stdout
      .readUint32()
      .then(len => {
        if (len > NativeApp.maxRead) {
          throw new ExtensionError(
            `Native application tried to send a message of ${len} bytes, which exceeds the limit of ${NativeApp.maxRead} bytes.`
          );
        }
        return this.proc.stdout.readJSON(len);
      })
      .then()

Source: https://searchfox.org/mozilla-central/source/toolkit/components/extensions/NativeMessaging.jsm#165.

So it seems long term we might need to add some sort of multipart messaging response messaging which I'm really dreading. How are you going to break up a large alignments table that in itself exceeds 1mb when encoded as JSON? Because the messages till have to each be valid JSON 😭

For an MVP I doubt that a Firefox extension will hit that limit, it wants small chunks of translated text anyway to make it feel more like loading a progressive JPEG.

Edit: to clarify, for an MVP I think we can get away with locking/writing to stdout from the callback directly. Long term we'll need to move to some queue-based thing where there is a thread that reads the queue, encodes the responses into one or more messages, etc. We can then decide whether we want a bound or unbound queue (bound -> stop translating if nobody reads stdout, unbound -> continue translating hoping that someone starts reading stdout before we run out of memory)

Edit 2: I do agree with Jerin that it would be good to have a matching output across most APIs. I don't see any problems with basically following bergamot-translator's Response object. But it looks like the native messaging API might need an extra layer of protocol on top of it to deal with the message size limits. I'll also have a look around how others deal with this issue.

@jerinphilip
Copy link

jerinphilip commented Feb 14, 2022

How are you going to break up a large alignments table that in itself exceeds 1mb when encoded as JSON?

To be clear, I am ideally not looking at exporting the hard-alignments table. But life is simpler if I have a JSON representation of the struct so I quickly know what to edit. Alignments are in response consumption at HTML, we're just choosing not to clear it out because WebAssembly VM would have already allocated it.

The simpler version of what I'm trying to say is keep the keys you want from the existing composition of Response structure. Leave the remaining out for later addition if possible.

target: String vs {"target": {"text": "..."}} is not going to make much of a difference w.r.t size or stdout issues, but the consistency will help me easily develop at least.

@XapaJIaMnu
Copy link
Owner Author

If we want the NativeInterface response json to 100% resemble response itself, the proper thing to do would be for toJson() method to be added to the Response. That way a change in the API would not require code changes in multiple places.

@jelmervdl
Copy link
Collaborator

If we want the NativeInterface response json to 100% resemble response itself, the proper thing to do would be for toJson() method to be added to the Response. That way a change in the API would not require code changes in multiple places.

True, except that ideally we don't want to deal with JSON strings in most interface implementations. The Python bindings convert a response to native python objects. WASM hides the serialisation/deserialisation in the emscripten bindings (which does a lot on demand instead of up front). The part where we communicate with a JSON string is specific to this interface. Just the names of things, and how they are nested, is the same across all interfaces.

@XapaJIaMnu
Copy link
Owner Author

@jelmervdl I see. The response struct has multiple sub-structs and some of them have sub-structs. Is there one place where I can view the whole scheme of the configuration?

@jelmervdl
Copy link
Collaborator

I don't have one 😅 Maybe @jerinphilip does. Things like the Annotation part of an AnnotatedText I have never used outside C++, so I don't know what the interface to that would be. Internally it works with a sort of byte offset array, but I'm not sure we would want to expose that implementation detail via JSON… I suppose figuring this out is worth an issue on its own.

For now, really the only important bit is {"target": {"text": String}}. That matches what is in Response, and is the only thing necessary to test implementations.

@XapaJIaMnu
Copy link
Owner Author

@jelmervdl currently it looks like:

./native_client.py                                                                                                                                                               
(None, None)
b'/\x00\x00\x00{"text": "Hello world!", "id": 0, "die": false}e\x00\x00\x00{"text": "Sticks and stones may break my bones but words WILL NEVER HURT ME!", "id": 1, "die": false}?\x00\x00\x00{"text": "Why does it not work well...", "id": 2, "die": false}'
Loading model: English-German tiny
{
"id": 2,
"target": {
"text": "Warum funktioniert es nicht gut..."
}
}

{
"id": 0,
"target": {
"text": "Hallo Welt!"
}
}

{
"id": 1,
"target": {
"text": "Stöcke und Steine können mir die Knochen brechen, aber Worte werden NIEMALS HURT ME!"
}
}

Tomorrow, I will add model loading functionality.

Removed service shared pointer and put it inside the thread. Clean shutdown. Removed syncrhonisation with C style iostream. Modified the json
@jelmervdl
Copy link
Collaborator

Few tweaks, but I'm getting somewhere with this!
image

(Don't worry about that à situation, that might be because I'm giving it HTML to translate without ResponseOptions.HTML=true)

…ifest.json extension-id

No way to change that. So easy hack for now: if you see the extension-id as an argument, switch to nativeMessaging.
And flush, otherwise the message won't make it directly to Firefox
I need HTML in the extension because there is no non-HTML mode anymore
@jelmervdl
Copy link
Collaborator

I've pushed some additions that were necessary to get it to work with Firefox, but this now works with https://github.com/jelmervdl/firefox-translations/tree/native-messaging (given you go to a page that matches the model that translateLocally selected)

Wrapped one string in tr, removed unneeded headers and forward defines and used constexpr if template instead of overloaded functions
@jelmervdl
Copy link
Collaborator

This code should now work with the latest version of my Firefox plugin. Be sure to switch to TranslateLocally in the extension settings.

(I have only tested it on macOS so far, but will test it with Windows and Ubuntu when I have the time for those VMs to boot up, install Firefox in them, etc.)

@jelmervdl
Copy link
Collaborator

Native messaging looks broken on Windows. If I try the python test script there, it does not read the full input. So either the length at the start of the message is incorrect, or it writes more bytes than the reported length. I don't think it is an issue on the Python side since Firefox also reports not being able to parse the returned JSON data.

std::cout doesn't automatically try to convert newlines or anything, right?

@XapaJIaMnu
Copy link
Owner Author

Might be an issue due to Windows using utf 16 by default?

@jelmervdl
Copy link
Collaborator

Apparently windows does do automatic line ending conversion on files opened in text mode, and stdin/out/err are all opened in text mode by default. See also https://stackoverflow.com/questions/16888339/what-is-the-simplest-way-to-write-to-stdout-in-binary-mode#16888382

@jelmervdl
Copy link
Collaborator

jelmervdl commented Mar 31, 2022

I fixed the binary/text mode conversion on Windows. The python script now runs through its tests without issue 🥳

The debug build on windows runs into another issue though, that I have no clue about… I'll wait for Github CI to build a release build to see whether that does work. But if anyone has seen anything like this before, let me know:
image

Edit: my wild guess is somewhere std::isspace is called with a (partial) unicode char. But the main TranslateLocally GUI does not complain when I translate stuff with umlauts, neither in the input nor output.

Edit 2: it is somewhere in the HTML bit, which is why it doesn't trigger in TranslateLocally GUI mode

@jelmervdl
Copy link
Collaborator

Okay, found that as well. Waiting for browsermt/bergamot-translator#396 to be merged, and then I'll pull that into one final commit to get browser translations on Windows to work out of the box.

Documentation is explicit about only calling it with unsigned char, and Windows' C++ runtime is checking for that.
@jelmervdl
Copy link
Collaborator

I've confirmed that native client now works with Firefox and the browser extension out of the box (well, you have to start TranslateLocally once) in Windows 10, macOS 12.3 and Ubuntu 20.10. 🎊

image

@XapaJIaMnu
Copy link
Owner Author

We should add the native message configuration to the read me.

@XapaJIaMnu
Copy link
Owner Author

@jelmervdl , could you please add to the README.md instructions how to use the native message interface, and then we can merge this.

@jelmervdl
Copy link
Collaborator

I added a brief description about the native messaging functionality. It is mostly just pointers to where to find the information you'd need if you want to develop for it. Let me know whether you think anything is missing.

@XapaJIaMnu
Copy link
Owner Author

I think this is fine for now. We can do further prs like the one that would fix #99 later on. I'm in favour of merging it.

@XapaJIaMnu XapaJIaMnu merged commit 50db857 into master Apr 13, 2022
@XapaJIaMnu XapaJIaMnu mentioned this pull request Apr 13, 2022
@jelmervdl jelmervdl deleted the nativemsg_cli branch July 4, 2022 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants