-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collapse draft API and actual implementation #77
Comments
On Feasibility of Collapse
Reasons to maybe retain Unified API So, unified API, while it was told this was library to combine multiple translator apps - the true intentions I can understand now to be a JS conversion layer, given that's where WASM bindings are set up. The classes are envisioned in JavaScript and translated without much thought for people on the other side to C++ (lacking constructors and missing constructs). So the API sits both in C++ and JS at the same time. So long as WASM direction for the project exist, there will be shackles on the classes exposed to WASM, and I'd rather these be on UnifiedAPI and not Because:
In general my worries are (1) WASM constraining the classes I am developing on, (2) Lack of timely feedback from Mozilla despite being a stakeholder (like what happened with Alignments PR). If there can be resolutions in both, I'm happy to collapse. |
@jerinphilip The primary purpose of this repo is integration. Get on with it. If you're just coding stuff for fun and expecting Mozilla to do all the integration, then there is almost zero value added from your code. |
I am in for merging the API class and the implemented class and I propose following:
Please correct me if I am wrong anywhere above. I am open for discussion on these points 👍 @jerinphilip Above comments are by no means a criticism of your work. I just want to list here the concerns from the perspective of the API consumer so that both of us can work together and resolve them if we go ahead with this task 👍 Please leave Alignments and Quality estimation out of the whole conversation while discussing this issue because (IIUC) we want to make this change before having these features in the |
I should be let finish #66 following the alignments PR (#46).
Umm.. No, at least for
You need to, for incoming features. That's the whole point. With this suggestion I run around incomplete types and we are back to square 1. This is the problem which gets worse as we go forward. Extension is using isAlignmentSupported, TranslationRequest at places despite not having a concrete object which is passed around. This is what needs to stop soon.
I'd claim otherwise - marian::bergamot, Translation*.
No, I wait another month or maybe more and do more work overall while you still work with incomplete types? Merge it in, it's harmless (doesn't break anything since it operates in marian layer), use the Response with complete concrete types in your efforts to establish feature-parity between |
As I said above, I am flexible and open for proposals on how we should go about this change. Just want to agree on 1 strategy and execute this task to save everyone's time and energy. |
The following needs to happen before a collapse to save everyone's time and energy.
I suggest we do the collapse in a 3 step approach:
@abhi-agg It needs to be acceptable to you that software can do a bunch of iterations until it becomes stable. If you remember, I salvaged the existing |
Edit: The following was my proposition for a collapse, in topological sorted order based on dependencies. I'm collapsing it to not bloat this thread. Click to expand
Preliminary cleanup and provision of concrete classes
Establish equivalencyTranslationModel - Service
I hope the above resolves TranslationResult - Response
Collapse
|
Good.
I agree it would be less work to implement the alignment and quality estimating passing once rather than twice.
The changes should be focused. i think by "single PR" you're really getting at it shouldn't have other stuff changed in them. Multiple steps towards this i.e. removing Vocab from Service as a focused change could happen, yes?
Agreed. I think removing those should be a first step.
I agree it should just be what the client needs. The consumer shouldn't be providing all that stuff in the constructor; the Aside: I don't know why it's
This is a merger. Let's take the language of dominance out of this. I don't care if the result of the merger begins with
I am embarrassed by the documentation of
There will be changes to the API. After all, Mozilla asked for binary model loading which exists in C++ code and you want alignments. That said, I won't want to come in with a wrecking ball for the sake of it and want to be stable. |
Just to be sure, it means we make this change and merge it before we merge Alignment+QE PR. Right?
Yes and yes.
I wrote in my comment before stating all the points that it is just a proposal and I am flexible and open for discussion on these points. Using the language of dominance wasn't my intention at all. I am sorry if it came out that way.
I am fine with removing the Translation* classes and let the
As you also said that these changes need to happen, I see following tasks at hand:
3rd is clearly my responsibility (but it can happen only after 1st and 2nd are done). I can do 1st as well. I would like to know if I can contribute in some way for 2nd. If I missed any task in this list then please feel free to add it here 👍 |
Towards #77. Response previously required marian-deep objects like histories and vocabs for construction. It has been decided that this structure needs to be exposed and should not have marian-internals. This commit is the final nail in the coffin of Response as a marian object concept. The construction responsiblity of Response is now moved to a ResponseBuilder class.
Towards #77. Reduces Request to require only (Id, Segments, ResponseBuilder). Request is an internal structure for Service and *will not be* exposed. However, the constructor was in fact dreadful and can now be cleaned up to simplify that ResponseBuilder nicely abstracts away the transition process from Request -> Response.
Towards #77. We improve doxygen compatible documentation for the external API exposed by Service. There are constructors using marian internal structures (like Options). But there is also a provision of a string (current Unified API subsitute for Options) based constructor and should suffice for a client who wants to not know of marian internals.
Towards #77. Brings in the concept of ResponseOptions, which should be the intended outcome of TranslationRequest, providing additional options along with the source string describing how to construct Response. It's currently typedef-ed to Ptr<marian::Options>, and subject to change if and when UnifiedAPI can bring it's own proper equivalent. Through options, ResponseBuilder now has if-guards to construct QualityScores or Alignments making it a no-op if the options are unset, bringing in the desired feature, albeit with a reinvented wheel structure which is not improper.
Towards #77. - What was earlier translateWithOptions is now renamed to translate, making it a method overload based on arguments. - A new translateMultiple is provided, which is blocking. This allows the browser to sent separate vector<text> which is queued as a whole, but each text maps to a Request each for which there is a Response. This is closer to what exists in TranslationModel as of now, except there are speed gains from properly batching. translateMultiple accepts TranslationRequest as well, although it does nothing with it.
Towards #77. This updates TranslationModel to use the new Service::translateMultiple(...) API to efficiently pack batches by compiling multiple Requests and using only using one translate-dispatch call for all Requests. Due to changes in batching/ordering and consequently difference in approximations after quantization, the outputs are now different from those earlier (breaking bergamot-translator-app regression-tests) but correct nonetheless (regression-tests output now need updating).
Towards #77. This updates TranslationModel to use the new Service::translateMultiple(...) API to efficiently pack batches by compiling multiple Requests and using only using one translate-dispatch call for all Requests. Due to changes in batching/ordering and consequently difference in approximations after quantization, the outputs are now different from those earlier (breaking bergamot-translator-app regression-tests) but correct nonetheless (regression-tests output now need updating).
Towards #77. This function is added back to provide a state of the source during collapse where nothing is expected to break at the extension, dropping the UnifiedAPI from the loop. It is certain that extension cannot be using it anywhere despite it's existence due to the functionality missing.
@kpu @jerinphilip Please let me know which PRs belong to this issue so that I can review them 👍 On a side note, I noticed that you already merged the Alignment+QE to main branch although we agreed that we should intercept that change after this issue is resolved. If you are planing to do things differently than what we agreed here, a heads up would be nice 👍 |
In order:
|
Do I need to review this one (it was created 6 days ago and I am not a reviewer there)? In general (not just specific to this PR), if you want me to review any of the PRs then it would be great if you can add me as a reviewer explicitly. It would be more time efficient 👍
Both of these PRs are marked as draft. I take it as a WIP from your side. So not ready for review? |
Towards #77. Response previously required marian-deep objects like histories and vocabs for construction. It has been decided that this structure needs to be exposed and should not have marian-internals. This commit is the final nail in the coffin of Response as a marian object concept. The construction responsiblity of Response is now moved to a ResponseBuilder class.
Towards #77. Reduces Request to require only (Id, Segments, ResponseBuilder). Request is an internal structure for Service and *will not be* exposed. However, the constructor was in fact dreadful and can now be cleaned up to simplify that ResponseBuilder nicely abstracts away the transition process from Request -> Response.
Towards #77. We improve doxygen compatible documentation for the external API exposed by Service. There are constructors using marian internal structures (like Options). But there is also a provision of a string (current Unified API subsitute for Options) based constructor and should suffice for a client who wants to not know of marian internals.
Towards #77. Brings in the concept of ResponseOptions, which should be the intended outcome of TranslationRequest, providing additional options along with the source string describing how to construct Response. It's currently typedef-ed to Ptr<marian::Options>, and subject to change if and when UnifiedAPI can bring it's own proper equivalent. Through options, ResponseBuilder now has if-guards to construct QualityScores or Alignments making it a no-op if the options are unset, bringing in the desired feature, albeit with a reinvented wheel structure which is not improper.
Towards #77. - What was earlier translateWithOptions is now renamed to translate, making it a method overload based on arguments. - A new translateMultiple is provided, which is blocking. This allows the browser to sent separate vector<text> which is queued as a whole, but each text maps to a Request each for which there is a Response. This is closer to what exists in TranslationModel as of now, except there are speed gains from properly batching. translateMultiple accepts TranslationRequest as well, although it does nothing with it.
Towards #77. This updates TranslationModel to use the new Service::translateMultiple(...) API to efficiently pack batches by compiling multiple Requests and using only using one translate-dispatch call for all Requests. Due to changes in batching/ordering and consequently difference in approximations after quantization, the outputs are now different from those earlier (breaking bergamot-translator-app regression-tests) but correct nonetheless (regression-tests output now need updating).
The following class pairs should collapse into one class.
Service
andTranslationModel
Response
andTranslationResult
Request
andTranslationRequest
My understanding is the
Translation
classes came from the "Unified API" which was a draft of what the API could look like. During that exercise, we agreed that the API would change with actual implementation. It always does. An API written without implementation inevitably does not capture the full complexity of the task. It also adds unnecessary complexity with functionality that neither the backend nor front end actually uses. Moreover, it does not consider what is efficient to pass and there's no point to multiple format conversions.Ideally a skeleton API class should be subsumed into the implemented class. There will be no converter between them, which is just a source of bugs.
My understanding is nobody is calling alignments and quality estimation, so it's easiest to merge there.
@jerinphilip Let's avoid arguing about casing for file names and try to avoid reinvention where possible.
@abhi-agg A working implementation should not be taxed because its design differs.
@motin @mlopatka @andrenatal Happy to discuss.
The text was updated successfully, but these errors were encountered: