Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Histories from Response #53

Closed
jerinphilip opened this issue Mar 10, 2021 · 2 comments · Fixed by #80
Closed

Remove Histories from Response #53

jerinphilip opened this issue Mar 10, 2021 · 2 comments · Fixed by #80
Assignees
Labels
cleanup Something that can be refactored or better organized mod: marian Changes affecting marian-dev component

Comments

@jerinphilip
Copy link
Contributor

Moving from #50.

Given alignments and quality-scores have been extracted in #46, there is potentially no further use for keeping histories_. (@kpu let know if we need anything more from histories) This allows removing histories_, any lazy constructions I was keen on keeping before and the remaining data-members are not strictly marian internal anymore.

This could pave way for replacing TranslationResult (and thus not having two translation results) mentioned in #50. However, TranslationResult remains constrained by WASM limitations (like https://github.com/mozilla/bergamot-translator-old/issues/14), which I'm not sure if we want Response to be constrained by as well. Response is the intended output struct for marian-server replacement.

Assigning task to self.

/cc @kpu, please advise on the WASM situation with TranslationResult.

@jerinphilip jerinphilip self-assigned this Mar 10, 2021
@kpu
Copy link
Member

kpu commented Mar 10, 2021

I'm happy about the removal of Histories and collapse of result objects!

Until they respond to https://github.com/mozilla/bergamot-translator-old/issues/14 just use absl::string_view for the time being.

jerinphilip pushed a commit that referenced this issue Mar 18, 2021
Helps #53.

In preparation , the data-export types for Quality and Alignment are
pushed down to Response from TranslationResult and computed during
construction. This brings TranslationResult closer to Response, paving
way to avoid having two TranslationResults.

histories_ only remain for marian-decoder replacement usage, which can
be removed in a separate PR.
@jerinphilip
Copy link
Contributor Author

jerinphilip commented Mar 18, 2021

Given they've responded, and above commit for #46 is in, part of solving this issue is covered.

Skeleton of remaining work:

  • Move construction logic of Response outside (to maybe a Factory) to be triggered when Request completes. Response can after be reduced to a struct, just carrying data: source, target and respective annotations, Quality and Alignments. This will remove vocabs, and histories_ dependency in Response constructor. (Being tracked in Callback on Request completion to construct Response #65)
  • Not having histories_ will break functionality at replacement-decoder for benchmarks and running speed tests. Adjust the marian-decoder replacements to consume processed from history data (translation + annotation). Remove OutputCollector dependency. (Being tracked in Make marian-decoder-new consume Response instead of Histories #66)

Probably after this, there is nothing offending in Response class (no vocabs, no histories) to be even directly exported to WASM (through bindings). The removal of histories and moving construction out make Response equivalent to (desired) TranslationResult.

@jerinphilip jerinphilip added cleanup Something that can be refactored or better organized mod: marian Changes affecting marian-dev component labels Mar 27, 2021
kpu pushed a commit that referenced this issue Mar 31, 2021
* Draft adjustments to API

* Adjustments to docs

* Let's call the word + sentence ranges annotations

* Editing confusing comment on size()

* Fixing compilation for template adjustments for SentenceRanges

* string_view template hacks

This commit shifts AnnotatedBlob into a templated type and gets the
troubled part to compile. All to manage absl::string_view and
std::string_view.

Objective: marian::bergamot stays C++ 11 to pluck and put in marian
code, bergamot-translator somehow flexes C++17. Simplify development in
one place.

* Fixing the wiring: Gets source to build

Runtime errors exist, but AnnotatedBlobs are consistent.

* Bugfix: Matching old-state after factoring AnnotatedBlob in

* Removing vocabs_ from Response.

(For the umpteenth time).

* Alignment API ready in marian::bergamot::Response

* Wiring alignments upto TranslationResult

* Adjustment to get alignments; bergamot-translator-app has alignments available

* Accessing words instead of Ids

This code sets up access of word string_views from annotations instead
of printing Ids. However, we have segfault. This is likely due to
targetRanges not being set, pending from
#25.

Could also be a rogue EOS token which we're filtering for in string_view
annotations, but not so in alignments.

* Switching to browsermt/marian-dev@jp/decode-string-view for targetTokenRanges

* Target word byte range annotations available

Issues corresponding to #25 should be resolved. There is still a
segfault. Could be due to EOS. Pending investigation.

* Bugfix: Tokens for alignments are now through.

Was not EOS.

* browsermt/marian-dev@master

ByteRange changes work downstream and has been merged to master.
Updating submodule to point to master.

* Style and documentation enhancements: response.cpp

* Style and documentation enhancements: TranslationResult.h

* Descriptions for SentenceRanges templating

* Switching to marian-dev@wasm-sync

* AnnotatedBlob can be copy-ctord/copy-assigned

* TranslationResult: Empty ctor + WASM Bindings

Allows empty construction of TranslationResult. Using this empty
constructor, WASM bindings are adjusted. Unsure of the results, maybe
@abhi-agg can test.

* Cosmetic: SentenceRangesT -> Annotation

- SentenceRangesT is renamed to AnnotationT;
- Further comments to explain heavily templated files.

* Response: Cleaning up unused members and adding docs

* Adding quality scores - attempt

* Stub QualityScores

This adjustment adds capability to get "scores", which should
potentially indicate how confident (at least relative in a
target-sentence) should be. This enables writing the code forward for
TranslationResult, and an example quality-score people can be pointed
at.

- These are not between [0,1] yet.
- In addition, guards to check out-of-bounds access have been placed so
  illegal accesses are caught early on during development.

* Removing token debug statements

* Reworking Annotation without templates

mozilla#8 provides
ByteRanges.

- This ByteRange data-type is used in Annotation and converted
  to marian::string_view(=absl::string-view) on demand.
- Since Annotation[using ByteRange] is not bound to anything else, it
  can be unit tested. A unit test is added (originally to test
  independently for integration after).
- Annotation with ByteRange is now propogated across marian::bergamot
  and functionality matched to how it was previously working.

This eliminates the string-view conversion and template code.

* Nit: Removing std::endl flushes

* Bring TranslationResult and Response closer

Helps #53.

In preparation , the data-export types for Quality and Alignment are
pushed down to Response from TranslationResult and computed during
construction. This brings TranslationResult closer to Response, paving
way to avoid having two TranslationResults.

histories_ only remain for marian-decoder replacement usage, which can
be removed in a separate PR.

* Clean up hacks originally added for a unit-test to compile

* Moving Annotation functions to cpp and documenting header file

* Shifting alignments, qualityScore testing capability into main-mts

* Restore Unified API files to previous state

* Adaptations to fix Response with Quality, Alignments to connect to old Unified API

* Missing reset on TranslationResultBindings

* Cleaning up Response documentation to reflect newer code

* Minor adjustments to get build back after main sync

* Marian seems to make available Catch somehow

* Disable COMPILE_BERGAMOT_TESTS for WASM

* Add COMPILE_BERGAMOT_TESTS as a CMakeDependent option

* Use the COMPILE_TESTS flag instead to skip macos.yml

* Trigger unit-tests on GitHub runners for Annotation

* Reordering enable_testing() to before inclusion of test directory

* doc constructs required to operate with alignments

Documents with doxygen compatible documentation for Response,
AnnotatedBlob, Annotation, ByteRange.

Incorporates doxygen compatible documentation for

* Updates ByteRange consistent with general C++

Also little documentation enhancements in the process.

* Updating marian-dev@9337105

* Copy-paste documentation because lazy

* Turn off autoformat and manually edit to fix style changes

* AnnotatedBlob -> AnnotatedText; blob -> text

* text.text in test app renamed

* text of text -> blob of text in places of documentation
@jerinphilip jerinphilip linked a pull request Apr 4, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Something that can be refactored or better organized mod: marian Changes affecting marian-dev component
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants