Removes vocabs and propogates fixes for breaks #79

jerinphilip · 2021-03-31T14:55:15Z

This PR removes sourceVocab() and targetVocab() exposed by Service as requested by Mozilla for interface towards a cleaner API.

sourceVocab() is unused and harmless to remove. Removing targetVocab() breaks existing marian-decoder-new and consequently regression-tests. Further changes in the PR propogates the vocab change to provide a Response based output printing in the regression-test-app. We have lost capabilities to completely benchmark with marian-decoder for time being as the new OutputCollector equivalent is a dumbed down version which only couts lines.

Keeping the changes minimal for reducing review-load (edit: oops, sorry, accidentally removed a bit of marian::Histories in the process). Marking as ready for review.

Regression tests (2/4) are failing for bytearray loads for bergamot-translator-app and service, which is already known and will fix with submodule update.

Regression-tests for service-cli which now tests the preliminary alignment and annotations are succeeding if PR merges. bergamot-translator-app stays intact, so no harm done.

Checklist

Prettify diff to not run over unnecessary lines.
Regression-tests - basic: [1/4]
Regression-tests - speed: [1/1]

Conflicts: src/translator/response.cpp -> Bringing to sync with ByteArray updates src/translator/response.h -> Bringing to sync with ByteArray updates

We however have Histories in constructor, which we will remove out of the way soon.

jerinphilip · 2021-04-01T04:46:37Z

There is a potential bug in main. If a sentence is attempted to be inserted corresponding to an empty std::vector<string_view>, we have an invalid sentence in the flat container for string_views, where I additionally mark sentence boundaries. This breaks on WNGT20 sources.shuf in line 1695, where an invalid character sentence is provided and the source vocab eats it. The resulting target is empty with no string_view and causes issues with word-counts and also sentence boundary.

For now, I have pushed the eos corresponding string_view in to get WNGT20 to complete, which seems the simplest. Looks harmless for the time-being. I have strengthened the unit-tests in the debugging process. (These are the few additional commits).

This is handling invalid user-input - not something strictly wrong with the container (Annotation).

…entence is happening

kpu · 2021-04-06T12:42:26Z

Focused PRs please. This appears to be both fixing annotations and removing vocabs. Separate them.

jerinphilip · 2021-04-06T12:52:29Z

@kpu #85. I can only test the changes here, which is why they're in source. can you review that PR first?

(I discovered bug here, cherry-picked commits separating them to a different PR at #85; When they're in source, this diff will simplify).

kpu · 2021-04-06T16:53:44Z

By ordering these you've make it much harder to get in. You can simplify the diff here now.

… empty sentence is happening" This reverts commit f10d6d2.

…points" This reverts commit d8d89f3.

This reverts commit 7e2b664.

…aviour" This reverts commit 4970a62.

This reverts commit 0239a5b.

This reverts commit ba03154.

…rowsermt/bergamot-translator into jp/marian-decoder-new-output-collector

This reverts commit a315dc0.

jerinphilip · 2021-04-06T19:32:19Z

By ordering these you've make it much harder to get in. You can simplify the diff here now.

I have decoupled changes to simplify diff, this source breaks marian-decoder-new and the speed tests bound to it.

abhi-agg · 2021-04-07T10:13:45Z

@jerinphilip Does it make sense to merge main to this PR? It seems to be out of date with main. Although, I don't know whether it will impact the review or not 👍

kpu · 2021-04-07T10:19:36Z

@abhi-agg I hit the magic merge button, please review

jerinphilip · 2021-04-07T10:41:01Z

Does it make sense to merge main to this PR?

Well, there are no more bugs in main than what exists already with this in, so... Just breaks marian-decoder-new on invalid inputs thanks to sentencepiece that's all. Fix is in #85 which after hours of struggle I believe cannot improve much beyond.

jerinphilip · 2021-04-07T11:15:52Z

I'll take advantage of this situation and work the other way. I pull this source into annotation-bugfix branch through main.

Jerin Philip added 5 commits March 31, 2021 14:45

Removes vocabs and propogates fixes for breaks

138a032

Merge branch 'main' into jp/marian-decoder-new-output-collector

457fc69

Conflicts: src/translator/response.cpp -> Bringing to sync with ByteArray updates src/translator/response.h -> Bringing to sync with ByteArray updates

Prettify diff: Undoing comment shuffles due to merge conflict edits

28eb098

20% of time actual work, 80% prettifying diff

2a6b5b2

Histories members -> poof!

63da9bd

We however have Histories in constructor, which we will remove out of the way soon.

jerinphilip requested review from kpu and abhi-agg March 31, 2021 18:41

Jerin Philip added 3 commits April 1, 2021 04:14

Changing Annotation to adhere to [begin, end)

ba03154

Stronger unit tests on sentences + num words, num sentences

a315dc0

Hotfix with empty string view from EOS

0239a5b

Jerin Philip added 4 commits April 1, 2021 10:29

No more absolving empty-sentence; Added tests now defined behaviour

4970a62

Uncommenting important section in unit test

7e2b664

Ensure empty string view default, beginning at end so marker points

d8d89f3

Further strengthen and comment unit-tests, mark exactly where empty s…

f10d6d2

…entence is happening

jerinphilip added the cleanup Something that can be refactored or better organized label Apr 1, 2021

jerinphilip marked this pull request as draft April 1, 2021 20:31

Merge branch 'main' into jp/marian-decoder-new-output-collector

8385bd0

jerinphilip marked this pull request as ready for review April 2, 2021 20:24

jerinphilip mentioned this pull request Apr 3, 2021

Cleanup API: Refactor request on-complete transition #80

Merged

jerinphilip changed the base branch from main to jp/strengthen-annotation April 3, 2021 20:29

jerinphilip changed the base branch from jp/strengthen-annotation to main April 3, 2021 20:31

jerinphilip linked an issue Apr 4, 2021 that may be closed by this pull request

Make marian-decoder-new consume Response instead of Histories #66

Closed

jerinphilip added the mod: marian Changes affecting marian-dev component label Apr 4, 2021

jerinphilip mentioned this pull request Apr 6, 2021

Collapse draft API and actual implementation #77

Closed

Jerin Philip added 2 commits April 6, 2021 19:27

Revert "Further strengthen and comment unit-tests, mark exactly where…

3449f03

… empty sentence is happening" This reverts commit f10d6d2.

Revert "Ensure empty string view default, beginning at end so marker …

3c88dc1

…points" This reverts commit d8d89f3.

Jerin Philip added 6 commits April 6, 2021 19:28

Revert "Uncommenting important section in unit test"

4d7ef75

This reverts commit 7e2b664.

Revert "No more absolving empty-sentence; Added tests now defined beh…

3599bed

…aviour" This reverts commit 4970a62.

Revert "Hotfix with empty string view from EOS"

4b952c9

This reverts commit 0239a5b.

Revert "Changing Annotation to adhere to [begin, end)"

6ad5f78

This reverts commit ba03154.

Merge branch 'jp/marian-decoder-new-output-collector' of github.com:b…

3e4dff6

…rowsermt/bergamot-translator into jp/marian-decoder-new-output-collector

Revert "Stronger unit tests on sentences + num words, num sentences"

e1b4719

This reverts commit a315dc0.

Merge branch 'main' into jp/marian-decoder-new-output-collector

e5bc34d

kpu approved these changes Apr 7, 2021

View reviewed changes

abhi-agg approved these changes Apr 7, 2021

View reviewed changes

jerinphilip merged commit b71b3a1 into main Apr 7, 2021

jerinphilip deleted the jp/marian-decoder-new-output-collector branch April 7, 2021 23:11

jerinphilip mentioned this pull request Apr 29, 2021

WASM Bindings collapse #87

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removes vocabs and propogates fixes for breaks #79

Removes vocabs and propogates fixes for breaks #79

jerinphilip commented Mar 31, 2021 •

edited

Loading

jerinphilip commented Apr 1, 2021

kpu commented Apr 6, 2021 •

edited

Loading

jerinphilip commented Apr 6, 2021 •

edited

Loading

kpu commented Apr 6, 2021

jerinphilip commented Apr 6, 2021

abhi-agg commented Apr 7, 2021

kpu commented Apr 7, 2021

jerinphilip commented Apr 7, 2021

jerinphilip commented Apr 7, 2021

Removes vocabs and propogates fixes for breaks #79

Removes vocabs and propogates fixes for breaks #79

Conversation

jerinphilip commented Mar 31, 2021 • edited Loading

jerinphilip commented Apr 1, 2021

kpu commented Apr 6, 2021 • edited Loading

jerinphilip commented Apr 6, 2021 • edited Loading

kpu commented Apr 6, 2021

jerinphilip commented Apr 6, 2021

abhi-agg commented Apr 7, 2021

kpu commented Apr 7, 2021

jerinphilip commented Apr 7, 2021

jerinphilip commented Apr 7, 2021

jerinphilip commented Mar 31, 2021 •

edited

Loading

kpu commented Apr 6, 2021 •

edited

Loading

jerinphilip commented Apr 6, 2021 •

edited

Loading