Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed quality-scores as HTML tag attributes #358

Merged
merged 14 commits into from
Feb 25, 2022
Merged

Embed quality-scores as HTML tag attributes #358

merged 14 commits into from
Feb 25, 2022

Conversation

jelmervdl
Copy link
Member

Extended fix for #355.

This is #357 but with the quality scores also exposed as <font x-bergamot-sentence-score=""> and <font x-bergamot-word-score=""> tags in the HTML output. It makes the HTML output itself look pretty horrific, but the resulting rendered HTML can easily be styled to show the scores.

Example: Have you seen my <strong>car</strong> recently? I lost it.

Italian: <font x-bergamot-sentence-score="-0.112859"><font x-bergamot-word-score="-0.273196">Hai</font> <font x-bergamot-word-score="-0.0245442">visto</font> <font x-bergamot-word-score="-0.193084">la</font> <font x-bergamot-word-score="-0.00968757">mia</font></font> <strong><font x-bergamot-sentence-score="-0.112859"><font x-bergamot-word-score="-0.131326">auto</font></font></strong> <font x-bergamot-sentence-score="-0.112859"><font x-bergamot-word-score="-0.157974">di</font> <font x-bergamot-word-score="-0.000199795">recente?</font></font> <font x-bergamot-sentence-score="-0.18783"><font x-bergamot-word-score="-0.224375">L&apos;ho</font> <font x-bergamot-word-score="-0.151286">perso.</font></font>

With Javascript or CSS you could pretty easily have some interface based on those extra attributes. Or we could change the attributes to something like style="--x-bergamot-word-score: -0.4" and you can use CSS to directly have it control a background colour or something. But that would be a bit harder to parse with Javascript.

Example:
image

The bindings still expose the byteranges as they previously did, because otherwise I would also have to expose AnnotatedText etc. which is apparently not something we're doing right now. Because the bindings create a new score vector on the fly, you'll have to delete it when you no longer use it, unfortunately.
Wraps sentences and words in `<font>` tags to annotate them with x-bergamot-sentence-score and  x-bergamot-word-score
@jelmervdl jelmervdl added the experimental Experimental stuff, might make it in might not label Feb 17, 2022
@jelmervdl jelmervdl linked an issue Feb 18, 2022 that may be closed by this pull request
@jelmervdl
Copy link
Member Author

I'm using <font> here as an element (even though span would be the semantically correct one) because I'd expect nobody in their right mind would have added rules to their stylesheet to style those by default. But I'm open for better suggestions. Also regarding the attribute names.

@jelmervdl jelmervdl self-assigned this Feb 18, 2022
@jelmervdl
Copy link
Member Author

jelmervdl commented Feb 23, 2022

I was experimenting a bit with this. If you're planning on making a popup highlight the full word or sentence on hover, i.e. like Google Translate does, it's difficult to find all <font> tags that together form a word or sentence. They have the same score, but there is no unique ID or anything to find them. That's something that can be added though.

It's not an issue if you just want to highlight good or sentences or words, non-interactively. Then just the scores are sufficient information. All the mock-ups I've seen show this type of output.

@abhi-agg
Copy link
Contributor

Extended fix for #355.

This is #357 but with the quality scores also exposed as <font x-bergamot-sentence-score=""> and <font x-bergamot-word-score=""> tags in the HTML output.

@jelmervdl Just confirming as I had missed it before. Does this PR build on top of #357? Meaning, apart from exposing quality scores for "word" byte ranges for html text, this PR additionally provides quality scores embedded into the translated result (using font attribute) as well?

@jelmervdl
Copy link
Member Author

@abhi-agg yes, when processing a translation request in HTML mode, the scores are available both in the HTML itself as well as through byte offsets. (And when not in HTML mode byte offsets are the only way they're available.)

@abhi-agg
Copy link
Contributor

Thanks for the clarification 👍🏾

Copy link
Contributor

@jerinphilip jerinphilip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As was the general agreement in the plenary, I believe the current proposal (based on HTML) is different but solves the problem better reducing work at the extension.

Leaving the below review based on what I understood from the plenary. These are towards not creating an unnecessary failure mode (broken HTML as byte-range).

wasm/bindings/response_bindings.cpp Outdated Show resolved Hide resolved
wasm/test_page/js/worker.js Outdated Show resolved Hide resolved
src/translator/response.h Show resolved Hide resolved
src/translator/response.cpp Show resolved Hide resolved
src/translator/quality_estimator.cpp Show resolved Hide resolved
@abhi-agg
Copy link
Contributor

abhi-agg commented Feb 25, 2022

As was the general agreement in the plenary, I believe the current proposal (based on HTML) is different but solves the problem better reducing work at the extension.

Correction. There was no final agreement from Mozilla side in the Plenary regarding which of the 2 approaches (#357 or #358) will the extension use. We did say that we are evaluating #358 option as this looks promising.

However, @andrenatal confirmed few hours later via email that the extension will use this approach.

In response to #358 (comment): I agree about the maintenance overhead but even though we have a proof of concept available from @jelmervdl, we could have deferred this removal once QE integration is complete in the extension. However, I am not adamant about it.

@jelmervdl This PR is in draft state. Is it up for review?

@jerinphilip jerinphilip marked this pull request as ready for review February 25, 2022 09:41
@jelmervdl
Copy link
Member Author

@abhi-agg I kept it in a draft state waiting for your and @andrenatal's approval to take this approach. It should be ready for review now.

@abhi-agg
Copy link
Contributor

abhi-agg commented Feb 25, 2022

@jelmervdl I did a quick test and overall it looks good to me. I can approve it after you rebase to latest main and resolve the #358 (comment). Thanks for the work 👍🏾

EDIT:
x-bergamot-sentence-score and x-bergamot-word-score strings are too big. Do we need x- in the beginning of each one of them? Plus, how about getting rid of bergamot all together or using some shorter string for it?

@jerinphilip
Copy link
Contributor

x-bergamot-sentence-score and x-bergamot-word-score strings are too big.

Is this a variable naming nit or are there performance implications of this somewhere?

@jelmervdl
Copy link
Member Author

x-bergamot-sentence-score and x-bergamot-word-score strings are too big.

What are they too long for? I intentionally made them this verbose to make sure they'd never overlap with any attribute that exists (or will exist) in a website. I think with just sentence-score and word-score that might occur. (It's a common string on Github.) Adding the x-bergamot makes it pretty unique. The x- prefix comes from https://stackoverflow.com/a/17902387 (the SO post summarises it better than the docs do)

@abhi-agg
Copy link
Contributor

abhi-agg commented Feb 25, 2022

x-bergamot-sentence-score and x-bergamot-word-score strings are too big.

Is this a variable naming nit or are there performance implications of this somewhere?

What are they too long for? I intentionally made them this verbose to make sure they'd never overlap with any attribute that exists (or will exist) in a website. I think with just sentence-score and word-score that might occur. (It's a common string on Github.) Adding the x-bergamot makes it pretty unique. The x- prefix comes from https://stackoverflow.com/a/17902387 (the SO post summarises it better than the docs do)

An obvious implication is the increased memory footprint which is evident in the example used in the description of this PR. Plus, all of this data will be copied from wasm to JS side during runtime. However, please feel free to ignore it as it was just a suggestion after reading #358 (comment).

I can approve the PR as soon as the rest of the comments are resolved 👍🏾

@abhi-agg abhi-agg self-requested a review February 25, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental Experimental stuff, might make it in might not
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QE for HTML input doesn't work
3 participants