Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the glossary regex more deterministic #1801

Merged
merged 2 commits into from Feb 28, 2024
Merged

Conversation

akirk
Copy link
Member

@akirk akirk commented Feb 28, 2024

Problem

@amieiro noticed that with the Spanish glossary, a term didn't match that existed in the Glossary. On the other hand, Portuguese, which had the same rivalling terms got it right:
Spanish translations with the wrong glossary match for Add-On

Portuguese translations with the right glossary match for Add-On

The problem arises from the fact that we now group the regex by suffixes. For each English term the suffix is determined and the term sorted into that bin and the regex is built from those bins.

Add-on and Add fall in different bins (the first just with an "s" suffix, the second with possible suffixes "s", "ed", or "ing") but in Spanish the second bin is created first because in the sequence of glossary terms, the word "troubleshooting" that starts the bin is processed before "customization" which crates the "s" bin. In Portuguese its reversed, more or less by chance, because "customization" is processed before "downgrading":

addon-spanish
addon-portuguese

Solution

The proposed solution of making the regex more deterministic so that there are not differences between languages. In this case it fixes the problem but there could be other occurrences where a krsort would make it work. We need to

Testing Instructions

Create a language with a glossary that contains the words "troubleshooting", "customization", "add", and "add-on" and an original that contains the word "Add-on". Before this PR it will only match the "add".

@amieiro amieiro merged commit 7d75933 into develop Feb 28, 2024
11 checks passed
@amieiro amieiro deleted the glossary-deterministic branch February 28, 2024 12:05
@pedro-mendonca
Copy link
Member

It works here, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants