Skip to content

fix: standardize_lang_tag macro=True preserves region (restore langcodes semantics)#377

Merged
JarbasAl merged 1 commit into
devfrom
fix/standardize-lang-tag-macro-semantics
May 25, 2026
Merged

fix: standardize_lang_tag macro=True preserves region (restore langcodes semantics)#377
JarbasAl merged 1 commit into
devfrom
fix/standardize-lang-tag-macro-semantics

Conversation

@JarbasAl
Copy link
Copy Markdown
Member

The spec-tools migration of standardize_lang_tag changed the macro argument from its historic meaning to a different operation. Restore the original semantics.

What broke

Historic body (pre-migration, commit 9baa615):

from langcodes import standardize_tag as std
return str(std(lang_code, macro=macro))

langcodes.standardize_tag(..., macro=True) performs macrolanguage substitution — it swaps a sublanguage for its macrolanguage (cmn -> zh, nb -> no). It does not strip the region. en-US round-trips unchanged.

Current body (post-migration, commit 19f6fba):

tag = standardize_lang(lang_code)             # spec-tools — preserves region
return tag.split('-')[0] if macro else tag   # strips region

This conflates macro with primary subtag only. en-US -> en. Every caller passing the default macro=True (ovos_bus_client.session:316, transformer plugins, …) now gets region-stripped output.

Observed downstream symptoms:

  • ovos_bus_client.SessionManager silently rewrites session.lang from en-US to en on every message.
  • ovoscope's test_final_session equality fails because the inbound Session(lang='en-US') round-trips back as lang='en'.
  • Locale resource lookups that key on the region (e.g. closest_lang('en', [...])) sometimes fail to resolve when they should.
  • ovos-core PR #763 had to ship a local get_message_lang workaround for this.
  • ovos-utterance-plugin-cancel PR Fix/package workflow #32 had to drop final_session from its ovoscope cases to work around it.

Fix

Delegate macro back to langcodes.standardize_tag as it did pre-migration. When langcodes is unavailable, fall back to spec-tools' standardize_lang (also region-preserving) and treat macro as a no-op (macrolanguage tables live in langcodes itself).

Tests

Updated test/unittests/test_lang.py to pin the corrected behaviour:

  • macro=True preserves the region (en-US -> en-US).
  • macro=True substitutes macrolanguages (cmn -> zh).
  • macro=False preserves both region and sublanguage.
  • Fallback without langcodes still region-preserves.

11/11 tests in test_lang.py pass; full suite green except an unrelated deltachat plugin import quirk.

Downstream unblocked

This fix lets us drop:

  • The local get_message_lang workaround in ovos-core PR #763.
  • The final_session skip in ovos-utterance-plugin-cancel PR Fix/package workflow #32.

🤖 Generated with Claude Code

…des semantics)

The spec-tools migration of `standardize_lang_tag` changed the
`macro` argument from its historic meaning (langcodes-defined
**macrolanguage substitution** — `cmn` -> `zh`, `nb` -> `no`)
to **"strip the region"** (`en-US` -> `en`) via
`tag.split('-')[0]`. Every caller that passes the default
`macro=True` (`ovos_bus_client.session`, many transformer
plugins, …) now gets region-stripped output — including OVOS's
`SessionManager`, which silently rewrites `session.lang` from
`en-US` to `en` on every message and breaks downstream consumers
that key on region (locale resource lookups, regional dialog,
ovoscope final-session assertions).

Restore the historic semantics by delegating to
`langcodes.standardize_tag(lang_code, macro=macro)` — which is what
the pre-migration body did (commit 9baa615). When langcodes is
unavailable, fall back to spec-tools' `standardize_lang` (also
region-preserving) and treat `macro` as a no-op.

Tests pin all three:
- macro=True preserves the region (`en-US` round-trips)
- macro=True substitutes macrolanguages (`cmn` -> `zh`)
- macro=False keeps both the region and the sublanguage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

Warning

Review limit reached

@JarbasAl, we couldn't start this review because you've used your available PR reviews for now.

Your plan includes 1 review of capacity. Refill in 18 minutes and 14 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eefa6568-67a6-47b4-825b-c94356508ce0

📥 Commits

Reviewing files that changed from the base of the PR and between 6903f95 and 2255e32.

📒 Files selected for processing (2)
  • ovos_utils/lang/__init__.py
  • test/unittests/test_lang.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/standardize-lang-tag-macro-semantics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the fix label May 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 25, 2026

I've completed my sweep! Here's the situation. 🧹

I've aggregated the results of the automated checks for this PR below.

🔍 Lint

The automated checks have finished their work. 🏁

ruff: issues found — see job log

⚖️ License Check

Verifying the legal status of all dependencies. 📜

✅ No license violations found (28 packages).

License distribution: 8× MIT License, 5× MIT, 3× Apache Software License, 2× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× Apache Software License; BSD License, 1× Apache-2.0 OR BSD-2-Clause, +4 more

Full breakdown — 28 packages
Package Version License URL
build 1.5.0 MIT link
certifi 2026.5.20 Mozilla Public License 2.0 (MPL 2.0) link
charset-normalizer 3.4.7 MIT link
click 8.4.1 BSD-3-Clause link
combo_lock 0.3.1 Apache-2.0 link
filelock 3.29.0 MIT link
idna 3.16 BSD-3-Clause link
json-database 0.10.1 MIT link
kthread 0.2.3 MIT License link
markdown-it-py 4.2.0 MIT License link
mdurl 0.1.2 MIT License link
memory-tempfile 2.2.3 MIT License link
ovos-spec-tools 0.5.1a1 Apache Software License link
ovos-utils 0.11.0a1 Apache-2.0 link
packaging 26.2 Apache-2.0 OR BSD-2-Clause link
pexpect 4.9.0 ISC License (ISCL) link
ptyprocess 0.7.0 ISC License (ISCL) link
pyee 13.0.1 MIT License link
Pygments 2.20.0 BSD-2-Clause link
pyproject_hooks 1.2.0 MIT License link
python-dateutil 2.9.0.post0 Apache Software License; BSD License link
requests 2.34.2 Apache Software License link
rich 13.9.4 MIT License link
rich-click 1.9.7 MIT License

Copyright (c) 2022 Phil Ewels

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
| link |
| six | 1.17.0 | MIT License | link |
| typing_extensions | 4.15.0 | PSF-2.0 | link |
| urllib3 | 2.7.0 | MIT | link |
| watchdog | 6.0.0 | Apache Software License | link |

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

📋 Repo Health

I've checked the repo's posture (aka architectural alignment). 🧘

✅ All required files present.

Latest Version: 0.11.0a1

ovos_utils/version.py — Version file
README.md — README
LICENSE — License file
pyproject.toml — pyproject.toml
⚠️ setup.py — setup.py
CHANGELOG.md — Changelog
ovos_utils/version.py has valid version block markers

🏷️ Release Preview

I've checked the 'Legal' section for the release. ⚖️

Current: 0.11.0a1Next: 0.11.1a1

Signal Value
Label (none)
PR title fix: standardize_lang_tag macro=True preserves region (restore langcodes semantics)
Bump build

✅ PR title follows conventional commit format.


🚀 Release Channel Compatibility

Predicted next version: 0.11.1a1

Channel Status Note Current Constraint
Stable Too new (must be <0.9.0) ovos-utils>=0.8.1,<0.9.0
Testing Too new (must be <0.8.5) ovos-utils>=0.8.4,<0.8.5
Alpha Compatible ovos-utils>=0.11.0a1

🔒 Security (pip-audit)

Scanning for any insecure random number generators. 🎲

✅ No known vulnerabilities found (46 packages scanned).

📊 Coverage

Measuring the reach of our test cases. 📏

85.2% total coverage

Files below 80% coverage (5 files)
File Coverage Missing lines
ovos_utils/log_parser.py 48.4% 225
ovos_utils/__init__.py 63.6% 16
ovos_utils/file_utils.py 73.3% 55
ovos_utils/thread_utils.py 76.9% 12
ovos_utils/geolocation.py 78.4% 22

Full report: download the coverage-report artifact.

🔨 Build Tests

Checking if the gears are still turning smoothly... ⚙️

✅ All versions pass

Python Build Install Tests
3.10
3.11
3.12
3.13
3.14

May your merges be conflict-free! 🕊️

@JarbasAl JarbasAl merged commit 5291187 into dev May 25, 2026
14 checks passed
@JarbasAl JarbasAl deleted the fix/standardize-lang-tag-macro-semantics branch May 25, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant