fix: standardize_lang_tag macro=True preserves region (restore langcodes semantics)#377
Conversation
…des semantics)
The spec-tools migration of `standardize_lang_tag` changed the
`macro` argument from its historic meaning (langcodes-defined
**macrolanguage substitution** — `cmn` -> `zh`, `nb` -> `no`)
to **"strip the region"** (`en-US` -> `en`) via
`tag.split('-')[0]`. Every caller that passes the default
`macro=True` (`ovos_bus_client.session`, many transformer
plugins, …) now gets region-stripped output — including OVOS's
`SessionManager`, which silently rewrites `session.lang` from
`en-US` to `en` on every message and breaks downstream consumers
that key on region (locale resource lookups, regional dialog,
ovoscope final-session assertions).
Restore the historic semantics by delegating to
`langcodes.standardize_tag(lang_code, macro=macro)` — which is what
the pre-migration body did (commit 9baa615). When langcodes is
unavailable, fall back to spec-tools' `standardize_lang` (also
region-preserving) and treat `macro` as a no-op.
Tests pin all three:
- macro=True preserves the region (`en-US` round-trips)
- macro=True substitutes macrolanguages (`cmn` -> `zh`)
- macro=False keeps both the region and the sublanguage
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Your plan includes 1 review of capacity. Refill in 18 minutes and 14 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
I've completed my sweep! Here's the situation. 🧹I've aggregated the results of the automated checks for this PR below. 🔍 LintThe automated checks have finished their work. 🏁 ❌ ruff: issues found — see job log ⚖️ License CheckVerifying the legal status of all dependencies. 📜 ✅ No license violations found (28 packages). License distribution: 8× MIT License, 5× MIT, 3× Apache Software License, 2× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 1× Apache Software License; BSD License, 1× Apache-2.0 OR BSD-2-Clause, +4 more Full breakdown — 28 packages
Copyright (c) 2022 Phil Ewels Permission is hereby granted, free of charge, to any person obtaining a copy The above copyright notice and this permission notice shall be included in all THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed. 📋 Repo HealthI've checked the repo's posture (aka architectural alignment). 🧘 ✅ All required files present. Latest Version: ✅ 🏷️ Release PreviewI've checked the 'Legal' section for the release. ⚖️ Current:
✅ PR title follows conventional commit format. 🚀 Release Channel Compatibility Predicted next version:
🔒 Security (pip-audit)Scanning for any insecure random number generators. 🎲 ✅ No known vulnerabilities found (46 packages scanned). 📊 CoverageMeasuring the reach of our test cases. 📏 ✅ 85.2% total coverage Files below 80% coverage (5 files)
Full report: download the 🔨 Build TestsChecking if the gears are still turning smoothly... ⚙️ ✅ All versions pass
May your merges be conflict-free! 🕊️ |
The spec-tools migration of
standardize_lang_tagchanged themacroargument from its historic meaning to a different operation. Restore the original semantics.What broke
Historic body (pre-migration, commit
9baa615):langcodes.standardize_tag(..., macro=True)performs macrolanguage substitution — it swaps a sublanguage for its macrolanguage (cmn->zh,nb->no). It does not strip the region.en-USround-trips unchanged.Current body (post-migration, commit
19f6fba):This conflates macro with primary subtag only.
en-US->en. Every caller passing the defaultmacro=True(ovos_bus_client.session:316, transformer plugins, …) now gets region-stripped output.Observed downstream symptoms:
ovos_bus_client.SessionManagersilently rewritessession.langfromen-UStoenon every message.test_final_sessionequality fails because the inboundSession(lang='en-US')round-trips back aslang='en'.closest_lang('en', [...])) sometimes fail to resolve when they should.get_message_langworkaround for this.final_sessionfrom its ovoscope cases to work around it.Fix
Delegate
macroback tolangcodes.standardize_tagas it did pre-migration. When langcodes is unavailable, fall back to spec-tools'standardize_lang(also region-preserving) and treatmacroas a no-op (macrolanguage tables live in langcodes itself).Tests
Updated
test/unittests/test_lang.pyto pin the corrected behaviour:macro=Truepreserves the region (en-US->en-US).macro=Truesubstitutes macrolanguages (cmn->zh).macro=Falsepreserves both region and sublanguage.11/11 tests in
test_lang.pypass; full suite green except an unrelateddeltachatplugin import quirk.Downstream unblocked
This fix lets us drop:
get_message_langworkaround in ovos-core PR #763.final_sessionskip in ovos-utterance-plugin-cancel PR Fix/package workflow #32.🤖 Generated with Claude Code