fix: update locale lookups and tests after BCP-47 folder rename#395
Conversation
The recent rename of locale folders from short codes (en-us, it, es) to canonical BCP-47 tags (en-US, it-IT, es-ES) broke two things: 1. _get_word() stripped the region tag and looked for locale/it/ which no longer exists. Fix: try the full tag first, then the short code, then scan for any folder matching the language prefix. 2. test_base.py built expected file paths using the old lowercase folder names (en-us, uk-ua). Fix: update to en-US / uk-UA to match the renamed directories. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughReplaced manual filesystem lookups and path-based locale fallbacks with CoreResources loaders across dialog, vocabulary, word connectors, and noise-word files; updated internal language keying to use full tags and standardized locale casing in tests; added comprehensive locale lookup tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greetings! I've analyzed your changes and have some results to share. 🖖I've aggregated the results of the automated checks for this PR below. 🔍 LintBeep boop! Standard processing sub-routine complete. 🦾 ❌ ruff: issues found — see job log 🔒 Security (pip-audit)The security scan is now complete. 🏁 ✅ No known vulnerabilities found (66 packages scanned). 📋 Repo HealthKeeping the project in tip-top shape! 🏃 ✅ All required files present. Latest Version: ✅ ⚖️ License CheckVerifying the source of all binary files. 💾 ✅ No license violations found (47 packages). License distribution: 12× MIT License, 8× MIT, 6× Apache Software License, 6× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 2× PSF-2.0, 2× Python Software Foundation License, +7 more Full breakdown — 47 packages
Copyright (c) 2022 Phil Ewels Permission is hereby granted, free of charge, to any person obtaining a copy The above copyright notice and this permission notice shall be included in all THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed. 🔨 Build TestsRunning the final assembly check. 🔧 ✅ All versions pass
An automated hug for your code 🤗 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ovos_workshop/skills/ovos.py`:
- Around line 2463-2468: The loop that reads locale files in _get_word currently
calls os.listdir(locale_dir) without checking that locale_dir exists, which can
raise and prevent the fallback; update _get_word to first guard with
os.path.isdir(locale_dir) (or os.path.exists) and return the default ", "
immediately if missing, then inside the loop ensure res_file exists, wrap
json.load and the connector access in a try/except (catching
JSONDecodeError/KeyError/IOError) and on any failure continue to the next folder
or return the default ", " so missing locale directory or malformed files do not
crash the function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 84939266-be5b-4a85-9e6a-1a247e5ceb48
📒 Files selected for processing (2)
ovos_workshop/skills/ovos.pytest/unittests/skills/test_base.py
Replace ad-hoc lang string splitting and manual locale path construction
with ovos_utils.lang.get_language_dir(), which uses langcodes.tag_distance()
to find the best matching folder regardless of casing or regional variants.
- _get_dialog(): no longer strips region tag before building path
- _get_word(): replaces multi-step fallback scan with single get_language_dir() call
- CommonQuerySkill.__init__(): uses get_language_dir() instead of lang.split("-")[0]
- CommonQuerySkill translated_noise_words property/setter: use full lang tag as dict key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Validates the refactored get_language_dir()-based lookups:
- TestGetWord: canonical BCP-47, short code ('it', 'es', 'en'),
lowercase tags ('en-us'), missing lang fallback, and a structural
check that every locale folder's word_connectors.json has 'and'/'or'
- TestGetDialog: resolves bundled .dialog files by canonical, lowercase,
and short-code lang tags; verifies context rendering; verifies fallback
for missing file and unknown lang
- TestJoinWordList: end-to-end for English, Italian (euphony e→ed, o→od),
Spanish (euphony y→e, o→u), German, French; also single-item and
empty-list edge cases; short-code equivalence checks
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_dialog() and _get_word() were still building paths manually even after the get_language_dir() step. CoreResources / SkillResources in resource_files.py already handle the full lookup chain (locale/, dialog/, vocab/ subdirectories, user overrides, tag distance matching) — there is no reason to duplicate that logic. - _get_dialog(): delegate to CoreResources(lang).load_dialog_file() - _get_word(): delegate to CoreResources(lang).load_json_file() - CommonQuerySkill.__init__: delegate to CoreResources(lang).load_list_file() - Remove manual path construction, os.listdir scans, resolve_resource_file calls and the get_language_dir import that were introduced in earlier steps — resource_files.py is the canonical place for this Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…alog ResourceFile._locate() already falls back to workshop_directory (the matching lang dir in ovos_workshop/locale/) so self.resources will find both skill-local AND workshop-bundled files. There is no reason for code inside the skill class to call the module-level _get_dialog() helper — that function is only needed outside a skill instance context (e.g. in join_word_list which has no self). _get_dialog / _get_word / CoreResources are still the right tool for module-level utilities. Inside the skill, self.resources is canonical. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
self.resources already searches workshop_directory (ovos_workshop/locale/{lang}/)
as a fallback in ResourceFile._locate(), so the explicit CoreResources(lang)
fallback was redundant. CoreResources now only appears in module-level
helpers (_get_dialog, _get_word) that have no skill instance.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
self.resources already searches workshop_directory as a fallback, so CoreResources was redundant here too. CoreResources is now only used in module-level free functions (_get_dialog, _get_word) that have no self. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_dialog no longer exists as a public symbol; game_skill.py was importing it. speak_dialog(key) uses self.resources (which already falls back to workshop_directory) and is the correct API for skills. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ovos_workshop/skills/ovos.py (1)
2028-2032:⚠️ Potential issue | 🟠 MajorLoad vocabulary for the requested locale, not just
self.lang.Line 2028 builds the cache key from the explicit
langargument, but Line 2032 now always readsself.resources, which is bound to the currentself.lang. During startup,register_intent_file()iteratesself.native_langs, so secondary-language blacklists can be populated from the core-language vocab and then cached under the wrong locale.Suggested fix
- vocab = self.resources.load_vocabulary_file(voc_filename) + vocab = self.load_lang(self.res_dir, lang).load_vocabulary_file( + voc_filename + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_workshop/skills/ovos.py` around lines 2028 - 2032, The cache key uses the explicit lang argument (lang + voc_filename) but the code always calls self.resources.load_vocabulary_file which is bound to self.lang; change the call to load the vocabulary for the requested locale instead of self.lang — e.g., obtain the Resource/loader for the standardized lang variable and call its load_vocabulary_file(voc_filename) so entries are cached under the correct key in self._voc_cache; ensure this change is applied in the same block that builds cache_key and affects register_intent_file / iterations over self.native_langs so blacklists are populated for the correct locale.
♻️ Duplicate comments (1)
ovos_workshop/skills/ovos.py (1)
2447-2451:⚠️ Potential issue | 🔴 CriticalGuard missing connector resources before testing membership.
This fallback path still assumes
load_json_file("word_connectors")returned a mapping. If the locale has noword_connectors.json,connector in dataraises instead of returning", ", which defeats the missing-locale behavior this PR is trying to restore.Suggested fix
- data = CoreResources(lang).load_json_file("word_connectors") + data = CoreResources(lang).load_json_file("word_connectors") or {} if connector in data: return data[connector]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ovos_workshop/skills/ovos.py` around lines 2447 - 2451, The code assumes CoreResources(lang).load_json_file("word_connectors") returns a mapping and uses "if connector in data" directly; guard that result before membership testing by checking that the returned value (variable data from CoreResources.load_json_file) is a mapping/dict (or truthy mapping) and if not, log a warning and return the default ", " immediately; update the block around the call to CoreResources(lang).load_json_file and the membership test of connector so you only do "connector in data" when data is a dict/mapping (or supports membership).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ovos_workshop/skills/common_query_skill.py`:
- Around line 75-79: The cache is being written using the full BCP-47 locale tag
(lang) but remove_noise() looks up using only the base subtag, so entries in
_translated_noise_words never match; update remove_noise() to use the same
standardized full BCP-47 tag when reading the cache (instead of stripping to the
base subtag) so lookups align with where _translated_noise_words and the other
cache writes (e.g., the block that calls
CoreResources(lang).load_list_file("noise_words") and the similar block at lines
88-94) store entries; ensure you reference and use the same lang normalization
logic everywhere (use the full tag, not the base) and adjust any locale
normalization helper used by remove_noise() accordingly.
---
Outside diff comments:
In `@ovos_workshop/skills/ovos.py`:
- Around line 2028-2032: The cache key uses the explicit lang argument (lang +
voc_filename) but the code always calls self.resources.load_vocabulary_file
which is bound to self.lang; change the call to load the vocabulary for the
requested locale instead of self.lang — e.g., obtain the Resource/loader for the
standardized lang variable and call its load_vocabulary_file(voc_filename) so
entries are cached under the correct key in self._voc_cache; ensure this change
is applied in the same block that builds cache_key and affects
register_intent_file / iterations over self.native_langs so blacklists are
populated for the correct locale.
---
Duplicate comments:
In `@ovos_workshop/skills/ovos.py`:
- Around line 2447-2451: The code assumes
CoreResources(lang).load_json_file("word_connectors") returns a mapping and uses
"if connector in data" directly; guard that result before membership testing by
checking that the returned value (variable data from
CoreResources.load_json_file) is a mapping/dict (or truthy mapping) and if not,
log a warning and return the default ", " immediately; update the block around
the call to CoreResources(lang).load_json_file and the membership test of
connector so you only do "connector in data" when data is a dict/mapping (or
supports membership).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b78dd1e1-ed04-4fb7-b9b1-83b0daf35134
📒 Files selected for processing (3)
ovos_workshop/skills/common_query_skill.pyovos_workshop/skills/ovos.pytest/unittests/test_locale_lookup.py
Cache is written with the full BCP-47 tag (e.g. "en-US") but
remove_noise() stripped to the base subtag ("en") before lookup,
so the dict lookup always missed. Use the full tag consistently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Follow-up to the BCP-47 locale folder rename (en-us → en-US, it → it-IT, es → es-ES, uk-ua → uk-UA). Fixes the broken lookups and cleans up the wider locale-handling mess exposed in the process.
Bugs fixed
_get_word()/_get_dialog()— were stripping the region tag and building paths manually, solocale/it/lookups silently failed after the renametest_base.py— expected file paths still used old lowercase folder names (en-us,uk-ua)remove_noise()inCommonQuerySkill— cache written with full BCP-47 tag (en-US) but lookup stripped to base subtag (en), guaranteed cache miss every timegame_skill.py— imported the now-removed_get_dialogfree functionRefactors
resource_files.py—_get_word()and_get_dialog()delegate toCoreResources, which usesSkillResources/locate_lang_directories()/tag_distance()for robust BCP-47 matchingself.resourcesis the single canonical path — removed allCoreResources(lang)calls from within skill methods (voc_list,_on_event_error,CommonQuerySkill.__init__);self.resourcesalready falls back toworkshop_directoryso workshop-bundled files are found automaticallygame_skill.py— replaced_get_dialog()calls withself.speak_dialog()Tests added
test/unittests/test_locale_lookup.py— 32 tests covering:_get_word(): canonical BCP-47, short codes (it,es,en), lowercase tags (en-us), missing lang fallback, schema check across all 17 locale folders_get_dialog(): canonical/lowercase/short-code resolution, context rendering, missing file/lang fallbacksjoin_word_list(): English, Italian (euphonye→ed,o→od), Spanish (euphonyy→e,o→u), German, French; edge cases; short-code equivalence🤖 Generated with Claude Code