Skip to content

fix: update locale lookups and tests after BCP-47 folder rename#395

Merged
JarbasAl merged 12 commits into
devfrom
fix/locale-rename-followup
Apr 8, 2026
Merged

fix: update locale lookups and tests after BCP-47 folder rename#395
JarbasAl merged 12 commits into
devfrom
fix/locale-rename-followup

Conversation

@JarbasAl
Copy link
Copy Markdown
Member

@JarbasAl JarbasAl commented Apr 8, 2026

Summary

Follow-up to the BCP-47 locale folder rename (en-us → en-US, it → it-IT, es → es-ES, uk-ua → uk-UA). Fixes the broken lookups and cleans up the wider locale-handling mess exposed in the process.

Bugs fixed

  • _get_word() / _get_dialog() — were stripping the region tag and building paths manually, so locale/it/ lookups silently failed after the rename
  • test_base.py — expected file paths still used old lowercase folder names (en-us, uk-ua)
  • remove_noise() in CommonQuerySkill — cache written with full BCP-47 tag (en-US) but lookup stripped to base subtag (en), guaranteed cache miss every time
  • game_skill.py — imported the now-removed _get_dialog free function

Refactors

  • All locale lookups now go through resource_files.py_get_word() and _get_dialog() delegate to CoreResources, which uses SkillResources / locate_lang_directories() / tag_distance() for robust BCP-47 matching
  • Inside the skill class, self.resources is the single canonical path — removed all CoreResources(lang) calls from within skill methods (voc_list, _on_event_error, CommonQuerySkill.__init__); self.resources already falls back to workshop_directory so workshop-bundled files are found automatically
  • game_skill.py — replaced _get_dialog() calls with self.speak_dialog()

Tests added

test/unittests/test_locale_lookup.py — 32 tests covering:

  • _get_word(): canonical BCP-47, short codes (it, es, en), lowercase tags (en-us), missing lang fallback, schema check across all 17 locale folders
  • _get_dialog(): canonical/lowercase/short-code resolution, context rendering, missing file/lang fallbacks
  • join_word_list(): English, Italian (euphony e→ed, o→od), Spanish (euphony y→e, o→u), German, French; edge cases; short-code equivalence

🤖 Generated with Claude Code

The recent rename of locale folders from short codes (en-us, it, es) to
canonical BCP-47 tags (en-US, it-IT, es-ES) broke two things:

1. _get_word() stripped the region tag and looked for locale/it/ which
   no longer exists. Fix: try the full tag first, then the short code,
   then scan for any folder matching the language prefix.

2. test_base.py built expected file paths using the old lowercase folder
   names (en-us, uk-ua). Fix: update to en-US / uk-UA to match the
   renamed directories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the fix label Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Replaced manual filesystem lookups and path-based locale fallbacks with CoreResources loaders across dialog, vocabulary, word connectors, and noise-word files; updated internal language keying to use full tags and standardized locale casing in tests; added comprehensive locale lookup tests.

Changes

Cohort / File(s) Summary
Core resource loading changes
ovos_workshop/skills/ovos.py
Rewrote dialog, vocabulary and connector loading to use CoreResources(...).load_*_file(...) instead of manual path construction, mustache rendering, or direct file reads; _get_dialog, _get_word, voc_list, and _on_event_error now delegate to resource loaders and return fallbacks when missing.
Noise words locale handling
ovos_workshop/skills/common_query_skill.py
Replaced manual noise_words.list path resolution with CoreResources(lang).load_list_file("noise_words"); changed internal indexing to use full self.lang tags (no short-tag truncation).
Locale tests and casing
test/unittests/skills/test_base.py, test/unittests/test_locale_lookup.py
Standardized locale casing in existing tests (en-US, uk-UA); added test_locale_lookup.py covering _get_word, _get_dialog, join_word_list behavior across locales and a filesystem consistency check for word_connectors.json in locale folders.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble through locales, hopping light and spry,
Replacing paths with loaders, oh my my my!
Full tags held tight, fallbacks standing by,
Tests now sing in proper case beneath the sky.
Hooray — resources tidy, I give a joyful cry! 🥕🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: updating locale lookups and tests following a BCP-47 folder rename, which is the core focus of all modifications across multiple files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/locale-rename-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

Greetings! I've analyzed your changes and have some results to share. 🖖

I've aggregated the results of the automated checks for this PR below.

🔍 Lint

Beep boop! Standard processing sub-routine complete. 🦾

ruff: issues found — see job log

🔒 Security (pip-audit)

The security scan is now complete. 🏁

✅ No known vulnerabilities found (66 packages scanned).

📋 Repo Health

Keeping the project in tip-top shape! 🏃

✅ All required files present.

Latest Version: 8.0.3a1

ovos_workshop/version.py — Version file
README.md — README
LICENSE — License file
pyproject.toml — pyproject.toml
CHANGELOG.md — Changelog
⚠️ requirements.txt — Requirements
ovos_workshop/version.py has valid version block markers

⚖️ License Check

Verifying the source of all binary files. 💾

✅ No license violations found (47 packages).

License distribution: 12× MIT License, 8× MIT, 6× Apache Software License, 6× Apache-2.0, 2× BSD-3-Clause, 2× ISC License (ISCL), 2× PSF-2.0, 2× Python Software Foundation License, +7 more

Full breakdown — 47 packages
Package Version License URL
audioop-lts 0.2.2 PSF-2.0 link
build 1.4.2 MIT link
certifi 2026.2.25 Mozilla Public License 2.0 (MPL 2.0) link
charset-normalizer 3.4.7 MIT link
click 8.3.2 BSD-3-Clause link
combo_lock 0.3.1 Apache-2.0 link
filelock 3.25.2 MIT link
idna 3.11 BSD-3-Clause link
importlib_metadata 9.0.0 Apache-2.0 link
json-database 0.10.1 MIT link
kthread 0.2.3 MIT License link
langcodes 3.5.1 MIT License link
markdown-it-py 4.0.0 MIT License link
mdurl 0.1.2 MIT License link
memory-tempfile 2.2.3 MIT License link
ovos-config 2.1.1 Apache-2.0 link
ovos-number-parser 0.5.1 Apache Software License link
ovos-plugin-manager 2.2.0 Apache-2.0 link
ovos-solver-yes-no-plugin 0.2.8 MIT link
ovos-utils 0.8.5 Apache-2.0 link
ovos-workshop 8.0.3a1 Apache-2.0 link
ovos_bus_client 1.5.0 Apache Software License link
packaging 26.0 Apache-2.0 OR BSD-2-Clause link
padacioso 1.0.0 apache-2.0 link
pexpect 4.9.0 ISC License (ISCL) link
ptyprocess 0.7.0 ISC License (ISCL) link
pyee 12.1.1 MIT License link
Pygments 2.20.0 BSD-2-Clause link
pyproject_hooks 1.2.0 MIT License link
python-dateutil 2.9.0.post0 Apache Software License; BSD License link
PyYAML 6.0.3 MIT License link
quebra-frases 0.3.7 Apache Software License link
RapidFuzz 3.14.5 MIT link
regex 2026.4.4 Apache-2.0 AND CNRI-Python link
requests 2.33.1 Apache Software License link
rich 13.9.4 MIT License link
rich-click 1.9.7 MIT License

Copyright (c) 2022 Phil Ewels

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
| link |
| simplematch | 1.4 | MIT License | link |
| six | 1.17.0 | MIT License | link |
| standard-aifc | 3.13.0 | Python Software Foundation License | link |
| standard-chunk | 3.13.0 | Python Software Foundation License | link |
| typing_extensions | 4.15.0 | PSF-2.0 | link |
| unicode-rbnf | 2.4.0 | MIT License | |
| urllib3 | 2.6.3 | MIT | link |
| watchdog | 6.0.0 | Apache Software License | link |
| websocket-client | 1.9.0 | Apache Software License | link |
| zipp | 3.23.0 | MIT | link |

Policy: Apache 2.0 (universal donor). StrongCopyleft / NetworkCopyleft / WeakCopyleft / Other / Error categories fail. MPL allowed.

🔨 Build Tests

Running the final assembly check. 🔧

✅ All versions pass

Python Build Install Tests
3.10
3.11
3.12
3.13
3.14

An automated hug for your code 🤗

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ovos_workshop/skills/ovos.py`:
- Around line 2463-2468: The loop that reads locale files in _get_word currently
calls os.listdir(locale_dir) without checking that locale_dir exists, which can
raise and prevent the fallback; update _get_word to first guard with
os.path.isdir(locale_dir) (or os.path.exists) and return the default ", "
immediately if missing, then inside the loop ensure res_file exists, wrap
json.load and the connector access in a try/except (catching
JSONDecodeError/KeyError/IOError) and on any failure continue to the next folder
or return the default ", " so missing locale directory or malformed files do not
crash the function.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 84939266-be5b-4a85-9e6a-1a247e5ceb48

📥 Commits

Reviewing files that changed from the base of the PR and between ec9ff79 and 21e0057.

📒 Files selected for processing (2)
  • ovos_workshop/skills/ovos.py
  • test/unittests/skills/test_base.py

Comment thread ovos_workshop/skills/ovos.py Outdated
JarbasAl and others added 5 commits April 8, 2026 14:57
Replace ad-hoc lang string splitting and manual locale path construction
with ovos_utils.lang.get_language_dir(), which uses langcodes.tag_distance()
to find the best matching folder regardless of casing or regional variants.

- _get_dialog(): no longer strips region tag before building path
- _get_word(): replaces multi-step fallback scan with single get_language_dir() call
- CommonQuerySkill.__init__(): uses get_language_dir() instead of lang.split("-")[0]
- CommonQuerySkill translated_noise_words property/setter: use full lang tag as dict key

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Validates the refactored get_language_dir()-based lookups:

- TestGetWord: canonical BCP-47, short code ('it', 'es', 'en'),
  lowercase tags ('en-us'), missing lang fallback, and a structural
  check that every locale folder's word_connectors.json has 'and'/'or'
- TestGetDialog: resolves bundled .dialog files by canonical, lowercase,
  and short-code lang tags; verifies context rendering; verifies fallback
  for missing file and unknown lang
- TestJoinWordList: end-to-end for English, Italian (euphony e→ed, o→od),
  Spanish (euphony y→e, o→u), German, French; also single-item and
  empty-list edge cases; short-code equivalence checks

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_dialog() and _get_word() were still building paths manually even
after the get_language_dir() step. CoreResources / SkillResources in
resource_files.py already handle the full lookup chain (locale/,
dialog/, vocab/ subdirectories, user overrides, tag distance matching)
— there is no reason to duplicate that logic.

- _get_dialog(): delegate to CoreResources(lang).load_dialog_file()
- _get_word(): delegate to CoreResources(lang).load_json_file()
- CommonQuerySkill.__init__: delegate to CoreResources(lang).load_list_file()
- Remove manual path construction, os.listdir scans, resolve_resource_file
  calls and the get_language_dir import that were introduced in earlier
  steps — resource_files.py is the canonical place for this

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…alog

ResourceFile._locate() already falls back to workshop_directory (the
matching lang dir in ovos_workshop/locale/) so self.resources will find
both skill-local AND workshop-bundled files. There is no reason for code
inside the skill class to call the module-level _get_dialog() helper —
that function is only needed outside a skill instance context (e.g. in
join_word_list which has no self).

_get_dialog / _get_word / CoreResources are still the right tool for
module-level utilities. Inside the skill, self.resources is canonical.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
self.resources already searches workshop_directory (ovos_workshop/locale/{lang}/)
as a fallback in ResourceFile._locate(), so the explicit CoreResources(lang)
fallback was redundant. CoreResources now only appears in module-level
helpers (_get_dialog, _get_word) that have no skill instance.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added fix and removed fix labels Apr 8, 2026
JarbasAl and others added 4 commits April 8, 2026 15:18
self.resources already searches workshop_directory as a fallback, so
CoreResources was redundant here too. CoreResources is now only used in
module-level free functions (_get_dialog, _get_word) that have no self.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_get_dialog no longer exists as a public symbol; game_skill.py was
importing it. speak_dialog(key) uses self.resources (which already
falls back to workshop_directory) and is the correct API for skills.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ovos_workshop/skills/ovos.py (1)

2028-2032: ⚠️ Potential issue | 🟠 Major

Load vocabulary for the requested locale, not just self.lang.

Line 2028 builds the cache key from the explicit lang argument, but Line 2032 now always reads self.resources, which is bound to the current self.lang. During startup, register_intent_file() iterates self.native_langs, so secondary-language blacklists can be populated from the core-language vocab and then cached under the wrong locale.

Suggested fix
-            vocab = self.resources.load_vocabulary_file(voc_filename)
+            vocab = self.load_lang(self.res_dir, lang).load_vocabulary_file(
+                voc_filename
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ovos_workshop/skills/ovos.py` around lines 2028 - 2032, The cache key uses
the explicit lang argument (lang + voc_filename) but the code always calls
self.resources.load_vocabulary_file which is bound to self.lang; change the call
to load the vocabulary for the requested locale instead of self.lang — e.g.,
obtain the Resource/loader for the standardized lang variable and call its
load_vocabulary_file(voc_filename) so entries are cached under the correct key
in self._voc_cache; ensure this change is applied in the same block that builds
cache_key and affects register_intent_file / iterations over self.native_langs
so blacklists are populated for the correct locale.
♻️ Duplicate comments (1)
ovos_workshop/skills/ovos.py (1)

2447-2451: ⚠️ Potential issue | 🔴 Critical

Guard missing connector resources before testing membership.

This fallback path still assumes load_json_file("word_connectors") returned a mapping. If the locale has no word_connectors.json, connector in data raises instead of returning ", ", which defeats the missing-locale behavior this PR is trying to restore.

Suggested fix
-    data = CoreResources(lang).load_json_file("word_connectors")
+    data = CoreResources(lang).load_json_file("word_connectors") or {}
     if connector in data:
         return data[connector]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ovos_workshop/skills/ovos.py` around lines 2447 - 2451, The code assumes
CoreResources(lang).load_json_file("word_connectors") returns a mapping and uses
"if connector in data" directly; guard that result before membership testing by
checking that the returned value (variable data from
CoreResources.load_json_file) is a mapping/dict (or truthy mapping) and if not,
log a warning and return the default ", " immediately; update the block around
the call to CoreResources(lang).load_json_file and the membership test of
connector so you only do "connector in data" when data is a dict/mapping (or
supports membership).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ovos_workshop/skills/common_query_skill.py`:
- Around line 75-79: The cache is being written using the full BCP-47 locale tag
(lang) but remove_noise() looks up using only the base subtag, so entries in
_translated_noise_words never match; update remove_noise() to use the same
standardized full BCP-47 tag when reading the cache (instead of stripping to the
base subtag) so lookups align with where _translated_noise_words and the other
cache writes (e.g., the block that calls
CoreResources(lang).load_list_file("noise_words") and the similar block at lines
88-94) store entries; ensure you reference and use the same lang normalization
logic everywhere (use the full tag, not the base) and adjust any locale
normalization helper used by remove_noise() accordingly.

---

Outside diff comments:
In `@ovos_workshop/skills/ovos.py`:
- Around line 2028-2032: The cache key uses the explicit lang argument (lang +
voc_filename) but the code always calls self.resources.load_vocabulary_file
which is bound to self.lang; change the call to load the vocabulary for the
requested locale instead of self.lang — e.g., obtain the Resource/loader for the
standardized lang variable and call its load_vocabulary_file(voc_filename) so
entries are cached under the correct key in self._voc_cache; ensure this change
is applied in the same block that builds cache_key and affects
register_intent_file / iterations over self.native_langs so blacklists are
populated for the correct locale.

---

Duplicate comments:
In `@ovos_workshop/skills/ovos.py`:
- Around line 2447-2451: The code assumes
CoreResources(lang).load_json_file("word_connectors") returns a mapping and uses
"if connector in data" directly; guard that result before membership testing by
checking that the returned value (variable data from
CoreResources.load_json_file) is a mapping/dict (or truthy mapping) and if not,
log a warning and return the default ", " immediately; update the block around
the call to CoreResources(lang).load_json_file and the membership test of
connector so you only do "connector in data" when data is a dict/mapping (or
supports membership).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b78dd1e1-ed04-4fb7-b9b1-83b0daf35134

📥 Commits

Reviewing files that changed from the base of the PR and between 21e0057 and 78b796c.

📒 Files selected for processing (3)
  • ovos_workshop/skills/common_query_skill.py
  • ovos_workshop/skills/ovos.py
  • test/unittests/test_locale_lookup.py

Comment thread ovos_workshop/skills/common_query_skill.py
JarbasAl and others added 2 commits April 8, 2026 15:30
Cache is written with the full BCP-47 tag (e.g. "en-US") but
remove_noise() stripped to the base subtag ("en") before lookup,
so the dict lookup always missed. Use the full tag consistently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added fix and removed fix labels Apr 8, 2026
@JarbasAl JarbasAl merged commit acbd438 into dev Apr 8, 2026
18 checks passed
@JarbasAl JarbasAl deleted the fix/locale-rename-followup branch April 8, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant