1. #3 : Develop a script to generate Weblate translation projects/com…#4
1. #3 : Develop a script to generate Weblate translation projects/com…#4AuraMindNest merged 14 commits intodevfrom
Conversation
…ponents from the documentations of Boost library repositories. 2. #2 : Develop a Weblate Django API service triggered by the boost-documentation-translations CI workflow.
|
@coderabbitai Review this PR |
|
✅ Actions performedReview triggered.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (4)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a new Django app weblate.boost_endpoint with REST endpoints and an in-memory service to discover/create/update Weblate projects/components from Boost repositories; introduces multiple automation scripts and Boost metadata files, updates several existing scripts for UTF-8/formatting and CLI options, and deletes Setup-ongoing.md. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant View as AddOrUpdateView
participant Service as BoostComponentService
participant GitHub as Git Repository
participant Models as Weblate Models
Client->>View: POST /boost-endpoint/add-or-update (organization, submodules, version, extensions)
View->>View: validate payload (AddOrUpdateRequestSerializer)
View->>Service: instantiate(org, lang_code, version, extensions)
View->>Service: process_all(submodules, user, request)
loop each submodule
Service->>GitHub: clone_repository(submodule)
GitHub-->>Service: repo_dir
Service->>GitHub: scan_documentation_files(repo_dir)
GitHub-->>Service: list_of_docs
loop each doc file
Service->>Service: generate_component_config(file, ext)
Service-->>Service: in-memory component config
end
Service->>Models: get_or_create_project(submodule)
Models-->>Service: Project
loop each component config
Service->>Models: create_or_update_component(Project, config)
Models-->>Service: Component, created_flag
Service->>Models: add_language_to_component(Component)
Models-->>Service: language_added_result
end
end
Service-->>View: aggregated_results
View-->>Client: 200 OK {results}
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
🧹 Nitpick comments (15)
scripts/auto/create_component.py (1)
645-645: Dead API call:get_statisticsresult is discarded.This makes a network request whose response is thrown away. Either remove the call entirely or restore the display of statistics.
- creator.get_statistics(project_slug, component_slug) + # Statistics display removed; remove the API call as well🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` at line 645, The call to creator.get_statistics(project_slug, component_slug) is making a network request whose result is discarded; either remove the call or consume its return value—replace the bare call with a captured result (e.g., stats = creator.get_statistics(project_slug, component_slug)) and then use that data where appropriate (log/print it, add it to the created component output, or return it from the function), or simply delete the line if statistics are not needed; locate the invocation of creator.get_statistics near the component creation flow and update the surrounding logic (e.g., the function that calls creator.get_statistics and any callers) to handle the stats variable or remove the unused call.scripts/auto/collect_boost_libraries_extensions.py (4)
96-102: No retry logic for transient HTTP failures.
fetch_urlis called many times per run (for commits, trees, libraries.json, .gitmodules). A single transient 5xx or network blip will propagate as an exception. For a script making 300+ API calls, even a simple single retry with brief backoff would improve robustness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 96 - 102, fetch_url currently makes a single HTTP request and will raise on transient network or 5xx errors; add retry-with-backoff inside fetch_url to make it robust: catch transient exceptions (urllib.error.HTTPError with status >=500, urllib.error.URLError, socket.timeout) and retry up to a small max (e.g., 3-5) with exponential backoff + jitter between attempts, preserving the same headers/timeout and only returning the decoded body on success, and re-raising the last exception if all attempts fail; reference the fetch_url function and ensure Authorization header logic remains unchanged.
355-375: No rate-limit warning when running without a GitHub token.The extension-fetching phase makes ~2 API calls per library (~340+ for the full Boost set). The unauthenticated GitHub API limit is 60 requests/hour. Consider emitting a warning when
tokenisNoneand there are many libraries to process, or documenting this prominently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 355 - 375, In fetch_extensions_for_libraries, detect when token is None and n_libs is large (e.g., >50) and emit a clear stderr warning before starting the loop (use the same print(..., file=sys.stderr) pattern) that unauthenticated GitHub API calls are rate-limited (~60/hr), that this run may hit rate limits given ~2 calls per library, and suggest providing a GITHUB token or reducing the set; keep the rest of the function unchanged and only add this pre-loop check/print so users see the warning early.
48-83: Custom.envloader:""escape convention is non-standard.Line 73 replaces
""→"inside double-quoted values. Most.envimplementations use\"for escaping. This won't cause issues unless a user expects standard dotenv behavior. Consider using thepython-dotenvpackage if available, or documenting the custom escape behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 48 - 83, The loader load_dotenv_script_dir currently treats double-quote escaping as ""->" which is non-standard; change the double-quoted value handling so it unescapes backslash escapes (e.g., after extracting value = value[1:-1] run a standard unescape such as using Python's unicode_escape decoding or replace(r'\"','"') and replace(r'\\','\') to handle \" and \\ correctly) instead of value.replace('""','"'); keep single-quote handling as-is; optionally, if python-dotenv is available prefer delegating to that library (fall back to this corrected parsing) — update the parsing logic inside load_dotenv_script_dir where the code checks value.startswith('"') and value.endswith('"') and where it sets os.environ.setdefault.
462-492:all_librariesis reassigned between incompatible tuple sizes (3→4 elements).On line 478,
all_librariesholds 3-tuples(str, str, str). After lines 485–489, it's reassigned to 4-tuples(str, str, str, str). While this works at runtime, it's fragile and defeats static type checking. Consider using a separate variable name for the 4-tuple list.Proposed fix
- include_extensions = not args.no_extensions - if include_extensions: - all_libraries = fetch_extensions_for_libraries( - all_libraries, libs_ref, args.token - ) - else: - all_libraries = [(f, u, s, "") for f, u, s in all_libraries] - - write_library_output(all_libraries, out_path, libs_ref, include_extensions) - write_submodule_output(all_libraries, lib_submodules, out_path, libs_ref) + include_extensions = not args.no_extensions + if include_extensions: + all_libraries_ext = fetch_extensions_for_libraries( + all_libraries, libs_ref, args.token + ) + else: + all_libraries_ext = [(f, u, s, "") for f, u, s in all_libraries] + + write_library_output(all_libraries_ext, out_path, libs_ref, include_extensions) + write_submodule_output(all_libraries_ext, lib_submodules, out_path, libs_ref)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 462 - 492, The function main reassigns all_libraries from 3-tuples (returned by collect_all_libraries) to 4-tuples (returned/constructed for extensions), breaking static typing; keep all_libraries as the original 3-tuple list and introduce a new variable (e.g., libraries_with_extensions) to hold the 4-tuple list produced by fetch_extensions_for_libraries or by mapping [(f,u,s,"")...], then pass libraries_with_extensions (not all_libraries) into write_library_output and write_submodule_output where the 4-tuple shape is required; reference symbols: main, all_libraries, collect_all_libraries, fetch_extensions_for_libraries, write_library_output, write_submodule_output, args.no_extensions.scripts/auto/generate_boost_project_configs.py (2)
139-139: Hardcoded org replacement:"boostorg"→"CppDigest"is fragile and environment-specific.
https_to_git_ssh(repo_url).replace("boostorg", "CppDigest")performs a naive global string replace. This hardcodes the fork organization and would break if any repo name happened to contain "boostorg". Consider making the target organization a CLI argument or config parameter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` at line 139, The repo assignment currently does a global string replace which is fragile; make the target org configurable (e.g., add a CLI arg or config flag like --org with default "CppDigest") and update the assignment that uses https_to_git_ssh(repo_url) so it only replaces the Git organization path segment rather than doing a global .replace. Concretely: add a parameter (e.g., org_name) to the script/arg parser, call https_to_git_ssh(repo_url) as before, parse the resulting SSH URL into its components (host and path), split the path into org/repo parts, substitute the org part with org_name (or replace only when it equals "boostorg"), then reassemble the SSH URL and assign to repo; reference the existing repo variable and https_to_git_ssh(repo_url) call to locate where to change.
115-124:" and ".join(names)produces awkward phrasing for 3+ extensions.For
[".html", ".xml", ".qbk"], this yields"HTML and XML and Quickbook". Consider using comma-separated with "and" before the last item for better readability.Proposed fix
- return " and ".join(names) + if len(names) <= 2: + return " and ".join(names) + return ", ".join(names[:-1]) + ", and " + names[-1]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` around lines 115 - 124, The function extensions_to_fullname builds a human-readable list but currently uses " and ".join(names) which yields awkward repetition for 3+ items; update extensions_to_fullname to format names using comma separation with a final " and " (e.g. "A, B and C"): keep the existing dedup logic using EXTENSION_FULLNAMES and names, then if len(names) == 0 return "", if len(names) == 1 return names[0], if len(names) == 2 return " and ".join(names), else return ", ".join(names[:-1]) + " and " + names[-1] so three-or-more lists read naturally.scripts/auto/upload_translations.py (1)
176-182: Misleading indentation: continuation lines are less indented than theprint(call.The
print()arguments on lines 178-181 are indented at column ~20, while theprint(itself starts at column ~28 inside theelseblock. This is syntactically valid but visually confusing and inconsistent with the surrounding code.Fix indentation
else: - print( - f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " - f"Not found={not_found}", - flush=True, - ) + print( + f" ✗ Failed: Accepted={accepted}, " + f"Skipped={skipped}, Not found={not_found}", + flush=True, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/upload_translations.py` around lines 176 - 182, The continuation lines of the print call inside the else block are misaligned and visually confusing; re-indent the wrapped f-string arguments so they line up with the opening parenthesis of print() (or align under a consistent indent level) in the else block containing the print and return False statements in upload_translations.py to match surrounding code style and improve readability.weblate/boost_endpoint/urls.py (1)
9-11: URL names"info"and"add-or-update"are not namespaced.Without an
app_namein this module or a namespace in theinclude()call, these generic names may collide with names in other apps. Consider addingapp_name = "boost_endpoint"to this module so the routes can be referenced asboost_endpoint:info, etc.Proposed fix
from weblate.boost_endpoint.views import AddOrUpdateView, BoostEndpointInfo +app_name = "boost_endpoint" + urlpatterns = [ path("", BoostEndpointInfo.as_view(), name="info"), path("add-or-update/", AddOrUpdateView.as_view(), name="add-or-update"), ]And in
weblate/urls.py:- path("boost-endpoint/", include("weblate.boost_endpoint.urls")), + path("boost-endpoint/", include(("weblate.boost_endpoint.urls", "boost_endpoint"))),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/urls.py` around lines 9 - 11, The URL patterns for BoostEndpointInfo and AddOrUpdateView are missing an app namespace; add a module-level app_name = "boost_endpoint" above urlpatterns so these routes can be referenced as boost_endpoint:info and boost_endpoint:add-or-update, and ensure the include(...) that imports this urlpatterns uses that namespace when wiring into the project URLs.scripts/auto/add_translation.py (1)
273-274: Dead code:ifblock with onlypass.This conditional does nothing at runtime. Either remove it entirely or implement the logging the comment hints at.
Option A: Remove dead code
- if "total" in translation: - pass # Optional: log translation.get("translated")/translation.get("total")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add_translation.py` around lines 273 - 274, The conditional check in add_translation.py: the if "total" in translation: block contains only pass and should be removed or implemented; either delete that dead code or replace the pass with a log statement that reports translation.get("translated") and translation.get("total") (using the module's logger or existing logging pattern in the file) so the code has observable behavior; locate the if "total" in translation: line and apply one of the two fixes consistently with surrounding logging conventions.scripts/auto/add-or-update.sh (1)
38-44: Shell injection risk: variables interpolated directly into JSON string.
ORGANIZATION,LANG_CODE,VERSIONare interpolated into the JSON body without escaping. While currently hardcoded, if these are ever parameterized (e.g., from CI inputs), special characters could break the JSON or allow injection. Consider usingjqto construct the payload safely.Safer payload construction with jq
-curl -X POST "${WEBLATE_URL}/boost-endpoint/add-or-update/" \ - -H "Authorization: Bearer ${TOKEN}" \ - -H "Content-Type: application/json" \ - -d "{ - \"organization\": \"${ORGANIZATION}\", - \"submodules\": ${SUBMODULES}, - \"lang_code\": \"${LANG_CODE}\", - \"version\": \"${VERSION}\", - \"extensions\": ${EXTENSIONS} - }" \ - --max-time 5 \ - || true +PAYLOAD=$(jq -n \ + --arg org "${ORGANIZATION}" \ + --argjson subs "${SUBMODULES}" \ + --arg lang "${LANG_CODE}" \ + --arg ver "${VERSION}" \ + --argjson exts "${EXTENSIONS}" \ + '{organization: $org, submodules: $subs, lang_code: $lang, version: $ver, extensions: $exts}') + +curl -X POST "${WEBLATE_URL}/boost-endpoint/add-or-update/" \ + -H "Authorization: Token ${TOKEN}" \ + -H "Content-Type: application/json" \ + -d "${PAYLOAD}" \ + --max-time 5 \ + || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 38 - 44, The JSON payload in add-or-update.sh is built by directly interpolating variables into the -d string (ORGANIZATION, SUBMODULES, LANG_CODE, VERSION, EXTENSIONS), which risks JSON breakage or shell injection; change this to construct the body with jq (or similar) using --arg for strings and --argjson for JSON arrays/objects to safely escape values and emit valid JSON, then pipe that output into curl (e.g., pass via --data `@-`); update the code paths that reference the inline -d block so they use the jq-built payload instead.weblate/urls.py (1)
921-922: Move boost_endpoint URL registration inside a conditional check matching other optional integrations.The boost_endpoint is unconditionally wired into
real_patternsat line 922, while all other optional integrations (billing, gitexport, legal, debug_toolbar, wlhosted.integrations, djangosaml2idp) are conditionally included based onINSTALLED_APPSmembership. Since boost_endpoint is not in the default package configuration (absent from pyproject.toml and setup.py), unconditional inclusion risks anImportErrorif the app is not installed and unnecessarily exposes the endpoint regardless of whether Boost functionality is needed.Proposed conditional inclusion pattern
- # Boost documentation translation endpoint - path("boost-endpoint/", include("weblate.boost_endpoint.urls")), ] + +# Boost documentation translation endpoint +if "weblate.boost_endpoint" in settings.INSTALLED_APPS: + real_patterns.append( + path("boost-endpoint/", include("weblate.boost_endpoint.urls")), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/urls.py` around lines 921 - 922, The boost_endpoint URL is being always registered (path("boost-endpoint/", include("weblate.boost_endpoint.urls"))) which can raise ImportError when the package isn't installed; update the registration to match other optional integrations by wrapping it in the same INSTALLED_APPS conditional: check if "weblate.boost_endpoint" (or the correct app label) is in INSTALLED_APPS before adding the path to real_patterns (the same list that holds other optional includes), so only include the boost endpoint when the app is present.weblate/boost_endpoint/services.py (2)
340-340: Consider using a self-documenting never-match regex or adding a comment.
"a^"is a valid never-matching regex, but it's not immediately obvious. A brief inline comment would help:"language_regex": f"^{self.lang_code}$" if self.lang_code else "a^", # never-match🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` at line 340, The fallback never-match pattern used for the language_regex key is not self-explanatory; update the value to use a clearer never-matching regex (for example the atomic negative lookahead "(?!)") or add an inline comment explaining "a^" so future readers understand it's intentional, referencing the language_regex assignment that uses self.lang_code to decide the pattern.
510-607:_delete_component_and_commit_removalpushes commits to the remote — ensure this is gated by proper authorization.This method deletes a component, removes translation files from disk, commits, and pushes to the remote repository. This is a significant destructive operation. The caller (
process_submodule) gates onproject.editpermission (line 648), but there's no additional safeguard within this method itself.Consider adding a log at
WARNINGorINFOlevel before the push (line 581) that includes the target remote and branch, so destructive pushes are auditable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 510 - 607, The _delete_component_and_commit_removal method performs destructive git pushes without internal audit logging; before invoking the git push (the block that uses push_url and push_branch inside _delete_component_and_commit_removal), add a WARNING-level log entry that records the target remote and branch (include push_url and push_branch) and any available actor/context (e.g., current request user or project identifier if accessible), then proceed with the existing push; keep the existing permission check in process_submodule but add this explicit pre-push audit log so pushes are auditable.scripts/auto/create_component_and_add_translation.py (1)
83-89:is_format_supportedtreats"auto"as unsupported — verify this is intentional.Line 88 explicitly returns
Falseforfile_format == "auto". If Weblate's auto-detection should be allowed to proceed (the component might resolve to a supported format at runtime), this early rejection would silently skip valid components.If this is intentional (i.e., you want only explicitly declared formats), a brief code comment explaining the rationale would help future readers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component_and_add_translation.py` around lines 83 - 89, The function is_format_supported currently treats a file_format of "auto" as unsupported by returning False; decide whether "auto" should be allowed: if Weblate's auto-detection should be accepted, remove the explicit file_format == "auto" check (or change it to return True) so the membership test against supported_formats can proceed; if rejecting "auto" is intentional, add a short inline comment in is_format_supported explaining why "auto" is excluded (e.g., auto-detected formats are resolved later and should not be validated here) and keep the current check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.gitignore:
- Line 57: The new .gitignore entry "gettextization.failed.po" is unanchored and
will ignore that filename anywhere in the repo; verify there are no legitimate
files named gettextization.failed.po that must be tracked (previously the
removed "/scripts/auto/pofiles/" entry only ignored that directory). If you need
to keep a specific copy tracked, either re-anchor the ignore (e.g., add the
explicit path for files to ignore) or add a negative pattern to whitelist the
tracked path (using !<path>/gettextization.failed.po); update the .gitignore
accordingly and ensure the change targets the previous entry context (look for
the removed "/scripts/auto/pofiles/" reference) so other occurrences aren't
accidentally ignored.
In `@scripts/auto/add-or-update.sh`:
- Around line 35-46: The curl request in add-or-update.sh is using the wrong
auth scheme and is silently swallowing failures; change the Authorization header
in the curl call from "Authorization: Bearer ${TOKEN}" to "Authorization: Token
${TOKEN}" (matching add_translation.py usage and DRF TokenAuthentication) and
remove the trailing "|| true" (or replace with curl --fail) so HTTP auth
failures are not ignored; update the curl invocation that posts to
"${WEBLATE_URL}/boost-endpoint/add-or-update/" accordingly.
- Around line 17-22: The WEBLATE_URL extracted into the WEBLATE_URL variable
from WEB_JSON lacks validation and can be empty or the string "null"; add the
same defensive check used for TOKEN to validate WEBLATE_URL (variable
WEBLATE_URL, source WEB_JSON) and if it's empty or "null" print an error to
stderr and exit 1 so subsequent curl calls don’t use an invalid URL. Ensure the
check mirrors the existing TOKEN check (test for -z or == "null") and uses a
clear error message mentioning WEBLATE_URL and WEB_JSON.
In `@scripts/auto/generate_boost_project_configs.py`:
- Line 24: Add "from __future__ import annotations" at the top of the file to
match project conventions and ensure consistent use of postponed evaluation of
annotations; modify the module containing the function parse_list_row to include
that import line before any other code so the return type annotation "tuple[str,
str, str, str, str, str] | None" follows the project's established style.
In `@scripts/auto/setup_project.py`:
- Around line 788-793: The loop over config_files treats any non-zero return
from run_one_config as a failure, but run_one_config uses return code 2 to
indicate a gracefully skipped clone (see docstring); update the loop in the
block that calls run_one_config so that result==2 is treated as a skip (do not
set exit_code) while other non-zero results still set exit_code=1. Reference the
run_one_config call and the exit_code variable in your change: only mark failure
when result is non-zero and not equal to 2 (i.e., keep exit_code assignment for
genuine errors, not for skip returns).
In `@weblate/boost_endpoint/__init__.py`:
- Line 1: Fix the spelling typo "Orgnaization" to "Organization" in the
top-level copyright/header string found in the module headers (the exact token
"Orgnaization" appears in the file headers); update that string in both modules
where it appears so the copyright reads "Copyright © Boost Organization
<boost@boost.org>".
In `@weblate/boost_endpoint/serializers.py`:
- Around line 15-18: The ListField definition for submodules currently allows an
empty list which lets POSTs with "submodules": [] pass validation and become a
silent no-op; update the submodules field (the serializers.ListField named
submodules) to reject empty lists by adding either min_length=1 or
allow_empty=False (and adjust any tests/clients expecting empty lists), so
validation fails with a clear error when no submodules are provided.
- Around line 33-43: The ListField named extensions in
weblate/boost_endpoint/serializers.py currently uses a child CharField with
allow_blank=True which permits empty-string entries like [""] — change the child
CharField to disallow blank strings (set allow_blank=False or remove the
allow_blank option) so that empty-string extensions fail validation;
alternatively add a custom validator on the extensions field to reject any
empty-string elements, keeping allow_null=True and default=None as-is.
In `@weblate/boost_endpoint/services.py`:
- Around line 47-48: SYNC_SETTLE_SECONDS is currently set to 60 and is being
awaited after every component sync, causing N×60s delays for N docs; change this
by either reducing the default value, making SYNC_SETTLE_SECONDS configurable
(env/config var) so callers can tune it, or move the single sleep out of the
per-component loop and perform one settle sleep after the batch completes (i.e.,
remove per-iteration await where the loop in the component-sync routine calls
SYNC_SETTLE_SECONDS and instead call a single sleep using SYNC_SETTLE_SECONDS
after finishing all components); update references to SYNC_SETTLE_SECONDS and
the component sync loop (the function/method that iterates component syncs)
accordingly.
- Around line 663-675: The deletion loop is iterating
project.component_set.all() too broadly; restrict it to components whose slugs
start with the computed prefix so only boost-managed components are considered.
Change the loop to iterate a filtered queryset (use the existing prefix
variable) instead of project.component_set.all(), e.g., filter by
slug__startswith=prefix, then keep the existing logic that builds wanted_slugs
and calls _delete_component_and_commit_removal for components not in
wanted_slugs and not component.is_glossary while recording errors into result.
- Around line 609-630: process_submodule assumes self.temp_dir is set and will
TypeError on os.path.join if it's None; add a lazy-init guard at the start of
process_submodule: if self.temp_dir is None, assign a temporary directory (e.g.
via tempfile.mkdtemp()) to self.temp_dir, then continue to build
temp_submodule_dir and os.makedirs(temp_submodule_dir, exist_ok=True) as before
so process_submodule can be called independently of process_all; ensure tempfile
is imported and that behavior remains compatible with clone_repository and any
existing cleanup logic.
- Around line 347-384: The transaction.atomic() block currently wraps
Component.objects.get_or_create and the heavy I/O steps
(_sync_component_for_translation and add_language_to_component), causing
long-held DB locks; refactor so only the DB-mutating parts stay inside the
atomic block: perform get_or_create, set acting_user, call component.post_create
(if created) and component.save(update_fields=...) inside transaction.atomic(),
capture and return component and created, then after the atomic block call
self._sync_component_for_translation(component, request, created=created) and,
if self.lang_code is not None, self.add_language_to_component(component,
request); ensure behavior for created vs existing (post_create only when
created) and preserve logging (LOGGER.info) and branch/push_branch updates.
- Around line 621-624: The code is vulnerable to path traversal when
constructing temp_submodule_dir = os.path.join(self.temp_dir, submodule);
sanitize and validate the submodule name (e.g., via the existing
_submodule_slug() or a strict slugify) and then compute the absolute/resolved
path for the directory (use os.path.realpath/os.path.abspath) and verify it
stays inside self.temp_dir using os.path.commonpath; if the resolved path
escapes self.temp_dir, raise/return an error and do not call os.makedirs or run
git clone. Update usages of temp_submodule_dir, submodule and any clone steps to
use the validated slug/path.
- Around line 283-284: The constructed component_slug
("boost-{slug}-documentation-{config['component_slug']}") can exceed the model's
slug max_length (100) for deeply nested paths; use the existing
truncate_component_name helper to truncate the component portion (or the entire
composed slug) before saving: update the component_slug creation in the block
that calls _submodule_slug(submodule) so that you pass config['component_slug']
through truncate_component_name (or apply it to the full
f"boost-{slug}-documentation-{...}") ensuring the final component_slug length is
clipped to the model limit so saving won't raise a DataError.
In `@weblate/boost_endpoint/urls.py`:
- Line 1: Fix the typo in the copyright header by changing "Orgnaization" to
"Organization" in the module-level copyright strings found in the
weblate.boost_endpoint.urls and weblate.boost_endpoint.views files (top of the
files).
In `@weblate/boost_endpoint/views.py`:
- Around line 67-72: The handler currently calls service.process_all(submodules,
user=request.user, request=request) synchronously and returns Response(results,
status=status.HTTP_200_OK), which can block and surface raw 500s; wrap the call
in a try/except that catches broad exceptions from service.process_all and
return a structured error Response (e.g.,
status=status.HTTP_500_INTERNAL_SERVER_ERROR with a JSON body containing an
error message and details), and as a better fix consider offloading the
long-running work to an async job (e.g., enqueue a Celery task and return
status=status.HTTP_202_ACCEPTED with a task id) instead of calling
service.process_all directly so the view no longer blocks the request thread.
- Around line 29-32: The AddOrUpdateView currently uses permission_classes =
(IsAuthenticated,) which allows any logged-in user to perform repository cloning
and project/component creation; change this to a stricter permission such as
permission_classes = (IsAdminUser,) or implement and reference a custom
permission class (e.g., IsWeblateAdmin or HasWeblateManagementPermission) and
use that in AddOrUpdateView.permission_classes to ensure only admins or users
with explicit Weblate management privileges can access these operations.
---
Nitpick comments:
In `@scripts/auto/add_translation.py`:
- Around line 273-274: The conditional check in add_translation.py: the if
"total" in translation: block contains only pass and should be removed or
implemented; either delete that dead code or replace the pass with a log
statement that reports translation.get("translated") and
translation.get("total") (using the module's logger or existing logging pattern
in the file) so the code has observable behavior; locate the if "total" in
translation: line and apply one of the two fixes consistently with surrounding
logging conventions.
In `@scripts/auto/add-or-update.sh`:
- Around line 38-44: The JSON payload in add-or-update.sh is built by directly
interpolating variables into the -d string (ORGANIZATION, SUBMODULES, LANG_CODE,
VERSION, EXTENSIONS), which risks JSON breakage or shell injection; change this
to construct the body with jq (or similar) using --arg for strings and --argjson
for JSON arrays/objects to safely escape values and emit valid JSON, then pipe
that output into curl (e.g., pass via --data `@-`); update the code paths that
reference the inline -d block so they use the jq-built payload instead.
In `@scripts/auto/collect_boost_libraries_extensions.py`:
- Around line 96-102: fetch_url currently makes a single HTTP request and will
raise on transient network or 5xx errors; add retry-with-backoff inside
fetch_url to make it robust: catch transient exceptions (urllib.error.HTTPError
with status >=500, urllib.error.URLError, socket.timeout) and retry up to a
small max (e.g., 3-5) with exponential backoff + jitter between attempts,
preserving the same headers/timeout and only returning the decoded body on
success, and re-raising the last exception if all attempts fail; reference the
fetch_url function and ensure Authorization header logic remains unchanged.
- Around line 355-375: In fetch_extensions_for_libraries, detect when token is
None and n_libs is large (e.g., >50) and emit a clear stderr warning before
starting the loop (use the same print(..., file=sys.stderr) pattern) that
unauthenticated GitHub API calls are rate-limited (~60/hr), that this run may
hit rate limits given ~2 calls per library, and suggest providing a GITHUB token
or reducing the set; keep the rest of the function unchanged and only add this
pre-loop check/print so users see the warning early.
- Around line 48-83: The loader load_dotenv_script_dir currently treats
double-quote escaping as ""->" which is non-standard; change the double-quoted
value handling so it unescapes backslash escapes (e.g., after extracting value =
value[1:-1] run a standard unescape such as using Python's unicode_escape
decoding or replace(r'\"','"') and replace(r'\\','\') to handle \" and \\
correctly) instead of value.replace('""','"'); keep single-quote handling as-is;
optionally, if python-dotenv is available prefer delegating to that library
(fall back to this corrected parsing) — update the parsing logic inside
load_dotenv_script_dir where the code checks value.startswith('"') and
value.endswith('"') and where it sets os.environ.setdefault.
- Around line 462-492: The function main reassigns all_libraries from 3-tuples
(returned by collect_all_libraries) to 4-tuples (returned/constructed for
extensions), breaking static typing; keep all_libraries as the original 3-tuple
list and introduce a new variable (e.g., libraries_with_extensions) to hold the
4-tuple list produced by fetch_extensions_for_libraries or by mapping
[(f,u,s,"")...], then pass libraries_with_extensions (not all_libraries) into
write_library_output and write_submodule_output where the 4-tuple shape is
required; reference symbols: main, all_libraries, collect_all_libraries,
fetch_extensions_for_libraries, write_library_output, write_submodule_output,
args.no_extensions.
In `@scripts/auto/create_component_and_add_translation.py`:
- Around line 83-89: The function is_format_supported currently treats a
file_format of "auto" as unsupported by returning False; decide whether "auto"
should be allowed: if Weblate's auto-detection should be accepted, remove the
explicit file_format == "auto" check (or change it to return True) so the
membership test against supported_formats can proceed; if rejecting "auto" is
intentional, add a short inline comment in is_format_supported explaining why
"auto" is excluded (e.g., auto-detected formats are resolved later and should
not be validated here) and keep the current check.
In `@scripts/auto/create_component.py`:
- Line 645: The call to creator.get_statistics(project_slug, component_slug) is
making a network request whose result is discarded; either remove the call or
consume its return value—replace the bare call with a captured result (e.g.,
stats = creator.get_statistics(project_slug, component_slug)) and then use that
data where appropriate (log/print it, add it to the created component output, or
return it from the function), or simply delete the line if statistics are not
needed; locate the invocation of creator.get_statistics near the component
creation flow and update the surrounding logic (e.g., the function that calls
creator.get_statistics and any callers) to handle the stats variable or remove
the unused call.
In `@scripts/auto/generate_boost_project_configs.py`:
- Line 139: The repo assignment currently does a global string replace which is
fragile; make the target org configurable (e.g., add a CLI arg or config flag
like --org with default "CppDigest") and update the assignment that uses
https_to_git_ssh(repo_url) so it only replaces the Git organization path segment
rather than doing a global .replace. Concretely: add a parameter (e.g.,
org_name) to the script/arg parser, call https_to_git_ssh(repo_url) as before,
parse the resulting SSH URL into its components (host and path), split the path
into org/repo parts, substitute the org part with org_name (or replace only when
it equals "boostorg"), then reassemble the SSH URL and assign to repo; reference
the existing repo variable and https_to_git_ssh(repo_url) call to locate where
to change.
- Around line 115-124: The function extensions_to_fullname builds a
human-readable list but currently uses " and ".join(names) which yields awkward
repetition for 3+ items; update extensions_to_fullname to format names using
comma separation with a final " and " (e.g. "A, B and C"): keep the existing
dedup logic using EXTENSION_FULLNAMES and names, then if len(names) == 0 return
"", if len(names) == 1 return names[0], if len(names) == 2 return " and
".join(names), else return ", ".join(names[:-1]) + " and " + names[-1] so
three-or-more lists read naturally.
In `@scripts/auto/upload_translations.py`:
- Around line 176-182: The continuation lines of the print call inside the else
block are misaligned and visually confusing; re-indent the wrapped f-string
arguments so they line up with the opening parenthesis of print() (or align
under a consistent indent level) in the else block containing the print and
return False statements in upload_translations.py to match surrounding code
style and improve readability.
In `@weblate/boost_endpoint/services.py`:
- Line 340: The fallback never-match pattern used for the language_regex key is
not self-explanatory; update the value to use a clearer never-matching regex
(for example the atomic negative lookahead "(?!)") or add an inline comment
explaining "a^" so future readers understand it's intentional, referencing the
language_regex assignment that uses self.lang_code to decide the pattern.
- Around line 510-607: The _delete_component_and_commit_removal method performs
destructive git pushes without internal audit logging; before invoking the git
push (the block that uses push_url and push_branch inside
_delete_component_and_commit_removal), add a WARNING-level log entry that
records the target remote and branch (include push_url and push_branch) and any
available actor/context (e.g., current request user or project identifier if
accessible), then proceed with the existing push; keep the existing permission
check in process_submodule but add this explicit pre-push audit log so pushes
are auditable.
In `@weblate/boost_endpoint/urls.py`:
- Around line 9-11: The URL patterns for BoostEndpointInfo and AddOrUpdateView
are missing an app namespace; add a module-level app_name = "boost_endpoint"
above urlpatterns so these routes can be referenced as boost_endpoint:info and
boost_endpoint:add-or-update, and ensure the include(...) that imports this
urlpatterns uses that namespace when wiring into the project URLs.
In `@weblate/urls.py`:
- Around line 921-922: The boost_endpoint URL is being always registered
(path("boost-endpoint/", include("weblate.boost_endpoint.urls"))) which can
raise ImportError when the package isn't installed; update the registration to
match other optional integrations by wrapping it in the same INSTALLED_APPS
conditional: check if "weblate.boost_endpoint" (or the correct app label) is in
INSTALLED_APPS before adding the path to real_patterns (the same list that holds
other optional includes), so only include the boost endpoint when the app is
present.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
scripts/auto/add_translation.py (1)
471-474:⚠️ Potential issue | 🟡 MinorInconsistent
flush=Truein error exit path.These two
print()calls are missingflush=True, unlike every other error/info print in this file. Without it, output may not appear beforesys.exit(1)in buffered environments (e.g., CI pipes).🛠 Proposed fix
- print("[ERROR] API token required") - print("[INFO] Provide via --token, --web-config, or WEBLATE_TOKEN") + print("[ERROR] API token required", flush=True) + print("[INFO] Provide via --token, --web-config, or WEBLATE_TOKEN", flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add_translation.py` around lines 471 - 474, The two prints in the token-check exit path (the if not token block in add_translation.py) are missing flush=True; update the two print calls that output "[ERROR] API token required" and "[INFO] Provide via --token, --web-config, or WEBLATE_TOKEN" to include flush=True so their output is flushed before sys.exit(1), keeping behavior consistent with other error/info prints in this script.scripts/auto/setup_project.py (1)
139-170:⚠️ Potential issue | 🟠 MajorComponent slug has no length bound — will fail Weblate's API constraint.
The Weblate
Component.slugfield has a maximum length of 100 characters. However,generate_component_slug()produces unbounded slugs, and when prefixed withproject_slugat line 224 (e.g.,"boost-filesystem-doc-modules-root-pages-reference-algorithms-containers-basic-string-overview"), the combined slug easily exceeds this limit. The dedup logic at lines 710–713 further extends the slug by appending counters, pushing it even further over the constraint and causing component creation to fail.Add a
MAX_COMPONENT_SLUG_LENGTH = 100constant and truncate the slug analogously totruncate_component_name()— preserve the tail to maintain uniqueness when directory paths share common prefixes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/setup_project.py` around lines 139 - 170, generate_component_slug currently returns unbounded slugs which can exceed Weblate's 100-char Component.slug limit when combined with project prefixes or dedup counters; add a constant MAX_COMPONENT_SLUG_LENGTH = 100 and enforce it in generate_component_slug by truncating the generated slug similar to truncate_component_name(): if slug length exceeds MAX_COMPONENT_SLUG_LENGTH, preserve the tail segment (rightmost characters) so directory-unique parts remain, and ensure any appended dedup counters (used elsewhere) will still fit by leaving room for a small suffix; update generate_component_slug to apply this truncation before returning.
🧹 Nitpick comments (16)
scripts/auto/upload_translations.py (1)
177-181: Fix indentation ofprint()arguments — visually misleading, not idiomatic.The continuation lines (lines 178–181) are dedented relative to the
print(on line 177, which makes them look as if they fall outside theelseblock. While Python accepts any indentation inside an open parenthesis, most formatters (black/autopep8) and readers expect continuation lines to be indented at least to the opening-paren column.♻️ Proposed fix
- print( - f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " - f"Not found={not_found}", - flush=True, - ) + print( + f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " + f"Not found={not_found}", + flush=True, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/upload_translations.py` around lines 177 - 181, The multi-line print call that starts with print( in upload_translations.py has its continuation lines dedented, which is visually misleading; re-indent the continuation lines (the f-string lines and flush=True,) to align under the opening parenthesis of the print call so they are at least in-column with print( — i.e., move the lines beginning with f" ✗ Failed..." and f"Not found=..." and the flush=True, argument to the same indentation level as the first line after the opening paren for the print call to match typical formatter/PEP8 expectations.scripts/auto/create_component.py (3)
645-645:get_statisticsreturn value is discarded — unnecessary API call.This makes a network request to fetch statistics but does nothing with the result. Either remove the call or use its return value to display something meaningful.
Option A: Remove the dead call
- creator.get_statistics(project_slug, component_slug)Option B: Print basic stats
- creator.get_statistics(project_slug, component_slug) + stats = creator.get_statistics(project_slug, component_slug) + if stats: + total = stats.get('total', 0) + translated = stats.get('translated', 0) + print(f"\n[INFO] Statistics: {translated}/{total} translated", flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` at line 645, The call to creator.get_statistics(project_slug, component_slug) currently discards its return value causing an unnecessary API request; either remove that call from create_component.py or capture its return and surface useful information (e.g., store stats = creator.get_statistics(project_slug, component_slug) and print or log summary fields) so the network call has a purpose; update references to project_slug and component_slug accordingly and ensure any printed output is concise and user-friendly.
496-506: Recursive retry on required fields could be replaced with a loop.If a user repeatedly submits empty input on a required field, this recurses until hitting Python's default recursion limit (~1000). A
whileloop would be more robust, though this is unlikely to matter in practice.Suggested loop-based approach
def _prompt(message: str, default: str = "", required: bool = False) -> str: """Prompt user for input with optional default.""" if default: value = input(f"{message} (default: {default}): ").strip() return value or default else: - value = input(f"{message}: ").strip() - if required and not value: - print("[ERROR] This field is required", flush=True) - return _prompt(message, default, required) - return value + while True: + value = input(f"{message}: ").strip() + if value or not required: + return value + print("[ERROR] This field is required", flush=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` around lines 496 - 506, The _prompt function currently retries required-field input by recursively calling itself which can hit Python's recursion limit; replace the recursive retry in _prompt with an iterative loop: keep prompting inside a while True loop (using the same input prompts and default handling) and break/return only when a non-empty value is provided (or default is accepted), preserving the existing message/default/required semantics and references to _prompt, message, default, and required.
746-751: Missing validation forprojectandcomponentkeys.
validate_configonly checks forweblate_url/api_token, butsetup_project_and_componentaccessesconfig['project']andconfig['component']directly (lines 593, 598). Missing keys produce aKeyErrorcaught by the broadexceptinmain(), yielding a cryptic message likeFailed to create project/component: 'project'.Proposed fix
def validate_config(config: Dict[str, Any]) -> bool: """Validate required configuration fields.""" if not config.get('weblate_url') or not config.get('api_token'): print("[ERROR] Missing Weblate URL or API token", flush=True) return False + if 'project' not in config or 'component' not in config: + print("[ERROR] Missing 'project' or 'component' configuration", flush=True) + return False return True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` around lines 746 - 751, validate_config currently only checks weblate_url and api_token but not the 'project' and 'component' keys used later in setup_project_and_component; update validate_config to also verify config.get('project') and config.get('component') are present and non-empty, print a clear error message including which key is missing (e.g., "[ERROR] Missing 'project' in config"), and return False on missing values so main() does not proceed and trigger a KeyError in setup_project_and_component.scripts/auto/create_component_and_add_translation.py (3)
87-88: Silent skip forfile_format == "auto"is undocumented behavior.When the component config carries
"file_format": "auto"(Weblate's automatic detection),is_format_supportedreturnsFalse, causingmainto exit 0 with a "Skipping" warning. This is a reasonable design choice, but it's invisible to callers and untested. At a minimum, document the intent in the docstring.📝 Suggested docstring addition
def is_format_supported(config: Dict[str, Any], supported_formats: set) -> bool: - """Return True if the component's file_format is in the supported set.""" + """Return True if the component's file_format is in the supported set. + + Returns False (i.e., skip) for missing, empty, or "auto" file_format values, + since automatic detection is not a concrete Weblate format identifier. + """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component_and_add_translation.py` around lines 87 - 88, Document that is_format_supported treats file_format == "auto" as intentionally unsupported and that main will skip and exit 0 when detection is automatic; update the docstring for the is_format_supported function (and/or the module-level docstring in create_component_and_add_translation.py) to state this behavior and rationale, and add/adjust a unit test to assert is_format_supported("auto") returns False and that main emits the "Skipping" warning and exits without error when file_format is "auto".
64-69: Hardcoded fallback set will silently diverge fromweblate_supported_formats.jsonover time.
default_formatsis a manually maintained copy of everyfile_formatvalue in the JSON file. Any future addition or removal in the JSON won't automatically propagate to this fallback, so callers who happen to use a missing/unreadable config path will see different behavior than those who do.A simpler and maintenance-free approach is to embed the path resolution inside the module and always attempt to load it rather than duplicating the set:
♻️ Proposed refactor
- default_formats = { - "asciidoc", "po", "json", "yaml", "xliff", "markdown", "txt", "html", - "odf", "csv", "idml", "ini", "php", "properties", "resx", "ts", "dtd", - "flatxml", "fluent", "stringsdict", "tbx", "xlsx", "ruby-yaml", - "webextension", "arb", "gwt", "winrc", "subtitle", - } - if not config_path or not os.path.isfile(config_path): - return default_formats + if not config_path or not os.path.isfile(config_path): + # Last-resort: return empty set so all formats pass through rather than + # silently using a stale hardcoded list. + return set() try: with open(config_path, "r", encoding="utf-8") as f: data = json.load(f) except (OSError, json.JSONDecodeError) as e: print(f"[WARNING] Could not load formats config {config_path}: {e}", flush=True) - return default_formats + return set()Alternatively, keep the fallback but add a unit test that asserts the set exactly matches the keys in the JSON file so CI catches any divergence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component_and_add_translation.py` around lines 64 - 69, Replace the hardcoded default_formats set with a runtime load from the weblate_supported_formats.json file: at module import attempt to open and parse that JSON and build default_formats from its file_format values (falling back to the existing hardcoded set only if the load/parse fails), and add a warning log when falling back; keep the symbol name default_formats so callers are unchanged. Alternatively, instead of changing runtime behavior, add a unit test that loads weblate_supported_formats.json and asserts its file_format keys exactly match the hardcoded default_formats to catch drift in CI.
382-393: Exit code0for a skipped (unsupported format) run is indistinguishable from a successful run in CI.When
is_format_supportedreturnsFalse, the script exits 0. CI orchestrators and callers relying on the exit code cannot tell whether the component was actually created or silently skipped. This is especially problematic if this script is invoked in a loop across many components — a misconfigured format will appear as a success.Consider a dedicated exit code (e.g.,
2) for "skipped" scenarios, or have the caller opt-in to skip-as-success via a--allow-unsupportedflag.💡 Minimal change to signal "skipped" distinctly
- return 0 + return 2 # Skipped — not a failure, but not a successful creation eitherIf keeping
0is intentional (e.g., to prevent CI from failing on unsupported formats), add a comment explaining the rationale.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component_and_add_translation.py` around lines 382 - 393, The current branch in create_component_and_add_translation.py that handles unsupported formats (when is_format_supported(config, supported_formats) is False) returns 0 which hides skipped runs from callers; change this to return a distinct non-zero exit code (e.g., return 2) to indicate "skipped/unsupported format" so CI and callers can distinguish success from skip, or alternatively add a CLI flag (--allow-unsupported) that, when set, preserves the existing return 0 while default behavior returns 2; update the block that references config, comp.get("file_format") and prints formats_path to use the new exit behavior and document the choice with a brief comment.scripts/auto/add_translation.py (1)
272-274: Remove dead no-op block.The
if "total" in translation: passblock is a no-op. Either implement the optional logging or remove it entirely.♻️ Proposed cleanup
- # Show statistics if available - if "total" in translation: - pass # Optional: log translation.get("translated")/translation.get("total")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add_translation.py` around lines 272 - 274, The no-op conditional checking "if 'total' in translation: pass" should be removed or turned into meaningful logging: locate the block in add_translation.py (the function that processes the translation dict where the variable translation is used) and either delete the entire if "total" in translation: pass stanza, or replace it with a concise log statement that reports translation.get("translated") and translation.get("total") (using the existing logger or print mechanism in that function) so the branch is no longer empty.scripts/auto/add-or-update.sh (1)
46-52: Preferjqfor JSON payload construction over shell string interpolation.
ORGANIZATION,VERSION, andLANG_CODEare hardcoded constants now, but if they ever become parameters (e.g., from env vars or script args), a value containing"or\will produce malformed JSON. Sincejqis already a declared dependency, use it to build the payload safely.♻️ Example using jq
-curl -X POST "${WEBLATE_URL}/boost-endpoint/add-or-update/" \ - -H "Authorization: Token ${TOKEN}" \ - -H "Content-Type: application/json" \ - -d "{ - \"organization\": \"${ORGANIZATION}\", - \"submodules\": ${SUBMODULES}, - ${LANG_CODE_JSON}, - \"version\": \"${VERSION}\", - \"extensions\": ${EXTENSIONS} - }" \ - --max-time 5 \ - || true +PAYLOAD="$(jq -n \ + --arg org "${ORGANIZATION}" \ + --argjson submodules "${SUBMODULES}" \ + --arg lang "${LANG_CODE}" \ + --arg version "${VERSION}" \ + --argjson extensions "${EXTENSIONS}" \ + '{"organization":$org,"submodules":$submodules,"lang_code":($lang|if .=="" then null else . end),"version":$version,"extensions":$extensions}' +)" +curl -X POST "${WEBLATE_URL}/boost-endpoint/add-or-update/" \ + -H "Authorization: Token ${TOKEN}" \ + -H "Content-Type: application/json" \ + -d "${PAYLOAD}" \ + --max-time 5 \ + || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 46 - 52, The current curl -d JSON payload in add-or-update.sh is built with shell string interpolation and will break if ORGANIZATION, VERSION, LANG_CODE (LANG_CODE_JSON) or other variables contain quotes/backslashes; replace the literal -d "..." block with a jq-based construction: use jq -n with --arg for ORGANIZATION and VERSION and --argjson for SUBMODULES and EXTENSIONS (and include LANG_CODE if needed) to safely assemble the object, then pass its output to curl (e.g., curl --data-binary `@-` ...); update the payload creation that references ORGANIZATION, SUBMODULES, LANG_CODE_JSON, VERSION, and EXTENSIONS to use jq so JSON is correctly escaped and robust.scripts/auto/collect_boost_libraries_extensions.py (2)
29-29: Old-styletypingimports on Python 3.11+ codebase.
Dict,List,Optional,Set,Tuplecan be replaced with their built-in equivalents (dict,list,set,tuple,X | None) since the project requires Python 3.11+. This is consistent withservices.pywhich uses modern syntax.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` at line 29, Replace the old-style typing imports by removing "Dict, List, Optional, Set, Tuple" from the from typing import line and update all usages to use built-in generics and union syntax (dict, list, set, tuple, and X | None) — search for occurrences of Dict, List, Optional, Set, Tuple in this module and change e.g. Dict[K, V] -> dict[K, V], List[T] -> list[T], Set[T] -> set[T], Tuple[...] -> tuple[...], and Optional[T] -> T | None; also remove the now-unused names from the import to avoid linter errors.
139-176: No rate-limit handling for GitHub API calls.
get_doc_extensionsmakes 2 API calls per library (commit + tree). With ~160 Boost libraries, that's ~320 requests. The unauthenticated GitHub API rate limit is 60 requests/hour; authenticated is 5,000/hour. Without a token, this will almost certainly fail partway through. Consider adding a brief note in the help text or a warning when--tokenis not provided and--no-extensionsis not set.weblate/urls.py (1)
921-922: Consider gating behindINSTALLED_APPSor adding a namespace, consistent with other optional features.Other feature modules (billing, gitexport, legal) are conditionally included based on
settings.INSTALLED_APPS. The boost endpoint is added unconditionally. Additionally, the URL name"info"(fromboost_endpoint/urls.py) is generic and could collide with future routes.Consider either:
- Wrapping in an
if "weblate.boost_endpoint" in settings.INSTALLED_APPS:guard, or- Adding a namespace:
include(("weblate.boost_endpoint.urls", "weblate.boost_endpoint"), namespace="boost-endpoint")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/urls.py` around lines 921 - 922, The boost endpoint is currently always registered with path("boost-endpoint/", include("weblate.boost_endpoint.urls")), which can cause unwanted exposure and name collisions for the generic "info" route; either wrap that path registration in a conditional guard checking if "weblate.boost_endpoint" is present in settings.INSTALLED_APPS (i.e., only call path(...) when "weblate.boost_endpoint" in settings.INSTALLED_APPS) or change the include to pass the app tuple and a namespace (include(("weblate.boost_endpoint.urls", "weblate.boost_endpoint"), namespace="boost-endpoint")) so URL names in weblate.boost_endpoint.urls are isolated—apply one of these changes where path("boost-endpoint/", include(...)) is declared.scripts/auto/generate_boost_project_configs.py (3)
115-124:" and ".joinproduces awkward phrasing for 3+ extensions.For a list like
[".html", ".xml", ".rst"], this yields"HTML and XML and reStructuredText"instead of something like"HTML, XML, and reStructuredText". If these strings are user-facing (used in projectinstructions), consider comma-separated formatting with "and" only before the last item.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` around lines 115 - 124, The current extensions_to_fullname function produces repetitive " and " for 3+ items; update it to produce comma-separated names with a single "and" before the last item (Oxford comma style) by building the deduplicated list using EXTENSION_FULLNAMES (as already done), then: if the list has 0 items return empty string, if 1 return that name, if 2 return "A and B", otherwise join all but the last with ", " and append ", and " + last; keep the existing deduplication logic using seen and names.
174-174: Hardcoded language regex^zh_Hans$limits reuse.The language regex and
languageslist (line 176) are hardcoded for Simplified Chinese. If this script will be used for other languages in the future, consider parameterizing these via a CLI argument (similar to--branch).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` at line 174, The hardcoded "language_regex" value "^zh_Hans$" and the associated "languages" list should be made configurable via a CLI option; add a new argument (e.g., --language or --languages) to the script's argument parser (where args like --branch are defined), default it to "zh_Hans", and use that value to construct the "language_regex" (e.g., wrap the provided language in anchors or accept a raw regex flag) and to populate the "languages" list before building the config dict that contains "language_regex" and "languages"; update any references that currently expect the fixed values to read from the parsed argument instead.
139-139: Fragile string replacement for organization name.
replace("boostorg", "CppDigest")is a global substring replacement, so any occurrence of"boostorg"in the URL (e.g., a hypothetical repo named"boostorg-utils") would be corrupted. Since this is an SSH URL with a known structure (git@github.com:boostorg/...), a targeted replacement is safer.Suggested fix
- repo = https_to_git_ssh(repo_url).replace("boostorg", "CppDigest") + repo = https_to_git_ssh(repo_url).replace("github.com:boostorg/", "github.com:CppDigest/")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` at line 139, The current assignment to repo uses a global substring replacement repo = https_to_git_ssh(repo_url).replace("boostorg", "CppDigest") which can corrupt repository names that contain "boostorg" elsewhere; update the logic in the code that computes repo (after calling https_to_git_ssh) to only replace the organization path segment (e.g., replace the "boostorg/" path component or the "git@github.com:boostorg/" prefix) rather than doing a global .replace; use a targeted string or regex substitution that matches the SSH URL structure produced by https_to_git_ssh (for example match "git@github.com:boostorg/" or "/boostorg/" and swap that segment to "CppDigest/") so only the org name is changed and other occurrences remain untouched.scripts/auto/setup_project.py (1)
727-732:project_slug=Noneis hardcoded at the only call site, makingsave_component_config'sproject_slugparameter dead code.
generate_component_configalready embeds theproject_slugintocomponent_slug(Lines 222–224), so the separateproject_slugparameter insave_component_configis never exercised. Either remove the parameter fromsave_component_configand document that the caller must pre-compose the slug, or pass the actualproject_slugvalue here and havegenerate_component_configstop prefixing it — whichever is the intended design. The current split is confusing and the new function signature misleads future callers into thinking there is a distinct naming path that is never taken.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/setup_project.py` around lines 727 - 732, The project_slug parameter on save_component_config is dead because callers (e.g., the call passing project_slug=None) already have project_slug embedded into component_slug by generate_component_config; remove the unused parameter from save_component_config (and its signature, docstring, and any internal references), update all callers to pass only (component_config, output_dir, component_slug) and document in save_component_config that the caller must provide a pre-composed component_slug that includes the project prefix; alternatively, if you prefer the other design, instead pass the real project_slug from the caller here and remove prefixing in generate_component_config—pick one approach and make consistent changes to generate_component_config and all call sites so the parameter is actually used or eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/auto/create_component_and_add_translation.py`:
- Line 80: load_supported_formats builds the supported-format set from raw JSON
values but is_format_supported lowercases the lookup key, causing mismatches for
mixed-case entries; update load_supported_formats (the set comprehension that
returns {item.get("file_format") ...}) to normalize entries (strip and lower) so
the returned set contains normalized values (e.g., use .strip().lower() on
item.get("file_format")), ensuring consistency with is_format_supported's
normalization and preventing case-related misses.
- Around line 59-80: Update the load_supported_formats docstring to reflect the
actual JSON schema: mention that config is a list of objects with "file_format"
and "file_extensions" (not "extensions"), and note that this function only
consumes "file_format" (ignores file_extensions); update the example in the
docstring and any explanatory text inside the load_supported_formats function to
avoid misleading callers.
In `@scripts/auto/create_component.py`:
- Around line 53-54: The code uses urljoin(self.base_url, "/api/") which drops
any path prefix in base_url; change the api URL construction in the class where
base_url and api_url are set to build a relative path instead of using urljoin —
e.g. replace urljoin(self.base_url, "/api/") with a safe concatenation like
self.base_url.rstrip("/") + "/api/" (keep the existing self.base_url.rstrip("/")
usage) so that base_url paths (e.g., http://host/weblate) correctly produce
http://host/weblate/api/; update the assignment to api_url accordingly and
remove the urljoin call.
In `@scripts/auto/setup_project.py`:
- Around line 50-51: The subprocess.run call in clone_repository currently can
hang; add an explicit timeout (e.g., timeout=180) to the subprocess.run
invocation (the call that builds cmd and assigns result) and update error
handling to catch subprocess.TimeoutExpired and treat it as a clone failure
(log/raise the same failure path you use for non-zero results). Ensure you
propagate the TimeoutExpired exception or raise a consistent clone-failure
exception so CI will fail fast.
- Around line 198-202: truncate_component_name currently always uses fixed
TRUNCATE_NAME_HEAD/TRUNCATE_NAME_TAIL/TRUNCATE_NAME_SEP sizes so the truncated
result can exceed the provided max_len; update truncate_component_name to
compute head and tail lengths from the passed max_len (subtract
len(TRUNCATE_NAME_SEP) and allocate remaining chars between head and tail, e.g.
keep TRUNCATE_NAME_TAIL as a minimum and set head = max_len - sep_len - tail,
with a safe fallback if head < 0), then return name[:head] + TRUNCATE_NAME_SEP +
name[-tail:] so the returned string never exceeds max_len; adjust the function
docstring accordingly and reference TRUNCATE_NAME_HEAD, TRUNCATE_NAME_TAIL, and
TRUNCATE_NAME_SEP in the implementation.
In `@scripts/auto/upload_translations.py`:
- Around line 482-486: The printed message unconditionally appends "..." even
when all matching components are shown; update the print logic around the
computed avail (from sorted(matches.keys())[:5]) to only append "..." when
len(matches) > 5 — i.e., check len(matches) and construct the message
(referencing avail, matches and args.component) so the ellipsis is included only
if there are more than five matches.
In `@scripts/auto/weblate_supported_formats.json`:
- Around line 4-5: The JSON list contains duplicate file extensions across
formats (e.g., ".yml" used by "yaml" and "ruby-yaml", ".json" by "json" and
"webextension", ".properties" by "properties" and "gwt"), so update those
entries to disambiguate by adding a contextual field (for example
"requires_source": true/false or "context": "webextension" / "ruby" or a "note"
string) to each conflicting object (e.g., adjust the objects for "yaml",
"ruby-yaml", "json", "webextension", "properties", and "gwt") so consumers can
distinguish formats by extension plus context; ensure the new field is
consistently present for all conflicting entries and documented in the file top
comment or a brief header entry.
In `@weblate/boost_endpoint/services.py`:
- Line 646: The code calls self.clone_repository(submodule, temp_submodule_dir,
"master") which hardcodes "master" and breaks repos using "main"; update the
call/site and clone_repository to accept an optional branch (e.g., branch=None)
and implement fallback logic: first try cloning with the configured/default
branch if provided, otherwise attempt "master", then "main", and finally clone
without -b to let the remote default prevail; ensure the call at this location
passes None or the resolved branch and that clone_repository handles the
attempted fallbacks and returns a clear success/failure result for
temp_submodule_dir and submodule.
- Line 523: The call to time.sleep(ADD_TRANSLATION_SECONDS) inside the
per-component flow blocks the request thread after each successful
add_new_language and causes huge cumulative delays; remove the in-loop sleep and
instead either (a) move a single sleep of ADD_TRANSLATION_SECONDS to run once
after processing all components (batch-settle), (b) make the wait configurable
via a setting/argument so callers can set it to 0 in fast paths, or (c) delegate
the settle to a background thread/task (so add_new_language returns immediately)
— update the code around the add_new_language caller and the time.sleep
invocation to implement one of these approaches and ensure tests/flow that
expect eventual consistency still run the settle behavior exactly once per
batch.
- Around line 236-238: The URL for project_web is built with the hyphenated slug
but Boost.org paths use the original submodule name (with underscores), so
change the construction of project_web to use the original submodule identifier
(the variable submodule) instead of slug (or otherwise restore underscores),
leaving project_slug as-is; update the assignment of project_web so it
references submodule (or submodule.replace('-', '_') if you must start from
slug) to produce .../libs/<submodule>/doc/html/ and keep
project_name/project_slug logic unchanged.
---
Outside diff comments:
In `@scripts/auto/add_translation.py`:
- Around line 471-474: The two prints in the token-check exit path (the if not
token block in add_translation.py) are missing flush=True; update the two print
calls that output "[ERROR] API token required" and "[INFO] Provide via --token,
--web-config, or WEBLATE_TOKEN" to include flush=True so their output is flushed
before sys.exit(1), keeping behavior consistent with other error/info prints in
this script.
In `@scripts/auto/setup_project.py`:
- Around line 139-170: generate_component_slug currently returns unbounded slugs
which can exceed Weblate's 100-char Component.slug limit when combined with
project prefixes or dedup counters; add a constant MAX_COMPONENT_SLUG_LENGTH =
100 and enforce it in generate_component_slug by truncating the generated slug
similar to truncate_component_name(): if slug length exceeds
MAX_COMPONENT_SLUG_LENGTH, preserve the tail segment (rightmost characters) so
directory-unique parts remain, and ensure any appended dedup counters (used
elsewhere) will still fit by leaving room for a small suffix; update
generate_component_slug to apply this truncation before returning.
---
Duplicate comments:
In `@scripts/auto/add-or-update.sh`:
- Around line 17-22: The WEBLATE_URL extracted into the WEBLATE_URL variable is
not validated like TOKEN, so add the same presence check after WEBLATE_URL="$(jq
-r '.weblate_url' "${WEB_JSON}")": verify WEBLATE_URL is not empty and not
"null" (e.g., test [[ -z "${WEBLATE_URL}" || "${WEBLATE_URL}" == "null" ]]) and
if it fails print an error mentioning web.json and exit non‑zero; update the
error message to clearly state the missing/invalid weblate_url and reuse the
same exit behavior as the TOKEN check.
- Around line 43-54: The curl call in add-or-update.sh is using the wrong auth
scheme and is swallowing failures; change the header from "Authorization: Bearer
${TOKEN}" to "Authorization: Token ${TOKEN}" and remove the trailing "|| true"
so curl will return a non-zero exit status on failure (do not silence errors),
keeping the existing variables (WEBLATE_URL, TOKEN, ORGANIZATION, SUBMODULES,
LANG_CODE_JSON, VERSION, EXTENSIONS) as-is and matching the Token scheme used in
add_translation.py.
In `@scripts/auto/setup_project.py`:
- Around line 791-793: The current logic treats any non-zero return from
run_one_config as a failure; change it so that only hard failures set exit_code:
check the result from run_one_config and if result == 2 (graceful clone-skip per
run_one_config docstring) do nothing, otherwise if result != 0 set exit_code =
1; update the block referencing run_one_config and exit_code accordingly so
skipped configs don't mark the script as failed.
In `@weblate/boost_endpoint/serializers.py`:
- Around line 34-35: The serializer currently defines extensions =
serializers.ListField(child=serializers.CharField(allow_blank=True)), which
allows empty strings like [""] to pass; change the child CharField to disallow
blank entries (e.g., set allow_blank=False or remove the allow_blank option) or
add a non-empty validator so the ListField rejects "" values at the serializer
boundary (update the extensions ListField / child serializers.CharField
accordingly).
In `@weblate/boost_endpoint/services.py`:
- Around line 628-648: process_submodule currently assumes self.temp_dir is set
and will TypeError when self.temp_dir is None; add a lazy init/check at the
start of process_submodule to ensure self.temp_dir is a valid path (create a
temp dir if None) before calling os.path.join — for example, check self.temp_dir
and assign/create a directory (and set self.temp_dir) or raise a clear error;
update references to temp_submodule_dir (the os.path.join call) so it uses the
guaranteed directory and ensure os.makedirs(temp_submodule_dir, exist_ok=True)
still runs.
- Around line 684-696: The loop is iterating all project components via
project.component_set.all() causing unrelated components to be considered;
restrict iteration to only components created by this boost scan by filtering on
the computed prefix (the variable prefix) before checking wanted_slugs. Update
the loop source from project.component_set.all() to query only components whose
slug starts with prefix (e.g., filter by slug__startswith using prefix) and then
keep the existing check against wanted_slugs and component.is_glossary before
calling _delete_component_and_commit_removal; use
truncate_component_slug/wanted_slugs as-is so deletion logic still matches
truncated slugs.
- Around line 640-643: The temp_submodule_dir construction in the method that
creates submodule temp dirs (using temp_submodule_dir =
os.path.join(self.temp_dir, submodule) and os.makedirs(...)) is vulnerable to
path traversal; before creating the directory resolve and validate the path:
compute the absolute/real resolved path for both self.temp_dir and the candidate
temp_submodule_dir (e.g., via pathlib.Path(self.temp_dir).resolve() and
.resolve() on the joined path), ensure the resolved candidate startswith the
resolved temp_dir (or is a subpath), and reject or sanitize the input (raise an
exception) if it does not, then only call os.makedirs on the validated path.
Ensure you also handle absolute submodule values and symlinks by using resolve()
rather than simple normpath.
- Around line 355-393: The transaction block still encloses network/disk I/O:
move calls to _sync_component_for_translation out of the transaction.atomic()
scope to avoid holding DB locks during long I/O. Keep only the minimal DB writes
inside the atomic block (the get_or_create call, setting acting_user and
push_branch/save, and any post_create DB-only ops), set a local flag (e.g.,
need_sync = True) or capture (component, created) before exiting the atomic
block, then after the atomic block call
self._sync_component_for_translation(component, request, created=created);
likewise ensure add_language_to_component remains outside as already done.
Ensure component.post_create(user, origin="boost_endpoint") is only invoked if
it is safe inside the transaction or else defer it similarly.
In `@weblate/boost_endpoint/views.py`:
- Around line 29-32: The AddOrUpdateView currently uses permission_classes =
(IsAuthenticated,) which allows any logged-in user to perform privileged
operations; change this to a stricter permission such as permission_classes =
(IsAdminUser,) or a custom permission (e.g., IsBoostAdmin) that enforces
admin/owner rights; update imports to bring in
rest_framework.permissions.IsAdminUser or add/implement the custom permission
class and reference it in AddOrUpdateView so only authorized users can
create/modify projects and components.
- Around line 67-72: The synchronous call to service.process_all(...) in the
view risks long blocking, gateway timeouts and raw 500s; wrap the call in a
try/except to return a structured error Response (log the exception via
logger.exception and return a JSON error with status 500) and—preferably—offload
the long-running work to a Celery task (create a task like process_all_async in
tasks.py that invokes service.process_all with the same arguments and have the
view enqueue that task and return Response({}, status=202)); update the view to
use service.process_all (or the new task) and ensure any calls to
do_update/add_new_language are executed inside the task rather than the HTTP
request.
---
Nitpick comments:
In `@scripts/auto/add_translation.py`:
- Around line 272-274: The no-op conditional checking "if 'total' in
translation: pass" should be removed or turned into meaningful logging: locate
the block in add_translation.py (the function that processes the translation
dict where the variable translation is used) and either delete the entire if
"total" in translation: pass stanza, or replace it with a concise log statement
that reports translation.get("translated") and translation.get("total") (using
the existing logger or print mechanism in that function) so the branch is no
longer empty.
In `@scripts/auto/add-or-update.sh`:
- Around line 46-52: The current curl -d JSON payload in add-or-update.sh is
built with shell string interpolation and will break if ORGANIZATION, VERSION,
LANG_CODE (LANG_CODE_JSON) or other variables contain quotes/backslashes;
replace the literal -d "..." block with a jq-based construction: use jq -n with
--arg for ORGANIZATION and VERSION and --argjson for SUBMODULES and EXTENSIONS
(and include LANG_CODE if needed) to safely assemble the object, then pass its
output to curl (e.g., curl --data-binary `@-` ...); update the payload creation
that references ORGANIZATION, SUBMODULES, LANG_CODE_JSON, VERSION, and
EXTENSIONS to use jq so JSON is correctly escaped and robust.
In `@scripts/auto/collect_boost_libraries_extensions.py`:
- Line 29: Replace the old-style typing imports by removing "Dict, List,
Optional, Set, Tuple" from the from typing import line and update all usages to
use built-in generics and union syntax (dict, list, set, tuple, and X | None) —
search for occurrences of Dict, List, Optional, Set, Tuple in this module and
change e.g. Dict[K, V] -> dict[K, V], List[T] -> list[T], Set[T] -> set[T],
Tuple[...] -> tuple[...], and Optional[T] -> T | None; also remove the
now-unused names from the import to avoid linter errors.
In `@scripts/auto/create_component_and_add_translation.py`:
- Around line 87-88: Document that is_format_supported treats file_format ==
"auto" as intentionally unsupported and that main will skip and exit 0 when
detection is automatic; update the docstring for the is_format_supported
function (and/or the module-level docstring in
create_component_and_add_translation.py) to state this behavior and rationale,
and add/adjust a unit test to assert is_format_supported("auto") returns False
and that main emits the "Skipping" warning and exits without error when
file_format is "auto".
- Around line 64-69: Replace the hardcoded default_formats set with a runtime
load from the weblate_supported_formats.json file: at module import attempt to
open and parse that JSON and build default_formats from its file_format values
(falling back to the existing hardcoded set only if the load/parse fails), and
add a warning log when falling back; keep the symbol name default_formats so
callers are unchanged. Alternatively, instead of changing runtime behavior, add
a unit test that loads weblate_supported_formats.json and asserts its
file_format keys exactly match the hardcoded default_formats to catch drift in
CI.
- Around line 382-393: The current branch in
create_component_and_add_translation.py that handles unsupported formats (when
is_format_supported(config, supported_formats) is False) returns 0 which hides
skipped runs from callers; change this to return a distinct non-zero exit code
(e.g., return 2) to indicate "skipped/unsupported format" so CI and callers can
distinguish success from skip, or alternatively add a CLI flag
(--allow-unsupported) that, when set, preserves the existing return 0 while
default behavior returns 2; update the block that references config,
comp.get("file_format") and prints formats_path to use the new exit behavior and
document the choice with a brief comment.
In `@scripts/auto/create_component.py`:
- Line 645: The call to creator.get_statistics(project_slug, component_slug)
currently discards its return value causing an unnecessary API request; either
remove that call from create_component.py or capture its return and surface
useful information (e.g., store stats = creator.get_statistics(project_slug,
component_slug) and print or log summary fields) so the network call has a
purpose; update references to project_slug and component_slug accordingly and
ensure any printed output is concise and user-friendly.
- Around line 496-506: The _prompt function currently retries required-field
input by recursively calling itself which can hit Python's recursion limit;
replace the recursive retry in _prompt with an iterative loop: keep prompting
inside a while True loop (using the same input prompts and default handling) and
break/return only when a non-empty value is provided (or default is accepted),
preserving the existing message/default/required semantics and references to
_prompt, message, default, and required.
- Around line 746-751: validate_config currently only checks weblate_url and
api_token but not the 'project' and 'component' keys used later in
setup_project_and_component; update validate_config to also verify
config.get('project') and config.get('component') are present and non-empty,
print a clear error message including which key is missing (e.g., "[ERROR]
Missing 'project' in config"), and return False on missing values so main() does
not proceed and trigger a KeyError in setup_project_and_component.
In `@scripts/auto/generate_boost_project_configs.py`:
- Around line 115-124: The current extensions_to_fullname function produces
repetitive " and " for 3+ items; update it to produce comma-separated names with
a single "and" before the last item (Oxford comma style) by building the
deduplicated list using EXTENSION_FULLNAMES (as already done), then: if the list
has 0 items return empty string, if 1 return that name, if 2 return "A and B",
otherwise join all but the last with ", " and append ", and " + last; keep the
existing deduplication logic using seen and names.
- Line 174: The hardcoded "language_regex" value "^zh_Hans$" and the associated
"languages" list should be made configurable via a CLI option; add a new
argument (e.g., --language or --languages) to the script's argument parser
(where args like --branch are defined), default it to "zh_Hans", and use that
value to construct the "language_regex" (e.g., wrap the provided language in
anchors or accept a raw regex flag) and to populate the "languages" list before
building the config dict that contains "language_regex" and "languages"; update
any references that currently expect the fixed values to read from the parsed
argument instead.
- Line 139: The current assignment to repo uses a global substring replacement
repo = https_to_git_ssh(repo_url).replace("boostorg", "CppDigest") which can
corrupt repository names that contain "boostorg" elsewhere; update the logic in
the code that computes repo (after calling https_to_git_ssh) to only replace the
organization path segment (e.g., replace the "boostorg/" path component or the
"git@github.com:boostorg/" prefix) rather than doing a global .replace; use a
targeted string or regex substitution that matches the SSH URL structure
produced by https_to_git_ssh (for example match "git@github.com:boostorg/" or
"/boostorg/" and swap that segment to "CppDigest/") so only the org name is
changed and other occurrences remain untouched.
In `@scripts/auto/setup_project.py`:
- Around line 727-732: The project_slug parameter on save_component_config is
dead because callers (e.g., the call passing project_slug=None) already have
project_slug embedded into component_slug by generate_component_config; remove
the unused parameter from save_component_config (and its signature, docstring,
and any internal references), update all callers to pass only (component_config,
output_dir, component_slug) and document in save_component_config that the
caller must provide a pre-composed component_slug that includes the project
prefix; alternatively, if you prefer the other design, instead pass the real
project_slug from the caller here and remove prefixing in
generate_component_config—pick one approach and make consistent changes to
generate_component_config and all call sites so the parameter is actually used
or eliminated.
In `@scripts/auto/upload_translations.py`:
- Around line 177-181: The multi-line print call that starts with print( in
upload_translations.py has its continuation lines dedented, which is visually
misleading; re-indent the continuation lines (the f-string lines and
flush=True,) to align under the opening parenthesis of the print call so they
are at least in-column with print( — i.e., move the lines beginning with f" ✗
Failed..." and f"Not found=..." and the flush=True, argument to the same
indentation level as the first line after the opening paren for the print call
to match typical formatter/PEP8 expectations.
In `@weblate/urls.py`:
- Around line 921-922: The boost endpoint is currently always registered with
path("boost-endpoint/", include("weblate.boost_endpoint.urls")), which can cause
unwanted exposure and name collisions for the generic "info" route; either wrap
that path registration in a conditional guard checking if
"weblate.boost_endpoint" is present in settings.INSTALLED_APPS (i.e., only call
path(...) when "weblate.boost_endpoint" in settings.INSTALLED_APPS) or change
the include to pass the app tuple and a namespace
(include(("weblate.boost_endpoint.urls", "weblate.boost_endpoint"),
namespace="boost-endpoint")) so URL names in weblate.boost_endpoint.urls are
isolated—apply one of these changes where path("boost-endpoint/", include(...))
is declared.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (12)
.gitignore (1)
57-57: Global ignore pattern forgettextization.failed.po— previously confirmed intentional.This unanchored entry suppresses
gettextization.failed.poat any depth in the repository, which was flagged in a prior review and explicitly addressed/confirmed by the team.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore at line 57, The unanchored .gitignore entry "gettextization.failed.po" currently ignores that file at any depth; to make the intent explicit either anchor it to the repo root by replacing the line with "/gettextization.failed.po" (so only root-level files are ignored) or keep the global pattern but add a nearby comment clarifying that the global ignore is intentional (e.g., "# intentionally ignore gettextization.failed.po at any depth - confirmed by team"); update the .gitignore entry accordingly.scripts/auto/setup_project.py (1)
791-796: Skipped configs (return code 2) are still treated as failures in batch mode.
run_one_configreturns2for gracefully skipped clones (line 574), yet line 795 treats any non-zero result as a failure. This contradicts the intent ofskip_on_clone_failure=True.Proposed fix
result = run_one_config(args, config_path_str, skip_on_clone_failure=True) - if result != 0: + if result == 1: exit_code = 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/setup_project.py` around lines 791 - 796, The loop treating any non-zero return from run_one_config as failure is incorrect because run_one_config can return 2 to indicate a graceful skip when skip_on_clone_failure=True; update the check in the loop that iterates over config_files to only set exit_code = 1 when result != 0 and result != 2 (or explicitly treat result == 2 as a skipped config), leaving exit_code unchanged on skips so batch mode does not mark skipped clones as failures; reference run_one_config and the loop variables i/config_path_str to locate the offending logic to adjust.scripts/auto/add-or-update.sh (2)
43-44:⚠️ Potential issue | 🔴 Critical
Authorization: Bearerwill be rejected — Weblate usesTokenscheme.DRF's
TokenAuthentication(used by every other script in this repo, e.g.,add_translation.pyline 91 anddelete_all_components.pyline 52) expectsAuthorization: Token <key>. SendingBearerresults in a 401/403 that is silently swallowed by|| true.🐛 Proposed fix
- -H "Authorization: Bearer ${TOKEN}" \ + -H "Authorization: Token ${TOKEN}" \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 43 - 44, The curl call in add-or-update.sh uses the wrong auth scheme and may hide failures; change the header from "Authorization: Bearer ${TOKEN}" to "Authorization: Token ${TOKEN}" so it matches DRF TokenAuthentication (like other scripts), and ensure the curl failure isn't silently ignored (remove or avoid the trailing "|| true" around the curl invocation) so non-2xx responses return a non-zero exit status; look for the curl invocation that references WEBLATE_URL and TOKEN to apply these changes.
17-22:⚠️ Potential issue | 🟡 Minor
WEBLATE_URLis not validated after extraction.
TOKENis checked for empty/"null"butWEBLATE_URLis not, so a missing ornullweblate_urlinweb.jsonsilently passes the guard and produces an invalid URL in thecurlcall.🛡️ Proposed fix
WEBLATE_URL="$(jq -r '.weblate_url' "${WEB_JSON}")" TOKEN="$(jq -r '.api_token' "${WEB_JSON}")" +if [[ -z "${WEBLATE_URL}" || "${WEBLATE_URL}" == "null" ]]; then + echo "Error: weblate_url not set in ${WEB_JSON}." >&2 + exit 1 +fi if [[ -z "${TOKEN}" || "${TOKEN}" == "null" ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 17 - 22, The script extracts WEBLATE_URL from WEB_JSON but never validates it, so a missing or "null" weblate_url will produce a bad curl URL; add the same validation applied to TOKEN: after WEBLATE_URL="$(jq -r '.weblate_url' "${WEB_JSON}")" check if WEBLATE_URL is empty or equals "null" (e.g. [[ -z "${WEBLATE_URL}" || "${WEBLATE_URL}" == "null" ]]), print an error to stderr referencing WEB_JSON and exit 1, ensuring subsequent uses (the curl call) always have a valid URL.scripts/auto/create_component.py (1)
53-54:urljoinwith an absolute path still drops the base URL's path prefix.This issue was flagged in a previous review and remains unresolved.
urljoin(self.base_url, "/api/")replaces the entire path component ofbase_url— a deployment athttp://host/weblatewould resolve tohttp://host/api/instead ofhttp://host/weblate/api/.🐛 Proposed fix
- self.api_url = urljoin(self.base_url, "/api/") + self.api_url = self.base_url + "/api/"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` around lines 53 - 54, The current use of urljoin(self.base_url, "/api/") discards any path prefix in self.base_url (e.g. "http://host/weblate" becomes "http://host/api/"); change the construction of self.api_url to preserve base path by avoiding a leading slash and ensuring exactly one separator — e.g. compute self.api_url by concatenating a normalized base (self.base_url.rstrip("/")) with "/api/" or call urljoin(self.base_url.rstrip("/") + "/", "api/") so that self.base_url and its path prefix are retained when creating self.api_url.weblate/boost_endpoint/services.py (4)
628-643: Two previously reported critical/major issues remain unresolved inprocess_submodule.
TypeErrorwhen called directly (past critical):self.temp_dirisNoneby default;os.path.join(None, submodule)at line 642 raisesTypeErrorifprocess_submoduleis called outsideprocess_all. A lazy-init guard is still missing.Path traversal via untrusted
submodulename (past major):submoduleoriginates from external API input and is used directly inos.path.join(self.temp_dir, submodule)at line 642. A name like"../../../etc"escapestemp_dir. No bounds check is present.🛡️ Proposed combined fix
def process_submodule(self, submodule: str, user=None, request=None) -> dict[str, Any]: """Process a single submodule: clone, scan, create/update components.""" + if self.temp_dir is None: + self.temp_dir = tempfile.mkdtemp(prefix="boost_endpoint_") + result = { "submodule": submodule, "success": False, "components_created": 0, "components_updated": 0, "components_deleted": 0, "errors": [], } # Create temp directory for this submodule temp_submodule_dir = os.path.join(self.temp_dir, submodule) + # Guard against path traversal + if not os.path.realpath(temp_submodule_dir).startswith( + os.path.realpath(self.temp_dir) + os.sep + ): + result["errors"].append(f"Invalid submodule name: {submodule!r}") + return result os.makedirs(temp_submodule_dir, exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 628 - 643, process_submodule is vulnerable because self.temp_dir can be None and submodule is used unchecked in os.path.join, allowing TypeError or path traversal; fix by lazily initializing self.temp_dir if None (e.g., set to a secure temp dir) before building temp_submodule_dir, sanitize/normalize the incoming submodule (e.g., reject path separators or use os.path.basename) and construct the candidate path with os.path.abspath(os.path.join(self.temp_dir, sanitized_submodule)); then verify the result is inside self.temp_dir (starts with the abs temp_dir path) before calling os.makedirs; update the flow around temp_submodule_dir, self.temp_dir, and os.path.join to perform these checks in process_submodule.
359-393: Network I/O (do_update,create_translations_immediate) still runs insidetransaction.atomic(), holding DB locks.
_sync_component_for_translation(called at lines 375 and 388) performs git operations and long-running DB reads while thetransaction.atomic()block is open. Althoughtime.sleep(SYNC_SETTLE_SECONDS)was removed (good), the git I/O itself can take tens of seconds for large repos, keeping the connection/lock open and risking connection pool exhaustion.The
add_language_to_componentcall at line 390-391 is already correctly placed outside the atomic block. The sync calls at lines 375 and 388 should move there as well.🐛 Proposed restructure
with transaction.atomic(): component, created = Component.objects.get_or_create(...) if user: component.acting_user = user if created: LOGGER.info("Created component: %s", component.name) if user: component.post_create(user, origin="boost_endpoint") - self._sync_component_for_translation(component, request, created=True) else: LOGGER.info("Component exists: %s", component.name) update_fields = [] if component.push_branch != push_branch: component.push_branch = push_branch update_fields.append("push_branch") if update_fields: component.save(update_fields=update_fields) - self._sync_component_for_translation(component, request, created=False) # Add language when lang_code is set; skip when None. + self._sync_component_for_translation(component, request, created=created) if self.lang_code is not None: self.add_language_to_component(component, request)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 359 - 393, The component synchronization (_sync_component_for_translation) is doing long-running git/IO while inside transaction.atomic(), holding DB locks; move both calls to _sync_component_for_translation (the one executed when created and the one executed when existing) to after the transaction.atomic() block—just like add_language_to_component is already outside—so the DB create/get and any component.save/update happen inside the atomic block but the long-running _sync_component_for_translation calls run after the transaction completes; ensure component.post_create(user, origin="boost_endpoint") still runs inside the atomic if needed, but call _sync_component_for_translation(component, request, created=...) and any related network/git operations only after leaving the atomic context.
682-696: Component deletion scope is still too broad — unfilterdproject.component_set.all()deletes any non-glossary component not in the current scan.
prefixis computed at line 684 but never used to filter the queryset. Any manually created component, or a component from a different submodule accidentally sharing the same project, will be deleted if its slug is not inwanted_slugs.🐛 Proposed fix
prefix = f"boost-{_submodule_slug(submodule)}-documentation-" wanted_slugs = { truncate_component_slug(f"{prefix}{c['component_slug']}") for c in configs } - for component in project.component_set.all(): + for component in project.component_set.filter(slug__startswith=prefix): if component.slug not in wanted_slugs and not component.is_glossary:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 682 - 696, The deletion loop is iterating over project.component_set.all(), which risks deleting unrelated components; restrict the queryset to only components matching the computed prefix so only boost submodule components are considered. Change the iteration to filter project.component_set by slug starting with prefix (e.g., use slug__startswith or check component.slug.startswith(prefix) on a filtered queryset) and still skip component.is_glossary; keep the same delete call _delete_component_and_commit_removal and the existing error handling/result updates.
379-385:⚠️ Potential issue | 🟡 MinorThe code comment promises
branch = "local"enforcement but the check is absent.The inline comment reads "Ensure branch is 'local' (avoid 'fatal: no such branch: 'master'' when remote has no master/main)" yet the
update_fieldsblock only conditionally updatespush_branch. If an existing component was manually set tobranch = "master", it will not be corrected, causing git pull failures.🐛 Proposed fix
update_fields = [] + if component.branch != "local": + component.branch = "local" + update_fields.append("branch") if component.push_branch != push_branch: component.push_branch = push_branch update_fields.append("push_branch")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 379 - 385, The comment says we must enforce branch = "local" but only push_branch is being updated; modify the logic around component to also check component.branch (the attribute named branch) and if it's not "local" set component.branch = "local" and append "branch" to update_fields so component.save(update_fields=update_fields) persists both changes; keep the existing push_branch check (component.push_branch != push_branch) and ensure update_fields contains all changed field names before calling component.save.weblate/boost_endpoint/views.py (2)
29-32:IsAuthenticatedstill allows any logged-in user to create/update projects and components.This was flagged in a previous review and remains unresolved. Any authenticated user can trigger repository cloning and Weblate project/component creation — operations that should require admin or manager-level privileges.
🔒 Proposed fix
-from rest_framework.permissions import IsAuthenticated +from rest_framework.permissions import IsAdminUser class AddOrUpdateView(APIView): - permission_classes = (IsAuthenticated,) + permission_classes = (IsAdminUser,)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/views.py` around lines 29 - 32, AddOrUpdateView currently uses permission_classes = (IsAuthenticated,) which permits any logged-in user to perform sensitive operations; change this to enforce admin/manager-level access by replacing or augmenting the permission with a stricter permission (e.g., use IsAdminUser or a custom permission like IsManagerOrAdmin) and apply it to AddOrUpdateView (and any other views that perform cloning/project/component creation). Ensure the new permission class is imported and, if creating a custom permission, implement it (e.g., class IsManagerOrAdmin) to check user.is_staff or the relevant manager flag, then set permission_classes = (IsManagerOrAdmin,) on AddOrUpdateView so only admins/managers can execute these actions.
67-72: Synchronousprocess_all()on the request thread — no error handling.This was flagged in a previous review and remains unresolved.
service.process_all()clones repositories, scans files, and creates/updates components; running it synchronously risks HTTP/proxy timeouts and exposes any unhandled exception as a raw 500.🛡️ Minimum fix: wrap in try/except
- results = service.process_all( - submodules, user=request.user, request=request - ) - - return Response(results, status=status.HTTP_200_OK) + try: + results = service.process_all( + submodules, user=request.user, request=request + ) + except Exception as exc: + return Response( + {"error": str(exc)}, + status=status.HTTP_500_INTERNAL_SERVER_ERROR, + ) + + return Response(results, status=status.HTTP_200_OK)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/views.py` around lines 67 - 72, Wrap the synchronous call to service.process_all(...) in a try/except block inside the view: call service.process_all(submodules, user=request.user, request=request) in try, return Response(results, status=HTTP_200_OK) on success, and in except Exception as e log the exception (use logger.exception or similar) and return a safe error Response (e.g., {"detail":"processing error"} with status HTTP_500_INTERNAL_SERVER_ERROR) so unhandled exceptions don't propagate as raw 500s; keep references to service.process_all, request.user, and Response to locate the code.weblate/boost_endpoint/serializers.py (1)
34-43:allow_blank=Trueonextensionschild still permits empty-string entries.This was flagged in a previous review and remains unresolved.
[""]passes validation and would produce unexpected behaviour in the file-scanning logic.🛡️ Proposed fix
extensions = serializers.ListField( - child=serializers.CharField(allow_blank=True), + child=serializers.CharField(), required=False,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/serializers.py` around lines 34 - 43, The ListField "extensions" currently uses child=serializers.CharField(allow_blank=True) which permits empty-string entries like [""] — change the child to disallow empty/blank values (set allow_blank=False) and add a small validator on the child that rejects empty or whitespace-only strings (and optionally enforces a minimal format such as starting with '.' if desired) so invalid entries are rejected during serializer validation; refer to the "extensions" ListField and its child CharField in serializers.py to implement this fix.
🧹 Nitpick comments (11)
scripts/auto/generate_boost_project_configs.py (1)
115-124: Consider a comma-separated join for 3+ extension names.When
ext_listhas three or more distinct names," and ".join(names)produces awkward output like"HTML and XML and Markdown". A conventional English list would be"HTML, XML, and Markdown".✏️ Suggested improvement
- return " and ".join(names) + if len(names) <= 2: + return " and ".join(names) + return ", ".join(names[:-1]) + ", and " + names[-1]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` around lines 115 - 124, The function extensions_to_fullname should emit a conventional English list (use commas and an Oxford comma for 3+ items) instead of always joining with " and "; update extensions_to_fullname to build names from EXTENSION_FULLNAMES as it does now, then: return the single name if names has length 1, return " and ".join(names) for length 2, and for length >=3 return ", ".join(names[:-1]) + ", and " + names[-1]; keep the existing uniqueness logic over ext_list and seen.scripts/auto/setup_project.py (1)
86-91: Substring-based exclude matching may over-exclude directories.
pattern in rel_rootis a plain substring check, so a pattern like"test"would also exclude directories named"testing","latest", or"contest". Consider usingfnmatchor matching against individual path components instead of the whole relative path string.Proposed fix (match per path component)
- rel_root = os.path.relpath(root, directory) - skip = False - for pattern in exclude_patterns: - if pattern in rel_root or rel_root.startswith(pattern): - skip = True - break + rel_root = os.path.relpath(root, directory) + rel_parts = Path(rel_root).parts + skip = any(pattern in rel_parts for pattern in exclude_patterns)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/setup_project.py` around lines 86 - 91, The current exclusion loop uses a substring check ("pattern in rel_root") which over-matches; change the logic in the loop that iterates exclude_patterns to either use fnmatch.fnmatchcase()/fnmatch.fnmatch on rel_root or split rel_root into path components (e.g., rel_root.split(os.sep)) and test equality/startswith against each component so patterns only match whole path segments; update the loop that sets skip (the variables rel_root, exclude_patterns, pattern, and skip) to use one of these approaches and ensure tests for prefix matching use component-based startswith or proper glob matching.scripts/auto/upload_translations.py (1)
177-182: Indentation misalignment inside theprint()call.
print(is at column 28 but its arguments and closing)are at columns 20 and 16, making the closing paren visually align with the enclosingwith open(...)block rather than with theprint(that owns it.♻️ Realign continuation lines
else: - print( - f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " - f"Not found={not_found}", - flush=True, - ) + print( + f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " + f"Not found={not_found}", + flush=True, + ) return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/upload_translations.py` around lines 177 - 182, The print() continuation lines inside upload_translations.py are misaligned: the opening "print(" is indented differently than its argument lines and the closing ")"; realign the multi-line print so its arguments and the closing parenthesis line up under the print( token (i.e., make the f-string lines and the closing parenthesis use the same indentation column as "print(") and ensure the subsequent "return False" keeps its original block-level indentation; look for the print block around the failing-reporting logic (the f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, f"Not found={not_found}" lines) and adjust spacing so the continuation lines visually belong to the print call.weblate/boost_endpoint/urls.py (1)
9-12: Consider addingapp_namefor URL namespace isolation.Without a namespace, the names
"info"and"add-or-update"are global and could silently collide with future URL names elsewhere in the project.♻️ Add `app_name` to the URL module
from weblate.boost_endpoint.views import AddOrUpdateView, BoostEndpointInfo +app_name = "boost_endpoint" + urlpatterns = [ path("", BoostEndpointInfo.as_view(), name="info"), path("add-or-update/", AddOrUpdateView.as_view(), name="add-or-update"), ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/urls.py` around lines 9 - 12, Add an application namespace by declaring an app_name (e.g., app_name = "boost_endpoint") at the top of this urls module so the URL names for BoostEndpointInfo.as_view() ("info") and AddOrUpdateView.as_view() ("add-or-update") are not global; update any reverse()/url references elsewhere to use the new namespace (e.g., "boost_endpoint:info" or "boost_endpoint:add-or-update").scripts/auto/add_translation.py (1)
273-274: Remove no-opifblock (dead code).The
passwith a comment stating the logic is "optional" contributes nothing. Remove the block; the comment alone in the function body (if needed at all) is sufficient.♻️ Remove dead code
- if "total" in translation: - pass # Optional: log translation.get("translated")/translation.get("total")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add_translation.py` around lines 273 - 274, Remove the no-op if block in add_translation.py that checks "if \"total\" in translation:" and only contains a pass; delete the entire if block (you can keep or move the comment elsewhere if you want a note) so the function no longer contains dead code related to the translation["total"] check in that spot (search for the if "total" in translation: statement to locate it).scripts/auto/add-or-update.sh (1)
40-54: Fire-and-forget via--max-time 5provides no delivery guarantee.
AddOrUpdateView.post()is synchronous — it processes the request and returns a response. If processing exceeds 5 s, the client disconnects before seeing any result (error or success), and the 401/validation failure from theBearerissue above is also invisible. Consider either:
- Waiting for an initial "accepted" HTTP 2xx before timing out (using a longer or no
--max-time), or- Moving the long-running work to a background task and returning 202 Accepted immediately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 40 - 54, The script currently uses curl --max-time 5 which can cut off the synchronous AddOrUpdateView.post() before it returns, hiding auth/validation errors; either remove/extend --max-time and let curl wait for the controller response so the caller sees success/failure, or refactor AddOrUpdateView.post() to enqueue the long-running work to a background worker (e.g., Celery/RQ/async task) and immediately return an HTTP 202 Accepted; update the script to POST and check the 2xx/202 response (or remove the forced timeout) and update AddOrUpdateView.post() to call the enqueue function (unique symbol: AddOrUpdateView.post) and return 202 if you choose the background approach.weblate/boost_endpoint/services.py (1)
62-73:truncate_component_nameandtruncate_component_slugignore theirmax_lenparameter in the truncation formula.Both functions use the
max_lenparameter only for the initiallen(name) <= max_lenguard, but then fall back to hardcoded constants (TRUNCATE_NAME_HEAD,TRUNCATE_NAME_TAIL, etc.) for the actual truncation. If either is called with amax_lensmaller than the constant sum (e.g.,truncate_component_slug("...", max_len=50)returns 100 chars), the result silently violates the contract.In practice, both are always invoked without
max_lenin this file, so there is no active bug — but the misleading signature is a maintenance hazard.♻️ Proposed fix (slug shown, apply same pattern to name)
def truncate_component_slug(slug: str, max_len: int = MAX_COMPONENT_SLUG_LENGTH) -> str: """Truncate component slug to max_len. If over limit: first 64 + '-' + last 35.""" if len(slug) <= max_len: return slug - return slug[:TRUNCATE_SLUG_HEAD] + TRUNCATE_SLUG_SEP + slug[-TRUNCATE_SLUG_TAIL:] + # Keep head + sep + tail within max_len + tail_len = min(TRUNCATE_SLUG_TAIL, max_len - TRUNCATE_SLUG_HEAD - len(TRUNCATE_SLUG_SEP)) + if tail_len <= 0: + return slug[:max_len] + head_len = max_len - len(TRUNCATE_SLUG_SEP) - tail_len + return slug[:head_len] + TRUNCATE_SLUG_SEP + slug[-tail_len:]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 62 - 73, The truncate functions (truncate_component_name and truncate_component_slug) currently ignore the max_len when computing head/tail slices and always use hardcoded TRUNCATE_* constants; update both functions to compute head_len and tail_len from the passed max_len minus the separator length (TRUNCATE_NAME_SEP / TRUNCATE_SLUG_SEP) so the returned string length never exceeds max_len, ensuring head_len and tail_len are non-negative (floor to 0) and, if needed, favor the head when distributing odd remainders; keep using TRUNCATE_*_SEP and existing constants only as defaults for their original values but derive slice sizes from max_len inside each function (truncate_component_name, truncate_component_slug).scripts/auto/create_component.py (1)
645-645:get_statisticsresult is discarded — unnecessary API call.
_print_translation_infocallscreator.get_statistics(...)but does not use the return value. This makes a live API round-trip with no observable effect.♻️ Proposed fix
- creator.get_statistics(project_slug, component_slug)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` at line 645, In _print_translation_info remove the unused API call to creator.get_statistics(project_slug, component_slug) (or if statistics are intended to be shown, capture its return into a variable and use it) so we don't perform a live round-trip whose result is discarded; update _print_translation_info to either delete the creator.get_statistics(...) invocation or assign its return (e.g., stats = creator.get_statistics(...)) and use stats in the output.scripts/auto/collect_boost_libraries_extensions.py (3)
252-310: Side-effect inparse_args:load_dotenv_script_dir()call is non-obvious.Calling
load_dotenv_script_dir()insideparse_args(line 254) is done soGITHUB_TOKENis available for the--tokendefault (line 283). This coupling is implicit — a caller readingparse_argswouldn't expect environment mutation. Consider moving theload_dotenv_script_dir()call tomain()beforeparse_args()for clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 252 - 310, The parse_args function currently calls load_dotenv_script_dir(), causing an implicit environment mutation; move that call out of parse_args and into the program initialization (e.g., at the top of main before calling parse_args) so parse_args only handles argument parsing. Update calls to parse_args() to ensure load_dotenv_script_dir() runs first, leaving parse_args, its default token arg (os.environ.get("GITHUB_TOKEN")), and behavior unchanged otherwise.
386-397: Prefer tuple unpacking over index access for readability.Lines 388-395 use
row[0],row[1],row[2],row[3]instead of tuple unpacking, which is inconsistent with the rest of the script (e.g., line 364 usesfor i, (first_col, repo_url, subpath) in ...).Proposed fix
with open(out_path, "w", encoding="utf-8") as f: for row in all_libraries: - first_col, repo_url, subpath = row[0], row[1], row[2] + first_col, repo_url, subpath, ext_str = row subpath_display = format_subpath_display(subpath) line = ( f"{quoted(first_col)}, {quoted(repo_url)}, " f"{quoted(ref)}, {quoted(subpath_display)}" ) if include_extensions: - line += f", {quoted(row[3])}" + line += f", {quoted(ext_str)}" line += "\n" f.write(line)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 386 - 397, Replace index-based access to the row tuple with tuple unpacking when writing out all_libraries: in the with-open block using out_path iterate rows as e.g. for first_col, repo_url, subpath, *rest in all_libraries (or for first_col, repo_url, subpath, ext in ...) and use those local names with format_subpath_display and quoted; when include_extensions is true, reference the unpacked extension variable instead of row[3]; keep use of ref, format_subpath_display, quoted, and include_extensions intact so the output line construction and f.write remain unchanged.
411-413:str.replacefor deriving the submodule output path is fragile.
out_path.replace(".txt", "_submodules.txt")replaces the first occurrence of.txtanywhere in the path (e.g.,output.txt.bak→output_submodules.txt.bak). Usingpathliboros.path.splitextwould be more robust.Proposed fix
+ import pathlib - submodule_out_path = out_path.replace(".txt", "_submodules.txt") - if submodule_out_path == out_path: - submodule_out_path = out_path + "_submodules" + p = pathlib.Path(out_path) + submodule_out_path = str(p.with_stem(p.stem + "_submodules"))Or alternatively with
os.path.splitext:base, ext = os.path.splitext(out_path) submodule_out_path = base + "_submodules" + (ext or ".txt")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 411 - 413, The current derivation of submodule_out_path using out_path.replace(".txt", "_submodules.txt") is fragile; change it to use a path-splitting approach (pathlib.Path or os.path.splitext) to reliably append "_submodules" before the file extension of out_path and preserve a default ".txt" when no extension exists; update the logic around the submodule_out_path and out_path variables (the lines that set submodule_out_path and the fallback that checks equality) so the new computation produces base + "_submodules" + extension and removes the brittle .replace behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/auto/add-or-update.sh`:
- Around line 46-52: The JSON payload currently interpolates shell vars directly
(ORGANIZATION, VERSION, LANG_CODE/ LANG_CODE_JSON, SUBMODULES, EXTENSIONS) into
a double-quoted string which can produce malformed JSON if values contain quotes
or backslashes; replace the inline string construction with a safe jq -n builder
that takes these shell variables as inputs (e.g., pass ORGANIZATION, VERSION,
SUBMODULES, EXTENSIONS and LANG_CODE as jq variables) and constructs the object
(.organization = $org | .submodules = $submodules | .version = $version |
.extensions = $extensions), and use an if $lang != "" then .lang_code = $lang
else . end expression to conditionally include lang_code when LANG_CODE is
non-empty so the HTTP body is always valid JSON.
In `@scripts/auto/collect_boost_libraries_extensions.py`:
- Around line 96-102: The fetch_url function lacks retry/backoff handling for
GitHub rate-limit responses (429/403), which causes unhandled HTTPError in
fetch_gitmodules and missing data in get_doc_extensions; update fetch_url to
catch HTTPError/URLError, detect 429 or 403 responses and read Retry-After and
X-RateLimit-Reset headers, then perform a small exponential backoff loop (e.g.,
3-5 retries) using the header values when present (sleep until Retry-After or
computed from X-RateLimit-Reset, otherwise use increasing delays), and only
rethrow the error after retries are exhausted so callers like fetch_gitmodules
and get_doc_extensions won't crash or silently miss data.
- Around line 69-78: The parsing currently skips environment entries with empty
string values because of the `if key and value` check; adjust the logic so empty
strings are preserved by only checking the key existance before setting the env
var. In the block that processes the regex match (variables m, key, value)
replace `if key and value:` with `if key:` and then call
`os.environ.setdefault(key, value)` so entries like KEY="" result in an
environment variable set to the empty string.
In `@scripts/auto/generate_boost_project_configs.py`:
- Line 139: The code currently does a global string replace on repo =
https_to_git_ssh(repo_url).replace("boostorg", "CppDigest") which can alter the
repository name if it contains "boostorg"; instead parse the SSH/HTTP repo URL
returned by https_to_git_ssh(repo_url), isolate the owner segment (the path
component before the repo name), replace only that owner token with "CppDigest",
and reassemble the URL; update the logic around the repo variable (referencing
https_to_git_ssh and repo) so only the owner portion is changed rather than
doing a greedy replace on the whole string.
In `@scripts/auto/setup_project.py`:
- Around line 224-227: After prefixing the component slug with the project slug,
apply the existing truncate_component_slug function to enforce the Weblate
max_length; specifically, import truncate_component_slug from
weblate.boost_endpoint.services and call truncate_component_slug(component_slug)
right after the block that sets component_slug =
f"{project_slug}-{component_slug}" so the combined project_slug + generated slug
is truncated to the allowed length; ensure generate_component_slug and
project_config usage remains unchanged.
In `@weblate/urls.py`:
- Line 922: The URL for the boost endpoint is included unconditionally via the
path("boost-endpoint/", include("weblate.boost_endpoint.urls")) entry, but the
app is not listed in settings.INSTALLED_APPS; either add
"weblate.boost_endpoint" to the INSTALLED_APPS arrays in the relevant settings
files (settings_example.py, settings_docker.py, settings_test.py) so AppConfig
and signals run, or guard the URL include by checking if
"weblate.boost_endpoint" is in settings.INSTALLED_APPS before adding the path to
real_patterns (use settings.INSTALLED_APPS as the conditional).
---
Duplicate comments:
In @.gitignore:
- Line 57: The unanchored .gitignore entry "gettextization.failed.po" currently
ignores that file at any depth; to make the intent explicit either anchor it to
the repo root by replacing the line with "/gettextization.failed.po" (so only
root-level files are ignored) or keep the global pattern but add a nearby
comment clarifying that the global ignore is intentional (e.g., "# intentionally
ignore gettextization.failed.po at any depth - confirmed by team"); update the
.gitignore entry accordingly.
In `@scripts/auto/add-or-update.sh`:
- Around line 43-44: The curl call in add-or-update.sh uses the wrong auth
scheme and may hide failures; change the header from "Authorization: Bearer
${TOKEN}" to "Authorization: Token ${TOKEN}" so it matches DRF
TokenAuthentication (like other scripts), and ensure the curl failure isn't
silently ignored (remove or avoid the trailing "|| true" around the curl
invocation) so non-2xx responses return a non-zero exit status; look for the
curl invocation that references WEBLATE_URL and TOKEN to apply these changes.
- Around line 17-22: The script extracts WEBLATE_URL from WEB_JSON but never
validates it, so a missing or "null" weblate_url will produce a bad curl URL;
add the same validation applied to TOKEN: after WEBLATE_URL="$(jq -r
'.weblate_url' "${WEB_JSON}")" check if WEBLATE_URL is empty or equals "null"
(e.g. [[ -z "${WEBLATE_URL}" || "${WEBLATE_URL}" == "null" ]]), print an error
to stderr referencing WEB_JSON and exit 1, ensuring subsequent uses (the curl
call) always have a valid URL.
In `@scripts/auto/create_component.py`:
- Around line 53-54: The current use of urljoin(self.base_url, "/api/") discards
any path prefix in self.base_url (e.g. "http://host/weblate" becomes
"http://host/api/"); change the construction of self.api_url to preserve base
path by avoiding a leading slash and ensuring exactly one separator — e.g.
compute self.api_url by concatenating a normalized base
(self.base_url.rstrip("/")) with "/api/" or call
urljoin(self.base_url.rstrip("/") + "/", "api/") so that self.base_url and its
path prefix are retained when creating self.api_url.
In `@scripts/auto/setup_project.py`:
- Around line 791-796: The loop treating any non-zero return from run_one_config
as failure is incorrect because run_one_config can return 2 to indicate a
graceful skip when skip_on_clone_failure=True; update the check in the loop that
iterates over config_files to only set exit_code = 1 when result != 0 and result
!= 2 (or explicitly treat result == 2 as a skipped config), leaving exit_code
unchanged on skips so batch mode does not mark skipped clones as failures;
reference run_one_config and the loop variables i/config_path_str to locate the
offending logic to adjust.
In `@weblate/boost_endpoint/serializers.py`:
- Around line 34-43: The ListField "extensions" currently uses
child=serializers.CharField(allow_blank=True) which permits empty-string entries
like [""] — change the child to disallow empty/blank values (set
allow_blank=False) and add a small validator on the child that rejects empty or
whitespace-only strings (and optionally enforces a minimal format such as
starting with '.' if desired) so invalid entries are rejected during serializer
validation; refer to the "extensions" ListField and its child CharField in
serializers.py to implement this fix.
In `@weblate/boost_endpoint/services.py`:
- Around line 628-643: process_submodule is vulnerable because self.temp_dir can
be None and submodule is used unchecked in os.path.join, allowing TypeError or
path traversal; fix by lazily initializing self.temp_dir if None (e.g., set to a
secure temp dir) before building temp_submodule_dir, sanitize/normalize the
incoming submodule (e.g., reject path separators or use os.path.basename) and
construct the candidate path with os.path.abspath(os.path.join(self.temp_dir,
sanitized_submodule)); then verify the result is inside self.temp_dir (starts
with the abs temp_dir path) before calling os.makedirs; update the flow around
temp_submodule_dir, self.temp_dir, and os.path.join to perform these checks in
process_submodule.
- Around line 359-393: The component synchronization
(_sync_component_for_translation) is doing long-running git/IO while inside
transaction.atomic(), holding DB locks; move both calls to
_sync_component_for_translation (the one executed when created and the one
executed when existing) to after the transaction.atomic() block—just like
add_language_to_component is already outside—so the DB create/get and any
component.save/update happen inside the atomic block but the long-running
_sync_component_for_translation calls run after the transaction completes;
ensure component.post_create(user, origin="boost_endpoint") still runs inside
the atomic if needed, but call _sync_component_for_translation(component,
request, created=...) and any related network/git operations only after leaving
the atomic context.
- Around line 682-696: The deletion loop is iterating over
project.component_set.all(), which risks deleting unrelated components; restrict
the queryset to only components matching the computed prefix so only boost
submodule components are considered. Change the iteration to filter
project.component_set by slug starting with prefix (e.g., use slug__startswith
or check component.slug.startswith(prefix) on a filtered queryset) and still
skip component.is_glossary; keep the same delete call
_delete_component_and_commit_removal and the existing error handling/result
updates.
- Around line 379-385: The comment says we must enforce branch = "local" but
only push_branch is being updated; modify the logic around component to also
check component.branch (the attribute named branch) and if it's not "local" set
component.branch = "local" and append "branch" to update_fields so
component.save(update_fields=update_fields) persists both changes; keep the
existing push_branch check (component.push_branch != push_branch) and ensure
update_fields contains all changed field names before calling component.save.
In `@weblate/boost_endpoint/views.py`:
- Around line 29-32: AddOrUpdateView currently uses permission_classes =
(IsAuthenticated,) which permits any logged-in user to perform sensitive
operations; change this to enforce admin/manager-level access by replacing or
augmenting the permission with a stricter permission (e.g., use IsAdminUser or a
custom permission like IsManagerOrAdmin) and apply it to AddOrUpdateView (and
any other views that perform cloning/project/component creation). Ensure the new
permission class is imported and, if creating a custom permission, implement it
(e.g., class IsManagerOrAdmin) to check user.is_staff or the relevant manager
flag, then set permission_classes = (IsManagerOrAdmin,) on AddOrUpdateView so
only admins/managers can execute these actions.
- Around line 67-72: Wrap the synchronous call to service.process_all(...) in a
try/except block inside the view: call service.process_all(submodules,
user=request.user, request=request) in try, return Response(results,
status=HTTP_200_OK) on success, and in except Exception as e log the exception
(use logger.exception or similar) and return a safe error Response (e.g.,
{"detail":"processing error"} with status HTTP_500_INTERNAL_SERVER_ERROR) so
unhandled exceptions don't propagate as raw 500s; keep references to
service.process_all, request.user, and Response to locate the code.
---
Nitpick comments:
In `@scripts/auto/add_translation.py`:
- Around line 273-274: Remove the no-op if block in add_translation.py that
checks "if \"total\" in translation:" and only contains a pass; delete the
entire if block (you can keep or move the comment elsewhere if you want a note)
so the function no longer contains dead code related to the translation["total"]
check in that spot (search for the if "total" in translation: statement to
locate it).
In `@scripts/auto/add-or-update.sh`:
- Around line 40-54: The script currently uses curl --max-time 5 which can cut
off the synchronous AddOrUpdateView.post() before it returns, hiding
auth/validation errors; either remove/extend --max-time and let curl wait for
the controller response so the caller sees success/failure, or refactor
AddOrUpdateView.post() to enqueue the long-running work to a background worker
(e.g., Celery/RQ/async task) and immediately return an HTTP 202 Accepted; update
the script to POST and check the 2xx/202 response (or remove the forced timeout)
and update AddOrUpdateView.post() to call the enqueue function (unique symbol:
AddOrUpdateView.post) and return 202 if you choose the background approach.
In `@scripts/auto/collect_boost_libraries_extensions.py`:
- Around line 252-310: The parse_args function currently calls
load_dotenv_script_dir(), causing an implicit environment mutation; move that
call out of parse_args and into the program initialization (e.g., at the top of
main before calling parse_args) so parse_args only handles argument parsing.
Update calls to parse_args() to ensure load_dotenv_script_dir() runs first,
leaving parse_args, its default token arg (os.environ.get("GITHUB_TOKEN")), and
behavior unchanged otherwise.
- Around line 386-397: Replace index-based access to the row tuple with tuple
unpacking when writing out all_libraries: in the with-open block using out_path
iterate rows as e.g. for first_col, repo_url, subpath, *rest in all_libraries
(or for first_col, repo_url, subpath, ext in ...) and use those local names with
format_subpath_display and quoted; when include_extensions is true, reference
the unpacked extension variable instead of row[3]; keep use of ref,
format_subpath_display, quoted, and include_extensions intact so the output line
construction and f.write remain unchanged.
- Around line 411-413: The current derivation of submodule_out_path using
out_path.replace(".txt", "_submodules.txt") is fragile; change it to use a
path-splitting approach (pathlib.Path or os.path.splitext) to reliably append
"_submodules" before the file extension of out_path and preserve a default
".txt" when no extension exists; update the logic around the submodule_out_path
and out_path variables (the lines that set submodule_out_path and the fallback
that checks equality) so the new computation produces base + "_submodules" +
extension and removes the brittle .replace behavior.
In `@scripts/auto/create_component.py`:
- Line 645: In _print_translation_info remove the unused API call to
creator.get_statistics(project_slug, component_slug) (or if statistics are
intended to be shown, capture its return into a variable and use it) so we don't
perform a live round-trip whose result is discarded; update
_print_translation_info to either delete the creator.get_statistics(...)
invocation or assign its return (e.g., stats = creator.get_statistics(...)) and
use stats in the output.
In `@scripts/auto/generate_boost_project_configs.py`:
- Around line 115-124: The function extensions_to_fullname should emit a
conventional English list (use commas and an Oxford comma for 3+ items) instead
of always joining with " and "; update extensions_to_fullname to build names
from EXTENSION_FULLNAMES as it does now, then: return the single name if names
has length 1, return " and ".join(names) for length 2, and for length >=3 return
", ".join(names[:-1]) + ", and " + names[-1]; keep the existing uniqueness logic
over ext_list and seen.
In `@scripts/auto/setup_project.py`:
- Around line 86-91: The current exclusion loop uses a substring check ("pattern
in rel_root") which over-matches; change the logic in the loop that iterates
exclude_patterns to either use fnmatch.fnmatchcase()/fnmatch.fnmatch on rel_root
or split rel_root into path components (e.g., rel_root.split(os.sep)) and test
equality/startswith against each component so patterns only match whole path
segments; update the loop that sets skip (the variables rel_root,
exclude_patterns, pattern, and skip) to use one of these approaches and ensure
tests for prefix matching use component-based startswith or proper glob
matching.
In `@scripts/auto/upload_translations.py`:
- Around line 177-182: The print() continuation lines inside
upload_translations.py are misaligned: the opening "print(" is indented
differently than its argument lines and the closing ")"; realign the multi-line
print so its arguments and the closing parenthesis line up under the print(
token (i.e., make the f-string lines and the closing parenthesis use the same
indentation column as "print(") and ensure the subsequent "return False" keeps
its original block-level indentation; look for the print block around the
failing-reporting logic (the f" ✗ Failed: Accepted={accepted},
Skipped={skipped}, f"Not found={not_found}" lines) and adjust spacing so the
continuation lines visually belong to the print call.
In `@weblate/boost_endpoint/services.py`:
- Around line 62-73: The truncate functions (truncate_component_name and
truncate_component_slug) currently ignore the max_len when computing head/tail
slices and always use hardcoded TRUNCATE_* constants; update both functions to
compute head_len and tail_len from the passed max_len minus the separator length
(TRUNCATE_NAME_SEP / TRUNCATE_SLUG_SEP) so the returned string length never
exceeds max_len, ensuring head_len and tail_len are non-negative (floor to 0)
and, if needed, favor the head when distributing odd remainders; keep using
TRUNCATE_*_SEP and existing constants only as defaults for their original values
but derive slice sizes from max_len inside each function
(truncate_component_name, truncate_component_slug).
In `@weblate/boost_endpoint/urls.py`:
- Around line 9-12: Add an application namespace by declaring an app_name (e.g.,
app_name = "boost_endpoint") at the top of this urls module so the URL names for
BoostEndpointInfo.as_view() ("info") and AddOrUpdateView.as_view()
("add-or-update") are not global; update any reverse()/url references elsewhere
to use the new namespace (e.g., "boost_endpoint:info" or
"boost_endpoint:add-or-update").
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.gitignoreSetup-ongoing.mdscripts/auto/add-or-update.shscripts/auto/add_translation.pyscripts/auto/boost-1.90.0_libraries_list.txtscripts/auto/boost-1.90.0_libraries_list_submodules.txtscripts/auto/collect_boost_libraries_extensions.pyscripts/auto/create_component.pyscripts/auto/create_component_and_add_translation.pyscripts/auto/delete_all_components.pyscripts/auto/generate_boost_project_configs.pyscripts/auto/setup_project.pyscripts/auto/upload_translations.pyscripts/auto/weblate_supported_formats.jsonweblate/boost_endpoint/__init__.pyweblate/boost_endpoint/apps.pyweblate/boost_endpoint/serializers.pyweblate/boost_endpoint/services.pyweblate/boost_endpoint/urls.pyweblate/boost_endpoint/views.pyweblate/urls.py
💤 Files with no reviewable changes (1)
- Setup-ongoing.md
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 6
> [!CAUTION]
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)scripts/auto/delete_all_components.py (2)
132-133:⚠️ Potential issue | 🟡 Minor
tuple[int, int]requires Python 3.9+, inconsistent with thetyping.Dict/Listusage elsewhere.The file imports
Dict,Listfromtyping(3.8-style), yet the return annotation on line 132 uses the lowercase built-in generic form (3.9+), which raisesTypeErroron Python 3.8 at class definition time.🛡️ Proposed fix
- def delete_all_components_in_project(self, project_slug: str) -> tuple[int, int]: + def delete_all_components_in_project(self, project_slug: str) -> Tuple[int, int]:And add
Tupleto the existingtypingimport:-from typing import Any, Dict, List +from typing import Any, Dict, List, Tuple🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/delete_all_components.py` around lines 132 - 133, The return annotation on delete_all_components_in_project uses Python 3.9+ built-in generics (tuple[int, int]) which is inconsistent with the file's typing imports and breaks on Python 3.8; change the signature to use typing.Tuple[int, int] and add Tuple to the existing from typing import ... list (alongside Dict and List) so the annotation is Tuple[int, int] and matches the other typing usage.
46-47:⚠️ Potential issue | 🟠 Major
urljoinstrips the subpath frombase_urlwhen the second argument starts with/.
urljoin("http://host/weblate/", "/api/")resolves to"http://host/api/", silently discarding the/weblate/prefix. Every subsequent API call will hit the wrong path. Sinceself.base_urlis already.rstrip("/")-normalised on the line above, direct concatenation is both simpler and correct.🐛 Proposed fix
- self.api_url = urljoin(self.base_url, "/api/") + self.api_url = self.base_url + "/api/"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/delete_all_components.py` around lines 46 - 47, The code uses urljoin to build self.api_url which drops the base subpath when the second argument starts with '/', so replace the urljoin usage in the class initializer: construct self.api_url by concatenating the already-normalised self.base_url with the "/api/" suffix (e.g., self.api_url = self.base_url + "/api/") instead of urljoin to preserve any subpath; update any related tests or callers expecting self.api_url if necessary.
♻️ Duplicate comments (13)
.gitignore (1)
57-57: Previously flagged unanchored pattern — confirmed addressed.The global (unanchored)
gettextization.failed.popattern was reviewed and accepted in the prior commit cycle. No further action needed unless a specific path must be tracked.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore at line 57, Leave the unanchored pattern "gettextization.failed.po" in .gitignore as-is (no change required); if you do need to track a specific path later, replace that entry with a anchored/path-specific pattern (e.g., prepend a directory or use a glob) to explicitly include the intended file location instead of the global pattern.weblate/boost_endpoint/services.py (4)
359-393:_sync_component_for_translation(git pull + translation load) still executes insidetransaction.atomic().Lines 375 and 388 call
_sync_component_for_translationwhile thewith transaction.atomic():block at Line 359 is still open.do_updatetriggers a VCS network operation andcreate_translations_immediateis a DB-heavy scan — both hold the transaction lock. While removing the 60-second sleep reduced the worst case, a slow clone or large translation scan still holds locks that can starve other writers.Move both
_sync_component_for_translationcalls to outside thewithblock, keeping only theget_or_create,acting_user,post_create, and field-update saves inside the transaction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 359 - 393, The _sync_component_for_translation calls are executed inside the with transaction.atomic() block; move both calls (the ones invoked when created=True and created=False) to after the atomic block so that only Component.objects.get_or_create(...), setting component.acting_user, component.post_create(user, origin="boost_endpoint") and any component.save(update_fields=...) remain inside the transaction; ensure you still call self._sync_component_for_translation(component, request, created=True/False) immediately after the with block returns and keep the conditional self.add_language_to_component(component, request) behavior intact.
641-643:⚠️ Potential issue | 🔴 CriticalPath traversal via untrusted
submodulename —"../../../etc"escapes the temp directory.
submoduleoriginates from the external API request (validated only as a non-empty string by the serializer).os.path.join(self.temp_dir, submodule)constructs the clone target without verifying the resolved path stays withinself.temp_dir. A value like"../../../tmp/attack"would place the git clone outside the intended sandbox.🛡️ Proposed fix — containment check before makedirs
# Create temp directory for this submodule temp_submodule_dir = os.path.join(self.temp_dir, submodule) + # Guard against path traversal from untrusted submodule names + if not os.path.realpath(temp_submodule_dir).startswith( + os.path.realpath(self.temp_dir) + os.sep + ): + result["errors"].append(f"Invalid submodule name: {submodule!r}") + return result os.makedirs(temp_submodule_dir, exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 641 - 643, The code trusts the external submodule string when creating temp_submodule_dir (temp_submodule_dir = os.path.join(self.temp_dir, submodule)) which allows path traversal; fix by resolving and validating the resulting path before creating directories: compute the absolute/real path for temp_submodule_dir (using os.path.abspath or os.path.realpath) and ensure it is inside self.temp_dir (e.g., startswith the resolved self.temp_dir plus os.sep) or reject/raise a validation error for bad submodule values; alternatively sanitize submodule to disallow path separators before calling os.path.join and then call os.makedirs(temp_submodule_dir, exist_ok=True).
684-696:⚠️ Potential issue | 🔴 CriticalDeletion scope is too broad — iterating
project.component_set.all()will delete manually created or cross-submodule components.
prefixis computed but never used to narrow the queryset. Any non-glossary component whose slug isn't in the current scan'swanted_slugs— even one manually added by a user or belonging to a different submodule that shares the project — will be deleted.🛠️ Proposed fix — restrict to slug prefix
- for component in project.component_set.all(): + for component in project.component_set.filter(slug__startswith=prefix): if component.slug not in wanted_slugs and not component.is_glossary:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 684 - 696, The deletion loop currently iterates project.component_set.all() and removes any non-glossary component whose slug is not in wanted_slugs; restrict the scope by filtering components to only those matching the computed prefix so you don't touch manually created or cross-submodule components: use the existing prefix and iterate only over project.component_set.filter(slug__startswith=prefix) (or equivalent slug starts-with check) and then keep the same check against wanted_slugs and use _delete_component_and_commit_removal to remove unmatched items; continue to append warnings and errors to result as before.
628-648:⚠️ Potential issue | 🟠 Major
process_submodulewillTypeErrorif called directly —self.temp_dirisNoneby default.Line 642:
os.path.join(self.temp_dir, submodule)raisesTypeErrorwhenself.temp_dir is None. Onlyprocess_all(Line 706) setstemp_dirbefore callingprocess_submodule, but the method is public.🛠️ Proposed fix — lazy initialisation at entry
def process_submodule( self, submodule: str, user=None, request=None ) -> dict[str, Any]: """Process a single submodule: clone, scan, create/update components.""" + if self.temp_dir is None: + self.temp_dir = tempfile.mkdtemp(prefix="boost_endpoint_") + LOGGER.info("Using temp directory: %s", self.temp_dir) + result = {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/services.py` around lines 628 - 648, process_submodule currently assumes self.temp_dir is set and will TypeError if None; fix by lazily initializing self.temp_dir at the start of process_submodule (before computing temp_submodule_dir) when it is falsy — e.g. set self.temp_dir = tempfile.mkdtemp(...) or create a TemporaryDirectory and store its path; then continue to compute temp_submodule_dir = os.path.join(self.temp_dir, submodule) and proceed (this ensures calling process_submodule directly works and keeps existing behavior with process_all). Ensure tempfile is imported and choose the same cleanup approach used elsewhere if any.scripts/auto/add-or-update.sh (2)
17-22:⚠️ Potential issue | 🟡 Minor
WEBLATE_URLis not validated — a missing or"null"value will silently send to an invalid URL.
TOKENis checked for null/empty butWEBLATE_URLreceives no equivalent guard. Ifweblate_urlis absent fromweb.json, curl will fire against an empty/malformed URL.🛡️ Proposed fix
WEBLATE_URL="$(jq -r '.weblate_url' "${WEB_JSON}")" TOKEN="$(jq -r '.api_token' "${WEB_JSON}")" +if [[ -z "${WEBLATE_URL}" || "${WEBLATE_URL}" == "null" ]]; then + echo "Error: weblate_url not set in ${WEB_JSON}." >&2 + exit 1 +fi if [[ -z "${TOKEN}" || "${TOKEN}" == "null" ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 17 - 22, The script assigns WEBLATE_URL from WEB_JSON but never validates it; add the same guard used for TOKEN to check WEBLATE_URL (the WEBLATE_URL variable read via jq from WEB_JSON) for empty or "null" and, if invalid, print an error to stderr and exit non‑zero. Update the add-or-update.sh logic immediately after setting WEBLATE_URL to validate the value (similar to the existing TOKEN check) so curl is never invoked with an empty or malformed URL.
46-50:⚠️ Potential issue | 🔴 CriticalWrong auth scheme — every request will receive a 401, silently swallowed by
|| true.The Weblate REST API uses
Authorization: Token TOKEN, matching DRF'sTokenAuthentication. To use a different keyword such asBearer,TokenAuthenticationwould need to be explicitly subclassed and reconfigured. UsingBearerhere produces a 401 on every POST, which|| truethen silently ignores.🔒 Proposed fix
curl -X POST "${WEBLATE_URL}/boost-endpoint/add-or-update/" \ - -H "Authorization: Bearer ${TOKEN}" \ + -H "Authorization: Token ${TOKEN}" \ -H "Content-Type: application/json" \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add-or-update.sh` around lines 46 - 50, The curl call in add-or-update.sh is using the wrong auth scheme and silently swallowing failures: replace the header "Authorization: Bearer ${TOKEN}" with "Authorization: Token ${TOKEN}" so Weblate/DRF TokenAuthentication accepts it, and remove or handle the trailing "|| true" on the curl invocation so a 401/other error is not silently ignored (e.g., allow curl to return non-zero or check the HTTP status after the request). Ensure you update the invocation that builds PAYLOAD and uses TOKEN in the same script.weblate/boost_endpoint/serializers.py (1)
34-44:⚠️ Potential issue | 🟡 Minor
allow_blank=Truepermits""in theextensionslist without validation error.A payload with
"extensions": [""]passes serializer validation. Whileget_supported_extensions()would silently drop the empty string, an early validation failure is cleaner.🛡️ Proposed fix
extensions = serializers.ListField( - child=serializers.CharField(allow_blank=True), + child=serializers.CharField(), required=False,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/serializers.py` around lines 34 - 44, The ListField `extensions` currently uses child=serializers.CharField(allow_blank=True) which permits empty strings (e.g. [""]) to pass validation; change the child to serializers.CharField(allow_blank=False) (or add a validate_extensions(self, value) method to raise a ValidationError if any item == "") so the serializer rejects empty-extension strings early; update the `extensions` field definition (the ListField/child CharField) accordingly to ensure empty strings are invalid.scripts/auto/create_component.py (1)
53-54:⚠️ Potential issue | 🟡 Minor
urljoinwith an absolute path discards the base URL's path prefix.
urljoin("http://host/weblate", "/api/")resolves tohttp://host/api/, dropping/weblate. This breaks any subpath-installed Weblate instance.🛠️ Proposed fix
self.base_url = base_url.rstrip("/") - self.api_url = urljoin(self.base_url, "/api/") + self.api_url = self.base_url + "/api/"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` around lines 53 - 54, The code uses urljoin(self.base_url, "/api/") which drops any path portion of self.base_url (e.g., "/weblate"); change the construction of self.api_url to preserve subpaths by ensuring the base ends with a slash and by using a relative join (e.g., build self.api_url from self.base_url + '/' and a relative "api/" or use urljoin(self.base_url + '/', 'api/')) so that self.base_url, self.api_url and the urljoin call are updated to produce "http://host/weblate/api/" instead of "http://host/api/".weblate/boost_endpoint/views.py (2)
29-32:⚠️ Potential issue | 🟠 Major
IsAuthenticatedis too permissive for project/component mutation.This endpoint clones repos, creates/updates projects and components — admin-level operations. Any logged-in user can trigger them. Use
IsAdminUseror a custom permission.Proposed fix
-from rest_framework.permissions import IsAuthenticated +from rest_framework.permissions import IsAdminUser class AddOrUpdateView(APIView): """Add or update Boost documentation components.""" - permission_classes = (IsAuthenticated,) + permission_classes = (IsAdminUser,)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/views.py` around lines 29 - 32, The AddOrUpdateView currently uses IsAuthenticated which is too permissive for admin-level operations; update the view's permission_classes on AddOrUpdateView to use IsAdminUser (or a project-admin custom permission) instead of IsAuthenticated and add/update the corresponding import from rest_framework.permissions so only admins (or the designated custom permission class) can invoke the clone/create/update actions.
67-72:⚠️ Potential issue | 🟠 MajorUnhandled exceptions from
process_allwill surface as raw 500 errors.
service.process_all()clones repos, scans files, and creates components — any failure (git timeout, disk full, DB error) becomes an opaque 500. At minimum, wrap in try/except to return a structured error response.Minimum fix
# Process all submodules (pass request so do_update and add_new_language work) - results = service.process_all( - submodules, user=request.user, request=request - ) - - return Response(results, status=status.HTTP_200_OK) + try: + results = service.process_all( + submodules, user=request.user, request=request + ) + except Exception as exc: + return Response( + {"error": str(exc)}, + status=status.HTTP_500_INTERNAL_SERVER_ERROR, + ) + + return Response(results, status=status.HTTP_200_OK)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@weblate/boost_endpoint/views.py` around lines 67 - 72, service.process_all can raise various exceptions (git, IO, DB) which currently bubble up as raw 500s; wrap the call to service.process_all(...) in a try/except, catch Exception, log the full traceback (e.g. logger.exception) and return a structured DRF Response containing an error message and optionally the exception string with status=status.HTTP_500_INTERNAL_SERVER_ERROR instead of returning the raw exception; ensure Response and status are used consistently (replacing the current direct return of results and status.HTTP_200_OK on error path) and add logger import if not present.scripts/auto/setup_project.py (1)
804-809:⚠️ Potential issue | 🟡 MinorSkipped configs (return code 2) still treated as failures in batch mode.
run_one_configreturns2for clone-skips (documented on Line 587), butresult != 0treats any non-zero as a failure. If gracefully skipping unreachable repos in batch mode is intended, skips shouldn't fail the batch.Proposed fix
result = run_one_config(args, config_path_str, skip_on_clone_failure=True) - if result != 0: + if result == 1: exit_code = 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/setup_project.py` around lines 804 - 809, The loop over config_files treats any non-zero result from run_one_config as failure, but run_one_config uses return code 2 to indicate a graceful skip (clone failure); update the logic in the loop that calls run_one_config (the block that assigns result and sets exit_code) so that it only marks failure (set exit_code = 1) when result is not 0 and not 2; for result == 2 log or print a clear "skipped" message and continue without changing exit_code. Ensure references to run_one_config, result, and exit_code are adjusted accordingly.scripts/auto/generate_boost_project_configs.py (1)
24-24: Missingfrom __future__ import annotationsfor consistency.The codebase widely uses this import. While Python 3.11+ supports
tuple[str, ...] | Nonenatively, adding the import aligns with project conventions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/generate_boost_project_configs.py` at line 24, Add the missing top-of-file directive "from __future__ import annotations" to this module to match project conventions so the forward-reference type syntax used in the parse_list_row function signature (parse_list_row) is consistent with the rest of the codebase; place the import as the first non-shebang line of scripts/auto/generate_boost_project_configs.py before any other imports or definitions.
🧹 Nitpick comments (8)
.gitignore (1)
47-47:.cursor/is unanchored — inconsistent with.vscode/*on line 16.
.cursor/without a leading/will match any directory named.cursorat any depth, whereas.vscode/*is root-anchored. If the intent is to ignore only the root-level Cursor IDE directory (same scope as.vscode/*), prepend/.✏️ Proposed fix
-.cursor/ +/.cursor/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.gitignore at line 47, The .cursor/ entry is unanchored and will match .cursor directories at any depth; change it to be root-anchored to match the style of .vscode/* by prepending a leading slash so update the .cursor/ entry to / .cursor/ (i.e., make it "/.cursor/") in the .gitignore to only ignore the repo-root Cursor IDE directory consistent with .vscode/*.scripts/auto/delete_all_components.py (1)
62-70: Missingtimeouton all HTTP calls — script will hang indefinitely on an unresponsive server.None of the
session.get()orsession.delete()calls pass atimeoutargument. The simplest fix is a per-call timeout; alternatively, mount a customHTTPAdapteron the session to set a default globally.♻️ Proposed fix (per-call, minimal change)
- response = self.session.get(full_url) + response = self.session.get(full_url, timeout=30)Apply the same
timeout=30to thesession.get()call inlist_componentsand bothsession.delete()calls indelete_project/delete_component.Also applies to: 79-90, 92-110, 112-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/delete_all_components.py` around lines 62 - 70, The HTTP calls currently risk hanging because no timeout is set; update the requests in list_components (where self.session.get(...) is called) to include a timeout (e.g., timeout=30) and likewise add timeout=30 to both self.session.delete(...) calls used in delete_project and delete_component; you can apply the per-call timeout to each request to fix the issue immediately.scripts/auto/upload_translations.py (1)
177-181: Misaligned continuation indentation insideprint().The
f-string arguments and closing)are indented at ~20/16 spaces while theprint(call sits at ~28 spaces. Python allows this, but it visually implies those lines belong to the enclosingtryblock rather than to the♻️ Proposed fix
- print( - f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " - f"Not found={not_found}", - flush=True, - ) + print( + f" ✗ Failed: Accepted={accepted}, Skipped={skipped}, " + f"Not found={not_found}", + flush=True, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/upload_translations.py` around lines 177 - 181, The multi-line print call in upload_translations.py is visually misaligned: move the f-string lines and the closing parenthesis to align under the opening "print(" so the continuation lines belong to the print call (e.g., ensure the f-string lines start at the same column as the first argument and the closing ")" aligns with "print("). Locate the offending print(...) that references Accepted={accepted}, Skipped={skipped}, Not found={not_found} and adjust indentation so the f-strings and closing parenthesis are indented consistently with the print call.scripts/auto/add_translation.py (1)
273-274: Remove the no-opifblock — dead code.The
if "total" in translation: passblock does nothing and the comment describes intent that was never implemented. Either remove both lines or emit the stats.♻️ Proposed fix
- # Show statistics if available - if "total" in translation: - pass # Optional: log translation.get("translated")/translation.get("total") + # Show statistics if available + if "total" in translation: + translated = translation.get("translated", 0) + total = translation.get("total", 0) + print(f" Progress: {translated}/{total}", flush=True)Or simply delete lines 272-274 if logging stats is not desired.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/add_translation.py` around lines 273 - 274, Remove the dead no-op block that checks if "total" in translation and does nothing; either delete the two lines containing `if "total" in translation: pass` (and its placeholder comment) or replace them with an actual log/emission of stats using `translation.get("translated")` and `translation.get("total")` (e.g., call the existing logger or emit a metric) so the intent in the comment is implemented; locate the check around the `translation` variable in add_translation.py and remove or implement it accordingly.scripts/auto/create_component.py (1)
645-645:get_statisticsreturn value is discarded — unnecessary API call.The result is thrown away after the request is made, yet
_make_requeststill prints[API] GET .../statistics/to stdout. Either remove the call or use the result.♻️ Proposed fix (remove the dead call)
- creator.get_statistics(project_slug, component_slug)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/create_component.py` at line 645, The call to creator.get_statistics(project_slug, component_slug) is unused and causes an unnecessary API request (and log) via _make_request; remove this dead call from the create_component flow (delete the line invoking get_statistics) or, if statistics are required, assign its return to a variable and use it (e.g., stats = creator.get_statistics(...)) wherever needed; locate the invocation by the symbol get_statistics in the create_component script and remove or consume its result accordingly.scripts/auto/collect_boost_libraries_extensions.py (2)
424-479:write_submodule_outputderives output path via string replacement — fragile edge case.Line 431:
submodule_out_path = out_path.replace(".txt", "_submodules.txt")will replace the first.txtoccurrence in the path. If the output directory contains.txt(e.g.,outputs.txt_dir/libraries.txt), the replacement targets the wrong substring.A safer approach would be to use
pathlibto manipulate only the stem/suffix:Proposed fix
- submodule_out_path = out_path.replace(".txt", "_submodules.txt") - if submodule_out_path == out_path: - submodule_out_path = out_path + "_submodules" + p = Path(out_path) + if p.suffix == ".txt": + submodule_out_path = str(p.with_stem(p.stem + "_submodules")) + else: + submodule_out_path = out_path + "_submodules"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 424 - 479, The code in write_submodule_output builds submodule_out_path by calling out_path.replace(".txt", "_submodules.txt"), which incorrectly replaces the first ".txt" anywhere in the path; change this to use pathlib to operate on the filename only: create a Path from out_path, derive a new name by appending "_submodules" to the stem (or replace the suffix) and then convert back to string (update the submodule_out_path creation logic that currently references out_path and submodule_out_path). Ensure behavior when out_path has no suffix still appends "_submodules" and preserve encoding/writing logic unchanged.
375-395: Type annotation mismatch infetch_extensions_for_librariessignature.The parameter
all_librariesis annotated asList[Tuple[str, str, str]](3-tuple), but the function returnsList[Tuple[str, str, str, str]](4-tuple). The annotation is correct for the input at call-time frommain(), but worth noting this asymmetry between input and output types—the return type annotation on Line 379 is good.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/collect_boost_libraries_extensions.py` around lines 375 - 395, The input/return tuple asymmetry should be made explicit with type aliases: define a Library = Tuple[str, str, str] (or Optional[str] if subpath can be None) and an ExtensionRow = Tuple[str, str, str, str], then update the function signature of fetch_extensions_for_libraries to accept all_libraries: List[Library] and return List[ExtensionRow]; also update local variable rows_with_ext: List[ExtensionRow] to use the alias so the types for all_libraries, rows_with_ext and the return value (in fetch_extensions_for_libraries) are clear and consistent.scripts/auto/setup_project.py (1)
206-217:max_lenparameter is accepted but not used in the truncation logic.Both
truncate_component_nameandtruncate_component_slugacceptmax_lenbut the truncation body uses fixed constants (TRUNCATE_NAME_HEAD + TRUNCATE_NAME_SEP + TRUNCATE_NAME_TAIL = 94andTRUNCATE_SLUG_HEAD + TRUNCATE_SLUG_SEP + TRUNCATE_SLUG_TAIL = 100). This works for the defaultmax_len=100but callers passing a smallermax_lenwould get a string exceeding their requested limit.Consider either removing the
max_lenparameter (since it's unused) or incorporating it into the formula.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/auto/setup_project.py` around lines 206 - 217, truncate_component_name and truncate_component_slug accept max_len but ignore it; update both functions (truncate_component_name, truncate_component_slug) to compute head and tail slice lengths from the provided max_len rather than always using TRUNCATE_*_HEAD/Tail constants. Specifically: if len(input) <= max_len return as before; otherwise compute sep = TRUNCATE_NAME_SEP or TRUNCATE_SLUG_SEP and available = max_len - len(sep), then choose head = min(CONST_HEAD, available - CONST_TAIL) and tail = available - head (or distribute proportionally but ensure head+tail+len(sep)==max_len and neither is negative), and return input[:head] + sep + input[-tail:]; apply same logic for slug using TRUNCATE_SLUG_* constants so callers who pass a smaller max_len get output <= max_len.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/auto/create_component.py`:
- Around line 534-535: The monolingual_formats list in _gather_component_config
(variable monolingual_formats) omits "txt" causing interactive mode to skip
template prompting for txt files; update the monolingual_formats list to include
"txt" so it matches the sister script (create_component_and_add_translation.py)
and ensure the conditional if config["file_format"] in monolingual_formats
triggers for "txt" as expected.
In `@scripts/auto/setup_project.py`:
- Around line 724-731: The uniquification loop for component_slug appends
suffixes like "-2" to an already-truncated slug and can exceed Weblate's
MAX_COMPONENT_SLUG_LENGTH; after you set component_slug =
f"{original_slug}-{counter}" in the while loop, re-truncate the resulting
component_slug to the same max length used by generate_component_config (or
MAX_COMPONENT_SLUG_LENGTH) so the final slug never exceeds the limit, and then
assign component_config['component']['slug'] = component_slug; also ensure
truncation happens each iteration (or reserve space for the suffix before
initial truncation) to avoid producing a too-long slug and reduce collision
risk.
In `@weblate/boost_endpoint/services.py`:
- Around line 196-231: The slug collision occurs because component_slug is built
from path parts only, so files like doc/intro.adoc and doc/intro.md both become
"doc-intro"; update the slug generation in the code that builds
slug_parts/component_slug to include the file extension (use
extension.lstrip(".").lower()) before joining, e.g., append extension without
the dot to slug_parts so component_slug becomes unique per format (affects
variable names component_slug, slug_parts and the behavior of
create_or_update_component).
- Around line 80-90: The helper _build_extension_to_format currently falls back
to format_cls.name when format_id is falsy, which captures a human-readable
label instead of a real API format ID; change the logic in
_build_extension_to_format to read format_id = getattr(format_cls, "format_id",
None) and if format_id is falsy (None or empty) skip this format (continue)
instead of using format_cls.name, while still checking format_cls.autoload and
mapping extensions from each pattern into result[ext] = format_id; ensure you
reference the format_id, format_cls.autoload, and the _build_extension_to_format
function when making this change.
- Around line 659-669: The code currently calls get_or_create_project(...)
before enforcing permissions, which can create a persisted Project even if
user.has_perm("project.edit", project) later fails; to fix, check permissions
first (using request and user and the has_perm("project.edit", project) call)
and only call get_or_create_project when the permission check passes, or
alternatively modify get_or_create_project/process_submodule to defer committing
the Project (e.g., wrap creation in a transaction or return an unsaved instance)
until after has_perm is confirmed; update the logic around
get_or_create_project, user.has_perm, and process_submodule so no Project row is
committed when permission is denied.
- Around line 563-566: The current block uses subprocess.run([... "git", "-C",
base_path, "add", "-u"], ...) which will stage all tracked changes in base_path;
instead iterate only over the translation files that were removed and stage
those explicitly (e.g. call subprocess.run(["git", "-C", base_path, "rm",
"--quiet", "--", removed_path]) for each removed path or use ["git", "-C",
base_path, "add", "--", removed_path] for deletions you recreated), passing each
file as a separate arg and keeping the existing base_path and removed_any check;
replace the single "git add -u" invocation with this per-file staging loop to
avoid sweeping unrelated modifications.
---
Outside diff comments:
In `@scripts/auto/delete_all_components.py`:
- Around line 132-133: The return annotation on delete_all_components_in_project
uses Python 3.9+ built-in generics (tuple[int, int]) which is inconsistent with
the file's typing imports and breaks on Python 3.8; change the signature to use
typing.Tuple[int, int] and add Tuple to the existing from typing import ... list
(alongside Dict and List) so the annotation is Tuple[int, int] and matches the
other typing usage.
- Around line 46-47: The code uses urljoin to build self.api_url which drops the
base subpath when the second argument starts with '/', so replace the urljoin
usage in the class initializer: construct self.api_url by concatenating the
already-normalised self.base_url with the "/api/" suffix (e.g., self.api_url =
self.base_url + "/api/") instead of urljoin to preserve any subpath; update any
related tests or callers expecting self.api_url if necessary.
---
Duplicate comments:
In @.gitignore:
- Line 57: Leave the unanchored pattern "gettextization.failed.po" in .gitignore
as-is (no change required); if you do need to track a specific path later,
replace that entry with a anchored/path-specific pattern (e.g., prepend a
directory or use a glob) to explicitly include the intended file location
instead of the global pattern.
In `@scripts/auto/add-or-update.sh`:
- Around line 17-22: The script assigns WEBLATE_URL from WEB_JSON but never
validates it; add the same guard used for TOKEN to check WEBLATE_URL (the
WEBLATE_URL variable read via jq from WEB_JSON) for empty or "null" and, if
invalid, print an error to stderr and exit non‑zero. Update the add-or-update.sh
logic immediately after setting WEBLATE_URL to validate the value (similar to
the existing TOKEN check) so curl is never invoked with an empty or malformed
URL.
- Around line 46-50: The curl call in add-or-update.sh is using the wrong auth
scheme and silently swallowing failures: replace the header "Authorization:
Bearer ${TOKEN}" with "Authorization: Token ${TOKEN}" so Weblate/DRF
TokenAuthentication accepts it, and remove or handle the trailing "|| true" on
the curl invocation so a 401/other error is not silently ignored (e.g., allow
curl to return non-zero or check the HTTP status after the request). Ensure you
update the invocation that builds PAYLOAD and uses TOKEN in the same script.
In `@scripts/auto/create_component.py`:
- Around line 53-54: The code uses urljoin(self.base_url, "/api/") which drops
any path portion of self.base_url (e.g., "/weblate"); change the construction of
self.api_url to preserve subpaths by ensuring the base ends with a slash and by
using a relative join (e.g., build self.api_url from self.base_url + '/' and a
relative "api/" or use urljoin(self.base_url + '/', 'api/')) so that
self.base_url, self.api_url and the urljoin call are updated to produce
"http://host/weblate/api/" instead of "http://host/api/".
In `@scripts/auto/generate_boost_project_configs.py`:
- Line 24: Add the missing top-of-file directive "from __future__ import
annotations" to this module to match project conventions so the
forward-reference type syntax used in the parse_list_row function signature
(parse_list_row) is consistent with the rest of the codebase; place the import
as the first non-shebang line of scripts/auto/generate_boost_project_configs.py
before any other imports or definitions.
In `@scripts/auto/setup_project.py`:
- Around line 804-809: The loop over config_files treats any non-zero result
from run_one_config as failure, but run_one_config uses return code 2 to
indicate a graceful skip (clone failure); update the logic in the loop that
calls run_one_config (the block that assigns result and sets exit_code) so that
it only marks failure (set exit_code = 1) when result is not 0 and not 2; for
result == 2 log or print a clear "skipped" message and continue without changing
exit_code. Ensure references to run_one_config, result, and exit_code are
adjusted accordingly.
In `@weblate/boost_endpoint/serializers.py`:
- Around line 34-44: The ListField `extensions` currently uses
child=serializers.CharField(allow_blank=True) which permits empty strings (e.g.
[""]) to pass validation; change the child to
serializers.CharField(allow_blank=False) (or add a validate_extensions(self,
value) method to raise a ValidationError if any item == "") so the serializer
rejects empty-extension strings early; update the `extensions` field definition
(the ListField/child CharField) accordingly to ensure empty strings are invalid.
In `@weblate/boost_endpoint/services.py`:
- Around line 359-393: The _sync_component_for_translation calls are executed
inside the with transaction.atomic() block; move both calls (the ones invoked
when created=True and created=False) to after the atomic block so that only
Component.objects.get_or_create(...), setting component.acting_user,
component.post_create(user, origin="boost_endpoint") and any
component.save(update_fields=...) remain inside the transaction; ensure you
still call self._sync_component_for_translation(component, request,
created=True/False) immediately after the with block returns and keep the
conditional self.add_language_to_component(component, request) behavior intact.
- Around line 641-643: The code trusts the external submodule string when
creating temp_submodule_dir (temp_submodule_dir = os.path.join(self.temp_dir,
submodule)) which allows path traversal; fix by resolving and validating the
resulting path before creating directories: compute the absolute/real path for
temp_submodule_dir (using os.path.abspath or os.path.realpath) and ensure it is
inside self.temp_dir (e.g., startswith the resolved self.temp_dir plus os.sep)
or reject/raise a validation error for bad submodule values; alternatively
sanitize submodule to disallow path separators before calling os.path.join and
then call os.makedirs(temp_submodule_dir, exist_ok=True).
- Around line 684-696: The deletion loop currently iterates
project.component_set.all() and removes any non-glossary component whose slug is
not in wanted_slugs; restrict the scope by filtering components to only those
matching the computed prefix so you don't touch manually created or
cross-submodule components: use the existing prefix and iterate only over
project.component_set.filter(slug__startswith=prefix) (or equivalent slug
starts-with check) and then keep the same check against wanted_slugs and use
_delete_component_and_commit_removal to remove unmatched items; continue to
append warnings and errors to result as before.
- Around line 628-648: process_submodule currently assumes self.temp_dir is set
and will TypeError if None; fix by lazily initializing self.temp_dir at the
start of process_submodule (before computing temp_submodule_dir) when it is
falsy — e.g. set self.temp_dir = tempfile.mkdtemp(...) or create a
TemporaryDirectory and store its path; then continue to compute
temp_submodule_dir = os.path.join(self.temp_dir, submodule) and proceed (this
ensures calling process_submodule directly works and keeps existing behavior
with process_all). Ensure tempfile is imported and choose the same cleanup
approach used elsewhere if any.
In `@weblate/boost_endpoint/views.py`:
- Around line 29-32: The AddOrUpdateView currently uses IsAuthenticated which is
too permissive for admin-level operations; update the view's permission_classes
on AddOrUpdateView to use IsAdminUser (or a project-admin custom permission)
instead of IsAuthenticated and add/update the corresponding import from
rest_framework.permissions so only admins (or the designated custom permission
class) can invoke the clone/create/update actions.
- Around line 67-72: service.process_all can raise various exceptions (git, IO,
DB) which currently bubble up as raw 500s; wrap the call to
service.process_all(...) in a try/except, catch Exception, log the full
traceback (e.g. logger.exception) and return a structured DRF Response
containing an error message and optionally the exception string with
status=status.HTTP_500_INTERNAL_SERVER_ERROR instead of returning the raw
exception; ensure Response and status are used consistently (replacing the
current direct return of results and status.HTTP_200_OK on error path) and add
logger import if not present.
---
Nitpick comments:
In @.gitignore:
- Line 47: The .cursor/ entry is unanchored and will match .cursor directories
at any depth; change it to be root-anchored to match the style of .vscode/* by
prepending a leading slash so update the .cursor/ entry to / .cursor/ (i.e.,
make it "/.cursor/") in the .gitignore to only ignore the repo-root Cursor IDE
directory consistent with .vscode/*.
In `@scripts/auto/add_translation.py`:
- Around line 273-274: Remove the dead no-op block that checks if "total" in
translation and does nothing; either delete the two lines containing `if "total"
in translation: pass` (and its placeholder comment) or replace them with an
actual log/emission of stats using `translation.get("translated")` and
`translation.get("total")` (e.g., call the existing logger or emit a metric) so
the intent in the comment is implemented; locate the check around the
`translation` variable in add_translation.py and remove or implement it
accordingly.
In `@scripts/auto/collect_boost_libraries_extensions.py`:
- Around line 424-479: The code in write_submodule_output builds
submodule_out_path by calling out_path.replace(".txt", "_submodules.txt"), which
incorrectly replaces the first ".txt" anywhere in the path; change this to use
pathlib to operate on the filename only: create a Path from out_path, derive a
new name by appending "_submodules" to the stem (or replace the suffix) and then
convert back to string (update the submodule_out_path creation logic that
currently references out_path and submodule_out_path). Ensure behavior when
out_path has no suffix still appends "_submodules" and preserve encoding/writing
logic unchanged.
- Around line 375-395: The input/return tuple asymmetry should be made explicit
with type aliases: define a Library = Tuple[str, str, str] (or Optional[str] if
subpath can be None) and an ExtensionRow = Tuple[str, str, str, str], then
update the function signature of fetch_extensions_for_libraries to accept
all_libraries: List[Library] and return List[ExtensionRow]; also update local
variable rows_with_ext: List[ExtensionRow] to use the alias so the types for
all_libraries, rows_with_ext and the return value (in
fetch_extensions_for_libraries) are clear and consistent.
In `@scripts/auto/create_component.py`:
- Line 645: The call to creator.get_statistics(project_slug, component_slug) is
unused and causes an unnecessary API request (and log) via _make_request; remove
this dead call from the create_component flow (delete the line invoking
get_statistics) or, if statistics are required, assign its return to a variable
and use it (e.g., stats = creator.get_statistics(...)) wherever needed; locate
the invocation by the symbol get_statistics in the create_component script and
remove or consume its result accordingly.
In `@scripts/auto/delete_all_components.py`:
- Around line 62-70: The HTTP calls currently risk hanging because no timeout is
set; update the requests in list_components (where self.session.get(...) is
called) to include a timeout (e.g., timeout=30) and likewise add timeout=30 to
both self.session.delete(...) calls used in delete_project and delete_component;
you can apply the per-call timeout to each request to fix the issue immediately.
In `@scripts/auto/setup_project.py`:
- Around line 206-217: truncate_component_name and truncate_component_slug
accept max_len but ignore it; update both functions (truncate_component_name,
truncate_component_slug) to compute head and tail slice lengths from the
provided max_len rather than always using TRUNCATE_*_HEAD/Tail constants.
Specifically: if len(input) <= max_len return as before; otherwise compute sep =
TRUNCATE_NAME_SEP or TRUNCATE_SLUG_SEP and available = max_len - len(sep), then
choose head = min(CONST_HEAD, available - CONST_TAIL) and tail = available -
head (or distribute proportionally but ensure head+tail+len(sep)==max_len and
neither is negative), and return input[:head] + sep + input[-tail:]; apply same
logic for slug using TRUNCATE_SLUG_* constants so callers who pass a smaller
max_len get output <= max_len.
In `@scripts/auto/upload_translations.py`:
- Around line 177-181: The multi-line print call in upload_translations.py is
visually misaligned: move the f-string lines and the closing parenthesis to
align under the opening "print(" so the continuation lines belong to the print
call (e.g., ensure the f-string lines start at the same column as the first
argument and the closing ")" aligns with "print("). Locate the offending
print(...) that references Accepted={accepted}, Skipped={skipped}, Not
found={not_found} and adjust indentation so the f-strings and closing
parenthesis are indented consistently with the print call.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.gitignoreSetup-ongoing.mdscripts/auto/add-or-update.shscripts/auto/add_translation.pyscripts/auto/boost-1.90.0_libraries_list.txtscripts/auto/boost-1.90.0_libraries_list_submodules.txtscripts/auto/collect_boost_libraries_extensions.pyscripts/auto/create_component.pyscripts/auto/create_component_and_add_translation.pyscripts/auto/delete_all_components.pyscripts/auto/generate_boost_project_configs.pyscripts/auto/setup_project.pyscripts/auto/upload_translations.pyscripts/auto/weblate_supported_formats.jsonweblate/boost_endpoint/__init__.pyweblate/boost_endpoint/apps.pyweblate/boost_endpoint/serializers.pyweblate/boost_endpoint/services.pyweblate/boost_endpoint/urls.pyweblate/boost_endpoint/views.pyweblate/urls.py
💤 Files with no reviewable changes (1)
- Setup-ongoing.md
wpak-ai
left a comment
There was a problem hiding this comment.
-
Add a README.md under scripts/auto that describes what the files in that directory do (testing only, development purposes only, or if used in production, specify which).
-
In addition, address the following comments: #4 (comment)
#4 (comment)
| #!/bin/bash | ||
| # Add or update Boost documentation components via Weblate endpoint | ||
| # Usage: ./add-or-update.sh | ||
| # Reads weblate_url and api_token from web.json (same directory). Requires jq. |
There was a problem hiding this comment.
Mention the purpose of this script at the top.
This is clearly for testing purposes only.
Otherwise, there shouldn't be any hard coded texts.
…ponents from the documentations of Boost library repositories.
Summary by CodeRabbit