feat(telemetry): add app_name and build_sha with remote provenance to context#2190
feat(telemetry): add app_name and build_sha with remote provenance to context#2190jackfish212 merged 4 commits intoMoonshotAI:mainfrom
Conversation
| from kimi_cli.utils.pyinstaller import datas, hiddenimports | ||
|
|
||
| # Inject build SHA (with remote origin) for telemetry provenance before Analysis runs. | ||
| runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py")) |
There was a problem hiding this comment.
🟡 runpy.run_path in kimi.spec never calls main() due to __name__ guard
The runpy.run_path(...) call on line 10 of kimi.spec is intended to inject the build SHA before PyInstaller's Analysis runs (per the comment on line 9). However, runpy.run_path sets __name__ to '<run_path>' by default, so the if __name__ == "__main__": guard at scripts/inject_build_sha.py:100 never fires, and main() is never called. The _build_info.py file is not written by this call.
This is currently masked because the Makefile targets (build-bin at line 140, build-bin-onedir at line 147) explicitly run the script before invoking PyInstaller. But if anyone runs pyinstaller kimi.spec directly (e.g. in local development or a custom CI step), the build SHA injection will silently not happen.
| runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py")) | |
| runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"), run_name="__main__") |
Was this helpful? React with 👍 or 👎 to provide feedback.
| build_id = _assemble(remote, sha) | ||
|
|
||
| target = _resolve_project_root() / "src" / "kimi_cli" / "_build_info.py" | ||
| target.write_text(f'BUILD_SHA = "{build_id}"\n') |
There was a problem hiding this comment.
🔴 Unescaped string interpolation in generated _build_info.py enables code injection via crafted git remote URL
In scripts/inject_build_sha.py:95, build_id (which includes a normalized git remote URL) is interpolated directly into a Python source string without escaping:
target.write_text(f'BUILD_SHA = "{build_id}"\n')If the git remote URL contains " characters (e.g. git@evil.com:";__import__('os').system('evil')#.git), the generated _build_info.py will contain syntactically valid Python that executes arbitrary code when imported. While the attack vector requires controlling the git remote URL of the build repo, using repr() would eliminate the risk entirely.
| target.write_text(f'BUILD_SHA = "{build_id}"\n') | |
| target.write_text(f'BUILD_SHA = {build_id!r}\n') |
Was this helpful? React with 👍 or 👎 to provide feedback.
|
|
||
| if remote and sha: | ||
| return f"{remote}@{sha}" | ||
| return sha |
There was a problem hiding this comment.
🟡 Runtime get_build_sha() discards remote when SHA is empty, inconsistent with build-time _assemble()
The build-time function scripts/inject_build_sha.py:86 uses return sha or remote, which preserves the remote URL even when SHA is empty. But the runtime fallback in src/kimi_cli/constant.py:95 uses return sha, which discards a successfully detected remote when SHA detection fails. This means the two code paths produce different results for the same (remote available, SHA unavailable) scenario. The runtime path should match the build-time behavior and fall back to remote when sha is empty.
| return sha | |
| return sha or remote |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5b10f4fb6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for prefix in ("git@", "https://", "http://"): | ||
| if url.startswith(prefix): | ||
| url = url[len(prefix) :] | ||
| # Remove trailing .git |
There was a problem hiding this comment.
Redact credentials from origin URL before telemetry tagging
The remote normalization keeps URL userinfo intact, so an origin like https://user:token@github.com/org/repo.git becomes part of BUILD_SHA and is later sent in telemetry context as build_sha. This leaks secrets from developer/CI remotes to telemetry whenever credentials are embedded in origin; strip user/password (userinfo) before composing the build identifier.
Useful? React with 👍 / 👎.
| from kimi_cli._build_info import BUILD_SHA # type: ignore[reportMissingImports] | ||
| return cast(str, BUILD_SHA) | ||
| except ImportError: |
There was a problem hiding this comment.
Avoid stale _build_info overriding current SHA detection
get_build_sha() returns kimi_cli._build_info.BUILD_SHA immediately, before checking KIMI_BUILD_SHA or live git state. Because _build_info.py is generated and gitignored, it can persist across checkouts and cause telemetry to report an old commit/remote indefinitely in local/dev runs, while also preventing env-based overrides from taking effect.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 138d495f23
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| try: | ||
| from kimi_cli._build_info import BUILD_SHA # type: ignore[reportMissingImports] | ||
| return cast(str, BUILD_SHA) | ||
| except ImportError: |
There was a problem hiding this comment.
Avoid stale injected SHA in git working-tree runs
When _build_info.py exists (it is generated by make build* and left untracked), this branch returns that embedded value before checking live git state, so later runs from the same repo can report an old commit after new commits are made. This makes telemetry provenance inaccurate for normal developer workflows unless the file is manually regenerated/removed each time; prefer live git metadata when a .git workspace is available, and only fall back to _build_info.py for packaged/frozen installs.
Useful? React with 👍 / 👎.
| if ":" in url and "/" not in url.split(":", 1)[0]: | ||
| url = url.replace(":", "/", 1) |
There was a problem hiding this comment.
Preserve remote port during URL normalization
This colon-to-slash rewrite also triggers for remotes that include an explicit port (for example ssh://git@host:7999/org/repo.git), producing host/7999/org/repo instead of preserving host:7999/.... That changes the repository identity and can cause incorrect provenance grouping/collisions for self-hosted Git servers that rely on non-default ports.
Useful? React with 👍 / 👎.
… context - Hardcode build identifier (remote@sha) into _build_info.py at build time - Inject app_name (Kimi Code CLI) and build_sha into every telemetry event context - Support wheel, PyInstaller, and dev mode detection via git origin remote - Add scripts/inject_build_sha.py for build-time SHA injection - Update Makefile and kimi.spec to inject build info before packaging
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d077ca1654
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| build_id = _assemble(remote, sha) | ||
|
|
||
| target = _resolve_project_root() / "src" / "kimi_cli" / "_build_info.py" | ||
| target.write_text(f'BUILD_SHA = "{build_id}"\n') |
There was a problem hiding this comment.
Escape build_id before writing _build_info.py
target.write_text(f'BUILD_SHA = "{build_id}"\n') writes raw remote text into a Python string literal, so valid remotes containing backslashes or quotes (e.g. a Windows local-path origin like C:\Users\me\repo) generate syntactically invalid _build_info.py. At runtime get_build_sha() imports this module and only catches ImportError (not SyntaxError), which can crash startup when telemetry initializes; serialize/escape the value (e.g. via !r/json.dumps) before writing.
Useful? React with 👍 / 👎.
| if ":" in url and "/" not in url.split(":", 1)[0]: | ||
| url = url.replace(":", "/", 1) |
There was a problem hiding this comment.
Handle ssh:// remotes before colon rewriting
The colon rewrite assumes SCP-style remotes, but for URI-style SSH remotes without userinfo (for example ssh://host/org/repo.git) it rewrites the scheme separator and produces ssh///host/org/repo, corrupting repository identity in telemetry. This regression affects runtime provenance (src/kimi_cli/constant.py) and build-time injection (scripts/inject_build_sha.py) because both use the same normalization logic.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87c1b742d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for prefix in ("git@", "https://", "http://"): | ||
| if url.startswith(prefix): | ||
| url = url[len(prefix) :] | ||
| # Remove userinfo (e.g. user:pass@) | ||
| if "@" in url: |
There was a problem hiding this comment.
Exclude local-path remotes from telemetry build IDs
The remote normalization only handles git@, http://, and https://, so origins configured as local filesystem paths (for example /home/alice/repo.git or file:///opt/mirror/repo.git) are preserved and then embedded into BUILD_SHA. Because build_sha is attached to every telemetry event, this can leak developer or CI host path details (including usernames/internal directory structure) to telemetry backends. Please treat non-network remotes as unsupported (empty) or sanitize them before composing the build identifier.
Useful? React with 👍 / 👎.
Summary
This PR improves telemetry observability in two areas: compaction trigger attribution and build provenance tracking.
1. Distinguish manual and manual-with-prompt compaction triggers
Problem: Previously, both
/compactwithout arguments and/compact <instruction>emittedtrigger_type=manual. This made it impossible to tell from telemetry whether a compaction was a simple manual cleanup or a user-guided summary with custom instructions.What was done:
manual: bool = Falsekeyword argument toKimiSoul.compact_context()(src/kimi_cli/soul/kimisoul.py).auto— system-triggered compaction (token limit reached)manual— user ran/compactwithout instructions (manual=True, emptycustom_instruction)manual-with-prompt— user ran/compact <custom-instruction>(manual=True, non-emptycustom_instruction)/compactslash-command handler (src/kimi_cli/soul/slash.py) to passmanual=True.tests/telemetry/test_instrumentation.py):test_manual_compaction_success_emits_eventtotest_manual_compaction_without_prompt_emits_event.test_manual_compaction_with_prompt_emits_eventassertingtrigger_type=manual-with-prompt.2. Add app_name and build_sha with remote provenance to telemetry context
Problem: Telemetry events currently carry
versionbut no build identifier. When investigating crashes or behavioural anomalies, it is hard to distinguish:Additionally, because forks and secondary development builds send telemetry using the same
app_nameand event schema, there is no reliable way to tell whether a telemetry stream originates from the officialMoonshotAI/kimi-clirepository or from a forked/derivative build. This makes it impossible to filter out non-official traffic when analysing product metrics or debugging issues reported by users.What was done:
scripts/inject_build_sha.pyoriginremote URL and normalizes it (git@github.com:user/repo.git→github.com/user/repo).git rev-parse HEADorKIMI_BUILD_SHAenv var).src/kimi_cli/_build_info.pycontainingBUILD_SHA = "remote@sha".src/kimi_cli/constant.py:get_build_sha()with a tiered resolution strategy:BUILD_SHAfrom_build_info.py(wheel / PyInstaller)KIMI_BUILD_SHAenvironment variable (CI override)git remote get-url origin+git rev-parse HEAD(dev mode)_normalize_remote()helper to strip protocols and.gitsuffixes.src/kimi_cli/telemetry/sink.py:app_name(="Kimi Code CLI") andbuild_sha(fromget_build_sha()) into the static event context dictionary.src/kimi_cli/utils/pyinstaller.py:"kimi_cli._build_info"tohiddenimportsso PyInstaller bundles the generated module.Makefile:build-kimi-cli,build-bin, andbuild-bin-onedirtargets now runscripts/inject_build_sha.pybefore packaging.kimi.spec:scripts/inject_build_sha.pyduringAnalysisso one-file / one-dir binaries also carry provenance..gitignore:src/kimi_cli/_build_info.pyso the generated file never leaks into git.Outcome: Every telemetry event now carries a
build_shafield inremote@shaform (e.g.github.com/MoonshotAI/kimi-cli@d5b10f4f). This allows the analytics pipeline to unambiguously identify official upstream builds and filter out or segment fork/derivative traffic.Commits
feat(compaction): distinguish manual and manual-with-prompt triggers in telemetryfeat(telemetry): add app_name and build_sha with remote provenance to context