feat(telemetry): add app_name and build_sha with remote provenance to context by jackfish212 · Pull Request #2190 · MoonshotAI/kimi-cli

jackfish212 · 2026-05-08T12:19:26Z

Summary

This PR improves telemetry observability in two areas: compaction trigger attribution and build provenance tracking.

1. Distinguish manual and manual-with-prompt compaction triggers

Problem: Previously, both /compact without arguments and /compact <instruction> emitted trigger_type=manual. This made it impossible to tell from telemetry whether a compaction was a simple manual cleanup or a user-guided summary with custom instructions.

What was done:

Added an explicit manual: bool = False keyword argument to KimiSoul.compact_context() (src/kimi_cli/soul/kimisoul.py).
Updated the trigger-reason logic to produce three distinct values:
- auto — system-triggered compaction (token limit reached)
- manual — user ran /compact without instructions (manual=True, empty custom_instruction)
- manual-with-prompt — user ran /compact <custom-instruction> (manual=True, non-empty custom_instruction)
Updated the /compact slash-command handler (src/kimi_cli/soul/slash.py) to pass manual=True.
Updated telemetry tests (tests/telemetry/test_instrumentation.py):
- Renamed the old test_manual_compaction_success_emits_event to test_manual_compaction_without_prompt_emits_event.
- Added a new test_manual_compaction_with_prompt_emits_event asserting trigger_type=manual-with-prompt.

2. Add app_name and build_sha with remote provenance to telemetry context

Problem: Telemetry events currently carry version but no build identifier. When investigating crashes or behavioural anomalies, it is hard to distinguish:

Official releases vs. development installs
Upstream builds vs. fork builds
Clean builds vs. dirty working trees

Additionally, because forks and secondary development builds send telemetry using the same app_name and event schema, there is no reliable way to tell whether a telemetry stream originates from the official MoonshotAI/kimi-cli repository or from a forked/derivative build. This makes it impossible to filter out non-official traffic when analysing product metrics or debugging issues reported by users.

What was done:

New script: scripts/inject_build_sha.py
- Detects the git origin remote URL and normalizes it (git@github.com:user/repo.git → github.com/user/repo).
- Detects the current commit SHA (via git rev-parse HEAD or KIMI_BUILD_SHA env var).
- Writes src/kimi_cli/_build_info.py containing BUILD_SHA = "remote@sha".
- Called at build time before wheel and PyInstaller packaging.
src/kimi_cli/constant.py:
- Added get_build_sha() with a tiered resolution strategy:
  1. Hardcoded BUILD_SHA from _build_info.py (wheel / PyInstaller)
  2. KIMI_BUILD_SHA environment variable (CI override)
  3. Live git remote get-url origin + git rev-parse HEAD (dev mode)
  4. Empty string fallback
- Added _normalize_remote() helper to strip protocols and .git suffixes.
src/kimi_cli/telemetry/sink.py:
- Injected app_name (="Kimi Code CLI") and build_sha (from get_build_sha()) into the static event context dictionary.
src/kimi_cli/utils/pyinstaller.py:
- Added "kimi_cli._build_info" to hiddenimports so PyInstaller bundles the generated module.
Makefile:
- build-kimi-cli, build-bin, and build-bin-onedir targets now run scripts/inject_build_sha.py before packaging.
kimi.spec:
- Runs scripts/inject_build_sha.py during Analysis so one-file / one-dir binaries also carry provenance.
.gitignore:
- Ignores src/kimi_cli/_build_info.py so the generated file never leaks into git.

Outcome: Every telemetry event now carries a build_sha field in remote@sha form (e.g. github.com/MoonshotAI/kimi-cli@d5b10f4f). This allows the analytics pipeline to unambiguously identify official upstream builds and filter out or segment fork/derivative traffic.

Commits

feat(compaction): distinguish manual and manual-with-prompt triggers in telemetry
feat(telemetry): add app_name and build_sha with remote provenance to context

…in telemetry

devin-ai-integration

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-05-08T12:24:00Z

 from kimi_cli.utils.pyinstaller import datas, hiddenimports

+# Inject build SHA (with remote origin) for telemetry provenance before Analysis runs.
+runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"))


🟡 runpy.run_path in kimi.spec never calls main() due to __name__ guard

The runpy.run_path(...) call on line 10 of kimi.spec is intended to inject the build SHA before PyInstaller's Analysis runs (per the comment on line 9). However, runpy.run_path sets __name__ to '<run_path>' by default, so the if __name__ == "__main__": guard at scripts/inject_build_sha.py:100 never fires, and main() is never called. The _build_info.py file is not written by this call.

This is currently masked because the Makefile targets (build-bin at line 140, build-bin-onedir at line 147) explicitly run the script before invoking PyInstaller. But if anyone runs pyinstaller kimi.spec directly (e.g. in local development or a custom CI step), the build SHA injection will silently not happen.

Suggested change

runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"))

runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"), run_name="__main__")

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-08T12:24:02Z

+    build_id = _assemble(remote, sha)
+
+    target = _resolve_project_root() / "src" / "kimi_cli" / "_build_info.py"
+    target.write_text(f'BUILD_SHA = "{build_id}"\n')


🔴 Unescaped string interpolation in generated _build_info.py enables code injection via crafted git remote URL

In scripts/inject_build_sha.py:95, build_id (which includes a normalized git remote URL) is interpolated directly into a Python source string without escaping:

target.write_text(f'BUILD_SHA = "{build_id}"\n')

If the git remote URL contains " characters (e.g. git@evil.com:";__import__('os').system('evil')#.git), the generated _build_info.py will contain syntactically valid Python that executes arbitrary code when imported. While the attack vector requires controlling the git remote URL of the build repo, using repr() would eliminate the risk entirely.

Suggested change

target.write_text(f'BUILD_SHA = "{build_id}"\n')

target.write_text(f'BUILD_SHA = {build_id!r}\n')

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-08T12:24:03Z

+
+    if remote and sha:
+        return f"{remote}@{sha}"
+    return sha


🟡 Runtime get_build_sha() discards remote when SHA is empty, inconsistent with build-time _assemble()

The build-time function scripts/inject_build_sha.py:86 uses return sha or remote, which preserves the remote URL even when SHA is empty. But the runtime fallback in src/kimi_cli/constant.py:95 uses return sha, which discards a successfully detected remote when SHA detection fails. This means the two code paths produce different results for the same (remote available, SHA unavailable) scenario. The runtime path should match the build-time behavior and fall back to remote when sha is empty.

Suggested change

return sha

return sha or remote

Was this helpful? React with 👍 or 👎 to provide feedback.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5b10f4fb6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T12:24:14Z

+    for prefix in ("git@", "https://", "http://"):
+        if url.startswith(prefix):
+            url = url[len(prefix) :]
+    # Remove trailing .git


Redact credentials from origin URL before telemetry tagging

The remote normalization keeps URL userinfo intact, so an origin like https://user:token@github.com/org/repo.git becomes part of BUILD_SHA and is later sent in telemetry context as build_sha. This leaks secrets from developer/CI remotes to telemetry whenever credentials are embedded in origin; strip user/password (userinfo) before composing the build identifier.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-08T12:24:14Z

+        from kimi_cli._build_info import BUILD_SHA  # type: ignore[reportMissingImports]
+        return cast(str, BUILD_SHA)
+    except ImportError:


Avoid stale _build_info overriding current SHA detection

get_build_sha() returns kimi_cli._build_info.BUILD_SHA immediately, before checking KIMI_BUILD_SHA or live git state. Because _build_info.py is generated and gitignored, it can persist across checkouts and cause telemetry to report an old commit/remote indefinitely in local/dev runs, while also preventing env-based overrides from taking effect.

Useful? React with 👍 / 👎.

jackfish212 · 2026-05-09T04:06:09Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 138d495f23

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T04:09:59Z

+    try:
+        from kimi_cli._build_info import BUILD_SHA  # type: ignore[reportMissingImports]
+        return cast(str, BUILD_SHA)
+    except ImportError:


Avoid stale injected SHA in git working-tree runs

When _build_info.py exists (it is generated by make build* and left untracked), this branch returns that embedded value before checking live git state, so later runs from the same repo can report an old commit after new commits are made. This makes telemetry provenance inaccurate for normal developer workflows unless the file is manually regenerated/removed each time; prefer live git metadata when a .git workspace is available, and only fall back to _build_info.py for packaged/frozen installs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-09T04:09:59Z

+    if ":" in url and "/" not in url.split(":", 1)[0]:
+        url = url.replace(":", "/", 1)


Preserve remote port during URL normalization

This colon-to-slash rewrite also triggers for remotes that include an explicit port (for example ssh://git@host:7999/org/repo.git), producing host/7999/org/repo instead of preserving host:7999/.... That changes the repository identity and can cause incorrect provenance grouping/collisions for self-hosted Git servers that rely on non-default ports.

Useful? React with 👍 / 👎.

… context - Hardcode build identifier (remote@sha) into _build_info.py at build time - Inject app_name (Kimi Code CLI) and build_sha into every telemetry event context - Support wheel, PyInstaller, and dev mode detection via git origin remote - Add scripts/inject_build_sha.py for build-time SHA injection - Update Makefile and kimi.spec to inject build info before packaging

…apshot

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d077ca1654

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T12:13:04Z

+    build_id = _assemble(remote, sha)
+
+    target = _resolve_project_root() / "src" / "kimi_cli" / "_build_info.py"
+    target.write_text(f'BUILD_SHA = "{build_id}"\n')


Escape build_id before writing _build_info.py

target.write_text(f'BUILD_SHA = "{build_id}"\n') writes raw remote text into a Python string literal, so valid remotes containing backslashes or quotes (e.g. a Windows local-path origin like C:\Users\me\repo) generate syntactically invalid _build_info.py. At runtime get_build_sha() imports this module and only catches ImportError (not SyntaxError), which can crash startup when telemetry initializes; serialize/escape the value (e.g. via !r/json.dumps) before writing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-09T12:13:04Z

+    if ":" in url and "/" not in url.split(":", 1)[0]:
+        url = url.replace(":", "/", 1)


Handle ssh:// remotes before colon rewriting

The colon rewrite assumes SCP-style remotes, but for URI-style SSH remotes without userinfo (for example ssh://host/org/repo.git) it rewrites the scheme separator and produces ssh///host/org/repo, corrupting repository identity in telemetry. This regression affects runtime provenance (src/kimi_cli/constant.py) and build-time injection (scripts/inject_build_sha.py) because both use the same normalization logic.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87c1b742d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T12:41:37Z

+    for prefix in ("git@", "https://", "http://"):
+        if url.startswith(prefix):
+            url = url[len(prefix) :]
+    # Remove userinfo (e.g. user:pass@)
+    if "@" in url:


Exclude local-path remotes from telemetry build IDs

The remote normalization only handles git@, http://, and https://, so origins configured as local filesystem paths (for example /home/alice/repo.git or file:///opt/mirror/repo.git) are preserved and then embedded into BUILD_SHA. Because build_sha is attached to every telemetry event, this can leak developer or CI host path details (including usernames/internal directory structure) to telemetry backends. Please treat non-network remotes as unsupported (empty) or sanitize them before composing the build identifier.

Useful? React with 👍 / 👎.

feat(compaction): distinguish manual and manual-with-prompt triggers …

10831e5

…in telemetry

devin-ai-integration Bot reviewed May 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

jackfish212 force-pushed the main branch from d5b10f4 to 138d495 Compare May 8, 2026 13:51

github-actions Bot mentioned this pull request May 9, 2026

📊 AI CLI 工具社区动态日报 2026-05-09 gsscsd/big_model_radar#315

Open

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

jackfish212 force-pushed the main branch from 138d495 to dd54539 Compare May 9, 2026 11:46

test(utils): add kimi_cli._build_info to pyinstaller hiddenimports sn…

d077ca1

…apshot

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

Merge branch 'main' into main

87c1b74

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

jackfish212 merged commit b36825b into MoonshotAI:main May 9, 2026
14 checks passed

This was referenced May 10, 2026

📊 AI CLI 工具社区动态日报 2026-05-10 gsscsd/big_model_radar#320

Open

📊 AI CLI 工具社区动态日报 2026-05-10 ivanweng2077/big_model_radar#20

Open

	runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"))
	runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"), run_name="__main__")

	target.write_text(f'BUILD_SHA = "{build_id}"\n')
	target.write_text(f'BUILD_SHA = {build_id!r}\n')

		if ":" in url and "/" not in url.split(":", 1)[0]:
		url = url.replace(":", "/", 1)

Conversation

jackfish212 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Distinguish manual and manual-with-prompt compaction triggers

2. Add app_name and build_sha with remote provenance to telemetry context

Commits

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

jackfish212 commented May 9, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jackfish212 commented May 8, 2026 •

edited

Loading