Skip to content

feat(telemetry): add app_name and build_sha with remote provenance to context#2190

Merged
jackfish212 merged 4 commits intoMoonshotAI:mainfrom
jackfish212:main
May 9, 2026
Merged

feat(telemetry): add app_name and build_sha with remote provenance to context#2190
jackfish212 merged 4 commits intoMoonshotAI:mainfrom
jackfish212:main

Conversation

@jackfish212
Copy link
Copy Markdown
Collaborator

@jackfish212 jackfish212 commented May 8, 2026

Summary

This PR improves telemetry observability in two areas: compaction trigger attribution and build provenance tracking.


1. Distinguish manual and manual-with-prompt compaction triggers

Problem: Previously, both /compact without arguments and /compact <instruction> emitted trigger_type=manual. This made it impossible to tell from telemetry whether a compaction was a simple manual cleanup or a user-guided summary with custom instructions.

What was done:

  • Added an explicit manual: bool = False keyword argument to KimiSoul.compact_context() (src/kimi_cli/soul/kimisoul.py).
  • Updated the trigger-reason logic to produce three distinct values:
    • auto — system-triggered compaction (token limit reached)
    • manual — user ran /compact without instructions (manual=True, empty custom_instruction)
    • manual-with-prompt — user ran /compact <custom-instruction> (manual=True, non-empty custom_instruction)
  • Updated the /compact slash-command handler (src/kimi_cli/soul/slash.py) to pass manual=True.
  • Updated telemetry tests (tests/telemetry/test_instrumentation.py):
    • Renamed the old test_manual_compaction_success_emits_event to test_manual_compaction_without_prompt_emits_event.
    • Added a new test_manual_compaction_with_prompt_emits_event asserting trigger_type=manual-with-prompt.

2. Add app_name and build_sha with remote provenance to telemetry context

Problem: Telemetry events currently carry version but no build identifier. When investigating crashes or behavioural anomalies, it is hard to distinguish:

  • Official releases vs. development installs
  • Upstream builds vs. fork builds
  • Clean builds vs. dirty working trees

Additionally, because forks and secondary development builds send telemetry using the same app_name and event schema, there is no reliable way to tell whether a telemetry stream originates from the official MoonshotAI/kimi-cli repository or from a forked/derivative build. This makes it impossible to filter out non-official traffic when analysing product metrics or debugging issues reported by users.

What was done:

  • New script: scripts/inject_build_sha.py
    • Detects the git origin remote URL and normalizes it (git@github.com:user/repo.gitgithub.com/user/repo).
    • Detects the current commit SHA (via git rev-parse HEAD or KIMI_BUILD_SHA env var).
    • Writes src/kimi_cli/_build_info.py containing BUILD_SHA = "remote@sha".
    • Called at build time before wheel and PyInstaller packaging.
  • src/kimi_cli/constant.py:
    • Added get_build_sha() with a tiered resolution strategy:
      1. Hardcoded BUILD_SHA from _build_info.py (wheel / PyInstaller)
      2. KIMI_BUILD_SHA environment variable (CI override)
      3. Live git remote get-url origin + git rev-parse HEAD (dev mode)
      4. Empty string fallback
    • Added _normalize_remote() helper to strip protocols and .git suffixes.
  • src/kimi_cli/telemetry/sink.py:
    • Injected app_name (="Kimi Code CLI") and build_sha (from get_build_sha()) into the static event context dictionary.
  • src/kimi_cli/utils/pyinstaller.py:
    • Added "kimi_cli._build_info" to hiddenimports so PyInstaller bundles the generated module.
  • Makefile:
    • build-kimi-cli, build-bin, and build-bin-onedir targets now run scripts/inject_build_sha.py before packaging.
  • kimi.spec:
    • Runs scripts/inject_build_sha.py during Analysis so one-file / one-dir binaries also carry provenance.
  • .gitignore:
    • Ignores src/kimi_cli/_build_info.py so the generated file never leaks into git.

Outcome: Every telemetry event now carries a build_sha field in remote@sha form (e.g. github.com/MoonshotAI/kimi-cli@d5b10f4f). This allows the analytics pipeline to unambiguously identify official upstream builds and filter out or segment fork/derivative traffic.


Commits

  • feat(compaction): distinguish manual and manual-with-prompt triggers in telemetry
  • feat(telemetry): add app_name and build_sha with remote provenance to context

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread kimi.spec Outdated
from kimi_cli.utils.pyinstaller import datas, hiddenimports

# Inject build SHA (with remote origin) for telemetry provenance before Analysis runs.
runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 runpy.run_path in kimi.spec never calls main() due to __name__ guard

The runpy.run_path(...) call on line 10 of kimi.spec is intended to inject the build SHA before PyInstaller's Analysis runs (per the comment on line 9). However, runpy.run_path sets __name__ to '<run_path>' by default, so the if __name__ == "__main__": guard at scripts/inject_build_sha.py:100 never fires, and main() is never called. The _build_info.py file is not written by this call.

This is currently masked because the Makefile targets (build-bin at line 140, build-bin-onedir at line 147) explicitly run the script before invoking PyInstaller. But if anyone runs pyinstaller kimi.spec directly (e.g. in local development or a custom CI step), the build SHA injection will silently not happen.

Suggested change
runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"))
runpy.run_path(str(Path(__file__).resolve().parent / "scripts" / "inject_build_sha.py"), run_name="__main__")
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

build_id = _assemble(remote, sha)

target = _resolve_project_root() / "src" / "kimi_cli" / "_build_info.py"
target.write_text(f'BUILD_SHA = "{build_id}"\n')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Unescaped string interpolation in generated _build_info.py enables code injection via crafted git remote URL

In scripts/inject_build_sha.py:95, build_id (which includes a normalized git remote URL) is interpolated directly into a Python source string without escaping:

target.write_text(f'BUILD_SHA = "{build_id}"\n')

If the git remote URL contains " characters (e.g. git@evil.com:";__import__('os').system('evil')#.git), the generated _build_info.py will contain syntactically valid Python that executes arbitrary code when imported. While the attack vector requires controlling the git remote URL of the build repo, using repr() would eliminate the risk entirely.

Suggested change
target.write_text(f'BUILD_SHA = "{build_id}"\n')
target.write_text(f'BUILD_SHA = {build_id!r}\n')
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread src/kimi_cli/constant.py

if remote and sha:
return f"{remote}@{sha}"
return sha
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Runtime get_build_sha() discards remote when SHA is empty, inconsistent with build-time _assemble()

The build-time function scripts/inject_build_sha.py:86 uses return sha or remote, which preserves the remote URL even when SHA is empty. But the runtime fallback in src/kimi_cli/constant.py:95 uses return sha, which discards a successfully detected remote when SHA detection fails. This means the two code paths produce different results for the same (remote available, SHA unavailable) scenario. The runtime path should match the build-time behavior and fall back to remote when sha is empty.

Suggested change
return sha
return sha or remote
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5b10f4fb6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +32 to +35
for prefix in ("git@", "https://", "http://"):
if url.startswith(prefix):
url = url[len(prefix) :]
# Remove trailing .git
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Redact credentials from origin URL before telemetry tagging

The remote normalization keeps URL userinfo intact, so an origin like https://user:token@github.com/org/repo.git becomes part of BUILD_SHA and is later sent in telemetry context as build_sha. This leaks secrets from developer/CI remotes to telemetry whenever credentials are embedded in origin; strip user/password (userinfo) before composing the build identifier.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/constant.py
Comment on lines +57 to +59
from kimi_cli._build_info import BUILD_SHA # type: ignore[reportMissingImports]
return cast(str, BUILD_SHA)
except ImportError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid stale _build_info overriding current SHA detection

get_build_sha() returns kimi_cli._build_info.BUILD_SHA immediately, before checking KIMI_BUILD_SHA or live git state. Because _build_info.py is generated and gitignored, it can persist across checkouts and cause telemetry to report an old commit/remote indefinitely in local/dev runs, while also preventing env-based overrides from taking effect.

Useful? React with 👍 / 👎.

@jackfish212
Copy link
Copy Markdown
Collaborator Author

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 138d495f23

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/kimi_cli/constant.py
Comment on lines +61 to +64
try:
from kimi_cli._build_info import BUILD_SHA # type: ignore[reportMissingImports]
return cast(str, BUILD_SHA)
except ImportError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid stale injected SHA in git working-tree runs

When _build_info.py exists (it is generated by make build* and left untracked), this branch returns that embedded value before checking live git state, so later runs from the same repo can report an old commit after new commits are made. This makes telemetry provenance inaccurate for normal developer workflows unless the file is manually regenerated/removed each time; prefer live git metadata when a .git workspace is available, and only fall back to _build_info.py for packaged/frozen installs.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/constant.py
Comment on lines +41 to +42
if ":" in url and "/" not in url.split(":", 1)[0]:
url = url.replace(":", "/", 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve remote port during URL normalization

This colon-to-slash rewrite also triggers for remotes that include an explicit port (for example ssh://git@host:7999/org/repo.git), producing host/7999/org/repo instead of preserving host:7999/.... That changes the repository identity and can cause incorrect provenance grouping/collisions for self-hosted Git servers that rely on non-default ports.

Useful? React with 👍 / 👎.

… context

- Hardcode build identifier (remote@sha) into _build_info.py at build time
- Inject app_name (Kimi Code CLI) and build_sha into every telemetry event context
- Support wheel, PyInstaller, and dev mode detection via git origin remote
- Add scripts/inject_build_sha.py for build-time SHA injection
- Update Makefile and kimi.spec to inject build info before packaging
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d077ca1654

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

build_id = _assemble(remote, sha)

target = _resolve_project_root() / "src" / "kimi_cli" / "_build_info.py"
target.write_text(f'BUILD_SHA = "{build_id}"\n')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape build_id before writing _build_info.py

target.write_text(f'BUILD_SHA = "{build_id}"\n') writes raw remote text into a Python string literal, so valid remotes containing backslashes or quotes (e.g. a Windows local-path origin like C:\Users\me\repo) generate syntactically invalid _build_info.py. At runtime get_build_sha() imports this module and only catches ImportError (not SyntaxError), which can crash startup when telemetry initializes; serialize/escape the value (e.g. via !r/json.dumps) before writing.

Useful? React with 👍 / 👎.

Comment thread src/kimi_cli/constant.py
Comment on lines +41 to +42
if ":" in url and "/" not in url.split(":", 1)[0]:
url = url.replace(":", "/", 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle ssh:// remotes before colon rewriting

The colon rewrite assumes SCP-style remotes, but for URI-style SSH remotes without userinfo (for example ssh://host/org/repo.git) it rewrites the scheme separator and produces ssh///host/org/repo, corrupting repository identity in telemetry. This regression affects runtime provenance (src/kimi_cli/constant.py) and build-time injection (scripts/inject_build_sha.py) because both use the same normalization logic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87c1b742d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +34 to +38
for prefix in ("git@", "https://", "http://"):
if url.startswith(prefix):
url = url[len(prefix) :]
# Remove userinfo (e.g. user:pass@)
if "@" in url:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude local-path remotes from telemetry build IDs

The remote normalization only handles git@, http://, and https://, so origins configured as local filesystem paths (for example /home/alice/repo.git or file:///opt/mirror/repo.git) are preserved and then embedded into BUILD_SHA. Because build_sha is attached to every telemetry event, this can leak developer or CI host path details (including usernames/internal directory structure) to telemetry backends. Please treat non-network remotes as unsupported (empty) or sanitize them before composing the build identifier.

Useful? React with 👍 / 👎.

@jackfish212 jackfish212 merged commit b36825b into MoonshotAI:main May 9, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants