fix(context): cap list_directory to 500 entries (GH-1809)#1827
fix(context): cap list_directory to 500 entries (GH-1809)#1827
Conversation
…en-limit blowup (#1809) When the working directory contains thousands of files (e.g. node_modules), list_directory() would enumerate every entry and inject the entire listing into the system prompt via KIMI_WORK_DIR_LS, easily exceeding the model's 262,144-token context window. Add _LIST_DIR_MAX_ENTRIES=500 cap with a trailing "... and N more entries" hint, consistent with existing truncation patterns (glob MAX_MATCHES=1000, git_context _MAX_DIRTY_FILES=20, ToolResultBuilder max_chars=50000).
Only top-level (single depth) entries are listed, so 200 is sufficient for project overview while saving ~3.5K tokens of context budget.
Replace flat single-level ls output with a compact 2-level tree: - Root level: up to 30 entries (dirs first, alphabetical) - Child level: up to 10 entries per subdirectory - Truncated levels show "... and N more" so the LLM knows more exists - System prompt updated to explain tree is partial This gives the model a much better project structure overview while using far fewer tokens than a flat listing of hundreds of entries. Worst-case output: ~240 lines ≈ 1.5K tokens (vs old 200-line flat listing ≈ 2.4K tokens with less structural information).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e129743420
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if is_dir: | ||
| lines.append(f"{connector}{name}/") | ||
| child_prefix = " " if is_last else "│ " | ||
| try: | ||
| child_entries, child_total = await _collect_entries( |
There was a problem hiding this comment.
Prevent recursing into symlinked directories
This new tree expansion traverses any root entry marked as a directory, but entry.stat() in _collect_entries follows symlinks. If a workspace contains a top-level symlink to an external directory (for example /etc or another large path), list_directory() will now enumerate and inject out-of-workspace filenames into KIMI_WORK_DIR_LS, which is a data-scope regression compared to the previous non-recursive behavior. Guard recursion by skipping symlink dirs (or by checking canonical paths remain within work_dir) before calling _collect_entries on children.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Won't fix — analyzed and the impact is negligible:
- Depth is capped at 1 level with a width limit of 10 entries per subdirectory. A symlinked directory only exposes child names, not contents, and no further recursion occurs.
- No sandbox to bypass — the file tools (
ReadFile,WriteFile, etc.) usecanonical()which does not resolve symlinks, andis_within_workspace()is pure string matching. More importantly, these tools already allow reading/writing arbitrary files via absolute paths, so there is no security boundary that symlink expansion could weaken. - LLM self-corrects — before modifying any file, the LLM will call
ReadFile/Globfirst, at which point it sees the content is unrelated to the project and moves on.
The P2 (sort before truncation) has already been fixed in the merged code.
src/kimi_cli/utils/path.py
Outdated
| total += 1 | ||
| if len(entries) >= max_width: | ||
| continue |
There was a problem hiding this comment.
Sort entries before applying width truncation
_collect_entries enforces max_width while iterating and only sorts afterward, so the returned subset depends on raw iterdir() order instead of the documented dirs-first/alphabetical order. In directories with more than the cap, entries that should sort earlier (especially directories) can be dropped entirely if they appear later in filesystem iteration, making the prompt context unstable and misleading. Apply truncation after ordering (or maintain a sorted top-k) to keep selection deterministic.
Useful? React with 👍 / 👎.
Add [drwxr-xr-x] / [-rw-r--r--] / [?] permission prefixes to each entry so the LLM can distinguish executable scripts, read-only files, and stat-failed entries. Extract _format_mode() helper using bit-shift loop for cleaner permission string generation.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a85e1cb867
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
tests/utils/test_list_directory.py
Outdated
| ├── adir/ | ||
| │ └── inside.txt | ||
| ├── emptydir/ | ||
| ├── regular.txt\ |
There was a problem hiding this comment.
Restore mode prefixes in Windows tree snapshot
The updated Windows snapshot omits the [mode] prefix on every line, but list_directory() now always emits each entry as [{mode}] name (including on Windows via src/kimi_cli/utils/path.py in the root and child formatting paths). As written, this test will fail whenever the Windows job runs, so the PR can pass on Linux/macOS while still breaking cross-platform CI coverage.
Useful? React with 👍 / 👎.
| assert out == snapshot( | ||
| """\ | ||
| ├── adir/ | ||
| │ └── inside.txt | ||
| ├── emptydir/ | ||
| ├── regular.txt\ | ||
| """ | ||
| ) |
There was a problem hiding this comment.
🟡 Windows test snapshot missing [mode] formatting and has wrong last-entry connector
The test_list_directory_tree_windows expected snapshot is incorrect. list_directory unconditionally formats every entry with [{mode}] (see src/kimi_cli/utils/path.py:125, :139, :143), and uses └── for the last entry when remaining == 0 (src/kimi_cli/utils/path.py:121-122). The Windows test expects entries without [mode] and uses ├── for the final entry regular.txt instead of └──.
Expected vs actual output on Windows
Expected in test:
├── adir/
│ └── inside.txt
├── emptydir/
├── regular.txt
Actual code would produce:
├── [drwxrwxrwx] adir/
│ └── [-rw-rw-rw-] inside.txt
├── [drwxrwxrwx] emptydir/
└── [-rw-rw-rw-] regular.txt
This test will always fail when run on a Windows platform.
| assert out == snapshot( | |
| """\ | |
| ├── adir/ | |
| │ └── inside.txt | |
| ├── emptydir/ | |
| ├── regular.txt\ | |
| """ | |
| ) | |
| assert out == snapshot( | |
| """\ | |
| ├── [drwxrwxrwx] adir/ | |
| │ └── [-rw-rw-rw-] inside.txt | |
| ├── [drwxrwxrwx] emptydir/ | |
| └── [-rw-rw-rw-] regular.txt\ | |
| """ | |
| ) | |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
✅ Resolved: The [mode] formatting was removed entirely in a subsequent commit ('fix: remove permission display from tree listing'). The current code no longer includes permission strings, and the Windows test snapshot at lines 143-149 now correctly shows entries without [mode] and uses └── for the last entry. Bug is resolved.
|
对比如下: 旧版(flat ls,无深度,无上限)
新版(2 层树 + 深度/宽度限制)
关键差异总结
The comparison is as follows: Old version (flat ls, no depth, no upper limit)
New version (2-level tree + depth/width restrictions)
Summary of key differences
|
Drop [drwxr-xr-x] prefixes — directory entries already end with / which is sufficient for the LLM to distinguish files from directories.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b1a5b4d13
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/kimi_cli/utils/path.py
Outdated
| if len(entries) >= max_width: | ||
| continue |
There was a problem hiding this comment.
Stop scanning full directories after width cap is hit
_collect_entries keeps iterating through every remaining entry even after max_width is reached, and list_directory() invokes it for each expanded top-level directory. In workspaces where early directories are very large (for example node_modules/vendored trees), prompt generation still does full O(n) directory scans and can significantly delay startup despite truncating displayed output; this is a performance regression from the previous top-level-only listing. Consider short-circuiting once overflow is detected (e.g., max_width + 1) instead of computing an exact remainder count.
Useful? React with 👍 / 👎.
… tests - Return "(empty directory)" instead of "" for empty directories - Add token budget comment to _LIST_DIR_ROOT_WIDTH constant - Fix test_truncates_child_width: assert exact prefix instead of .strip() - Add tests: empty dir, unreadable subdir, last-entry-is-dir prefix
_collect_entries now stats and sorts ALL entries before slicing to max_width, ensuring the displayed subset is deterministic regardless of filesystem enumeration order. Previously the first max_width entries by iterdir order were taken then sorted, which could drop directories that happen to appear late in enumeration. Also fix Windows test snapshot: regular.txt is last entry → └── not ├──.
Summary
list_directory()此前无条目上限,在包含数千文件的目录(如 node_modules)中启动 kimi 时,整个文件列表被注入系统提示词KIMI_WORK_DIR_LS,轻松超过模型 262,144 token 上限_LIST_DIR_MAX_ENTRIES = 500截断阈值,超出后追加... and N more entries (use Glob or Shell to explore)提示MAX_MATCHES=1000、git_context_MAX_DIRTY_FILES=20、ToolResultBuildermax_chars=50000)Closes #1809
Test plan
test_list_directory_truncates_large_dirs测试:创建 600 个文件,验证输出被截断为 500 行 + 1 行提示test_list_directory_unix/test_list_directory_windows测试不受影响(文件数远低于阈值)Summary
list_directory()previously had no upper limit for entries. When kimi is started in a directory containing thousands of files (such as node_modules), the entire file list is injected with the system prompt wordKIMI_WORK_DIR_LS, easily exceeding the model's upper limit of 262,144 tokens._LIST_DIR_MAX_ENTRIES = 500truncation threshold, append... and N more entries (use Glob or Shell to explore)prompt after exceedingMAX_MATCHES=1000, git_context_MAX_DIRTY_FILES=20,ToolResultBuildermax_chars=50000)Closes #1809
Test plan
test_list_directory_truncates_large_dirstest: create 600 files, verify output is truncated to 500 lines + 1 line prompttest_list_directory_unix/test_list_directory_windowstests are not affected (number of files is well below the threshold)