Skip to content

feat(v0.6.0): AI-native reposition — remove syntect, add --mode=ast and directory index#120

Open
docdyhr wants to merge 5 commits intomainfrom
feat/v0.6.0
Open

feat(v0.6.0): AI-native reposition — remove syntect, add --mode=ast and directory index#120
docdyhr wants to merge 5 commits intomainfrom
feat/v0.6.0

Conversation

@docdyhr
Copy link
Copy Markdown
Owner

@docdyhr docdyhr commented Apr 12, 2026

Summary

v0.6.0 release — pivots batless from a highlighting tool to a pure AI-native structured output tool. Four phases of work:

  • Remove syntax highlighting: Drop syntect + wizard, remove --mode=highlight, --theme, and related flags. Default mode is now plain.
  • Formatter consolidation: All 5 output modes (Plain, Json, Summary, Index, Ast) implement the Formatter trait in src/formatters/. formatter.rs is a thin dispatcher.
  • Multi-file index mode: batless --mode=index <dir> walks directories, emitting one compact NDJSON line per file.
  • --mode=ast: Raw tree-sitter parse tree as JSON (Rust/Python/JS/TS/TSX; null root for others).

Breaking changes

  • --mode=highlight removed (use bat for terminal highlighting)
  • --theme, --list-themes, --configure, --list-profiles, --edit-profile removed
  • Default output mode: highlightplain
  • syntect 5 and strip-ansi-escapes 0.2 removed from dependencies

New features

# Raw AST output
batless --mode=ast src/lib.rs | jq '.root.type'

# Multi-file index (NDJSON, one line per file)
batless --mode=index src/ | jq -c 'select(.symbol_count > 0) | {file, symbol_count}'

Test plan

  • 365 tests passing (lib: 225, integration: 140), zero failures
  • cargo fmt and cargo clippy clean
  • CI green on this branch

Closes #118

🤖 Generated with Claude Code

Summary by Sourcery

Pivot batless from a syntax-highlighting code viewer to an AI-native structured code analysis tool, removing highlighting/theming and adding new AST and directory index capabilities while consolidating formatters and updating docs and config accordingly.

New Features:

  • Introduce an ast output mode that emits tree-sitter parse trees as structured JSON for supported languages.
  • Add multi-file index support so --mode=index can walk directories and output one NDJSON object per file.
  • Provide usage-tracking helper scripts and statistics tooling for analysing real-world batless invocations.

Enhancements:

  • Replace syntect-based language detection with a static extension map and simplify language utilities.
  • Refactor output formatting so all modes (plain, json, summary, index, ast) implement a common Formatter trait behind a thin dispatcher.
  • Simplify error handling and error codes by removing highlighting/theme-related variants and aligning tests.
  • Tighten configuration and profile handling by dropping theme-related settings and making plain the default output mode.
  • Update documentation (README, ROADMAP, CLAUDE guide, usage docs) to emphasize AI-native structured analysis use cases and document new modes.

Build:

  • Bump crate version to 0.6.0 and update Cargo dependencies, removing syntect and strip-ansi-escapes.

Documentation:

  • Revise README, ROADMAP, and CLAUDE integration docs to describe the AI-focused positioning, new AST and directory index modes, and updated CLI flags.
  • Add developer-facing usage tracking documentation describing how to use the new logging and stats scripts.

Tests:

  • Adjust and extend unit, integration, CLI, and property tests to cover the new output modes, language detection changes, removed highlighting/theme features, and directory index handling.

Chores:

  • Remove the interactive configuration wizard and related CLI flags that conflicted with the automation-first philosophy.

docdyhr and others added 4 commits April 13, 2026 00:00
Adds a transparent shell wrapper (batless-logger) that intercepts every
batless invocation, writes structured NDJSON to ~/.batless/stats/, and
delegates to the real binary unchanged. Errors are caught, reported to
stderr with a GitHub issue link, and appended as separate log entries.

Also adds batless-stats, a Python analyser that reads the NDJSON logs
and reports mode breakdown, AI profile usage, flag frequency, file
extension distribution, hourly patterns, and unique command signatures.
Supports --all, --date, --session, --errors, --commands, and --json flags.

Useful for validating that CLAUDE.md protocol instructions are being
followed in practice (e.g. --profile=claude coverage, escalation path
usage) and for identifying batless usage patterns across AI sessions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dirs v6 updates dirs-sys from 0.4 to 0.5, which drops the old
redox_users 0.4 chain that was pulling in getrandom 0.2.17 alongside
newer versions. All public API calls (home_dir, config_dir) are
unchanged.

The two remaining duplicate getrandom versions (0.3.4 via proptest,
0.4.1 via tempfile) are dev-only deps — harmless and not fixable
without upstream changes.

Closes #117

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
batless's unique value is the structured AI output that built-in Read/Grep/Glob
tools cannot produce: symbol indexes, token-estimated context, semantic chunks,
and content hashes. Syntax highlighting serves human terminal users — a use case
where bat is the better tool.

- ROADMAP: rewrite vision statement; v0.6.0 now explicitly removes syntax
  highlighting (syntect), ThemeManager, interactive wizard, and dead code;
  add "What is NOT on the Roadmap" entry for highlighting/themes
- README: replace "Ultimate Non-Blocking Code Viewer" headline with
  "Machine-Readable Code Analysis for AI and Automation"; rewrite Why section
  to lead with the 5 unique AI output features; update feature comparison
  table, quick start examples, core capabilities, and philosophy blurb to
  reflect the new focus; mark syntax highlighting as deprecated in v0.6.0
- CLAUDE.md: add "AI Assistant Integration: When to Use batless" section
  immediately after Overview, directing AI assistants to use built-in tools
  for routine operations and batless only for the 5 structured output cases

Refs #118

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd directory index

Phase 1 — Remove syntax highlighting:
- Delete src/highlighter.rs (SyntaxHighlighter) and src/wizard.rs
  (ConfigurationWizard, 799 lines)
- Drop syntect 5 and strip-ansi-escapes 0.2 from dependencies
- Remove OutputMode::Highlight, --theme, --list-themes, --configure,
  --list-profiles, --edit-profile CLI flags
- Replace syntect language detection with static extension map (39 langs)
- Default output mode: Highlight → Plain

Phase 2 — Formatter trait consolidation:
- Create src/formatters/{plain,json,summary,ast}_formatter.rs
- All 5 output modes implement the Formatter trait
- formatter.rs is now a thin dispatcher

Phase 3 — Multi-file index mode:
- batless --mode=index <dir> walks directory recursively
- Emits one compact NDJSON line per file
- Hidden directories skipped; sorted order; per-file errors stay valid NDJSON

Phase 4 — --mode=ast:
- Emits raw tree-sitter parse tree as pretty JSON
- Supported: Rust, Python, JavaScript, TypeScript, TSX
- Leaf nodes include text (≤256 chars); max depth 64
- Unsupported: "parser": "none", "root": null

Tests: 365 total (lib: 225, integration: 140), zero failures
Docs: CHANGELOG v0.6.0 entry, README updated, version bumped 0.5.0 → 0.6.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 12, 2026 22:01
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Apr 12, 2026

Reviewer's Guide

Refactors batless into an AI-native, structured-output tool by removing syntect-based highlighting and theme/config wizard features, consolidating all output modes behind a Formatter trait, adding an AST JSON mode and directory-walking index mode, simplifying language detection, and updating configuration, errors, docs, and tests accordingly.

Sequence diagram for new directory index NDJSON flow

sequenceDiagram
    actor User
    participant CLI as main_run
    participant CM as ConfigManager
    participant DirIdx as handle_directory_index
    participant FS as collect_files_recursive
    participant Core as process_file
    participant Fmt as OutputFormatter
    participant Stdout as stdout

    User->>CLI: batless --mode=index src/
    CLI->>CM: ConfigManager::from_env()
    CM-->>CLI: manager (config, output_mode=Index)
    CLI->>CM: file_path()
    CM-->>CLI: "src/"
    CLI->>CLI: output_mode == Index and file_path is_dir
    CLI->>DirIdx: handle_directory_index("src/", &manager)

    DirIdx->>CM: config()
    CM-->>DirIdx: &BatlessConfig
    DirIdx->>FS: collect_files_recursive(Path("src/"), &mut files)
    FS-->>DirIdx: files Vec<PathBuf> (sorted)

    loop for each file in files
        DirIdx->>Core: process_file(path_str, config)
        alt Ok(FileInfo)
            Core-->>DirIdx: file_info
            DirIdx->>Fmt: format_output(&file_info, path_str, config, OutputMode::Index)
            alt Ok(String)
                Fmt-->>DirIdx: pretty_json
                DirIdx->>DirIdx: compact = serde_json::from_str(pretty_json)
                DirIdx-->>Stdout: writeln!(compact)
            else Err(BatlessError)
                Fmt-->>DirIdx: Err(e)
                DirIdx->>DirIdx: err_obj = {file, error}
                DirIdx-->>Stdout: writeln!(err_obj as JSON)
            end
        else Err(BatlessError)
            Core-->>DirIdx: Err(e)
            DirIdx->>DirIdx: err_obj = {file, error}
            DirIdx-->>Stdout: writeln!(err_obj as JSON)
        end
    end

    DirIdx-->>CLI: Ok(())
    CLI-->>User: NDJSON stream (one line per file)
Loading

Class diagram for consolidated Formatter trait and output modes

classDiagram
    class OutputFormatter {
        +format_output(file_info FileInfo, file_path String, config BatlessConfig, output_mode OutputMode) BatlessResult~String~
        +format_line(line String, line_number usize, file_path String, config BatlessConfig, output_mode OutputMode) BatlessResult~String~
        +format_error(error BatlessError, file_path String, output_mode OutputMode) String
        +error_type_name(error BatlessError) String
    }

    class Formatter {
        <<interface>> Formatter
        +format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
        +output_mode() OutputMode
    }

    class PlainFormatter {
        +format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
        +output_mode() OutputMode
    }

    class JsonFormatter {
        +format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
        +output_mode() OutputMode
    }

    class SummaryFormatter {
        +format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
        +output_mode() OutputMode
    }

    class IndexFormatter {
        +format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
        +output_mode() OutputMode
    }

    class AstFormatter {
        +format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
        +output_mode() OutputMode
        -node_to_json(node Node, source u8[], depth usize) Value
        -parse_to_tree(content String, language String) Tree~String~
    }

    class OutputMode {
        <<enumeration>>
        Plain
        Json
        Summary
        Index
        Ast
        +parse_mode(s String) OutputMode
        +all() OutputMode[]
        +as_str() String
    }

    class FileInfo {
        +lines String[]
        +original_lines Option~String[]~
        +language Option~String~
        +encoding String
        +total_lines usize
        +total_lines_exact bool
        +total_bytes usize
        +truncated bool
        +truncated_by_lines bool
        +truncated_by_bytes bool
        +truncated_by_context bool
        +tokens Option~String[]~
        +summary_lines Option~SummaryLine[]~
        +file_hash Option~String~
        +estimated_llm_tokens Option~usize~
        +token_model Option~String~
        +compression_ratio Option~f64~
        +syntax_errors String[]
        +processed_lines() usize
        +token_count() usize
        +tokens_truncated() bool
        +truncation_reason() Option~String~
    }

    class BatlessConfig {
        +max_lines usize
        +max_bytes Option~usize~
        +language Option~String~
        +strip_ansi bool
        +use_color bool
        +include_tokens bool
        +include_identifiers bool
        +summary_mode bool
        +summary_level SummaryLevel
        +pretty_json bool
        +json_line_numbers bool
        +streaming_json bool
        +streaming_chunk_size usize
        +streaming_chunk_strategy String
        +validate() BatlessResult~()~
        +with_language(language Option~String~) BatlessConfig
        +with_strip_ansi(strip_ansi bool) BatlessConfig
        +with_use_color(use_color bool) BatlessConfig
        +with_max_lines(max_lines usize) BatlessConfig
        +with_max_bytes(max_bytes Option~usize~) BatlessConfig
        +merge_with(other BatlessConfig) BatlessConfig
    }

    class ConfigManager {
        -args Args
        -config BatlessConfig
        -output_mode OutputMode
        +from_env() BatlessResult~ConfigManager~
        +from_args(args Args) BatlessResult~ConfigManager~
        +file_path() BatlessResult~String~
        +config() &BatlessConfig
        +args() &Args
        +output_mode() OutputMode
        -load_and_apply_config() BatlessResult~()~
        -determine_output_mode() BatlessResult~()~
        -validate_language() BatlessResult~()~
    }

    class LanguageDetector {
        +detect_language(file_path String) Option~String~
        +detect_language_with_fallback(file_path String) Option~String~
        +extension_to_language(extension String) Option~String~
        +list_languages() String[]
        +validate_language(language String) BatlessResult~()~
        +find_language(name String) Option~String~
    }

    class BatlessError {
        <<enum>> BatlessError
        ConfigurationError
        FileNotFound
        FileReadError
        PermissionDenied
        LanguageNotFound
        LanguageDetectionError
        EncodingError
        ProcessingError
        IoError
        JsonSerializationError
        InvalidSchema
        +error_code() ErrorCode
        +file_not_found_with_suggestions(path String, available String[]) BatlessError
        +language_not_found_with_suggestions(language String, available String[]) BatlessError
        +language_detection_error(path String, details String) BatlessError
        +config_error_with_help(message String, help Option~String~) BatlessError
    }

    Formatter <|.. PlainFormatter
    Formatter <|.. JsonFormatter
    Formatter <|.. SummaryFormatter
    Formatter <|.. IndexFormatter
    Formatter <|.. AstFormatter

    OutputFormatter --> OutputMode
    OutputFormatter --> FileInfo
    OutputFormatter --> BatlessConfig
    OutputFormatter --> BatlessError
    OutputFormatter ..> Formatter

    JsonFormatter --> BatlessConfig
    JsonFormatter --> FileInfo
    SummaryFormatter --> FileInfo
    PlainFormatter --> FileInfo
    AstFormatter --> FileInfo

    ConfigManager --> BatlessConfig
    ConfigManager --> OutputMode
    ConfigManager --> LanguageDetector

    LanguageDetector --> BatlessError

    BatlessError --> ErrorCode
Loading

File-Level Changes

Change Details Files
Replace syntect-based language/theme system with a static extension-based language detector and drop theme-related APIs.
  • Remove syntect SyntaxSet/ThemeSet caching and theme APIs, keeping only extension-based language detection
  • Make detect_language_with_fallback a thin alias over extension-based detection and expose extension_to_language publicly
  • Generate the known-language list from a static extension table via BTreeSet for stable, sorted output
  • Delete ThemeManager, syntax/theme accessors, and all theme-related tests, updating remaining tests for new behavior
src/language.rs
Refactor output formatting to a trait-based formatter stack, remove highlight mode, and introduce a new AST output mode.
  • Strip inline plain/json/summary implementations and highlight handling from OutputFormatter, delegating to dedicated Formatter implementations for each mode
  • Add OutputMode::Ast, wire it through parsing, listing, and string conversion, and ensure non-streaming behavior in format_line
  • Adjust error_type_name mappings and tests to reflect removal of highlight/theme error variants and presence of ast mode
src/formatter.rs
Align CLI/config surface with the AI-native pivot by removing theme/wizard options, changing the default mode to plain, and supporting the new AST mode.
  • Remove CLI args for theme, list-themes, configure, list-profiles, and edit-profile, as well as associated handling logic in main/ConfigManager
  • Change default OutputMode from Highlight to Plain in ConfigManager construction and mode derivation logic
  • Introduce CliOutputMode::Ast and map CLI/FromStr parsing to the new OutputMode set (plain/json/summary/index/ast), with updated validation error messages
  • Simplify validation to only check language (no theme), and update tests for new defaults and valid mode set
src/main.rs
src/config_manager.rs
src/config_validation.rs
Remove theme/highlighting configuration from core config/profile types and validation, fully decoupling batless from terminal color theming.
  • Drop the theme field and with_theme API from BatlessConfig, adjust Default, merge, file/json (de)serialization, and profile application behavior
  • Update config-related tests and sample TOML/JSON snippets to no longer reference theme and to assert new defaults
  • Remove theme validation paths and associated tests from config_validation
src/config.rs
src/profile.rs
src/config_validation.rs
Simplify error model by removing highlight and theme error variants and codes, keeping only language/processing-related errors.
  • Delete HighlightError and ThemeNotFound variants from BatlessError and the corresponding ErrorCode values and string mappings
  • Remove helper constructors for theme_not_found_with_suggestions and highlight_error variants, and update error_code dispatch
  • Prune tests that referenced theme/highlight errors and update remaining tests to assert the new LanguageNotFound codes and messages
src/error.rs
Update library exports and tests to reflect removal of the highlighter and wizard modules and theme-listing APIs.
  • Remove highlighter and wizard modules from lib.rs exports, along with highlight_content and list_themes convenience functions
  • Delete tests that exercised highlight_content, list_themes, and default theme expectations, keeping only language-focused tests
  • Ensure public API still exposes process_file, format_output, and language utilities consistent with the new scope
src/lib.rs
Add a tree-sitter based AST formatter that emits structured JSON parse trees for supported languages.
  • Implement AstFormatter as a Formatter that uses tree-sitter parsers for Rust, Python, JavaScript, TypeScript, and TSX to build a JSON tree with node type, positions, flags, optional text, and bounded depth
  • Return a top-level JSON object containing file metadata, parser identifier, and root node (or null if unsupported) in pretty-printed form
  • Add unit tests for Rust, Python, and unsupported-language cases to validate parser selection and output schema
src/formatters/ast_formatter.rs
Consolidate plain, JSON, and summary formatting into dedicated Formatter implementations under src/formatters.
  • Create PlainFormatter that respects line-numbering flags and emits either raw content or numbered lines
  • Create JsonFormatter that reproduces the prior JSON schema (lines, counts, truncation flags, identifiers, summary, hash, token estimates, compression) and honors pretty_json
  • Create SummaryFormatter that emits file metadata, structural summary or content, token samples, and syntax errors similarly to the previous inline summary logic
src/formatters/plain_formatter.rs
src/formatters/json_formatter.rs
src/formatters/summary_formatter.rs
src/formatters/error_formatter.rs
src/formatters/mod.rs
Implement directory-wide index processing for --mode=index, emitting per-file NDJSON and handling errors per entry.
  • Add a recursive directory walker that skips hidden directories, sorts entries for deterministic order, and collects file paths
  • Implement handle_directory_index to process each file with process_file and Index formatter, compact pretty JSON to single-line NDJSON, and emit structured error objects on failures
  • Wire directory handling into run(): when mode==Index and the input path is a directory, route to the new handler instead of single-file processing
src/main.rs
Align documentation, roadmap, and CLI tests with the AI-native repositioning, new modes, and removed features.
  • Rewrite README, ROADMAP, and CLAUDE.md to emphasize machine-readable analysis, AST/index modes, and removal of highlighting/themes/wizard, including updated examples and feature matrices
  • Add USAGE_TRACKING.md describing optional batless-logger and batless-stats tooling for local usage telemetry (scripts referenced but not fully shown here)
  • Update integration, CLI documentation, and property tests to drop highlight/theme and wizard scenarios, add coverage for plain default behavior, AST/index modes, and new error semantics
README.md
ROADMAP.md
CLAUDE.md
docs/USAGE_TRACKING.md
tests/integration_tests.rs
tests/cli_coverage_tests.rs
tests/cli_documentation_tests.rs
tests/property_tests.rs
Version and dependency updates to reflect the 0.6.0 release and syntect removal.
  • Bump crate version from 0.5.0 to 0.6.0 in Cargo.toml and update dirs dependency to v6
  • Remove syntect and strip-ansi-escapes from dependencies, keeping tree-sitter and related libraries for AST functionality
  • Regenerate Cargo.lock accordingly (not fully shown)
Cargo.toml
Cargo.lock
CHANGELOG.md

Assessment against linked issues

Issue Objective Addressed Explanation
#118 Remove the syntax highlighting/theme subsystem and interactive wizard (highlighter.rs, ThemeManager, wizard.rs), drop related dependencies and flags, and make plain text the default output mode while keeping deprecated highlight/theme flags usable with warnings in v0.6.0. The PR fully removes the syntax highlighting and theme system (deletes highlighter.rs and ThemeManager, removes wizard.rs, drops syntect and strip-ansi-escapes, removes theme-related config and errors, and updates README/CHANGELOG). It also switches the default mode to plain (ConfigManager default OutputMode::Plain, tests updated). However, the issue specifies that in v0.6.0 --mode=highlight and --theme should be deprecated with warnings and fall back to plain, to be removed in v0.7.0. In this PR, OutputMode::Highlight and the --theme/--list-themes/--configure/--list-profiles/--edit-profile flags are removed outright, and OutputMode::parse_mode("highlight") now errors instead of warning and falling back. The is-terminal and termcolor dependencies also remain. Thus, the removal/cleanup is done, but the planned deprecation-with-warning behavior and full dependency removal are not implemented as described.
#118 Add a new --mode=ast that outputs raw tree-sitter parse trees as JSON for supported languages.
#118 Add multi-file directory index support for --mode=index and consolidate the formatter system into trait-based formatters under src/formatters/.

Possibly linked issues

  • #N/A: The PR fulfills the v0.6.0 issue: removes highlighting/wizard, cleans dead code, adds --mode=ast and directory index.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In LanguageDetector::list_languages, the hard-coded all_extensions array risks diverging from extension_to_language; consider deriving the set of languages directly from a single mapping source so new extensions automatically stay in sync.
  • handle_directory_index builds a full Vec<PathBuf> via collect_files_recursive before processing, which can be memory-heavy on large trees; you might want to stream entries (e.g., process as you recurse) instead of collecting them all first.
  • The AST formatter currently keys off specific language display names (e.g., "Rust", "Python"); to avoid fragile coupling with detection, consider normalizing on extensions or a small internal enum so adding/updating language names doesn’t silently disable AST parsing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `LanguageDetector::list_languages`, the hard-coded `all_extensions` array risks diverging from `extension_to_language`; consider deriving the set of languages directly from a single mapping source so new extensions automatically stay in sync.
- `handle_directory_index` builds a full `Vec<PathBuf>` via `collect_files_recursive` before processing, which can be memory-heavy on large trees; you might want to stream entries (e.g., process as you recurse) instead of collecting them all first.
- The AST formatter currently keys off specific language display names (e.g., "Rust", "Python"); to avoid fragile coupling with detection, consider normalizing on extensions or a small internal enum so adding/updating language names doesn’t silently disable AST parsing.

## Individual Comments

### Comment 1
<location path="src/formatters/ast_formatter.rs" line_range="123-132" />
<code_context>
+            None => (Value::Null, "none"),
+        };
+
+        let output = json!({
+            "file": file_path,
+            "language": language,
+            "mode": "ast",
+            "parser": parser_name,
+            "total_lines": file_info.total_lines,
+            "total_bytes": file_info.total_bytes,
+            "root": root_value,
+        });
+
+        Ok(serde_json::to_string_pretty(&output)?)
+    }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** AST formatter always pretty-prints JSON and ignores the `pretty_json` configuration flag.

`JsonFormatter` respects `BatlessConfig.pretty_json`, but `AstFormatter` always uses `to_string_pretty`. For consistency (and for NDJSON/tooling use cases), this should branch on `config.pretty_json` and use `to_string` when it's false.

Suggested implementation:

```rust
        let output = json!({
            "file": file_path,
            "language": language,
            "mode": "ast",
            "parser": parser_name,
            "total_lines": file_info.total_lines,
            "total_bytes": file_info.total_bytes,
            "root": root_value,
        });

        let json = if config.pretty_json {
            serde_json::to_string_pretty(&output)?
        } else {
            serde_json::to_string(&output)?
        };

        Ok(json)
    }

```

This change assumes that a `config: &BatlessConfig` (or similar) is already in scope in this function, as it is in `JsonFormatter`. If not, you’ll need to:
1. Add a `config: &BatlessConfig` (or `&self.config`) reference to this method’s parameters or use an existing `self.config`.
2. Ensure the call sites for this formatter pass the appropriate `BatlessConfig` instance, mirroring how `JsonFormatter` is wired up.
</issue_to_address>

### Comment 2
<location path="src/language.rs" line_range="90-99" />
<code_context>
+        // Derive unique sorted list from all extension mappings
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The hard-coded `all_extensions` list can drift out of sync with `extension_to_language`’s match arms.

This duplication means new extensions added to `extension_to_language` may not appear in `list_languages`, causing silent omissions. It would be safer to have both detection and listing derive from a single shared mapping (e.g., a static table) as the source of truth.

Suggested implementation:

```rust
    /// Get sorted list of all known languages
    ///
    /// This derives the list from the same extension→language mapping that
    /// `extension_to_language` uses, so that both stay in sync.
    pub fn list_languages() -> Vec<String> {
        let mut languages: Vec<String> = EXTENSION_TO_LANGUAGE
            .values()
            .cloned()
            .collect();

        languages.sort();
        languages.dedup();
        languages

```

To fully realize the “single source of truth” approach and avoid drift:

1. Define a shared mapping in `src/language.rs`, for example near the top of the file:
   ```rust
   use std::collections::HashMap;
   use once_cell::sync::Lazy;

   pub static EXTENSION_TO_LANGUAGE: Lazy<HashMap<&'static str, String>> = Lazy::new(|| {
       let mut m = HashMap::new();
       m.insert("rs", "Rust".to_string());
       m.insert("py", "Python".to_string());
       m.insert("js", "JavaScript".to_string());
       m.insert("ts", "TypeScript".to_string());
       m.insert("go", "Go".to_string());
       m.insert("java", "Java".to_string());
       m.insert("cpp", "C++".to_string());
       m.insert("c", "C".to_string());
       m.insert("rb", "Ruby".to_string());
       // …add all the other extensions/languages currently handled in `extension_to_language`
       m
   });
   ```
   Adjust the exact mapping values to match the `extension_to_language` semantics already in this file, and reuse existing crates (e.g., if you already use `lazy_static` instead of `once_cell`, prefer that).

2. Refactor `extension_to_language` to use this mapping instead of hard-coded `match` arms, for example:
   ```rust
   pub fn extension_to_language(ext: &str) -> Option<&str> {
       EXTENSION_TO_LANGUAGE.get(ext).map(|s| s.as_str())
   }
   ```
   This ensures that adding a new extension only requires updating `EXTENSION_TO_LANGUAGE`, and both detection and listing will automatically stay in sync.

3. Remove or update any remaining hard-coded extension lists in this file (or others) that duplicate this mapping, so all consumers rely on `EXTENSION_TO_LANGUAGE`.
</issue_to_address>

### Comment 3
<location path="tests/property_tests.rs" line_range="31" />
<code_context>
     }

-    #[test]
-    fn test_highlight_content_deterministic(content in ".*") {
-        let config = BatlessConfig::default();
-
</code_context>
<issue_to_address>
**suggestion (testing):** Replace removed highlight property test with a property test for a remaining formatter (e.g. JSON or AST)

With `test_highlight_content_deterministic` removed, we’ve lost a useful property-based check for deterministic formatting. To retain that coverage under the new architecture, please add a proptest that, for random `content`, builds a `FileInfo` and verifies determinism for one of the remaining modes, e.g.:
- JSON mode: run the JSON formatter (or `batless --mode=json`) twice on the same input and assert identical JSON output (or identical `serde_json::Value` after parsing).
- AST mode (for a supported language): run twice and assert identical JSON, ignoring any known non-deterministic fields.
This keeps the guarantee that formatting is stable across runs, which is important for AI-facing structured outputs.

Suggested implementation:

```rust
use batless::{process_file, BatlessConfig};
use proptest::prelude::*;
use std::io::Write;
use tempfile::NamedTempFile;
use serde_json::Value;

// Property test: JSON-mode output should be deterministic for arbitrary input.
proptest! {
    #[test]
    fn json_mode_output_is_deterministic(content in ".*") {
        // Arrange: write randomized content into a temporary file
        let mut file = NamedTempFile::new().expect("failed to create temp file");
        write!(file, "{}", content).expect("failed to write to temp file");
        file.flush().expect("failed to flush temp file");

        let path = file.path().to_path_buf();

        // Arrange: configure batless for JSON/structured output
        let mut config = BatlessConfig::default();
        // NOTE: you need to set the mode/flags on `config` so that `process_file`
        // produces JSON output. See <additional_changes> for details.

        // Act: process the same file twice with the same configuration
        let out1 = process_file(&path, &config).expect("first run failed");
        let out2 = process_file(&path, &config).expect("second run failed");

        // Assert: parsed JSON structures are identical
        let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
        let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");

        prop_assert_eq!(v1, v2);
    }
}

```

for details.

        // Act: process the same file twice with the same configuration
        let out1 = process_file(&path, &config).expect("first run failed");
        let out2 = process_file(&path, &config).expect("second run failed");

        // Assert: parsed JSON structures are identical
        let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
        let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");

        prop_assert_eq!(v1, v2);
    }
}
>>>>>>> REPLACE
</file_operation>
</file_operations>

<additional_changes>
1. Configure JSON mode on `BatlessConfig` inside `json_mode_output_is_deterministic`:
   - If you have a mode enum, import it and set it, e.g.:
     - `use batless::{process_file, BatlessConfig, Mode};`
     - `config.mode = Mode::Json;`
   - Or, if JSON is enabled via flags, set them accordingly, e.g.:
     - `config.json = true;` or `config.output_format = OutputFormat::Json;`
2. Adjust the `process_file` call signature if it differs in your codebase:
   - If it expects a `FileInfo` instead of a `PathBuf`, construct one from `path` and pass `&file_info`.
   - If it is fallible in another way (e.g. returns `Result<String, Error>` with a different type), adapt the `expect(...)` calls.
3. Ensure `serde_json` is available in `dev-dependencies` in `Cargo.toml` (or regular `dependencies` if already used elsewhere):
   - Under `[dev-dependencies]`: `serde_json = "1"`
4. If there is already a surrounding `proptest! { ... }` block in this file, you may instead want to move the new `json_mode_output_is_deterministic` test into that existing block to keep style consistent and avoid nested `proptest!` macros.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +90 to +99
// Derive unique sorted list from all extension mappings
let all_extensions = [
"rs",
"py",
"js",
"ts",
"go",
"java",
"cpp",
"c",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): The hard-coded all_extensions list can drift out of sync with extension_to_language’s match arms.

This duplication means new extensions added to extension_to_language may not appear in list_languages, causing silent omissions. It would be safer to have both detection and listing derive from a single shared mapping (e.g., a static table) as the source of truth.

Suggested implementation:

    /// Get sorted list of all known languages
    ///
    /// This derives the list from the same extension→language mapping that
    /// `extension_to_language` uses, so that both stay in sync.
    pub fn list_languages() -> Vec<String> {
        let mut languages: Vec<String> = EXTENSION_TO_LANGUAGE
            .values()
            .cloned()
            .collect();

        languages.sort();
        languages.dedup();
        languages

To fully realize the “single source of truth” approach and avoid drift:

  1. Define a shared mapping in src/language.rs, for example near the top of the file:

    use std::collections::HashMap;
    use once_cell::sync::Lazy;
    
    pub static EXTENSION_TO_LANGUAGE: Lazy<HashMap<&'static str, String>> = Lazy::new(|| {
        let mut m = HashMap::new();
        m.insert("rs", "Rust".to_string());
        m.insert("py", "Python".to_string());
        m.insert("js", "JavaScript".to_string());
        m.insert("ts", "TypeScript".to_string());
        m.insert("go", "Go".to_string());
        m.insert("java", "Java".to_string());
        m.insert("cpp", "C++".to_string());
        m.insert("c", "C".to_string());
        m.insert("rb", "Ruby".to_string());
        // …add all the other extensions/languages currently handled in `extension_to_language`
        m
    });

    Adjust the exact mapping values to match the extension_to_language semantics already in this file, and reuse existing crates (e.g., if you already use lazy_static instead of once_cell, prefer that).

  2. Refactor extension_to_language to use this mapping instead of hard-coded match arms, for example:

    pub fn extension_to_language(ext: &str) -> Option<&str> {
        EXTENSION_TO_LANGUAGE.get(ext).map(|s| s.as_str())
    }

    This ensures that adding a new extension only requires updating EXTENSION_TO_LANGUAGE, and both detection and listing will automatically stay in sync.

  3. Remove or update any remaining hard-coded extension lists in this file (or others) that duplicate this mapping, so all consumers rely on EXTENSION_TO_LANGUAGE.

}

#[test]
fn test_highlight_content_deterministic(content in ".*") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Replace removed highlight property test with a property test for a remaining formatter (e.g. JSON or AST)

With test_highlight_content_deterministic removed, we’ve lost a useful property-based check for deterministic formatting. To retain that coverage under the new architecture, please add a proptest that, for random content, builds a FileInfo and verifies determinism for one of the remaining modes, e.g.:

  • JSON mode: run the JSON formatter (or batless --mode=json) twice on the same input and assert identical JSON output (or identical serde_json::Value after parsing).
  • AST mode (for a supported language): run twice and assert identical JSON, ignoring any known non-deterministic fields.
    This keeps the guarantee that formatting is stable across runs, which is important for AI-facing structured outputs.

Suggested implementation:

use batless::{process_file, BatlessConfig};
use proptest::prelude::*;
use std::io::Write;
use tempfile::NamedTempFile;
use serde_json::Value;

// Property test: JSON-mode output should be deterministic for arbitrary input.
proptest! {
    #[test]
    fn json_mode_output_is_deterministic(content in ".*") {
        // Arrange: write randomized content into a temporary file
        let mut file = NamedTempFile::new().expect("failed to create temp file");
        write!(file, "{}", content).expect("failed to write to temp file");
        file.flush().expect("failed to flush temp file");

        let path = file.path().to_path_buf();

        // Arrange: configure batless for JSON/structured output
        let mut config = BatlessConfig::default();
        // NOTE: you need to set the mode/flags on `config` so that `process_file`
        // produces JSON output. See <additional_changes> for details.

        // Act: process the same file twice with the same configuration
        let out1 = process_file(&path, &config).expect("first run failed");
        let out2 = process_file(&path, &config).expect("second run failed");

        // Assert: parsed JSON structures are identical
        let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
        let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");

        prop_assert_eq!(v1, v2);
    }
}

for details.

    // Act: process the same file twice with the same configuration
    let out1 = process_file(&path, &config).expect("first run failed");
    let out2 = process_file(&path, &config).expect("second run failed");

    // Assert: parsed JSON structures are identical
    let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
    let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");

    prop_assert_eq!(v1, v2);
}

}

REPLACE
</file_operation>
</file_operations>

<additional_changes>

  1. Configure JSON mode on BatlessConfig inside json_mode_output_is_deterministic:
    • If you have a mode enum, import it and set it, e.g.:
      • use batless::{process_file, BatlessConfig, Mode};
      • config.mode = Mode::Json;
    • Or, if JSON is enabled via flags, set them accordingly, e.g.:
      • config.json = true; or config.output_format = OutputFormat::Json;
  2. Adjust the process_file call signature if it differs in your codebase:
    • If it expects a FileInfo instead of a PathBuf, construct one from path and pass &file_info.
    • If it is fallible in another way (e.g. returns Result<String, Error> with a different type), adapt the expect(...) calls.
  3. Ensure serde_json is available in dev-dependencies in Cargo.toml (or regular dependencies if already used elsewhere):
    • Under [dev-dependencies]: serde_json = "1"
  4. If there is already a surrounding proptest! { ... } block in this file, you may instead want to move the new json_mode_output_is_deterministic test into that existing block to keep style consistent and avoid nested proptest! macros.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

batless/src/language.rs

Lines 33 to 35 in 30ce343

"js" => "JavaScript",
"ts" => "TypeScript",
"go" => "Go",

P1 Badge Map .tsx/.jsx extensions before AST parsing

AST mode advertises TSX/JSX support, but language detection never classifies .tsx or .jsx files because the extension map only covers js/ts. In those files file_info.language stays None, so AstFormatter::parse_to_tree falls back to parser: "none" and root: null instead of returning a parse tree. Adding tsx/jsx mappings here is needed for the new AST feature to work on React code.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/language.rs Outdated
Comment on lines +18 to +20
path.extension()
.and_then(|e| e.to_str())
.and_then(Self::extension_to_language)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle extensionless filenames in language detection

The new detector only checks Path::extension(), so extensionless-but-language-specific filenames (for example Dockerfile and Makefile) are now never detected. This is a regression from the previous syntect-based path detection and causes downstream features (language-aware summaries/indexing/comment stripping) to run as unknown language on those common files.

Useful? React with 👍 / 👎.

{
continue;
}
collect_files_recursive(&path, out);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prevent symlink cycles in directory index traversal

The recursive directory walker calls itself for every directory path without checking for symlinks or already-visited directories. If the indexed tree contains a symlink cycle (or repeated links), --mode=index <dir> can recurse indefinitely and eventually stack-overflow/crash. Add a symlink guard (symlink_metadata) or visited inode/path tracking before recurring.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

v0.6.0 release that pivots batless from terminal syntax highlighting toward AI-native, structured outputs (index / AST / JSON / summary), removing highlighting/theme/wizard functionality and consolidating formatting behind a shared Formatter trait.

Changes:

  • Removed syntect-based highlighting, theme management, and the interactive configuration wizard; default output mode becomes plain.
  • Consolidated all output modes behind src/formatters/* with src/formatter.rs acting as a dispatcher, and introduced --mode=ast.
  • Added directory-walking behavior for --mode=index <dir> producing NDJSON (one compact JSON object per file).

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/property_tests.rs Removes property tests tied to removed highlighting API.
tests/integration_tests.rs Updates integration coverage to plain mode; removes theme/highlight tests.
tests/cli_documentation_tests.rs Updates CLI behavior expectations after default mode change; removes theme docs tests.
tests/cli_coverage_tests.rs Removes wizard/profile/theme CLI coverage tests that no longer apply.
src/wizard.rs Removes the interactive configuration wizard module entirely.
src/profile.rs Removes theme from custom profiles and stops applying theme to config.
src/main.rs Adds directory index handling for --mode=index and removes theme/wizard special commands.
src/lib.rs Removes highlighter/wizard exports and APIs; trims theme-related tests/exports.
src/language.rs Replaces syntect-based detection with a static extension→language map; removes ThemeManager.
src/highlighter.rs Removes syntect-based SyntaxHighlighter implementation entirely.
src/formatters/summary_formatter.rs Adds trait-based summary formatter implementation.
src/formatters/plain_formatter.rs Adds trait-based plain formatter implementation (including line numbering).
src/formatters/mod.rs Registers all formatter modules (plain/json/summary/index/ast/error).
src/formatters/json_formatter.rs Adds trait-based JSON formatter implementation.
src/formatters/error_formatter.rs Removes highlight/theme error labels from error formatting.
src/formatters/ast_formatter.rs Adds AST mode formatter (tree-sitter parse tree serialized as JSON).
src/formatter.rs Replaces inline formatting logic with dispatcher to formatter modules; adds OutputMode::Ast.
src/error.rs Removes highlight/theme error variants and codes.
src/config.rs Removes theme from config, related defaults, merging, and tests.
src/config_validation.rs Removes theme validation.
src/config_manager.rs Removes theme/wizard/profile CLI flags; default output mode becomes plain; adds AST mode.
scripts/batless-stats Adds a Python tool to analyze NDJSON usage logs (developer instrumentation).
scripts/batless-logger Adds a bash wrapper to log batless invocations as NDJSON (developer instrumentation).
ROADMAP.md Updates v0.6.0 roadmap to reflect AI-native pivot and removal of highlighting/wizard.
README.md Repositions product messaging and documents new index/ast behaviors.
docs/USAGE_TRACKING.md Documents the new usage logging and stats scripts.
CLAUDE.md Adds guidance for AI assistants on when batless provides value vs built-in file tools.
CHANGELOG.md Adds v0.6.0 changelog entry describing breaking changes and new features.
Cargo.toml Bumps version to 0.6.0; removes syntect/strip-ansi-escapes; updates dirs dependency.
Cargo.lock Updates lockfile accordingly (removes syntect dependency graph; bumps dirs).
Comments suppressed due to low confidence (1)

src/language.rs:36

  • LanguageDetector::extension_to_language doesn’t map tsx/jsx, but --mode=ast and AstFormatter expect languages TSX/JSX. As a result, .tsx/.jsx files won’t be auto-detected and users can’t even force --language TSX/JSX because validate_language() is derived from list_languages(). Add extension mappings for tsx/jsx (and include them in list_languages) so AST/index support for TSX/JSX is reachable.
    /// Map file extensions to language names
    pub fn extension_to_language(extension: &str) -> Option<String> {
        let language_name = match extension.to_lowercase().as_str() {
            "rs" => "Rust",
            "py" => "Python",
            "js" => "JavaScript",
            "ts" => "TypeScript",
            "go" => "Go",
            "java" => "Java",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/language.rs Outdated
Comment on lines 15 to 21
/// Detect the programming language from a file path (extension-based)
pub fn detect_language(file_path: &str) -> Option<String> {
let path = Path::new(file_path);

get_syntax_set_internal()
.find_syntax_for_file(path)
.ok()
.flatten()
.map(|syntax| syntax.name.clone())
path.extension()
.and_then(|e| e.to_str())
.and_then(Self::extension_to_language)
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect_language() only checks path.extension(), but the mapping includes dockerfile/makefile which are typically filenames without extensions (e.g. Dockerfile, Makefile). Those will currently return None even though they appear “supported” via list_languages(). Consider special-casing path.file_name() for these names (and/or removing them from the extension map if you don’t intend to support filename-based detection).

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +131
let language = file_info.language.as_deref();
let content = file_info.lines.join("\n");
let source = content.as_bytes();

let (root_value, parser_name) = match Self::parse_to_tree(&content, language) {
Some((tree, name)) => {
let root = tree.root_node();
(Self::node_to_json(root, source, 0), name)
}
None => (Value::Null, "none"),
};

let output = json!({
"file": file_path,
"language": language,
"mode": "ast",
"parser": parser_name,
"total_lines": file_info.total_lines,
"total_bytes": file_info.total_bytes,
"root": root_value,
});
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AstFormatter builds content from file_info.lines, which may have been transformed by summary extraction and/or --strip-comments / --strip-blank-lines in FileProcessor::apply_post_processing. That means --mode=ast is not a raw parse tree of the original file, and the reported total_lines/total_bytes can diverge from the parsed source. Consider parsing from the original, unmodified lines (store them on FileInfo whenever post-processing mutates lines) and/or emitting both original vs parsed/processed metadata so the AST output is self-consistent.

Copilot uses AI. Check for mistakes.
src/main.rs Outdated
Comment on lines +206 to +209
let Ok(entries) = std::fs::read_dir(dir) else {
return;
};
let mut entries: Vec<_> = entries.flatten().collect();
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_files_recursive silently returns on read_dir errors, which can produce incomplete directory indexes without any indication to the caller (e.g. permission issues). Since index mode is meant for automation, it’s safer to propagate the error (or emit a structured per-directory error object) so consumers can detect partial results.

Suggested change
let Ok(entries) = std::fs::read_dir(dir) else {
return;
};
let mut entries: Vec<_> = entries.flatten().collect();
let entries = std::fs::read_dir(dir).unwrap_or_else(|error| {
panic!("Failed to read directory '{}': {error}", dir.display())
});
let mut entries: Vec<_> = entries
.map(|entry| {
entry.unwrap_or_else(|error| {
panic!(
"Failed to read directory entry in '{}': {error}",
dir.display()
)
})
})
.collect();

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +226
for entry in entries {
let path = entry.path();
if path.is_dir() {
// Skip hidden directories
if path
.file_name()
.and_then(|n| n.to_str())
.map(|n| n.starts_with('.'))
.unwrap_or(false)
{
continue;
}
collect_files_recursive(&path, out);
} else if path.is_file() {
out.push(path);
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_files_recursive uses path.is_dir() / path.is_file() which follow symlinks. A symlinked directory can introduce cycles (infinite recursion) or unexpectedly traverse outside the requested tree. Consider using entry.file_type() / symlink_metadata to detect and skip symlinks (or track visited canonical paths/inodes) to make directory indexing safe and predictable.

Copilot uses AI. Check for mistakes.
Comment on lines +243 to +248
Ok(pretty) => {
// Compact the pretty JSON to a single line for NDJSON
serde_json::from_str::<serde_json::Value>(&pretty)
.and_then(|v| serde_json::to_string(&v))
.unwrap_or(pretty)
}
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directory index mode compacts NDJSON by parsing the already-generated JSON string (from_str -> to_string) for every file. This adds avoidable CPU/alloc overhead for large trees. A cleaner approach is to have IndexFormatter emit compact JSON directly (or return a serde_json::Value that can be serialized once), and only pretty-print for single-file index output when desired.

Suggested change
Ok(pretty) => {
// Compact the pretty JSON to a single line for NDJSON
serde_json::from_str::<serde_json::Value>(&pretty)
.and_then(|v| serde_json::to_string(&v))
.unwrap_or(pretty)
}
Ok(compact) => compact,

Copilot uses AI. Check for mistakes.
Comment on lines 34 to 38
[dependencies]
clap = { version = "4", features = ["derive"] }
clap_complete = "4.6"
syntect = { version = "5", default-features = false, features = [
"parsing",
"default-syntaxes",
"default-themes",
"html",
"dump-load",
"regex-onig",
] }
is-terminal = "0.4"
termcolor = "1.4"
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description / issue scope says is-terminal and termcolor were removed, but both are still listed as dependencies. Also, is-terminal appears to be used only for stdout().is_terminal()-style checks, which can be done via std::io::IsTerminal on Rust 1.85. Either update the PR description to match reality, or remove is-terminal (and potentially termcolor) and switch to the std API / plain stderr output accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +58
--chunk-strategy=*|--language=*|--theme=*|--ai-model=*)
EXTRA_FLAGS+=("${arg%%=*}") ;;
--version|--version-json|--list-languages|--list-themes)
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batless-logger still special-cases removed flags like --theme=* and --list-themes. With v0.6.0 these invocations will always fail, and keeping them in the “known flags” set can skew usage stats and signatures. Consider removing them (or explicitly tagging them as legacy/deprecated) so the logger reflects the current CLI surface.

Suggested change
--chunk-strategy=*|--language=*|--theme=*|--ai-model=*)
EXTRA_FLAGS+=("${arg%%=*}") ;;
--version|--version-json|--list-languages|--list-themes)
--chunk-strategy=*|--language=*|--ai-model=*)
EXTRA_FLAGS+=("${arg%%=*}") ;;
--version|--version-json|--list-languages)

Copilot uses AI. Check for mistakes.
- language: detect extensionless files (Dockerfile, Makefile, Vagrantfile,
  Justfile) via filename_to_language() fallback after extension lookup
- main: prevent symlink cycles in collect_files_recursive() by using
  symlink_metadata() and skipping symlinks entirely; log read_dir/stat
  errors via eprintln! instead of silently swallowing them
- ast_formatter: respect config.pretty_json flag (was always pretty-printing)
- batless-logger: remove stale --theme=* and --list-themes references
  removed in v0.6.0 syntect pivot

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v0.6.0: Deprecate and remove syntax highlighting (mode 1 cleanup)

2 participants