feat(v0.6.0): AI-native reposition — remove syntect, add --mode=ast and directory index#120
feat(v0.6.0): AI-native reposition — remove syntect, add --mode=ast and directory index#120
Conversation
Adds a transparent shell wrapper (batless-logger) that intercepts every batless invocation, writes structured NDJSON to ~/.batless/stats/, and delegates to the real binary unchanged. Errors are caught, reported to stderr with a GitHub issue link, and appended as separate log entries. Also adds batless-stats, a Python analyser that reads the NDJSON logs and reports mode breakdown, AI profile usage, flag frequency, file extension distribution, hourly patterns, and unique command signatures. Supports --all, --date, --session, --errors, --commands, and --json flags. Useful for validating that CLAUDE.md protocol instructions are being followed in practice (e.g. --profile=claude coverage, escalation path usage) and for identifying batless usage patterns across AI sessions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dirs v6 updates dirs-sys from 0.4 to 0.5, which drops the old redox_users 0.4 chain that was pulling in getrandom 0.2.17 alongside newer versions. All public API calls (home_dir, config_dir) are unchanged. The two remaining duplicate getrandom versions (0.3.4 via proptest, 0.4.1 via tempfile) are dev-only deps — harmless and not fixable without upstream changes. Closes #117 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
batless's unique value is the structured AI output that built-in Read/Grep/Glob tools cannot produce: symbol indexes, token-estimated context, semantic chunks, and content hashes. Syntax highlighting serves human terminal users — a use case where bat is the better tool. - ROADMAP: rewrite vision statement; v0.6.0 now explicitly removes syntax highlighting (syntect), ThemeManager, interactive wizard, and dead code; add "What is NOT on the Roadmap" entry for highlighting/themes - README: replace "Ultimate Non-Blocking Code Viewer" headline with "Machine-Readable Code Analysis for AI and Automation"; rewrite Why section to lead with the 5 unique AI output features; update feature comparison table, quick start examples, core capabilities, and philosophy blurb to reflect the new focus; mark syntax highlighting as deprecated in v0.6.0 - CLAUDE.md: add "AI Assistant Integration: When to Use batless" section immediately after Overview, directing AI assistants to use built-in tools for routine operations and batless only for the 5 structured output cases Refs #118 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nd directory index
Phase 1 — Remove syntax highlighting:
- Delete src/highlighter.rs (SyntaxHighlighter) and src/wizard.rs
(ConfigurationWizard, 799 lines)
- Drop syntect 5 and strip-ansi-escapes 0.2 from dependencies
- Remove OutputMode::Highlight, --theme, --list-themes, --configure,
--list-profiles, --edit-profile CLI flags
- Replace syntect language detection with static extension map (39 langs)
- Default output mode: Highlight → Plain
Phase 2 — Formatter trait consolidation:
- Create src/formatters/{plain,json,summary,ast}_formatter.rs
- All 5 output modes implement the Formatter trait
- formatter.rs is now a thin dispatcher
Phase 3 — Multi-file index mode:
- batless --mode=index <dir> walks directory recursively
- Emits one compact NDJSON line per file
- Hidden directories skipped; sorted order; per-file errors stay valid NDJSON
Phase 4 — --mode=ast:
- Emits raw tree-sitter parse tree as pretty JSON
- Supported: Rust, Python, JavaScript, TypeScript, TSX
- Leaf nodes include text (≤256 chars); max depth 64
- Unsupported: "parser": "none", "root": null
Tests: 365 total (lib: 225, integration: 140), zero failures
Docs: CHANGELOG v0.6.0 entry, README updated, version bumped 0.5.0 → 0.6.0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer's GuideRefactors batless into an AI-native, structured-output tool by removing syntect-based highlighting and theme/config wizard features, consolidating all output modes behind a Formatter trait, adding an AST JSON mode and directory-walking index mode, simplifying language detection, and updating configuration, errors, docs, and tests accordingly. Sequence diagram for new directory index NDJSON flowsequenceDiagram
actor User
participant CLI as main_run
participant CM as ConfigManager
participant DirIdx as handle_directory_index
participant FS as collect_files_recursive
participant Core as process_file
participant Fmt as OutputFormatter
participant Stdout as stdout
User->>CLI: batless --mode=index src/
CLI->>CM: ConfigManager::from_env()
CM-->>CLI: manager (config, output_mode=Index)
CLI->>CM: file_path()
CM-->>CLI: "src/"
CLI->>CLI: output_mode == Index and file_path is_dir
CLI->>DirIdx: handle_directory_index("src/", &manager)
DirIdx->>CM: config()
CM-->>DirIdx: &BatlessConfig
DirIdx->>FS: collect_files_recursive(Path("src/"), &mut files)
FS-->>DirIdx: files Vec<PathBuf> (sorted)
loop for each file in files
DirIdx->>Core: process_file(path_str, config)
alt Ok(FileInfo)
Core-->>DirIdx: file_info
DirIdx->>Fmt: format_output(&file_info, path_str, config, OutputMode::Index)
alt Ok(String)
Fmt-->>DirIdx: pretty_json
DirIdx->>DirIdx: compact = serde_json::from_str(pretty_json)
DirIdx-->>Stdout: writeln!(compact)
else Err(BatlessError)
Fmt-->>DirIdx: Err(e)
DirIdx->>DirIdx: err_obj = {file, error}
DirIdx-->>Stdout: writeln!(err_obj as JSON)
end
else Err(BatlessError)
Core-->>DirIdx: Err(e)
DirIdx->>DirIdx: err_obj = {file, error}
DirIdx-->>Stdout: writeln!(err_obj as JSON)
end
end
DirIdx-->>CLI: Ok(())
CLI-->>User: NDJSON stream (one line per file)
Class diagram for consolidated Formatter trait and output modesclassDiagram
class OutputFormatter {
+format_output(file_info FileInfo, file_path String, config BatlessConfig, output_mode OutputMode) BatlessResult~String~
+format_line(line String, line_number usize, file_path String, config BatlessConfig, output_mode OutputMode) BatlessResult~String~
+format_error(error BatlessError, file_path String, output_mode OutputMode) String
+error_type_name(error BatlessError) String
}
class Formatter {
<<interface>> Formatter
+format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
+output_mode() OutputMode
}
class PlainFormatter {
+format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
+output_mode() OutputMode
}
class JsonFormatter {
+format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
+output_mode() OutputMode
}
class SummaryFormatter {
+format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
+output_mode() OutputMode
}
class IndexFormatter {
+format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
+output_mode() OutputMode
}
class AstFormatter {
+format(file_info FileInfo, file_path String, config BatlessConfig) BatlessResult~String~
+output_mode() OutputMode
-node_to_json(node Node, source u8[], depth usize) Value
-parse_to_tree(content String, language String) Tree~String~
}
class OutputMode {
<<enumeration>>
Plain
Json
Summary
Index
Ast
+parse_mode(s String) OutputMode
+all() OutputMode[]
+as_str() String
}
class FileInfo {
+lines String[]
+original_lines Option~String[]~
+language Option~String~
+encoding String
+total_lines usize
+total_lines_exact bool
+total_bytes usize
+truncated bool
+truncated_by_lines bool
+truncated_by_bytes bool
+truncated_by_context bool
+tokens Option~String[]~
+summary_lines Option~SummaryLine[]~
+file_hash Option~String~
+estimated_llm_tokens Option~usize~
+token_model Option~String~
+compression_ratio Option~f64~
+syntax_errors String[]
+processed_lines() usize
+token_count() usize
+tokens_truncated() bool
+truncation_reason() Option~String~
}
class BatlessConfig {
+max_lines usize
+max_bytes Option~usize~
+language Option~String~
+strip_ansi bool
+use_color bool
+include_tokens bool
+include_identifiers bool
+summary_mode bool
+summary_level SummaryLevel
+pretty_json bool
+json_line_numbers bool
+streaming_json bool
+streaming_chunk_size usize
+streaming_chunk_strategy String
+validate() BatlessResult~()~
+with_language(language Option~String~) BatlessConfig
+with_strip_ansi(strip_ansi bool) BatlessConfig
+with_use_color(use_color bool) BatlessConfig
+with_max_lines(max_lines usize) BatlessConfig
+with_max_bytes(max_bytes Option~usize~) BatlessConfig
+merge_with(other BatlessConfig) BatlessConfig
}
class ConfigManager {
-args Args
-config BatlessConfig
-output_mode OutputMode
+from_env() BatlessResult~ConfigManager~
+from_args(args Args) BatlessResult~ConfigManager~
+file_path() BatlessResult~String~
+config() &BatlessConfig
+args() &Args
+output_mode() OutputMode
-load_and_apply_config() BatlessResult~()~
-determine_output_mode() BatlessResult~()~
-validate_language() BatlessResult~()~
}
class LanguageDetector {
+detect_language(file_path String) Option~String~
+detect_language_with_fallback(file_path String) Option~String~
+extension_to_language(extension String) Option~String~
+list_languages() String[]
+validate_language(language String) BatlessResult~()~
+find_language(name String) Option~String~
}
class BatlessError {
<<enum>> BatlessError
ConfigurationError
FileNotFound
FileReadError
PermissionDenied
LanguageNotFound
LanguageDetectionError
EncodingError
ProcessingError
IoError
JsonSerializationError
InvalidSchema
+error_code() ErrorCode
+file_not_found_with_suggestions(path String, available String[]) BatlessError
+language_not_found_with_suggestions(language String, available String[]) BatlessError
+language_detection_error(path String, details String) BatlessError
+config_error_with_help(message String, help Option~String~) BatlessError
}
Formatter <|.. PlainFormatter
Formatter <|.. JsonFormatter
Formatter <|.. SummaryFormatter
Formatter <|.. IndexFormatter
Formatter <|.. AstFormatter
OutputFormatter --> OutputMode
OutputFormatter --> FileInfo
OutputFormatter --> BatlessConfig
OutputFormatter --> BatlessError
OutputFormatter ..> Formatter
JsonFormatter --> BatlessConfig
JsonFormatter --> FileInfo
SummaryFormatter --> FileInfo
PlainFormatter --> FileInfo
AstFormatter --> FileInfo
ConfigManager --> BatlessConfig
ConfigManager --> OutputMode
ConfigManager --> LanguageDetector
LanguageDetector --> BatlessError
BatlessError --> ErrorCode
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
LanguageDetector::list_languages, the hard-codedall_extensionsarray risks diverging fromextension_to_language; consider deriving the set of languages directly from a single mapping source so new extensions automatically stay in sync. handle_directory_indexbuilds a fullVec<PathBuf>viacollect_files_recursivebefore processing, which can be memory-heavy on large trees; you might want to stream entries (e.g., process as you recurse) instead of collecting them all first.- The AST formatter currently keys off specific language display names (e.g., "Rust", "Python"); to avoid fragile coupling with detection, consider normalizing on extensions or a small internal enum so adding/updating language names doesn’t silently disable AST parsing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `LanguageDetector::list_languages`, the hard-coded `all_extensions` array risks diverging from `extension_to_language`; consider deriving the set of languages directly from a single mapping source so new extensions automatically stay in sync.
- `handle_directory_index` builds a full `Vec<PathBuf>` via `collect_files_recursive` before processing, which can be memory-heavy on large trees; you might want to stream entries (e.g., process as you recurse) instead of collecting them all first.
- The AST formatter currently keys off specific language display names (e.g., "Rust", "Python"); to avoid fragile coupling with detection, consider normalizing on extensions or a small internal enum so adding/updating language names doesn’t silently disable AST parsing.
## Individual Comments
### Comment 1
<location path="src/formatters/ast_formatter.rs" line_range="123-132" />
<code_context>
+ None => (Value::Null, "none"),
+ };
+
+ let output = json!({
+ "file": file_path,
+ "language": language,
+ "mode": "ast",
+ "parser": parser_name,
+ "total_lines": file_info.total_lines,
+ "total_bytes": file_info.total_bytes,
+ "root": root_value,
+ });
+
+ Ok(serde_json::to_string_pretty(&output)?)
+ }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** AST formatter always pretty-prints JSON and ignores the `pretty_json` configuration flag.
`JsonFormatter` respects `BatlessConfig.pretty_json`, but `AstFormatter` always uses `to_string_pretty`. For consistency (and for NDJSON/tooling use cases), this should branch on `config.pretty_json` and use `to_string` when it's false.
Suggested implementation:
```rust
let output = json!({
"file": file_path,
"language": language,
"mode": "ast",
"parser": parser_name,
"total_lines": file_info.total_lines,
"total_bytes": file_info.total_bytes,
"root": root_value,
});
let json = if config.pretty_json {
serde_json::to_string_pretty(&output)?
} else {
serde_json::to_string(&output)?
};
Ok(json)
}
```
This change assumes that a `config: &BatlessConfig` (or similar) is already in scope in this function, as it is in `JsonFormatter`. If not, you’ll need to:
1. Add a `config: &BatlessConfig` (or `&self.config`) reference to this method’s parameters or use an existing `self.config`.
2. Ensure the call sites for this formatter pass the appropriate `BatlessConfig` instance, mirroring how `JsonFormatter` is wired up.
</issue_to_address>
### Comment 2
<location path="src/language.rs" line_range="90-99" />
<code_context>
+ // Derive unique sorted list from all extension mappings
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The hard-coded `all_extensions` list can drift out of sync with `extension_to_language`’s match arms.
This duplication means new extensions added to `extension_to_language` may not appear in `list_languages`, causing silent omissions. It would be safer to have both detection and listing derive from a single shared mapping (e.g., a static table) as the source of truth.
Suggested implementation:
```rust
/// Get sorted list of all known languages
///
/// This derives the list from the same extension→language mapping that
/// `extension_to_language` uses, so that both stay in sync.
pub fn list_languages() -> Vec<String> {
let mut languages: Vec<String> = EXTENSION_TO_LANGUAGE
.values()
.cloned()
.collect();
languages.sort();
languages.dedup();
languages
```
To fully realize the “single source of truth” approach and avoid drift:
1. Define a shared mapping in `src/language.rs`, for example near the top of the file:
```rust
use std::collections::HashMap;
use once_cell::sync::Lazy;
pub static EXTENSION_TO_LANGUAGE: Lazy<HashMap<&'static str, String>> = Lazy::new(|| {
let mut m = HashMap::new();
m.insert("rs", "Rust".to_string());
m.insert("py", "Python".to_string());
m.insert("js", "JavaScript".to_string());
m.insert("ts", "TypeScript".to_string());
m.insert("go", "Go".to_string());
m.insert("java", "Java".to_string());
m.insert("cpp", "C++".to_string());
m.insert("c", "C".to_string());
m.insert("rb", "Ruby".to_string());
// …add all the other extensions/languages currently handled in `extension_to_language`
m
});
```
Adjust the exact mapping values to match the `extension_to_language` semantics already in this file, and reuse existing crates (e.g., if you already use `lazy_static` instead of `once_cell`, prefer that).
2. Refactor `extension_to_language` to use this mapping instead of hard-coded `match` arms, for example:
```rust
pub fn extension_to_language(ext: &str) -> Option<&str> {
EXTENSION_TO_LANGUAGE.get(ext).map(|s| s.as_str())
}
```
This ensures that adding a new extension only requires updating `EXTENSION_TO_LANGUAGE`, and both detection and listing will automatically stay in sync.
3. Remove or update any remaining hard-coded extension lists in this file (or others) that duplicate this mapping, so all consumers rely on `EXTENSION_TO_LANGUAGE`.
</issue_to_address>
### Comment 3
<location path="tests/property_tests.rs" line_range="31" />
<code_context>
}
- #[test]
- fn test_highlight_content_deterministic(content in ".*") {
- let config = BatlessConfig::default();
-
</code_context>
<issue_to_address>
**suggestion (testing):** Replace removed highlight property test with a property test for a remaining formatter (e.g. JSON or AST)
With `test_highlight_content_deterministic` removed, we’ve lost a useful property-based check for deterministic formatting. To retain that coverage under the new architecture, please add a proptest that, for random `content`, builds a `FileInfo` and verifies determinism for one of the remaining modes, e.g.:
- JSON mode: run the JSON formatter (or `batless --mode=json`) twice on the same input and assert identical JSON output (or identical `serde_json::Value` after parsing).
- AST mode (for a supported language): run twice and assert identical JSON, ignoring any known non-deterministic fields.
This keeps the guarantee that formatting is stable across runs, which is important for AI-facing structured outputs.
Suggested implementation:
```rust
use batless::{process_file, BatlessConfig};
use proptest::prelude::*;
use std::io::Write;
use tempfile::NamedTempFile;
use serde_json::Value;
// Property test: JSON-mode output should be deterministic for arbitrary input.
proptest! {
#[test]
fn json_mode_output_is_deterministic(content in ".*") {
// Arrange: write randomized content into a temporary file
let mut file = NamedTempFile::new().expect("failed to create temp file");
write!(file, "{}", content).expect("failed to write to temp file");
file.flush().expect("failed to flush temp file");
let path = file.path().to_path_buf();
// Arrange: configure batless for JSON/structured output
let mut config = BatlessConfig::default();
// NOTE: you need to set the mode/flags on `config` so that `process_file`
// produces JSON output. See <additional_changes> for details.
// Act: process the same file twice with the same configuration
let out1 = process_file(&path, &config).expect("first run failed");
let out2 = process_file(&path, &config).expect("second run failed");
// Assert: parsed JSON structures are identical
let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");
prop_assert_eq!(v1, v2);
}
}
```
for details.
// Act: process the same file twice with the same configuration
let out1 = process_file(&path, &config).expect("first run failed");
let out2 = process_file(&path, &config).expect("second run failed");
// Assert: parsed JSON structures are identical
let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");
prop_assert_eq!(v1, v2);
}
}
>>>>>>> REPLACE
</file_operation>
</file_operations>
<additional_changes>
1. Configure JSON mode on `BatlessConfig` inside `json_mode_output_is_deterministic`:
- If you have a mode enum, import it and set it, e.g.:
- `use batless::{process_file, BatlessConfig, Mode};`
- `config.mode = Mode::Json;`
- Or, if JSON is enabled via flags, set them accordingly, e.g.:
- `config.json = true;` or `config.output_format = OutputFormat::Json;`
2. Adjust the `process_file` call signature if it differs in your codebase:
- If it expects a `FileInfo` instead of a `PathBuf`, construct one from `path` and pass `&file_info`.
- If it is fallible in another way (e.g. returns `Result<String, Error>` with a different type), adapt the `expect(...)` calls.
3. Ensure `serde_json` is available in `dev-dependencies` in `Cargo.toml` (or regular `dependencies` if already used elsewhere):
- Under `[dev-dependencies]`: `serde_json = "1"`
4. If there is already a surrounding `proptest! { ... }` block in this file, you may instead want to move the new `json_mode_output_is_deterministic` test into that existing block to keep style consistent and avoid nested `proptest!` macros.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Derive unique sorted list from all extension mappings | ||
| let all_extensions = [ | ||
| "rs", | ||
| "py", | ||
| "js", | ||
| "ts", | ||
| "go", | ||
| "java", | ||
| "cpp", | ||
| "c", |
There was a problem hiding this comment.
suggestion (bug_risk): The hard-coded all_extensions list can drift out of sync with extension_to_language’s match arms.
This duplication means new extensions added to extension_to_language may not appear in list_languages, causing silent omissions. It would be safer to have both detection and listing derive from a single shared mapping (e.g., a static table) as the source of truth.
Suggested implementation:
/// Get sorted list of all known languages
///
/// This derives the list from the same extension→language mapping that
/// `extension_to_language` uses, so that both stay in sync.
pub fn list_languages() -> Vec<String> {
let mut languages: Vec<String> = EXTENSION_TO_LANGUAGE
.values()
.cloned()
.collect();
languages.sort();
languages.dedup();
languagesTo fully realize the “single source of truth” approach and avoid drift:
-
Define a shared mapping in
src/language.rs, for example near the top of the file:use std::collections::HashMap; use once_cell::sync::Lazy; pub static EXTENSION_TO_LANGUAGE: Lazy<HashMap<&'static str, String>> = Lazy::new(|| { let mut m = HashMap::new(); m.insert("rs", "Rust".to_string()); m.insert("py", "Python".to_string()); m.insert("js", "JavaScript".to_string()); m.insert("ts", "TypeScript".to_string()); m.insert("go", "Go".to_string()); m.insert("java", "Java".to_string()); m.insert("cpp", "C++".to_string()); m.insert("c", "C".to_string()); m.insert("rb", "Ruby".to_string()); // …add all the other extensions/languages currently handled in `extension_to_language` m });
Adjust the exact mapping values to match the
extension_to_languagesemantics already in this file, and reuse existing crates (e.g., if you already uselazy_staticinstead ofonce_cell, prefer that). -
Refactor
extension_to_languageto use this mapping instead of hard-codedmatcharms, for example:pub fn extension_to_language(ext: &str) -> Option<&str> { EXTENSION_TO_LANGUAGE.get(ext).map(|s| s.as_str()) }
This ensures that adding a new extension only requires updating
EXTENSION_TO_LANGUAGE, and both detection and listing will automatically stay in sync. -
Remove or update any remaining hard-coded extension lists in this file (or others) that duplicate this mapping, so all consumers rely on
EXTENSION_TO_LANGUAGE.
| } | ||
|
|
||
| #[test] | ||
| fn test_highlight_content_deterministic(content in ".*") { |
There was a problem hiding this comment.
suggestion (testing): Replace removed highlight property test with a property test for a remaining formatter (e.g. JSON or AST)
With test_highlight_content_deterministic removed, we’ve lost a useful property-based check for deterministic formatting. To retain that coverage under the new architecture, please add a proptest that, for random content, builds a FileInfo and verifies determinism for one of the remaining modes, e.g.:
- JSON mode: run the JSON formatter (or
batless --mode=json) twice on the same input and assert identical JSON output (or identicalserde_json::Valueafter parsing). - AST mode (for a supported language): run twice and assert identical JSON, ignoring any known non-deterministic fields.
This keeps the guarantee that formatting is stable across runs, which is important for AI-facing structured outputs.
Suggested implementation:
use batless::{process_file, BatlessConfig};
use proptest::prelude::*;
use std::io::Write;
use tempfile::NamedTempFile;
use serde_json::Value;
// Property test: JSON-mode output should be deterministic for arbitrary input.
proptest! {
#[test]
fn json_mode_output_is_deterministic(content in ".*") {
// Arrange: write randomized content into a temporary file
let mut file = NamedTempFile::new().expect("failed to create temp file");
write!(file, "{}", content).expect("failed to write to temp file");
file.flush().expect("failed to flush temp file");
let path = file.path().to_path_buf();
// Arrange: configure batless for JSON/structured output
let mut config = BatlessConfig::default();
// NOTE: you need to set the mode/flags on `config` so that `process_file`
// produces JSON output. See <additional_changes> for details.
// Act: process the same file twice with the same configuration
let out1 = process_file(&path, &config).expect("first run failed");
let out2 = process_file(&path, &config).expect("second run failed");
// Assert: parsed JSON structures are identical
let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");
prop_assert_eq!(v1, v2);
}
}for details.
// Act: process the same file twice with the same configuration
let out1 = process_file(&path, &config).expect("first run failed");
let out2 = process_file(&path, &config).expect("second run failed");
// Assert: parsed JSON structures are identical
let v1: Value = serde_json::from_str(&out1).expect("first output is not valid JSON");
let v2: Value = serde_json::from_str(&out2).expect("second output is not valid JSON");
prop_assert_eq!(v1, v2);
}
}
REPLACE
</file_operation>
</file_operations>
<additional_changes>
- Configure JSON mode on
BatlessConfiginsidejson_mode_output_is_deterministic:- If you have a mode enum, import it and set it, e.g.:
use batless::{process_file, BatlessConfig, Mode};config.mode = Mode::Json;
- Or, if JSON is enabled via flags, set them accordingly, e.g.:
config.json = true;orconfig.output_format = OutputFormat::Json;
- If you have a mode enum, import it and set it, e.g.:
- Adjust the
process_filecall signature if it differs in your codebase:- If it expects a
FileInfoinstead of aPathBuf, construct one frompathand pass&file_info. - If it is fallible in another way (e.g. returns
Result<String, Error>with a different type), adapt theexpect(...)calls.
- If it expects a
- Ensure
serde_jsonis available indev-dependenciesinCargo.toml(or regulardependenciesif already used elsewhere):- Under
[dev-dependencies]:serde_json = "1"
- Under
- If there is already a surrounding
proptest! { ... }block in this file, you may instead want to move the newjson_mode_output_is_deterministictest into that existing block to keep style consistent and avoid nestedproptest!macros.
There was a problem hiding this comment.
💡 Codex Review
Lines 33 to 35 in 30ce343
AST mode advertises TSX/JSX support, but language detection never classifies .tsx or .jsx files because the extension map only covers js/ts. In those files file_info.language stays None, so AstFormatter::parse_to_tree falls back to parser: "none" and root: null instead of returning a parse tree. Adding tsx/jsx mappings here is needed for the new AST feature to work on React code.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/language.rs
Outdated
| path.extension() | ||
| .and_then(|e| e.to_str()) | ||
| .and_then(Self::extension_to_language) |
There was a problem hiding this comment.
Handle extensionless filenames in language detection
The new detector only checks Path::extension(), so extensionless-but-language-specific filenames (for example Dockerfile and Makefile) are now never detected. This is a regression from the previous syntect-based path detection and causes downstream features (language-aware summaries/indexing/comment stripping) to run as unknown language on those common files.
Useful? React with 👍 / 👎.
| { | ||
| continue; | ||
| } | ||
| collect_files_recursive(&path, out); |
There was a problem hiding this comment.
Prevent symlink cycles in directory index traversal
The recursive directory walker calls itself for every directory path without checking for symlinks or already-visited directories. If the indexed tree contains a symlink cycle (or repeated links), --mode=index <dir> can recurse indefinitely and eventually stack-overflow/crash. Add a symlink guard (symlink_metadata) or visited inode/path tracking before recurring.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
v0.6.0 release that pivots batless from terminal syntax highlighting toward AI-native, structured outputs (index / AST / JSON / summary), removing highlighting/theme/wizard functionality and consolidating formatting behind a shared Formatter trait.
Changes:
- Removed syntect-based highlighting, theme management, and the interactive configuration wizard; default output mode becomes
plain. - Consolidated all output modes behind
src/formatters/*withsrc/formatter.rsacting as a dispatcher, and introduced--mode=ast. - Added directory-walking behavior for
--mode=index <dir>producing NDJSON (one compact JSON object per file).
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/property_tests.rs | Removes property tests tied to removed highlighting API. |
| tests/integration_tests.rs | Updates integration coverage to plain mode; removes theme/highlight tests. |
| tests/cli_documentation_tests.rs | Updates CLI behavior expectations after default mode change; removes theme docs tests. |
| tests/cli_coverage_tests.rs | Removes wizard/profile/theme CLI coverage tests that no longer apply. |
| src/wizard.rs | Removes the interactive configuration wizard module entirely. |
| src/profile.rs | Removes theme from custom profiles and stops applying theme to config. |
| src/main.rs | Adds directory index handling for --mode=index and removes theme/wizard special commands. |
| src/lib.rs | Removes highlighter/wizard exports and APIs; trims theme-related tests/exports. |
| src/language.rs | Replaces syntect-based detection with a static extension→language map; removes ThemeManager. |
| src/highlighter.rs | Removes syntect-based SyntaxHighlighter implementation entirely. |
| src/formatters/summary_formatter.rs | Adds trait-based summary formatter implementation. |
| src/formatters/plain_formatter.rs | Adds trait-based plain formatter implementation (including line numbering). |
| src/formatters/mod.rs | Registers all formatter modules (plain/json/summary/index/ast/error). |
| src/formatters/json_formatter.rs | Adds trait-based JSON formatter implementation. |
| src/formatters/error_formatter.rs | Removes highlight/theme error labels from error formatting. |
| src/formatters/ast_formatter.rs | Adds AST mode formatter (tree-sitter parse tree serialized as JSON). |
| src/formatter.rs | Replaces inline formatting logic with dispatcher to formatter modules; adds OutputMode::Ast. |
| src/error.rs | Removes highlight/theme error variants and codes. |
| src/config.rs | Removes theme from config, related defaults, merging, and tests. |
| src/config_validation.rs | Removes theme validation. |
| src/config_manager.rs | Removes theme/wizard/profile CLI flags; default output mode becomes plain; adds AST mode. |
| scripts/batless-stats | Adds a Python tool to analyze NDJSON usage logs (developer instrumentation). |
| scripts/batless-logger | Adds a bash wrapper to log batless invocations as NDJSON (developer instrumentation). |
| ROADMAP.md | Updates v0.6.0 roadmap to reflect AI-native pivot and removal of highlighting/wizard. |
| README.md | Repositions product messaging and documents new index/ast behaviors. |
| docs/USAGE_TRACKING.md | Documents the new usage logging and stats scripts. |
| CLAUDE.md | Adds guidance for AI assistants on when batless provides value vs built-in file tools. |
| CHANGELOG.md | Adds v0.6.0 changelog entry describing breaking changes and new features. |
| Cargo.toml | Bumps version to 0.6.0; removes syntect/strip-ansi-escapes; updates dirs dependency. |
| Cargo.lock | Updates lockfile accordingly (removes syntect dependency graph; bumps dirs). |
Comments suppressed due to low confidence (1)
src/language.rs:36
LanguageDetector::extension_to_languagedoesn’t maptsx/jsx, but--mode=astandAstFormatterexpect languagesTSX/JSX. As a result,.tsx/.jsxfiles won’t be auto-detected and users can’t even force--language TSX/JSXbecausevalidate_language()is derived fromlist_languages(). Add extension mappings fortsx/jsx(and include them inlist_languages) so AST/index support for TSX/JSX is reachable.
/// Map file extensions to language names
pub fn extension_to_language(extension: &str) -> Option<String> {
let language_name = match extension.to_lowercase().as_str() {
"rs" => "Rust",
"py" => "Python",
"js" => "JavaScript",
"ts" => "TypeScript",
"go" => "Go",
"java" => "Java",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/language.rs
Outdated
| /// Detect the programming language from a file path (extension-based) | ||
| pub fn detect_language(file_path: &str) -> Option<String> { | ||
| let path = Path::new(file_path); | ||
|
|
||
| get_syntax_set_internal() | ||
| .find_syntax_for_file(path) | ||
| .ok() | ||
| .flatten() | ||
| .map(|syntax| syntax.name.clone()) | ||
| path.extension() | ||
| .and_then(|e| e.to_str()) | ||
| .and_then(Self::extension_to_language) | ||
| } |
There was a problem hiding this comment.
detect_language() only checks path.extension(), but the mapping includes dockerfile/makefile which are typically filenames without extensions (e.g. Dockerfile, Makefile). Those will currently return None even though they appear “supported” via list_languages(). Consider special-casing path.file_name() for these names (and/or removing them from the extension map if you don’t intend to support filename-based detection).
| let language = file_info.language.as_deref(); | ||
| let content = file_info.lines.join("\n"); | ||
| let source = content.as_bytes(); | ||
|
|
||
| let (root_value, parser_name) = match Self::parse_to_tree(&content, language) { | ||
| Some((tree, name)) => { | ||
| let root = tree.root_node(); | ||
| (Self::node_to_json(root, source, 0), name) | ||
| } | ||
| None => (Value::Null, "none"), | ||
| }; | ||
|
|
||
| let output = json!({ | ||
| "file": file_path, | ||
| "language": language, | ||
| "mode": "ast", | ||
| "parser": parser_name, | ||
| "total_lines": file_info.total_lines, | ||
| "total_bytes": file_info.total_bytes, | ||
| "root": root_value, | ||
| }); |
There was a problem hiding this comment.
AstFormatter builds content from file_info.lines, which may have been transformed by summary extraction and/or --strip-comments / --strip-blank-lines in FileProcessor::apply_post_processing. That means --mode=ast is not a raw parse tree of the original file, and the reported total_lines/total_bytes can diverge from the parsed source. Consider parsing from the original, unmodified lines (store them on FileInfo whenever post-processing mutates lines) and/or emitting both original vs parsed/processed metadata so the AST output is self-consistent.
src/main.rs
Outdated
| let Ok(entries) = std::fs::read_dir(dir) else { | ||
| return; | ||
| }; | ||
| let mut entries: Vec<_> = entries.flatten().collect(); |
There was a problem hiding this comment.
collect_files_recursive silently returns on read_dir errors, which can produce incomplete directory indexes without any indication to the caller (e.g. permission issues). Since index mode is meant for automation, it’s safer to propagate the error (or emit a structured per-directory error object) so consumers can detect partial results.
| let Ok(entries) = std::fs::read_dir(dir) else { | |
| return; | |
| }; | |
| let mut entries: Vec<_> = entries.flatten().collect(); | |
| let entries = std::fs::read_dir(dir).unwrap_or_else(|error| { | |
| panic!("Failed to read directory '{}': {error}", dir.display()) | |
| }); | |
| let mut entries: Vec<_> = entries | |
| .map(|entry| { | |
| entry.unwrap_or_else(|error| { | |
| panic!( | |
| "Failed to read directory entry in '{}': {error}", | |
| dir.display() | |
| ) | |
| }) | |
| }) | |
| .collect(); |
| for entry in entries { | ||
| let path = entry.path(); | ||
| if path.is_dir() { | ||
| // Skip hidden directories | ||
| if path | ||
| .file_name() | ||
| .and_then(|n| n.to_str()) | ||
| .map(|n| n.starts_with('.')) | ||
| .unwrap_or(false) | ||
| { | ||
| continue; | ||
| } | ||
| collect_files_recursive(&path, out); | ||
| } else if path.is_file() { | ||
| out.push(path); | ||
| } |
There was a problem hiding this comment.
collect_files_recursive uses path.is_dir() / path.is_file() which follow symlinks. A symlinked directory can introduce cycles (infinite recursion) or unexpectedly traverse outside the requested tree. Consider using entry.file_type() / symlink_metadata to detect and skip symlinks (or track visited canonical paths/inodes) to make directory indexing safe and predictable.
| Ok(pretty) => { | ||
| // Compact the pretty JSON to a single line for NDJSON | ||
| serde_json::from_str::<serde_json::Value>(&pretty) | ||
| .and_then(|v| serde_json::to_string(&v)) | ||
| .unwrap_or(pretty) | ||
| } |
There was a problem hiding this comment.
Directory index mode compacts NDJSON by parsing the already-generated JSON string (from_str -> to_string) for every file. This adds avoidable CPU/alloc overhead for large trees. A cleaner approach is to have IndexFormatter emit compact JSON directly (or return a serde_json::Value that can be serialized once), and only pretty-print for single-file index output when desired.
| Ok(pretty) => { | |
| // Compact the pretty JSON to a single line for NDJSON | |
| serde_json::from_str::<serde_json::Value>(&pretty) | |
| .and_then(|v| serde_json::to_string(&v)) | |
| .unwrap_or(pretty) | |
| } | |
| Ok(compact) => compact, |
| [dependencies] | ||
| clap = { version = "4", features = ["derive"] } | ||
| clap_complete = "4.6" | ||
| syntect = { version = "5", default-features = false, features = [ | ||
| "parsing", | ||
| "default-syntaxes", | ||
| "default-themes", | ||
| "html", | ||
| "dump-load", | ||
| "regex-onig", | ||
| ] } | ||
| is-terminal = "0.4" | ||
| termcolor = "1.4" |
There was a problem hiding this comment.
PR description / issue scope says is-terminal and termcolor were removed, but both are still listed as dependencies. Also, is-terminal appears to be used only for stdout().is_terminal()-style checks, which can be done via std::io::IsTerminal on Rust 1.85. Either update the PR description to match reality, or remove is-terminal (and potentially termcolor) and switch to the std API / plain stderr output accordingly.
scripts/batless-logger
Outdated
| --chunk-strategy=*|--language=*|--theme=*|--ai-model=*) | ||
| EXTRA_FLAGS+=("${arg%%=*}") ;; | ||
| --version|--version-json|--list-languages|--list-themes) |
There was a problem hiding this comment.
batless-logger still special-cases removed flags like --theme=* and --list-themes. With v0.6.0 these invocations will always fail, and keeping them in the “known flags” set can skew usage stats and signatures. Consider removing them (or explicitly tagging them as legacy/deprecated) so the logger reflects the current CLI surface.
| --chunk-strategy=*|--language=*|--theme=*|--ai-model=*) | |
| EXTRA_FLAGS+=("${arg%%=*}") ;; | |
| --version|--version-json|--list-languages|--list-themes) | |
| --chunk-strategy=*|--language=*|--ai-model=*) | |
| EXTRA_FLAGS+=("${arg%%=*}") ;; | |
| --version|--version-json|--list-languages) |
- language: detect extensionless files (Dockerfile, Makefile, Vagrantfile, Justfile) via filename_to_language() fallback after extension lookup - main: prevent symlink cycles in collect_files_recursive() by using symlink_metadata() and skipping symlinks entirely; log read_dir/stat errors via eprintln! instead of silently swallowing them - ast_formatter: respect config.pretty_json flag (was always pretty-printing) - batless-logger: remove stale --theme=* and --list-themes references removed in v0.6.0 syntect pivot Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
v0.6.0 release — pivots batless from a highlighting tool to a pure AI-native structured output tool. Four phases of work:
--mode=highlight,--theme, and related flags. Default mode is nowplain.Formattertrait insrc/formatters/.formatter.rsis a thin dispatcher.batless --mode=index <dir>walks directories, emitting one compact NDJSON line per file.--mode=ast: Raw tree-sitter parse tree as JSON (Rust/Python/JS/TS/TSX;nullroot for others).Breaking changes
--mode=highlightremoved (usebatfor terminal highlighting)--theme,--list-themes,--configure,--list-profiles,--edit-profileremovedhighlight→plainsyntect 5andstrip-ansi-escapes 0.2removed from dependenciesNew features
Test plan
cargo fmtandcargo clippycleanCloses #118
🤖 Generated with Claude Code
Summary by Sourcery
Pivot batless from a syntax-highlighting code viewer to an AI-native structured code analysis tool, removing highlighting/theming and adding new AST and directory index capabilities while consolidating formatters and updating docs and config accordingly.
New Features:
astoutput mode that emits tree-sitter parse trees as structured JSON for supported languages.--mode=indexcan walk directories and output one NDJSON object per file.Enhancements:
Formattertrait behind a thin dispatcher.plainthe default output mode.Build:
Documentation:
Tests:
Chores: