Skip to content

cuonqcon333/read

Repository files navigation

@caplab/read

A read-only workspace access package for LLM agents and developer tooling with deterministic path handling, budget-aware file operations, and encoding-safe text processing.

Why This Package Exists

AI agents and developer tools need safe, predictable file system access. Traditional file APIs can leak outside workspace boundaries, produce unbounded output, or fail silently on encoding issues. @caplab/read provides a controlled, read-only interface with explicit budgets and deterministic behavior.

Problems It Solves

  • Workspace boundary enforcement: Prevents reading files outside the designated workspace root
  • Output budgeting: Limits bytes, lines, and file counts to prevent unbounded responses
  • Encoding safety: Auto-detects and handles UTF-8, UTF-8 BOM, UTF-16LE, UTF-16BE without silent failures
  • Binary detection: Identifies binary files and handles them predictably
  • Deterministic ordering: Same inputs produce the same outputs across runs
  • Symlink safety: Resolves symlinks to prevent boundary escape attacks

What It Intentionally Does Not Do

  • Write or modify any files (strictly read-only)
  • Execute code or commands
  • Provide write operations or file system mutations
  • Support arbitrary path traversal outside workspace root
  • Handle encodings outside UTF-8/UTF-16 family

Installation

npm install @caplab/read

ESM-Only Usage

This package is ESM-only. Use import syntax:

import { createWorkspaceReader } from "@caplab/read";

Quick Start

import { createWorkspaceReader } from "@caplab/read";

const reader = await createWorkspaceReader({
	workspaceRoot: "./my-project",
});

// List directory
const dirResult = await reader.listDirectory("src");
console.log(dirResult);

// Read file
const fileResult = await reader.readFile("src/index.ts");
console.log(fileResult);

// Search files
for await (const match of reader.fileSearch("function")) {
	console.log(match);
}

API Overview

createWorkspaceReader(options)

Creates a configured workspace reader instance. Throws an error if workspaceRoot is invalid (does not exist or is not a directory).

interface WorkspaceReaderOptions {
	workspaceRoot: string; // Required absolute path
	maxBytes?: number; // Default: 262144 (256 KiB)
	maxLines?: number; // Default: 2000
	maxFiles?: number; // Default: 20
	maxTotalBytes?: number; // Default: 1048576 (1 MiB)
	maxEntries?: number; // Default: 1000
	maxSearchResults?: number; // Default: 200
	maxDepth?: number; // Default: 10
}

listDirectory(path, options)

Lists files and directories with deterministic ordering. Returns absolute path and normalized relativePath for each entry.

interface ListDirectoryOptions {
	maxDepth?: number; // Default: unlimited
	includeHidden?: boolean; // Default: false
	include?: string[]; // Glob patterns
	exclude?: string[]; // Glob patterns
	maxEntries?: number; // Default: 1000
}

readFile(path, options)

Reads a single file with byte and line budgets. Returns absolute path and normalized relativePath. Supported encodings: UTF-8, UTF-8 BOM, UTF-16LE, UTF-16BE.

interface ReadFileOptions {
	maxBytes?: number; // Default: 262144
	maxLines?: number; // Default: 2000
	encoding?: "utf-8" | "utf-16le" | "utf-16be" | "auto"; // Default: 'auto'
}

readMultipleFiles(paths, options)

Reads multiple files with bounded parallelism and deterministic admission. Returns results with absolute path and normalized relativePath for each file.

interface ReadMultipleFilesOptions {
	maxFiles?: number; // Default: 20
	maxBytes?: number; // Default: 262144 (per-file)
	maxTotalBytes?: number; // Default: 1048576
	maxConcurrency?: number; // Default: 5
}

fileSearch(query, options)

Search files using an adapter over @caplab/grep-search. This is not a second search engine—it delegates search behavior to @caplab/grep-search with workspace boundary enforcement and budget limits.

interface FileSearchOptions {
	regex?: boolean;
	wholeWord?: boolean;
	caseSensitive?: boolean;
	multiline?: boolean;
	extensions?: string[];
	ignore?: string[];
	maxDepth?: number;
	beforeContext?: number;
	afterContext?: number;
	maxResults?: number; // Default: 200
}

Default Budgets and Limits

  • readFile.maxBytes: 262144 (256 KiB)
  • readFile.maxLines: 2000
  • readMultipleFiles.maxFiles: 20
  • readMultipleFiles.maxTotalBytes: 1048576 (1 MiB)
  • listDirectory.maxEntries: 1000
  • fileSearch.maxResults: 200

Error Model

Expected operational failures return structured result objects with explicit error codes:

  • file_not_found: File does not exist
  • permission_denied: Insufficient permissions
  • binary_file: File detected as binary
  • unsupported_encoding: Encoding not supported
  • path_outside_workspace: Path outside workspace root
  • skipped_due_to_budget: Skipped due to maxTotalBytes exhaustion
  • skipped_due_to_max_files: Skipped due to maxFiles limit

Only invalid API usage or invalid configuration throws exceptions.

Hard Guarantees / Invariants

  1. Read-only guarantee: The package never mutates workspace state
  2. Deterministic ordering: Same input produces same output across runs
  3. Fail-closed boundaries: Symlink resolution prevents workspace escape
  4. Encoding safety: Text decoding is explicit and predictable
  5. Output budgeting: Large files and directory trees don't create unbounded output
  6. Binary safety: Binary files are detected and handled predictably
  7. Path consistency: All operations return absolute path and normalized relativePath

Relationship to @caplab/grep-search

fileSearch is an adapter over @caplab/grep-search. It:

  • Delegates search behavior to @caplab/grep-search
  • Enforces workspace boundary with cwd parameter
  • Applies default maxResults budget
  • Forwards search options (regex, caseSensitive, etc.) via allowlist
  • Does not implement a separate search engine

Node Version Requirement

Node.js >= 20.0.0

License

MIT License - see LICENSE file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors