Skip to content

Land real file source#23

Merged
simonsmallchua merged 3 commits into
mainfrom
work/file-source
May 6, 2026
Merged

Land real file source#23
simonsmallchua merged 3 commits into
mainfrom
work/file-source

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented May 6, 2026

Summary

  • The file source becomes a real implementation: reads a single text file from disk and yields its lines, with an optional encoding knob (default "utf-8", undecodable bytes replaced with U+FFFD).
  • Required option: path. Existence is checked at capture time (not config load) so log files that come and go under rotation don't block startup.
  • Supports format / format_keys with the same conflict rules as flyctl. Pairs naturally with the regex format presets shipped in Wire regex format presets into sources #21.
  • Cursor-filter compatibility documented honestly: still keys on a leading ISO-8601 timestamp, so files with non-leading-TS line shapes (Apache combined, nginx default, RFC 5424) are best run through one-shot capture + paperbark analyse rather than the long-running paperbark monitor loop. Format-aware cursor mode added to the v0.2+ shortlist.

Second item from the v0.2 shortlist in docs/ROADMAP.md.

Changes

  • src/paperbark/sources/file.py: real FileSource (was a NotImplementedError stub). Constructor: path, encoding, format_keys, line_format. Stateless across capture() calls.
  • src/paperbark/dispatcher.py: build_source file branch now validates path (required) + encoding, threads format / format_keys with the conflict check, attaches line_format to the source instance.
  • Docs: status table flipped, full file option section added in docs/SOURCES.md; matching #### file options block in docs/CONFIG.md; v0.2 roadmap shortlist updated; CHANGELOG.md Unreleased entry.
  • Tests: 6 new FileSource tests (yield, protocol, validation, missing file, encoding, undecodable bytes), 7 new dispatcher tests (path required/empty/non-string, unknown option, encoding threading, format preset attachment, format/format_keys conflict, end-to-end capture_iteration). Existing stub-test parametrise lists trimmed to drop file.

Test plan

  • uv run ruff check src/ tests/
  • uv run ruff format --check src/ tests/
  • uv run mypy src/ tests/
  • uv run pytest -q (382 passed, +14 over previous PR)
  • uv run --with pip-audit pip-audit (no vulnerabilities)
  • uv run pre-commit run --files <touched>

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

  • New Features

    • File source: read a single log file from disk with configurable path and encoding (UTF‑8 default) and validated options.
    • Format presets for line parsing: json, apache-combined, nginx-default, syslog-rfc5424.
  • Documentation

    • Config, sources and roadmap docs updated with file-source options, examples and guidance.
  • Tests

    • Expanded unit and end‑to‑end tests covering file‑source behaviour, option validation and cursor handling.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 545b979f-32e8-4f30-be8a-23860e0c7710

📥 Commits

Reviewing files that changed from the base of the PR and between ddf9c37 and bee123a.

📒 Files selected for processing (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Adds a concrete FileSource that reads a single on-disk text file (configurable path, encoding, optional format presets and format_keys), updates dispatcher.build_source to validate and construct FileSource with format/format_keys rules, and adds docs and tests covering the new source and config.

Changes

File Source Implementation & Integration

Layer / File(s) Summary
Public API / Types
src/paperbark/sources/file.py
Adds FileSource class with `init(*, path: str
Core Implementation
src/paperbark/sources/file.py
Implements capture() to re-open the file each call and yield lines; includes module docstring and stateless capture semantics aligned with cursor/dedup contract.
Dispatcher Integration & Validation
src/paperbark/dispatcher.py
build_source file-type branch now requires and validates path (non-empty string), threads encoding (default "utf-8"), resolves format presets to line_format, enforces format_keys only for JSON formats, and constructs FileSource(...).
Configuration Documentation
docs/CONFIG.md, docs/SOURCES.md
Adds a file options subsection documenting path, encoding, format, format_keys, example TOML, cursor-filter notes; removes old file-tail docs and updates built-in sources list and stubs.
Roadmap & Changelog
docs/ROADMAP.md, CHANGELOG.md
Documents the new real file source in Unreleased and notes format-aware cursor mode and v0.2 file-source landing.
Tests
tests/test_dispatcher.py, tests/test_sources.py
Adds tests validating required/typed path, empty-path rejection, unknown-option rejection, encoding threading, format preset attachment, forbid format+format_keys, and end-to-end capture processing; removes FileSource from stub parameterisations and groups file tests under v0.2 header.

Possibly related PRs

  • Good-Native/paperbark#23: Implements the same real FileSource, dispatcher updates, docs and tests — directly related.
  • Good-Native/paperbark#21: Modifies dispatcher format-handling and line_format propagation with non-JSON format constraints.
  • Good-Native/paperbark#12: Adds generic unknown-option rejection helper to dispatcher that complements per-source option validation.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Land real file source' clearly and concisely summarizes the main change—implementing a real file source implementation to replace the previous stub.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 8-23: The Unreleased changelog entry uses raw bullets but should
follow the project's Keep-a-Changelog subsection pattern; insert a "### Added"
header above the existing bullet list in the "## [Unreleased]" section so the
new `file` source details (options like `path`, `encoding`, supported `format` /
`format_keys`, and cursor behavior) appear under an "Added" subsection
consistent with released entries (see other version entries for style).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b51c4a44-1145-4249-a18a-4f51f60cfab6

📥 Commits

Reviewing files that changed from the base of the PR and between 01c51f3 and ddf9c37.

📒 Files selected for processing (1)
  • CHANGELOG.md

Comment thread CHANGELOG.md
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@simonsmallchua simonsmallchua merged commit dffe794 into main May 6, 2026
4 of 5 checks passed
@simonsmallchua simonsmallchua mentioned this pull request May 6, 2026
5 tasks
@coderabbitai coderabbitai Bot mentioned this pull request May 9, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant