Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ These are enforced by CI. Non-negotiable.
- **One concern per module.** If you're adding >50 lines of new logic to an existing file, it belongs in its own module. cycle.py handles cycle logic — not scoring. cli.py handles CLI — not business logic.
- **No hardcoded data in logic files.** Regex patterns, score maps, category weights, thresholds — these go in `constants.py` or a dedicated `*_patterns.py`. Logic files import them.
- **New module checklist:** create the `.py` file, add to `__init__.py` re-exports, add to `scripts/install.sh` PACKAGE_FILES, add to this file's structure tree.
- **Follow the dependency flow:** `types -> constants -> errors -> shell -> config/state -> worktree -> cycle -> scoring -> costs -> cleanup -> multi -> profiler -> planner -> decomposer -> subagent -> integrator -> feature -> cli`. New modules slot into this chain. No circular imports. (`multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.)
- **Follow the dependency flow:** `types -> constants -> errors -> shell -> config/state -> worktree -> cycle -> scoring -> costs -> cleanup/compact -> multi -> profiler -> planner -> decomposer -> subagent -> integrator -> feature -> cli`. New modules slot into this chain. No circular imports. (`multi.py` uses a late import of `run_nightshift` from `cli.py` to avoid circular deps.)
- **Functions over inline code.** If a block of code does one thing and is >10 lines, extract it into a named function. The function name documents the intent.
- **Config over magic numbers.** If a value might change (thresholds, limits, timeouts), put it in `DEFAULT_CONFIG` and `types.py`, not inline.

Expand Down
2 changes: 2 additions & 0 deletions docs/changelog/v0.0.7.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Security hardening for running against untrusted target repositories.
- **[feat]** Interactive daemon setup: running any daemon script with no arguments now prompts for agent choice (claude/codex) and duration (2h/4h/6h/8h/unlimited), with a confirmation step before starting. Duration converts to a max-sessions count (hours * 2, assuming 30-min sessions). Passing arguments directly (e.g., `daemon.sh claude 60 10`) skips prompts for backward compatibility. Shared functions `interactive_setup()` and `interactive_setup_strategist()` live in `lib-agent.sh`. All 4 daemon scripts updated. (task #0034)
- **[feat]** Cost tracking and budget ceiling for daemon sessions. After each session, the daemon parses token usage from the stream-json log (input, cache creation, cache read, output), calculates USD cost using model-specific pricing, and appends to a cumulative ledger (`docs/sessions/costs.json`). Session index now includes a cost column. Set `NIGHTSHIFT_BUDGET=50` to auto-stop the daemon when cumulative spend exceeds the limit. Interactive setup includes a budget prompt ($25/$50/$100/unlimited). Supports Claude Opus, Sonnet, and Haiku pricing. New Python module `nightshift/costs.py` with 32 tests. All 3 looping daemon scripts instrumented. (task #0026)
- **[feat]** Daemon log rotation and orphan branch pruning. At the start of each daemon cycle, old session logs (>.7 days) are automatically deleted and remote branches created by nightshift that have no open PR are pruned. Configurable via `NIGHTSHIFT_KEEP_LOGS` env var (default: 7 days). Only branches matching daemon prefixes (`feat/`, `fix/`, `docs/`, `refactor/`, `release/`, `test/`) are pruned; protected branches (`main`, `master`, `develop`) are never touched. New Python module `nightshift/cleanup.py` with 31 tests. All 3 looping daemon scripts instrumented. (task #0027)
- **[feat]** Automated handoff compaction in daemon. At the start of each daemon cycle, numbered handoff files (`docs/handoffs/NNNN.md`) are counted. When 7+ files exist, they are automatically compacted into a weekly summary in `docs/handoffs/weekly/` and the originals are deleted. Prevents context bloat from accumulating handoffs. New Python module `nightshift/compact.py` with 14 tests. All 3 looping daemon scripts instrumented. (task #0030)

- **[security]** Instruction file size cap: `read_repo_instructions()` now enforces a per-file limit (`MAX_INSTRUCTION_FILE_BYTES`, 10 KB) and a total limit across all files (`MAX_INSTRUCTION_TOTAL_BYTES`, 30 KB). Files exceeding the per-file limit are truncated; files that would exceed the total budget are either partially truncated or skipped entirely. Truncation warnings are injected into the output so the agent knows content was cut. Prevents malicious repos from flooding the agent's context window with oversized instruction files. (`nightshift/constants.py`, `nightshift/cycle.py`; task #0036)

Expand All @@ -44,3 +45,4 @@ Security hardening for running against untrusted target repositories.
- 4 new tests for prompt guard new-file detection (bash subprocess tests): new file in existing directory, new file when directory was empty, no false positives when unchanged, new directory created during cycle
- 19 new tests covering `read_repo_instructions()` (empty dir, single file, multiple files, missing files, empty files, content preservation, nested paths), `wrap_repo_instructions()` (empty input, whitespace, preamble/suffix ordering, behavioral warnings), and prompt injection protection in `build_prompt()` (omitted when empty, wrapped when present, old instruction removed, shift log preserved, ordering, adversarial content handling)
- 20 new tests for Codex/OpenAI cost tracking: `parse_session_tokens()` Codex format (turn.completed parsing, model_hint fallback, no-cached tokens, missing usage, mixed events, missing file with hint), `calculate_cost()` for gpt-5.4/mini/nano pricing (input, output, cache read, mixed), `record_session()` with Codex agent (default model resolution, non-zero cost), constants (pricing entries, AGENT_DEFAULT_MODELS)
- 14 new tests for handoff compaction: below threshold no-op, at/above threshold compacts, weekly summary format, decisions/issues preservation, ignores non-numbered files, duplicate weekly suffix, nonexistent/empty dir, custom threshold, weekly dir auto-creation, session range ordering, ISO week in filename
47 changes: 47 additions & 0 deletions docs/handoffs/0027.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Handoff #0027
**Date**: 2026-04-04
**Version**: v0.0.7 in progress

## What I Built
- **Task #0030** (Automated handoff compaction in daemon): New Python module `nightshift/compact.py` with `compact_handoffs()` function that auto-compacts numbered handoff files into weekly summaries when 7+ accumulate. Shell wrapper `compact_handoffs()` in `lib-agent.sh` called by all 3 looping daemon scripts (builder, reviewer, overseer) before each cycle. Parses handoff markdown to extract session numbers, dates, versions, what was built, decisions, known issues, and state. Generates weekly summaries following the exact format in `docs/handoffs/README.md`. Handles duplicate weekly filenames with letter suffixes.
- Also merged PR #42 (README rewrite) from previous session.
- Files created: `nightshift/compact.py`
- Files modified: `nightshift/types.py`, `nightshift/constants.py`, `nightshift/__init__.py`, `scripts/lib-agent.sh`, `scripts/daemon.sh`, `scripts/daemon-review.sh`, `scripts/daemon-overseer.sh`, `scripts/install.sh`, `CLAUDE.md`, `tests/test_nightshift.py`
- Tests: +14 new, 650 total passing

## Decisions Made
- Compaction logic in Python (not bash) for testability and markdown parsing robustness
- `compact.py` at same dependency level as `cleanup.py` (depends only on constants and types)
- Threshold is a constant (`HANDOFF_COMPACTION_THRESHOLD = 7`) in constants.py
- Weekly filename collision handled with letter suffixes (b, c, d...) matching existing pattern (week-2026-W14b.md exists)
- Summary uses "Start/End of batch" format for progress section to preserve both endpoints

## Known Issues
- Task #0012 (Phractal re-validation) still pending -- needs API access
- v0.0.6 release not yet tagged
- Codex `.git/` sandbox issue untested
- OpenAI pricing should be re-verified periodically; rates change

## Current State
- Loop 1: 100% (22/22)
- Loop 2: 63% (7/11) -- unchanged
- Self-Maintaining: 54% (7/13) -- unchanged
- Meta-Prompt: 57% (4/7) -- unchanged
- Overall: 76% (weighted) -- unchanged
- Version: v0.0.7 in progress
- Test count: 650

## Evaluate
Run evaluation against Phractal for the changes merged this session.

## Next Session Should
Tasks: #0029, #0031, #0033
1. **Task #0029** (normal) -- Pin Phractal to a known-good commit for eval stability
2. **Task #0031** (normal) -- Task queue vision-alignment check
3. **Task #0033** (normal) -- Learnings verification in status report

## Where to Look
- `nightshift/compact.py` -- entire module, compaction logic
- `scripts/lib-agent.sh` lines 230-255 -- compact_handoffs shell wrapper
- `scripts/daemon.sh` line 125 -- compaction call in daemon loop
- `nightshift/constants.py` line 633 -- HANDOFF_COMPACTION_THRESHOLD
40 changes: 20 additions & 20 deletions docs/handoffs/LATEST.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# Handoff #0026
# Handoff #0027
**Date**: 2026-04-04
**Version**: v0.0.7 in progress

## What I Built
- **Task #0039** (Codex/OpenAI model pricing in cost tracker): Three changes:
1. Added pricing for gpt-5.4, gpt-5.4-mini, gpt-5.4-nano to `MODEL_PRICING` in constants.py. OpenAI cached input is ~10% of full input price (vs Claude's ~10% for cache read). `cache_creation` mirrors `input` since OpenAI has no separate cache-creation concept.
2. Updated `parse_session_tokens()` in costs.py to handle Codex `turn.completed` events. Key difference: OpenAI `input_tokens` includes cached tokens (Claude separates them). Parser subtracts `cached_input_tokens` from `input_tokens` so our data model stores non-cached input at full rate.
3. Added `AGENT_DEFAULT_MODELS` mapping and `model_hint` parameter. Codex logs don't include model identifiers, so `record_session()` uses the agent name to look up a default model. No daemon script changes needed.
- Files modified: `nightshift/constants.py`, `nightshift/costs.py`, `nightshift/__init__.py`, `tests/test_nightshift.py`, `docs/changelog/v0.0.7.md`
- Tests: +20 new, 636 total passing
- **Task #0030** (Automated handoff compaction in daemon): New Python module `nightshift/compact.py` with `compact_handoffs()` function that auto-compacts numbered handoff files into weekly summaries when 7+ accumulate. Shell wrapper `compact_handoffs()` in `lib-agent.sh` called by all 3 looping daemon scripts (builder, reviewer, overseer) before each cycle. Parses handoff markdown to extract session numbers, dates, versions, what was built, decisions, known issues, and state. Generates weekly summaries following the exact format in `docs/handoffs/README.md`. Handles duplicate weekly filenames with letter suffixes.
- Also merged PR #42 (README rewrite) from previous session.
- Files created: `nightshift/compact.py`
- Files modified: `nightshift/types.py`, `nightshift/constants.py`, `nightshift/__init__.py`, `scripts/lib-agent.sh`, `scripts/daemon.sh`, `scripts/daemon-review.sh`, `scripts/daemon-overseer.sh`, `scripts/install.sh`, `CLAUDE.md`, `tests/test_nightshift.py`
- Tests: +14 new, 650 total passing

## Decisions Made
- Priced gpt-5.4 at $2.50 in / $0.25 cached / $15.00 out per MTok (from OpenAI pricing page, April 2026)
- `cache_creation` rate set equal to `input` for OpenAI models -- it's never used (Codex logs have no cache-creation tokens) but the field must exist for `calculate_cost()` to work
- `AGENT_DEFAULT_MODELS` kept as a simple dict in constants.py rather than deriving from `DEFAULT_CONFIG` -- more explicit, less coupling
- Used `max(0, raw_input - cached)` to guard against malformed Codex logs where cached > total
- Compaction logic in Python (not bash) for testability and markdown parsing robustness
- `compact.py` at same dependency level as `cleanup.py` (depends only on constants and types)
- Threshold is a constant (`HANDOFF_COMPACTION_THRESHOLD = 7`) in constants.py
- Weekly filename collision handled with letter suffixes (b, c, d...) matching existing pattern (week-2026-W14b.md exists)
- Summary uses "Start/End of batch" format for progress section to preserve both endpoints

## Known Issues
- Task #0012 (Phractal re-validation) still pending -- needs API access
Expand All @@ -29,19 +29,19 @@
- Meta-Prompt: 57% (4/7) -- unchanged
- Overall: 76% (weighted) -- unchanged
- Version: v0.0.7 in progress
- Test count: 636
- Test count: 650

## Evaluate
Run evaluation against Phractal for the changes merged this session.

## Next Session Should
Tasks: #0038, #0029, #0030
1. **Task #0038** (low) -- Clean up non-ASCII box-drawing chars in shell scripts
2. **Task #0029** (normal) -- Pin Phractal to a known-good commit for eval stability
3. **Task #0030** (normal) -- Enforce handoff compaction in daemon
Tasks: #0029, #0031, #0033
1. **Task #0029** (normal) -- Pin Phractal to a known-good commit for eval stability
2. **Task #0031** (normal) -- Task queue vision-alignment check
3. **Task #0033** (normal) -- Learnings verification in status report

## Where to Look
- `nightshift/constants.py` lines 575-616 -- MODEL_PRICING with new OpenAI entries + AGENT_DEFAULT_MODELS
- `nightshift/costs.py` lines 13-95 -- parse_session_tokens() with Codex turn.completed handling
- `nightshift/costs.py` lines 137-138 -- record_session() model_hint integration
- `tests/test_nightshift.py` -- TestParseCodexSessionTokens, TestCalculateCostOpenAI, TestRecordSessionCodex
- `nightshift/compact.py` -- entire module, compaction logic
- `scripts/lib-agent.sh` lines 230-255 -- compact_handoffs shell wrapper
- `scripts/daemon.sh` line 125 -- compaction call in daemon loop
- `nightshift/constants.py` line 633 -- HANDOFF_COMPACTION_THRESHOLD
11 changes: 11 additions & 0 deletions docs/learnings/2026-04-04-ruff-autofix-import-sorting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Learning: Let ruff auto-fix import sorting in __init__.py

**Date**: 2026-04-04
**Session**: 0027
**Type**: optimization

## What Happened
When adding a new module (`compact.py`) to `__init__.py`, manually inserting imports and `__all__` entries in alphabetical order required 5+ edits and I still got the order wrong (e.g., `CompactionResult` before `Baseline`, `HANDOFF_COMPACTION_THRESHOLD` before `FRONTEND_EXTENSIONS`). The ruff I001 import-sort rule has very specific ordering expectations.

## Lesson
After adding new imports and `__all__` entries to `__init__.py`, run `python3 -m ruff check --fix nightshift/__init__.py` to let ruff auto-sort everything. This is faster and more reliable than manually maintaining alphabetical order in a 370-line file. Do the manual insertion approximately right, then let ruff fix it.
1 change: 1 addition & 0 deletions docs/ops/OPERATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ The Python package that IS Nightshift. The overnight hardening runner.
| `scoring.py` | Post-cycle diff scoring | `score_diff()`, `diff_line_score()`, `has_test_files()` |
| `costs.py` | Session cost tracking | `record_session()`, `parse_session_tokens()`, `calculate_cost()`, `read_ledger()`, `write_ledger()`, `total_cost()`, `format_session_cost()`, `default_ledger_path()` |
| `cleanup.py` | Daemon housekeeping | `rotate_logs()`, `prune_orphan_branches()` |
| `compact.py` | Handoff compaction | `compact_handoffs()` |
| `multi.py` | Multi-repo orchestration | `run_multi_shift()`, `validate_repos()`, `format_multi_summary()` |
| `profiler.py` | Repo analysis for Loop 2 | `profile_repo()` |
| `planner.py` | Feature planning for Loop 2 | `build_plan_prompt()`, `validate_plan()`, `parse_plan()`, `execution_order()`, `format_plan()`, `scope_check()` |
Expand Down
14 changes: 8 additions & 6 deletions docs/tasks/0030.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
status: pending
status: done
priority: normal
target: v0.0.7
created: 2026-04-03
completed:
completed: 2026-04-04
---

# Enforce handoff compaction in daemon
Expand All @@ -13,7 +13,9 @@ Compaction triggers at 7 handoff files but it's just an instruction to the agent
Fix: add a pre-session step in daemon.sh that counts handoff files. If 7+ exist, auto-compact them before spawning the agent (concatenate, extract key info, write weekly summary, delete originals).

## Acceptance Criteria
- [ ] daemon.sh counts handoff files before each cycle
- [ ] Auto-compaction runs when 7+ numbered files exist
- [ ] Weekly summary written to docs/handoffs/weekly/
- [ ] Original files deleted after compaction

- daemon.sh counts handoff files before each cycle
- Auto-compaction runs when 7+ numbered files exist
- Weekly summary written to docs/handoffs/weekly/
- Original files deleted after compaction

6 changes: 3 additions & 3 deletions docs/tasks/0043.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
status: pending

## status: pending
priority: normal
target: v0.0.7
created: 2026-04-04
completed:
---

# Create CONTRIBUTING.md — agent-to-agent collaboration protocol

Expand All @@ -15,7 +15,6 @@ This repo is built by autonomous agents (the Nightshift daemons). When an extern
Two audiences, both agents:

1. **The resident agent** (Nightshift daemon) — needs to understand that an external contribution is happening, not get confused by unfamiliar branches/PRs, and know how to review and integrate external work without breaking its own workflow.

2. **The contributing agent** (someone else's Claude/Codex/etc.) — needs to understand the repo's conventions, quality gates, and workflow well enough to produce a PR that the resident agent can process without human intervention.

The human's role is oversight: they review, approve, merge. But the agents on both sides need to be able to do the actual work.
Expand Down Expand Up @@ -57,3 +56,4 @@ The agent building this decides the structure. Don't copy-paste from CLAUDE.md
- Structured for fast LLM parsing
- Does not duplicate CLAUDE.md content
- Reviewed and merged via standard PR workflow

19 changes: 19 additions & 0 deletions docs/tasks/0044.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
status: pending
priority: low
target: v0.0.8
created: 2026-04-04
completed:
---

# Add _ParsedHandoff TypedDict for compact.py internal type

Flagged by code review on PR #47.

`compact.py:_parse_handoff()` returns `dict[str, str]` with a fixed schema (session, date, version, built, decisions, known_issues, state). Project convention requires TypedDicts for all data structures. Since this is internal-only, severity is low.

## Acceptance Criteria
- [ ] `_ParsedHandoff` TypedDict defined (either in types.py or locally in compact.py)
- [ ] `_parse_handoff()` return type annotated as `_ParsedHandoff`
- [ ] `_build_weekly_summary()` parameter type updated to use `_ParsedHandoff`
- [ ] mypy passes
19 changes: 19 additions & 0 deletions docs/tasks/0045.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
status: pending
priority: low
target: v0.0.8
created: 2026-04-04
completed:
---

# Fix shell injection in cleanup_old_logs and cleanup_orphan_branches

Flagged by code review on PR #47 (pattern also exists in cleanup functions).

`lib-agent.sh` functions `cleanup_old_logs()` and `cleanup_orphan_branches()` interpolate shell variables directly into Python `-c` strings. While the values come from controlled sources (REPO_DIR), this is a code quality issue. The `compact_handoffs()` function was fixed to use heredoc with `sys.argv`, but the existing cleanup functions still use the old pattern.

## Acceptance Criteria
- [ ] `cleanup_old_logs()` uses heredoc with sys.argv instead of shell interpolation
- [ ] `cleanup_orphan_branches()` uses heredoc with sys.argv instead of shell interpolation
- [ ] Shell script syntax passes (`bash -n`)
- [ ] Daemon still works (test by running `make daemon` briefly)
6 changes: 6 additions & 0 deletions nightshift/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
summarize,
verify_cycle_cli,
)
from nightshift.compact import compact_handoffs
from nightshift.config import (
infer_install_command,
infer_lint_command,
Expand Down Expand Up @@ -41,6 +42,7 @@
FORBIDDEN_CYCLE_COMMANDS,
FRONTEND_DIR_NAMES,
FRONTEND_EXTENSIONS,
HANDOFF_COMPACTION_THRESHOLD,
INTEGRATOR_MAX_FIX_ATTEMPTS,
INTEGRATOR_TEST_TIMEOUT,
MODEL_PRICING,
Expand Down Expand Up @@ -155,6 +157,7 @@
ArchitectureDoc,
Baseline,
BranchPruneResult,
CompactionResult,
CostLedger,
Counters,
CycleEntry,
Expand Down Expand Up @@ -217,6 +220,7 @@
"FORBIDDEN_CYCLE_COMMANDS",
"FRONTEND_DIR_NAMES",
"FRONTEND_EXTENSIONS",
"HANDOFF_COMPACTION_THRESHOLD",
"INTEGRATOR_MAX_FIX_ATTEMPTS",
"INTEGRATOR_TEST_TIMEOUT",
"MODEL_PRICING",
Expand All @@ -234,6 +238,7 @@
"ArchitectureDoc",
"Baseline",
"BranchPruneResult",
"CompactionResult",
"CostLedger",
"Counters",
"CycleEntry",
Expand Down Expand Up @@ -280,6 +285,7 @@
"collect_wave_files",
"command_exists",
"command_for_agent",
"compact_handoffs",
"confirm_feature_build",
"decompose_plan",
"default_ledger_path",
Expand Down
Loading