Remove Ollama/LLM text cleanup, keep deterministic cleanup by arankine9 · Pull Request #1 · arankine9/Wave

arankine9 · 2026-06-01T22:21:07Z

What & why

Wave's cleanup was actually two stacked layers:

DeterministicCleanup — a pure-Swift pass (disfluency removal, spoken-symbol substitution for code, identifier reassembly, spacing/casing) that runs on every dictation with no network.
Ollama/LLM polish — an optional stage layered on top that required a local Ollama install.

Since Ollama was never set up, every dictation already ran on the deterministic pass alone. This PR rips out the entire LLM/Ollama layer and all of its UI surfaces, while keeping the deterministic pass running silently — so pasted output is unchanged.

Removed

Core: OllamaClient, OllamaHealth/OllamaHealthProbe, CleanupClient, CleanupPipeline, CleanupResult, CleanupError, IdentityCache, SystemPrompt.
Prefs/env: CleanupMode, cleanupModel, ollamaURL, and the WAVE_CLEANUP_MODEL / WAVE_OLLAMA_URL / WAVE_CLEANUP_MODE env vars.
Dead gate: SkipGate.shouldSkipCleanup (the LLM-skip gate) and AppPaths.identityCacheURL. SkipGate.looksLikeCodeDictation is kept — it still routes prose vs. code in DeterministicCleanup.
UI: Settings → Cleanup section (mode picker, Ollama model/URL fields, health probe); History panel "Show original" before/after toggle and the Cleaned/Raw badge; onboarding "cleaned text" page reworded to plain paste.
Tooling/tests: scripts/judge.sh, the cleanup-pairs / identifier-spelling / hallucination-audit fixtures, and the LLM-only tests (CleanupPipelineTests, CleanupModeTests, OllamaHealthProbeTests, SystemPromptTests, TokenBudgetBenchmarkTests, HallucinationFixtureTests, IdentityCacheTests, SkipGateTests, FixtureGateTests).

Changed

DictationOrchestrator calls DeterministicCleanup.transform directly (the .cleaning status and timing are preserved).
DictationTrace / HistoryLine drop path / inputTokens / outputTokens. Existing history.jsonl still decodes — JSONDecoder ignores the extra keys.
Remaining tests rewritten to drive deterministic cleanup with no stub LLM client.
README / CHANGELOG / SHIPPING / TODO and the install/bench scripts updated to drop Ollama setup and the LLM gates.

Notes / judgment calls

Output is unchanged for plain prose and spoken code — the deterministic pass was already doing all the work.
Onboarding step kept, not deleted: I reframed the paste step (removed the "Ollama cleans your transcript" copy) rather than removing the whole page, since it still usefully teaches the core "your words land in the focused app" behavior. Happy to drop the step entirely if you'd prefer.
rawTranscript retained in the history log (now hidden in the UI) — it's invisible to users but powers the HistoryGoldenTests replay tool.
Historical entries in CHANGELOG.md / TODO.md describing the old LLM build are left as a record, with a note pointing at the removal.

⚠️ I couldn't compile here (Linux container; this is a macOS/AppKit project). Edits were made against a full read of the code and verified with a repo-wide symbol sweep, but CI (swift test on macOS) is the real check.

Generated by Claude Code

Wave's cleanup had two layers: a pure-Swift DeterministicCleanup pass (disfluency removal, spoken-symbol substitution for code, identifier reassembly, spacing/casing) and an optional Ollama/LLM polish stage on top. The LLM stage required a local Ollama install that was never set up, so in practice every dictation already ran on the deterministic pass alone. This removes the entire LLM/Ollama layer and all of its UI while keeping the deterministic pass running silently — pasted output is unchanged. Removed: - OllamaClient, OllamaHealth/OllamaHealthProbe, CleanupClient, CleanupPipeline, CleanupResult, CleanupError, IdentityCache, SystemPrompt. - CleanupMode preference + the cleanupModel/ollamaURL prefs and the WAVE_CLEANUP_MODEL / WAVE_OLLAMA_URL / WAVE_CLEANUP_MODE env vars. - SkipGate.shouldSkipCleanup (LLM-skip gate) and AppPaths.identityCacheURL. - Settings -> Cleanup section (mode picker, Ollama model/URL, health probe). - History before/after "Show original" toggle and the Cleaned/Raw badge. - Onboarding "cleaned text" framing reworded to plain paste. - scripts/judge.sh, the cleanup-pairs / identifier-spelling / hallucination-audit fixtures, and the LLM-only tests. Changed: - DictationOrchestrator now calls DeterministicCleanup.transform directly. - DictationTrace / HistoryLine drop path/inputTokens/outputTokens (existing history.jsonl still decodes; extra keys are ignored). - SkipGate keeps looksLikeCodeDictation (used to route prose vs. code). - Docs/scripts updated to drop Ollama setup and the LLM gates.

…eaning-KWTXe # Conflicts: # Tests/WaveCoreTests/CleanupModeTests.swift # Tests/WaveCoreTests/CleanupPipelineTests.swift # Tests/WaveCoreTests/EndToEndIntegrationTests.swift # Tests/WaveCoreTests/FixtureGateTests.swift # Tests/WaveCoreTests/HallucinationFixtureTests.swift # Tests/WaveCoreTests/IdentityCacheTests.swift # Tests/WaveCoreTests/LatencyBudgetBenchmarkTests.swift # Tests/WaveCoreTests/OllamaHealthProbeTests.swift # Tests/WaveCoreTests/SkipGateTests.swift # Tests/WaveCoreTests/SystemPromptTests.swift # Tests/WaveCoreTests/TokenBudgetBenchmarkTests.swift

claude and others added 2 commits June 1, 2026 22:20

arankine9 merged commit f7b2c2b into main Jun 1, 2026
1 check passed

arankine9 deleted the claude/remove-text-cleaning-KWTXe branch June 1, 2026 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Ollama/LLM text cleanup, keep deterministic cleanup#1

Remove Ollama/LLM text cleanup, keep deterministic cleanup#1
arankine9 merged 2 commits into
mainfrom
claude/remove-text-cleaning-KWTXe

arankine9 commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arankine9 commented Jun 1, 2026

What & why

Removed

Changed

Notes / judgment calls

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants