Skip to content

Remove Ollama/LLM text cleanup, keep deterministic cleanup#1

Merged
arankine9 merged 2 commits into
mainfrom
claude/remove-text-cleaning-KWTXe
Jun 1, 2026
Merged

Remove Ollama/LLM text cleanup, keep deterministic cleanup#1
arankine9 merged 2 commits into
mainfrom
claude/remove-text-cleaning-KWTXe

Conversation

@arankine9
Copy link
Copy Markdown
Owner

What & why

Wave's cleanup was actually two stacked layers:

  1. DeterministicCleanup — a pure-Swift pass (disfluency removal, spoken-symbol substitution for code, identifier reassembly, spacing/casing) that runs on every dictation with no network.
  2. Ollama/LLM polish — an optional stage layered on top that required a local Ollama install.

Since Ollama was never set up, every dictation already ran on the deterministic pass alone. This PR rips out the entire LLM/Ollama layer and all of its UI surfaces, while keeping the deterministic pass running silently — so pasted output is unchanged.

Removed

  • Core: OllamaClient, OllamaHealth/OllamaHealthProbe, CleanupClient, CleanupPipeline, CleanupResult, CleanupError, IdentityCache, SystemPrompt.
  • Prefs/env: CleanupMode, cleanupModel, ollamaURL, and the WAVE_CLEANUP_MODEL / WAVE_OLLAMA_URL / WAVE_CLEANUP_MODE env vars.
  • Dead gate: SkipGate.shouldSkipCleanup (the LLM-skip gate) and AppPaths.identityCacheURL. SkipGate.looksLikeCodeDictation is kept — it still routes prose vs. code in DeterministicCleanup.
  • UI: Settings → Cleanup section (mode picker, Ollama model/URL fields, health probe); History panel "Show original" before/after toggle and the Cleaned/Raw badge; onboarding "cleaned text" page reworded to plain paste.
  • Tooling/tests: scripts/judge.sh, the cleanup-pairs / identifier-spelling / hallucination-audit fixtures, and the LLM-only tests (CleanupPipelineTests, CleanupModeTests, OllamaHealthProbeTests, SystemPromptTests, TokenBudgetBenchmarkTests, HallucinationFixtureTests, IdentityCacheTests, SkipGateTests, FixtureGateTests).

Changed

  • DictationOrchestrator calls DeterministicCleanup.transform directly (the .cleaning status and timing are preserved).
  • DictationTrace / HistoryLine drop path / inputTokens / outputTokens. Existing history.jsonl still decodes — JSONDecoder ignores the extra keys.
  • Remaining tests rewritten to drive deterministic cleanup with no stub LLM client.
  • README / CHANGELOG / SHIPPING / TODO and the install/bench scripts updated to drop Ollama setup and the LLM gates.

Notes / judgment calls

  • Output is unchanged for plain prose and spoken code — the deterministic pass was already doing all the work.
  • Onboarding step kept, not deleted: I reframed the paste step (removed the "Ollama cleans your transcript" copy) rather than removing the whole page, since it still usefully teaches the core "your words land in the focused app" behavior. Happy to drop the step entirely if you'd prefer.
  • rawTranscript retained in the history log (now hidden in the UI) — it's invisible to users but powers the HistoryGoldenTests replay tool.
  • Historical entries in CHANGELOG.md / TODO.md describing the old LLM build are left as a record, with a note pointing at the removal.

⚠️ I couldn't compile here (Linux container; this is a macOS/AppKit project). Edits were made against a full read of the code and verified with a repo-wide symbol sweep, but CI (swift test on macOS) is the real check.


Generated by Claude Code

claude and others added 2 commits June 1, 2026 22:20
Wave's cleanup had two layers: a pure-Swift DeterministicCleanup pass
(disfluency removal, spoken-symbol substitution for code, identifier
reassembly, spacing/casing) and an optional Ollama/LLM polish stage on
top. The LLM stage required a local Ollama install that was never set up,
so in practice every dictation already ran on the deterministic pass
alone. This removes the entire LLM/Ollama layer and all of its UI while
keeping the deterministic pass running silently — pasted output is
unchanged.

Removed:
- OllamaClient, OllamaHealth/OllamaHealthProbe, CleanupClient,
  CleanupPipeline, CleanupResult, CleanupError, IdentityCache,
  SystemPrompt.
- CleanupMode preference + the cleanupModel/ollamaURL prefs and the
  WAVE_CLEANUP_MODEL / WAVE_OLLAMA_URL / WAVE_CLEANUP_MODE env vars.
- SkipGate.shouldSkipCleanup (LLM-skip gate) and AppPaths.identityCacheURL.
- Settings -> Cleanup section (mode picker, Ollama model/URL, health probe).
- History before/after "Show original" toggle and the Cleaned/Raw badge.
- Onboarding "cleaned text" framing reworded to plain paste.
- scripts/judge.sh, the cleanup-pairs / identifier-spelling /
  hallucination-audit fixtures, and the LLM-only tests.

Changed:
- DictationOrchestrator now calls DeterministicCleanup.transform directly.
- DictationTrace / HistoryLine drop path/inputTokens/outputTokens
  (existing history.jsonl still decodes; extra keys are ignored).
- SkipGate keeps looksLikeCodeDictation (used to route prose vs. code).
- Docs/scripts updated to drop Ollama setup and the LLM gates.
…eaning-KWTXe

# Conflicts:
#	Tests/WaveCoreTests/CleanupModeTests.swift
#	Tests/WaveCoreTests/CleanupPipelineTests.swift
#	Tests/WaveCoreTests/EndToEndIntegrationTests.swift
#	Tests/WaveCoreTests/FixtureGateTests.swift
#	Tests/WaveCoreTests/HallucinationFixtureTests.swift
#	Tests/WaveCoreTests/IdentityCacheTests.swift
#	Tests/WaveCoreTests/LatencyBudgetBenchmarkTests.swift
#	Tests/WaveCoreTests/OllamaHealthProbeTests.swift
#	Tests/WaveCoreTests/SkipGateTests.swift
#	Tests/WaveCoreTests/SystemPromptTests.swift
#	Tests/WaveCoreTests/TokenBudgetBenchmarkTests.swift
@arankine9 arankine9 merged commit f7b2c2b into main Jun 1, 2026
1 check passed
@arankine9 arankine9 deleted the claude/remove-text-cleaning-KWTXe branch June 1, 2026 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants