Skip to content

DesiredPathConsulting/clear-copy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clear-copy

A prose linter, not a humanizer.

Every commercial alternative in this space (Undetectable, Humbot, StealthGPT, WriteHuman) and every public Claude Code skill (blader/humanizer, Aboudjem, voice-humanizer, conorbronsdon) is a rewriter. You paste in your prose, you get back rewritten prose. The writer learns nothing and pays per word forever.

This tool scores prose against a calibrated reference register, flags AI-tells and mechanical issues, and leaves the rewrite to the writer. The writer learns the calibration.

What it checks

For each draft, the tool computes:

  • Sentence-length burstiness (standard deviation of sentence length, sometimes called the coefficient of variation in detection literature). The load-bearing engagement metric and the load-bearing detection signal.
  • Mean sentence length with register-specific bounds.
  • Flesch reading ease with register-specific floors.
  • Abstract-noun ratio per 100 words.
  • Paragraph-length variance.
  • Hedge phrases from a curated blocklist ("In today's X landscape," "When it comes to," "At the end of the day," etc.).
  • AI vocabulary tells (delve, tapestry, pivotal, leverage as verb, etc.) plus copula avoidance, vague attributions, knowledge-cutoff disclaimers, persuasive-authority tropes, signposting, and curly quotes.
  • Mechanical flags: long sentences (>35 words), prepositional chains, repeated-shape runs, opener variance.

Output is a transparent stats block with a 0-100 score and a per-rule deduction breakdown. The full scoring formula is in skills/clear-copy/references/stats-compute.md. No black box.

Why register matters

Burstiness-based detection has been criticized (Pangram's own essay, 2024; multiple false-positive studies showing 15-45% FPR on formal registers and non-native English). Bar-journal prose runs long, citation-dense, and low-Flesch by convention; landing-page prose runs short, concrete, and high-Flesch. Scoring both against one threshold produces noise.

This tool ships two registers out of the box:

  • legal — bar-journal and legal-analysis prose. Tolerates mean sentences up to 22 words, Flesch as low as 30, dense abstract-noun usage.
  • marketing — landing-page, founder-letter, and consultant prose. Caps mean sentences at 18, expects Flesch above 40, low abstract-noun load.

Add a third register (operator voice, technical documentation, conversational newsletter, etc.) by following skills/clear-copy/references/corpus-construction.md. Copy a threshold block, retune against samples representative of your target voice, register the entry.

Install

git clone https://github.com/DesiredPathConsulting/clear-copy.git
cd clear-copy
# No dependencies. Node 20+ recommended.
node tools/prose-check.mjs --register=marketing path/to/draft.md

To wire into a project's CI, copy tools/prose-check/ and tools/prose-check.mjs into the target project's scripts/ directory. The ci-enforcement.md runbook covers the rest.

To use with Claude Code, copy skills/clear-copy/ and skills/clear-copy-marketing/ into your ~/.claude/skills/ directory. Both skills auto-trigger on relevant prose-editing tasks.

Quick example

$ node tools/prose-check.mjs --register=marketing example.md

=== WARN [score: 64] example.md ===
Stats for this draft:
- Score: 64 / 100  (WARN)
- Sentence-length stdev / burstiness: 4.8  (somewhat uniform)
- Mean sentence length: 19.2 words  (borderline; target <= 18)
- Flesch reading ease: 51
- Abstract-noun ratio: 5.2 / 100 words  (target <= 4)

Mechanical flags (-13 pts):
- 2 long sentences (>35 words)
- 3 prepositional chains
- Opener variance: 3 of first 5 sentences are DET+NOUN

Vocab and phrasing (-8 pts):
- 8 vocab/phrase AI tells:
  - vocab (6): "pivotal", "leverage", "robust", "seamless", "elevate", "unlock"
  - copula (2): "serves as a", "stands as a"

The output tells the writer what to fix. The writer does the fixing.

CLI modes

Flag Use
Default Full stats block per file. Right for one or two files.
--triage One line per file, score + top 3 deductions, sorted worst-first. Right for batch reviews of 5+ files.
--diff Per-file offending spans only. Right when fixing issues a triage pass surfaced.
--html Score built HTML in dist/ instead of markdown source. Right for .astro / .tsx projects authored with prose in component props.
--register=legal|marketing Pick a register. Default legal.
WARN_ONLY=1 env Cap exit code at 1 during calibration soak.
INCLUDE_CORPUS=1 env Append corpus comparison sidebar (requires a corpus; see corpus-construction.md).

Prior art and how this differs

Each individual metric has documented prior art. The contribution is the combination: deterministic CLI + AI-tell lexicon + corpus-calibrated thresholds + transparent scoring formula, in one pass.

  • Wikipedia's "Signs of AI writing" (May 2026, WikiProject AI Cleanup): catalogs vocabulary tells and stylistic patterns. Does not cover sentence-length variance, perplexity, or burstiness. This tool adds the statistical layer.
  • blader/humanizer, Aboudjem/humanizer-skill, conorbronsdon/avoid-ai-writing, dannwaneri/voice-humanizer: prompt-based Claude Code skills covering the lexical axis. None ship a deterministic CLI with reproducible numbers. The Aboudjem skill emits a 0-100 score with undisclosed formula. This tool publishes the formula.
  • proselint (Suchow and Griffiths, 2016): rule-based style linter using expert-codified rules with fixed thresholds. No corpus calibration, no AI-tell coverage. This tool adds both.
  • GPTZero, Pangram, Copyleaks, DetectGPT, Binoculars: detectors targeting the same signals (perplexity, burstiness, n-gram repetition). Closed; flag rather than explain at writer-actionable granularity. This tool exposes the signals in writer-facing output.
  • Stylometric corpus comparison (Burrows's Delta, 2002, and the genre-aware variants): score a candidate text against a reference distribution. This tool inverts the typical detection use case: a domain corpus as a positive style target for a human writer, not as detector training data. Closest published analog; the inversion is the contribution.

Methodology, not magic

Everything that produces a number in this tool is in the source. Open tools/prose-check/decision.mjs to see the deduction logic. Open tools/prose-check/thresholds.mjs to see the register cutoffs. Open tools/prose-check/aivocab.mjs to see the AI-vocabulary blocklist and add to it.

The vocabulary list will drift as AI writing drifts. The mechanical rules are stable. The thresholds need retuning every six months. The corpus needs rebuilding every six to twelve. None of this is automated. That is the methodology.

Contributing

See CONTRIBUTING.md. The most valuable contributions are:

  1. New AI-tell entries for the vocabulary blocklist, backed by recent examples.
  2. New registers (with the threshold-calibration evidence from corpus-construction.md).
  3. False-positive reports with a representative passage and the score it incorrectly produced. The skill is calibrated against a corpus; if your register isn't covered, the right fix is usually a new register, not a rule change.

License

Apache-2.0. See LICENSE.

Background

Originated in the legal-ai-governance project and forked here for community use. The methodology section is the why-this-exists; the contributing guide is how to extend it.

If this saved you time, GitHub Sponsors is appreciated but never expected.

About

A prose linter, not a humanizer. Corpus-calibrated, register-aware AI-tell detection with a transparent scoring formula. Ships with Claude Code skills.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors