Release Clause-Driven-LLM v0.1.0 · AnwarDebes/Clause-Driven-LLM

The control layer in front of an LLM agent (which expert to route to, whether to answer at all) is usually an opaque neural classifier. This release makes it a transparent one: a Coalesced Tsetlin Machine learns a propositional policy over frozen LLM embeddings and drives the agent, with a signed, SAT-checkable receipt behind every decision and clauses an operator can read and edit by hand.

Headline

CLINC150: 94.2% routing accuracy across 150 intents, 1.5 points behind a logistic regression on the same frozen embedding, and 94.0% out-of-scope-escalation AUROC straight from the clause vote margin (no extra calibration head).
Banking77: 92.1% across 77 fine-grained intents.
Receipts: every decision verifies in 5.6 ms at the median (HMAC signature plus a Glucose 4 SAT replay), 100% verified over 200 decisions.
Editing: a single added clause lifts a targeted intent's held-out recall (for example play_music from 80% to 90%) at no measurable cost to global accuracy.
Why the LLM features matter: a dependency-light lexical token-presence backend reaches only 82.8% on CLINC150.

What is inside

The cdllm package with GPU (torch) and CPU (numba) training backends, the controller, the human-editable policy, and the SAT-receipt machinery.
The full five-seed experiment pipeline, opaque baselines (logistic regression, linear SVM, k-NN), the negative-sampling and feature ablations, six figures, 30 passing tests, and the compiled paper at paper/report.pdf.

Honest limitations

The clause policy trails a neural router by one to two accuracy points. It trades that for an auditable and editable policy, not for a higher number.
A receipt certifies the recorded decision, not the model's full Boolean function; bounded-perturbation robustness verification is the natural next step.
English only, and the embeddings inherit the encoder's biases.
The out-of-scope study uses the CLINC pool; an open-world deployment needs a larger and more adversarial escalation set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clause-Driven-LLM v0.1.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Headline

What is inside

Honest limitations

Uh oh!