server: add tool-safe directional steering policy by audreyt · Pull Request #148 · antirez/ds4

audreyt · 2026-05-14T21:26:47Z

Summary

This adds server-side directional steering policies for tool-aware deployments:

--dir-steering-policy final-answer   # default
--dir-steering-policy decoding
--dir-steering-policy always
--dir-steering-policy off

final-answer is now the default policy, following maintainer feedback on antirez/ds4#148. It keeps prompt prefill, thinking tokens, and DSML/tool-call grammar unsteered, then re-enables steering once generation has clearly entered final natural-language answer text.

decoding is the middle-ground policy requested in review: prompt/prefill is unsteered, but every generated token is steered, including thinking and tool-call syntax. always restores the previous always-on behavior, and off disables steering at the server policy layer.

Why

Directional steering is useful for behavior/style/topic control, but applying it while the model is emitting tool-call syntax can perturb DSML grammar, tool arguments, or Responses/Anthropic tool protocol structure.

For tool-using agents, the safer default is:

no steering during prefill
no steering during hidden thinking
no steering inside DSML/tool-call syntax
steering only for final visible prose

This lets deployments use steering for final-answer behavior without making tool calls less reliable.

Changes

Adds per-session directional steering overrides so the server can toggle steering scales dynamically.
Defaults ds4-server directional steering policy to final-answer.
Adds --dir-steering-policy decoding, which disables steering only during prompt/prefill.
Keeps always available for the original behavior and off for policy-layer disabling.
Tracks thinking state and DSML decode state during generation.
Avoids steering partial tool-call starts and tool-call bodies in final-answer mode.
Uses non-MTP eval while dynamic final-answer steering is active, so draft tokens do not cross steering-state boundaries.
Documents the policies in README.md and dir-steering/README.md.
Adds server unit coverage for default policy selection, decoding, and final-answer/tool-safe behavior.

Compatibility

The server default changes from always to final-answer. Existing deployments that want exact previous behavior can pass:

--dir-steering-policy always

The core steering scales and file format are unchanged.

Testing

make ds4_test
./ds4_test --server
make ds4-server
./ds4-server --help | rg -n "dir-steering-policy|final-answer|decoding|always|off"
DS4_TEST_MODEL=/Users/au/w/ds4/gguf/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf make test

The first full-suite attempt used my local abliterated/aligned ds4flash.gguf symlink and failed the official logprob-vector fixture, as expected for a different GGUF. Re-running with the non-abliterated imatrix GGUF used for the official vectors passes.

antirez · 2026-05-14T21:32:50Z

That's brilliant! Thank you so much. Going to merge ASAP.

antirez · 2026-05-14T21:34:27Z

@audreyt since I guess you already tested it, are we sure we don't want the final policy to be the default?

antirez · 2026-05-14T21:37:37Z

Also, what about an additional decoding policy that disables steering only during prefill? Sorry for the many comments.

audreyt · 2026-05-14T22:25:16Z

Also, what about an additional decoding policy that disables steering only during prefill? Sorry for the many comments.

Done, thank you for the nudge!

I made final-answer the default, and added the decoding policy too:

final-answer now defaults to no steering during prefill/thinking/tool-call grammar, then steering for final visible prose.
decoding disables steering only during prefill, then steers all generated tokens.
always is still available for the previous behavior.
off remains the policy-layer disable switch.

I also rebased onto current main, marked the PR ready for review, and updated the PR description with the compatibility note and test results. It's now working beautifully in my OpenClaw instance (@jdd-kami).

audreyt marked this pull request as ready for review May 14, 2026 22:00

audreyt added 2 commits May 14, 2026 18:00

server: make directional steering tool-safe

b7c2305

server: default steering to final-answer policy

7f966fb

audreyt force-pushed the codex/tool-safe-steering branch from ccc093e to 7f966fb Compare May 14, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add tool-safe directional steering policy#148

server: add tool-safe directional steering policy#148
audreyt wants to merge 2 commits into
antirez:mainfrom
audreyt:codex/tool-safe-steering

audreyt commented May 14, 2026 •

edited

Loading

Uh oh!

antirez commented May 14, 2026

Uh oh!

antirez commented May 14, 2026

Uh oh!

antirez commented May 14, 2026

Uh oh!

audreyt commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

audreyt commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Compatibility

Testing

Uh oh!

antirez commented May 14, 2026

Uh oh!

antirez commented May 14, 2026

Uh oh!

antirez commented May 14, 2026

Uh oh!

audreyt commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

audreyt commented May 14, 2026 •

edited

Loading