Skip to content

server: add tool-safe directional steering policy#148

Open
audreyt wants to merge 2 commits into
antirez:mainfrom
audreyt:codex/tool-safe-steering
Open

server: add tool-safe directional steering policy#148
audreyt wants to merge 2 commits into
antirez:mainfrom
audreyt:codex/tool-safe-steering

Conversation

@audreyt
Copy link
Copy Markdown
Contributor

@audreyt audreyt commented May 14, 2026

Summary

This adds server-side directional steering policies for tool-aware deployments:

--dir-steering-policy final-answer   # default
--dir-steering-policy decoding
--dir-steering-policy always
--dir-steering-policy off

final-answer is now the default policy, following maintainer feedback on antirez/ds4#148. It keeps prompt prefill, thinking tokens, and DSML/tool-call grammar unsteered, then re-enables steering once generation has clearly entered final natural-language answer text.

decoding is the middle-ground policy requested in review: prompt/prefill is unsteered, but every generated token is steered, including thinking and tool-call syntax. always restores the previous always-on behavior, and off disables steering at the server policy layer.

Why

Directional steering is useful for behavior/style/topic control, but applying it while the model is emitting tool-call syntax can perturb DSML grammar, tool arguments, or Responses/Anthropic tool protocol structure.

For tool-using agents, the safer default is:

  • no steering during prefill
  • no steering during hidden thinking
  • no steering inside DSML/tool-call syntax
  • steering only for final visible prose

This lets deployments use steering for final-answer behavior without making tool calls less reliable.

Changes

  • Adds per-session directional steering overrides so the server can toggle steering scales dynamically.
  • Defaults ds4-server directional steering policy to final-answer.
  • Adds --dir-steering-policy decoding, which disables steering only during prompt/prefill.
  • Keeps always available for the original behavior and off for policy-layer disabling.
  • Tracks thinking state and DSML decode state during generation.
  • Avoids steering partial tool-call starts and tool-call bodies in final-answer mode.
  • Uses non-MTP eval while dynamic final-answer steering is active, so draft tokens do not cross steering-state boundaries.
  • Documents the policies in README.md and dir-steering/README.md.
  • Adds server unit coverage for default policy selection, decoding, and final-answer/tool-safe behavior.

Compatibility

The server default changes from always to final-answer. Existing deployments that want exact previous behavior can pass:

--dir-steering-policy always

The core steering scales and file format are unchanged.

Testing

make ds4_test
./ds4_test --server
make ds4-server
./ds4-server --help | rg -n "dir-steering-policy|final-answer|decoding|always|off"
DS4_TEST_MODEL=/Users/au/w/ds4/gguf/DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf make test

The first full-suite attempt used my local abliterated/aligned ds4flash.gguf symlink and failed the official logprob-vector fixture, as expected for a different GGUF. Re-running with the non-abliterated imatrix GGUF used for the official vectors passes.

@antirez
Copy link
Copy Markdown
Owner

antirez commented May 14, 2026

That's brilliant! Thank you so much. Going to merge ASAP.

@antirez
Copy link
Copy Markdown
Owner

antirez commented May 14, 2026

@audreyt since I guess you already tested it, are we sure we don't want the final policy to be the default?

@antirez
Copy link
Copy Markdown
Owner

antirez commented May 14, 2026

Also, what about an additional decoding policy that disables steering only during prefill? Sorry for the many comments.

@audreyt audreyt marked this pull request as ready for review May 14, 2026 22:00
@audreyt audreyt force-pushed the codex/tool-safe-steering branch from ccc093e to 7f966fb Compare May 14, 2026 22:07
@audreyt
Copy link
Copy Markdown
Contributor Author

audreyt commented May 14, 2026

Also, what about an additional decoding policy that disables steering only during prefill? Sorry for the many comments.

Done, thank you for the nudge!

I made final-answer the default, and added the decoding policy too:

  • final-answer now defaults to no steering during prefill/thinking/tool-call grammar, then steering for final visible prose.
  • decoding disables steering only during prefill, then steers all generated tokens.
  • always is still available for the previous behavior.
  • off remains the policy-layer disable switch.

I also rebased onto current main, marked the PR ready for review, and updated the PR description with the compatibility note and test results. It's now working beautifully in my OpenClaw instance (@jdd-kami).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants