Skip to content

Conversation

@corbt
Copy link
Contributor

@corbt corbt commented Jan 23, 2026

Summary

  • Add a separate TinkerNativeBackend that implements native Tinker loss/checkpoint flow with renderer-based data conversion and an in-process OpenAI-compatible server.
  • Define a new tinker dependency group (fastapi/uvicorn/tinker/tinker-cookbook) and conditionally export Tinker backends to keep base installs light.
  • Align LocalBackend model base_path with backend path and add an integration test for the Tinker native flow.

This is an alternative to #523. That PR tried to combine native Tinker behavior with ART-native behavior in a single backend, which made the code too complex to reason about. Here we split it into a separate native backend. Downside: args are less compatible with LocalBackend. Upside: we can take advantage of Tinker's functionality more explicitly.

Experimental: not yet tested with multi-turn rollouts or tool-calls, but the yes-no-maybe flow converges (avg reward ~0.955 by step 4).

Test plan

  • uv run pytest tests/integration/test_tinker_native_backend.py -v -s
  • Manual yes-no-maybe style loop (16 rollouts/prompt, converged by step 4).

Cursor Bot added 3 commits January 23, 2026 07:57
Separate native Tinker training/inference from LocalBackend to keep the API
clear while enabling explicit loss/checkpoint behavior and config.
Align tinker native types with OpenAI tooling and update tests to avoid
invalid type expressions under pyright.
Use merge_state for backend persistence to avoid clobbering model state, and
fail fast on trajectories without Choice objects to prevent no-op training.
Expose policy version fields on trajectories for off-policy tracking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants