Skip to content

Revert "Upgrade LiteLLM to 1.84.1 (#709)"#729

Draft
juanmichelini wants to merge 1 commit into
mainfrom
revert-litellm-1.84.1-709
Draft

Revert "Upgrade LiteLLM to 1.84.1 (#709)"#729
juanmichelini wants to merge 1 commit into
mainfrom
revert-litellm-1.84.1-709

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

Reverts #709 (commit 27cb0cf0).

Why

The litellm 1.84.1 upgrade correlates with a regression for litellm_proxy/anthropic/claude-opus-4-8. Every conversation against that model now fails with:

litellm.UnsupportedParamsError: anthropic does not support parameters:
['reasoning_effort', 'thinking'], for model=claude-opus-4-8.
To drop these, set `litellm.drop_params=True` or for proxy:
    litellm_settings:
      drop_params: true
…
LiteLLM Retried: 3 times

This bubbles up as code: LLMBadRequestError on the conversation, the worker logs Remote conversation ended with error, the instance is retried 4×, all 4 fail, and the result lands in output_errors.jsonl with patch_len=0. Because get_default_on_result_writer routes errored results away from output.jsonl, the eval-monitor dashboard shows 0% critic acceptance even though the critic never actually evaluated anything.

Evidence

Same SDK commit (45697f5) and identical LLM payload (drop_params: true, reasoning_effort: high, extended_thinking_budget: 200000) on both sides of the boundary:

Time (UTC) Eval run Result
2026‑05‑28 23:38–23:48 smoke 26608744731 ✅ 1/1, FinishAction + non-empty patch
2026‑05‑29 02:14 → … full 26609410334 ❌ 380/380 errored (output.jsonl=0)
2026‑05‑29 04:49 → 05:11 smoke 26618647016 ❌ 1/1, UnsupportedParamsError × 4 attempts

The failing conversation event (conversations/scikit-learn__scikit-learn-13439/events/event-00005-*.json in 26618647016/results.tar.gz) contains the proxy's LLMBadRequestError verbatim.

Caveats

  • The flag the error suggests (drop_params: true) is already set on the SDK side. With litellm_proxy/... as the provider, the SDK forwards reasoning_effort / thinking to the proxy as OpenAI params, and the proxy's litellm is the one rejecting them. So a complete fix also needs litellm_settings.drop_params: true (or an equivalent per-model config) on llm-proxy.eval.all-hands.dev.
  • This revert restores the previously pinned litellm in the benchmarks inference runtime so client + proxy are once again on matched behavior. It does not address whatever changed inside the proxy itself.

Suggested follow-ups

  1. Pin/configure the proxy with drop_params: true (or register a model entry for anthropic/claude-opus-4-8 that explicitly accepts thinking / reasoning_effort).
  2. Re-attempt the litellm 1.84.1 bump once the proxy is updated.
  3. Eval-monitor UX: when output.jsonl == 0 and output_errors.jsonl > 0, surface "all completed instances errored" instead of "0% critic acceptance" so this failure mode isn't mistaken for a critic problem.

This PR was opened by an AI agent (OpenHands) on behalf of @juanmichelini.

@juanmichelini can click here to continue refining the PR

This reverts commit 27cb0cf.

The litellm 1.84.1 upgrade correlates with a regression for
`litellm_proxy/anthropic/claude-opus-4-8`: every conversation now
fails with

    litellm.UnsupportedParamsError: anthropic does not support
    parameters: ['reasoning_effort', 'thinking'], for model=claude-opus-4-8

Smoke run 26608744731 (before the upgrade reached the proxy) succeeded
1/1; smoke run 26618647016 and full run 26609410334 (after) errored on
every instance, leaving `output.jsonl=0` and all completed instances
in `output_errors.jsonl`.

Reverting in benchmarks restores the prior pinned litellm in the
inference runtime; the proxy will also need to be brought back in line
(or have `litellm_settings.drop_params: true` configured) for the fix
to take effect end-to-end.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file investigation priority:high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants