Add extra_body to custom providers + multi-batch label_communities for 16k-context models by EirikWolf · Pull Request #1197 · Graphify-Labs/graphify

EirikWolf · 2026-06-08T16:58:30Z

Summary

Two related fixes that together enable self-hosted reasoning models (Qwen3, Llama 3.1 8B-Instruct, etc.) to drive graphify label.

cfg.get(\"extra_body\") from providers.json reaches the OpenAI-compatible client — at both the extraction (_call_openai_compat via extract_files_direct) and the labeling (_call_llm) code paths. Without this, a custom provider pointing at a vLLM endpoint serving Qwen3 had no way to set chat_template_kwargs.enable_thinking=false. The model then emits a chain-of-thought preamble instead of the JSON the parser expects, and the call fails. An explicit extra_body also bypasses the ollama num_ctx auto-derive — a custom provider that opts in is declaring it owns the request shape.
label_communities batches communities in chunks of 100 (configurable via batch_size=) instead of a single call hard-capped at 200. The old 200-cap × _LABEL_TOP_K=12 sampled node labels routinely overflowed the 16k context window of self-hosted reasoning models, dropping the entire pass to placeholders even on graphs with only a couple hundred communities. The default max_communities is now None (label every community); explicit integer caps still work for back-compat. Partial batch failures no longer kill the whole pass — successful batches still contribute real labels, only the failed batch's cids stay as Community N.

Why one PR for two changes: they're separable but neither alone unlocks the self-hosted-Qwen3 workflow that motivated them, and a reviewer who pulls this down to verify will want both in one shot. Squash-merging the two as a single feature also keeps the changelog readable.

Tested

Locally against 5 repos (200–525 communities each) using vLLM serving Qwen3.6-27B INT4-AutoRound on a 24 GB RTX 3090. Label coverage went from 0–44% (when the call returned at all) to 99.8% after both changes.

CI suite (uv run pytest) — added 7 new tests:

tests/test_labeling.py:

test_label_communities_batches_when_over_batch_size — 250 communities ÷ batch_size 100 → 3 calls of (100, 100, 50), all real names
test_label_communities_partial_batch_failure_keeps_successful_batches — middle batch raises; surviving batches still produce real names
test_label_communities_all_batches_fail_raises — propagates so generate_community_labels can degrade
test_label_communities_max_communities_caps_total — explicit max_communities=40 still caps total cids sent

tests/test_llm_backends.py:

test_call_openai_compat_uses_explicit_extra_body
test_call_openai_compat_extra_body_wins_over_moonshot_default
test_call_openai_compat_explicit_extra_body_skips_ollama_auto_derive

Existing labeling and backend tests are untouched and continue to pass — back-compat verified.

Notes for reviewers

_LABEL_MAX_COMMUNITIES = 200 is kept as a module-level constant (now reframed as a legacy soft-cap for callers that pin it explicitly). The default behavior change is to label every community across N batches rather than truncating at 200 in a single call.
Default max_communities flipped from 200 → None. Callers passing nothing previously got the first 200 communities labeled in one shot; they now get all communities labeled across however many batches it takes. The old behavior is recoverable by passing max_communities=200 explicitly.
Partial-batch error policy: errors print to stderr (matches existing generate_community_labels warning style) and the loop continues. Only if every batch fails do we re-raise — keeps the "raises on backend failure" contract intact for the no-labels-written case.
No new dependencies, no schema changes, no skill regeneration needed.

Test plan

uv run pytest tests/test_labeling.py tests/test_llm_backends.py — 48 pass
uv run ruff check on touched files — clean
Full uv run pytest — all green except a pre-existing test_cpp_preprocess_passes_absolute_path failure on Windows (the test asserts the path starts with /, fails for C:\...; unrelated to this PR — same failure on stock upstream/v8)
End-to-end against vLLM Qwen3.6-27B: 1 694 / 1 697 = 99.8% real LLM-generated names across 5 repos

Two related fixes that together unlock self-hosted reasoning models (Qwen3, Llama 3.1 8B-Instruct, etc.) for the `graphify label` workflow: 1. `cfg.get("extra_body")` from `providers.json` is now propagated to the OpenAI-compatible client at both the extraction code path (`_call_openai_compat` via `extract_files_direct`) and the labeling code path (`_call_llm`). Without this, a custom provider pointing at a vLLM endpoint serving Qwen3 has no way to set `chat_template_kwargs.enable_thinking=false`, so the model emits a chain-of-thought preamble instead of the JSON the parser expects and the whole call rejects. An explicit `extra_body` also bypasses the ollama `num_ctx` auto-derive — a provider that opts in knows its own request shape. 2. `label_communities` now batches communities in chunks of 100 (configurable via `batch_size=`) instead of a single call hard-capped at 200. The 200-cap × 12 sampled node labels routinely overflowed the 16k context window of self-hosted reasoning models, dropping the entire pass to placeholders even on small graphs. Default `max_communities` is now `None` (label every community); explicit integer caps still work for back-compat. Partial batch failures no longer kill the whole pass — successful batches still contribute real labels, only the failed batch's cids stay as placeholders. Tested against 5 local repos (200–525 communities each) on vLLM serving Qwen3.6-27B INT4-AutoRound on a 24 GB RTX 3090. Coverage went from 0–44% (when the call returned at all) to 99,8% after these changes.

safishamsi merged commit 7477b46 into Graphify-Labs:v8 Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add extra_body to custom providers + multi-batch label_communities for 16k-context models#1197

Add extra_body to custom providers + multi-batch label_communities for 16k-context models#1197
safishamsi merged 1 commit into
Graphify-Labs:v8from
EirikWolf:feat/extra-body-and-multi-batch-label

EirikWolf commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

EirikWolf commented Jun 8, 2026

Summary

Tested

Notes for reviewers

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants