Skip to content

fix: guard label/text normalizers against None node labels (#1194)#1195

Merged
safishamsi merged 1 commit into
Graphify-Labs:v8from
freiit:fix/none-label-normalize-guards
Jun 8, 2026
Merged

fix: guard label/text normalizers against None node labels (#1194)#1195
safishamsi merged 1 commit into
Graphify-Labs:v8from
freiit:fix/none-label-normalize-guards

Conversation

@freiit

@freiit freiit commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What & why

Graph nodes can carry label: None — OpenAI-compatible LLM backends occasionally emit a null label during semantic extraction. The callers fetch the label with node.get("label", node.get("id", "")) / node.get("label", ""), but dict.get(key, default) returns the default only when the key is absent; for an explicit "label": None it returns None. That None flows into helpers that call unicodedata.normalize(...), crashing the entire extract pipeline with:

TypeError: normalize() argument 2 must be str, not None

at whichever normalizer runs first (dedup → build → export). The four affected helpers:

  • dedup._norm
  • build._norm_label
  • export._strip_diacritics
  • serve._strip_diacritics

Because semantic extraction results are cached before the build step, the crash recurs on every subsequent extract until the cache is wiped, and there's no --no-dedup escape hatch. Same bug class as #454 (sanitize_label crashing on a None source_file).

The other unicodedata.normalize call sites (extract._make_id, build._normalize_id, symbol_resolution._bash_make_id, mcp_ingest) build their input via "_".join(p for p in parts if p), so they're always str — not affected.

Fix

Coerce non-str input to "" at each chokepoint (and widen the type hint to str | None). A null/empty label then normalizes to "", which the surrounding if key: guards already skip — so the offending node simply isn't considered for merging (it stays in the graph) instead of aborting the run.

Verification

Reproduced on a 6,253-file Markdown corpus via a vLLM / gpt-oss-120b OpenAI-compatible backend: every extract crashed — first at dedup._norm, then (after guarding that) at export._strip_diacritics. With all four guarded, the same corpus builds cleanly:

[graphify extract] wrote /data/graphify-out/graph.json: 16715 nodes, 16136 edges, 4272 communities
[graphify] Done

py_compile clean. Unit check: _norm(None) == "", _norm_label(None) == "", both _strip_diacritics(None) == ""; normal strings unchanged.

Fixes #1194

Nodes can carry label=None (OpenAI-compatible LLM backends emit null labels
during semantic extraction). Callers use dict.get("label", fallback), which
returns None for an explicit null value (the fallback only applies when the key
is absent). That None reaches helpers calling unicodedata.normalize(...),
crashing the whole extract pipeline with:

    TypeError: normalize() argument 2 must be str, not None

at whichever normalizer runs first (dedup -> build -> export):
  - dedup._norm
  - build._norm_label
  - export._strip_diacritics
  - serve._strip_diacritics

Extraction is cached before the build step, so the crash recurs on every
re-run until the cache is wiped, with no --no-dedup escape hatch. Coerce
non-str input to "" at each chokepoint; a null label then normalizes to ""
(already skipped by surrounding 'if key:' guards). Same class as Graphify-Labs#454.

Fixes Graphify-Labs#1194
@safishamsi safishamsi merged commit 3602c80 into Graphify-Labs:v8 Jun 8, 2026
@freiit freiit deleted the fix/none-label-normalize-guards branch June 9, 2026 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dedup._norm() crashes with TypeError: normalize() argument 2 must be str, not None on a node with label: null — aborts the whole build

2 participants