fix: guard label/text normalizers against None node labels (#1194) by freiit · Pull Request #1195 · Graphify-Labs/graphify

freiit · 2026-06-08T14:23:01Z

What & why

Graph nodes can carry label: None — OpenAI-compatible LLM backends occasionally emit a null label during semantic extraction. The callers fetch the label with node.get("label", node.get("id", "")) / node.get("label", ""), but dict.get(key, default) returns the default only when the key is absent; for an explicit "label": None it returns None. That None flows into helpers that call unicodedata.normalize(...), crashing the entire extract pipeline with:

TypeError: normalize() argument 2 must be str, not None

at whichever normalizer runs first (dedup → build → export). The four affected helpers:

dedup._norm
build._norm_label
export._strip_diacritics
serve._strip_diacritics

Because semantic extraction results are cached before the build step, the crash recurs on every subsequent extract until the cache is wiped, and there's no --no-dedup escape hatch. Same bug class as #454 (sanitize_label crashing on a None source_file).

The other unicodedata.normalize call sites (extract._make_id, build._normalize_id, symbol_resolution._bash_make_id, mcp_ingest) build their input via "_".join(p for p in parts if p), so they're always str — not affected.

Fix

Coerce non-str input to "" at each chokepoint (and widen the type hint to str | None). A null/empty label then normalizes to "", which the surrounding if key: guards already skip — so the offending node simply isn't considered for merging (it stays in the graph) instead of aborting the run.

Verification

Reproduced on a 6,253-file Markdown corpus via a vLLM / gpt-oss-120b OpenAI-compatible backend: every extract crashed — first at dedup._norm, then (after guarding that) at export._strip_diacritics. With all four guarded, the same corpus builds cleanly:

[graphify extract] wrote /data/graphify-out/graph.json: 16715 nodes, 16136 edges, 4272 communities
[graphify] Done

py_compile clean. Unit check: _norm(None) == "", _norm_label(None) == "", both _strip_diacritics(None) == ""; normal strings unchanged.

Fixes #1194

Nodes can carry label=None (OpenAI-compatible LLM backends emit null labels during semantic extraction). Callers use dict.get("label", fallback), which returns None for an explicit null value (the fallback only applies when the key is absent). That None reaches helpers calling unicodedata.normalize(...), crashing the whole extract pipeline with: TypeError: normalize() argument 2 must be str, not None at whichever normalizer runs first (dedup -> build -> export): - dedup._norm - build._norm_label - export._strip_diacritics - serve._strip_diacritics Extraction is cached before the build step, so the crash recurs on every re-run until the cache is wiped, with no --no-dedup escape hatch. Coerce non-str input to "" at each chokepoint; a null label then normalizes to "" (already skipped by surrounding 'if key:' guards). Same class as Graphify-Labs#454. Fixes Graphify-Labs#1194

safishamsi merged commit 3602c80 into Graphify-Labs:v8 Jun 8, 2026

freiit deleted the fix/none-label-normalize-guards branch June 9, 2026 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: guard label/text normalizers against None node labels (#1194)#1195

fix: guard label/text normalizers against None node labels (#1194)#1195
safishamsi merged 1 commit into
Graphify-Labs:v8from
freiit:fix/none-label-normalize-guards

freiit commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

freiit commented Jun 8, 2026

What & why

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants