Skip to content

fix(enrich): pipe-delim COPY staging so JSON commas don't break Kuzu#150

Merged
aksOps merged 1 commit into
mainfrom
fix/enrich-csv-escape
May 13, 2026
Merged

fix(enrich): pipe-delim COPY staging so JSON commas don't break Kuzu#150
aksOps merged 1 commit into
mainfrom
fix/enrich-csv-escape

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 13, 2026

Summary

Kuzu's CSV parser doesn't honor RFC-4180 quoting and counts commas inside JSON property values as field separators. Observed real-world abort on `~/projects/` enrich:

```
Copy exception: error in file <kuzu-tmp-IMPORTS-...csv>, on line 4:
expected 6 values per row, but got more
```

Triggered by Markdown depends_on edges and Python imports whose Properties JSON includes commas (e.g. `{"language":"python","module":"glob"}`).

Switch the staging file delimiter from comma to pipe `|`. Go's `json.Marshal` never emits a literal `|` character, so the separator is unambiguous. Both `copyNodeBatch` and `copyEdgeBatch` flip together.

Test plan

  • `cd go && CGO_ENABLED=1 go test ./internal/graph/... -count=1` — 46 passed
  • New regression test `TestBulkLoadEdgesCommaInProperties` (fails on main, passes here)
  • New regression test `TestBulkLoadNodesCommaInProperties` (fails on main, passes here)
  • Re-run `codeiq enrich ~/projects/` once fix(enrich): pipe-delim COPY staging so JSON commas don't break Kuzu #150 (duplicate-PK) and this land — verify no comma-related COPY abort

🤖 Generated with Claude Code

Kuzu's CSV parser doesn't respect RFC-4180 quoting and counts commas
inside JSON property values as field separators. On real-world inputs
this aborted BulkLoadEdges with "Copy exception: expected 6 values
per row, but got more" — observed on Markdown depends_on edges and
Python imports whose properties include {"language":"python",
"module":"glob"}.

Switch the staging file delimiter from comma to pipe '|'. Go's
json.Marshal never emits a literal '|', so the separator is
unambiguous. Both copyNodeBatch and copyEdgeBatch flip together.

Adds two regression tests with comma-bearing Properties JSON
(TestBulkLoadEdgesCommaInProperties + TestBulkLoadNodesCommaInProperties)
that fail on main and pass after the fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aksOps aksOps merged commit 7867a79 into main May 13, 2026
13 checks passed
@aksOps aksOps deleted the fix/enrich-csv-escape branch May 13, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant