Skip to content

fix(enrich): explicit QUOTE/ESCAPE so Kuzu COPY honors RFC-4180#153

Merged
aksOps merged 1 commit into
mainfrom
fix/enrich-csv-quote-honor
May 13, 2026
Merged

fix(enrich): explicit QUOTE/ESCAPE so Kuzu COPY honors RFC-4180#153
aksOps merged 1 commit into
mainfrom
fix/enrich-csv-quote-honor

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 13, 2026

Summary

#150 switched the staging-file delimiter to `|` to avoid comma collisions inside JSON property values. That fix works for commas but breaks when an ID itself contains a literal `|`. Istio's EDS cluster names are exactly this shape:

```
json:istio/none_cds.json:inbound|7070|tcplocal|s1tcp.none
```

Go's `encoding/csv` writer DOES wrap such fields in `"` per RFC-4180. But Kuzu's CSV reader defaults to backslash escaping, not the RFC-4180 doubled-quote form Go produces. With the default Kuzu escape rule, the pipe-bearing quoted field is parsed as multiple fields and COPY aborts:

```
Copy exception: Error in file /tmp/codeiq-edges-3435630223.csv on line 7319:
expected 6 values per row, but got more.
```

Fix

Pass `QUOTE='"', ESCAPE='"'` explicitly to the Kuzu COPY FROM clause for both `copyNodeBatch` and `copyEdgeBatch`. Kuzu now reads the RFC-4180 form Go writes.

Test plan

  • `cd go && CGO_ENABLED=1 go test ./... -count=1` — 881 passed
  • New `TestBulkLoadEdgesPipeInTargetID` exercises the exact Istio cluster-name shape (fails on main, passes here)
  • End-to-end on polyglot-bench/istio — `codeiq enrich .` now exits 0 (was exit 2 pre-fix): 36k nodes, 55k edges, 20 services
  • After this + fix(parser): unquote TOML keys and section headers #152 land — `codeiq enrich ~/projects/` end-to-end with exit 0

🤖 Generated with Claude Code

PR #150 switched the staging file delimiter to '|' to avoid JSON-
property comma collisions. That fixes comma-bearing values but
breaks when an ID itself contains a literal '|' — Istio's EDS
cluster names are exactly this shape:

    json:istio/none_cds.json:inbound|7070|tcplocal|s1tcp.none

Go's encoding/csv writer DOES wrap such fields in '"' per RFC-4180.
But Kuzu's CSV reader defaults to BACKSLASH escaping, not the
RFC-4180 doubled-quote form Go produces. With the default Kuzu
escape rule, the pipe-bearing quoted field is parsed as multiple
fields and the COPY aborts:

    Copy exception: Error in file ... expected 6 values per row,
    but got more.

Fix: pass `QUOTE='"', ESCAPE='"'` explicitly so Kuzu interprets
the RFC-4180 form Go writes. Applies to both copyNodeBatch and
copyEdgeBatch.

End-to-end: `codeiq enrich ~/projects/polyglot-bench/istio` now
exits 0 (was exit 2 pre-fix): 36k nodes, 55k edges, 20 services.

Regression test TestBulkLoadEdgesPipeInTargetID covers the exact
Istio cluster-name shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@aksOps aksOps merged commit a83b768 into main May 13, 2026
13 checks passed
@aksOps aksOps deleted the fix/enrich-csv-quote-honor branch May 13, 2026 16:23
aksOps added a commit that referenced this pull request May 14, 2026
Stale doc references after Phase 6 (Java deletion, #132) and the Kuzu
0.7.1 → 0.11.3 bump (#155 + #159).

- CLAUDE.md / PROJECT_SUMMARY.md: bump Kuzu 0.7.1 → 0.11.3,
  go-sqlite3 1.14.22 → 1.14.44, cobra to 1.10.2; note native FTS.
- AGENTS.md: rewrite "What this repo is" (no more "REST API");
  flip `mvn -B -ntp clean verify` → `go test ./...`; clarify that
  REST + React SPA were deleted in Phase 6 and won't return.
- SECURITY.md: rewrite scope. Drop the dead JAR / serve / REST API /
  React UI / H2 / Neo4j Embedded references. New in-scope list covers
  every codeiq subcommand, the 10 MCP tools (with `run_cypher` mutation
  gate called out), `.codeiq/cache/` (SQLite) + `.codeiq/graph/`
  (Kuzu), and `read_file` path sandboxing. Add the security CI
  workflows (CodeQL, Semgrep, OSV-Scanner, Trivy, Gitleaks, SBOM,
  Socket Security) + perf-gate to the hardening references.
- CHANGELOG.md: populate [Unreleased] with the OOM-fix saga
  (PRs #145-#148), the five correctness fixes (#149-#153), the
  Kuzu 0.7.1 → 0.11.3 bump (#155-#158), the FTS migration (#159),
  the Dependabot config rewrite (#154), and the enrich CLI knobs.

No code changes.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant