Enhancement: native bulk-load / fast-ingest mode for replay scenarios

## Enhancement request

LadybugDB would benefit from a native **bulk-load / fast-ingest mode** optimized for scenarios where durability-per-write is not required, such as:

- Replaying a write-ahead log into a fresh database
- Initial data loading from an external source of truth
- Migration/recovery operations

## Motivation

We are using LadybugDB as the embedded graph database for Graphiti (knowledge graph) in the Liminis notes application. Our graphiti integration keeps its own JSONL WAL (separate from LadybugDB's db.wal) so we can recover the database by replaying mutations. Production workspace has 16,996 WAL files containing ~200K+ mutations.

Replay takes 25+ minutes, even though the machine is neither CPU-bound nor I/O-throughput-bound. The dominant cost is per-write commit durability (presumably fsync on db.wal per transaction).

## Key insight

During replay, durability-per-write is **not valuable**. The source of truth is the external WAL file. If replay crashes mid-way, the correct recovery is to discard the partial DB and restart from scratch. fsync'ing each write only slows things down without providing any real safety.

## What we'd like

A mode where:
- All mutations go into a single logical transaction (or batch)
- fsync is deferred to a single call at the end
- Indices are built on final state (we already do this)
- Optionally: checksums skipped, checkpoints deferred

Potential shapes:

1. **Database-level flag** at open time:
   \`\`\`python
   kuzu.Database(path, bulk_load_mode=True)
   \`\`\`

2. **Explicit transactions** exposed via the Python \`Connection\` API:
   \`\`\`python
   conn.execute("BEGIN TRANSACTION")
   for mutation in wal:
       conn.execute(mutation.cypher, mutation.params)
   conn.execute("COMMIT")
   \`\`\`
   (We see BEGIN/COMMIT in Kuzu's Cypher grammar but haven't verified it works via the Python \`execute()\` call.)

3. **Native \`COPY FROM\` bulk loader** like Kuzu upstream — but this requires the data in a specific format (CSV, Parquet) and doesn't fit our MERGE-heavy mutation stream.

## Current workaround options we're considering

- \`auto_checkpoint=False\` at Database construction — defers single final checkpoint
- \`enable_checksums=False\` — small speedup
- Wrapping replay in \`BEGIN TRANSACTION ... COMMIT\` if that works

## Context

- Our graphiti fork: https://github.com/verveguy/graphiti (liminis branch)
- Related graphiti issue for the driver-level changes: https://github.com/verveguy/graphiti/issues/33
- We'd happily help test experimental modes against a 200K-mutation replay corpus

## Expected impact

If bulk-load mode can cut fsync-per-write, we estimate **10-100x speedup on replay**, turning 25-minute replays into 30 seconds — making the WAL a much more ergonomic recovery tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: native bulk-load / fast-ingest mode for replay scenarios #386

Enhancement request

Motivation

Key insight

What we'd like

Current workaround options we're considering

Context

Expected impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Enhancement: native bulk-load / fast-ingest mode for replay scenarios #386

Description

Enhancement request

Motivation

Key insight

What we'd like

Current workaround options we're considering

Context

Expected impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions