Skip to content

Architecture

ABCrimson edited this page Mar 6, 2026 · 6 revisions

Architecture

modern-xlsx is a hybrid Rust WASM + TypeScript library for reading and writing XLSX files.

Layer Diagram

  ┌──────────────────────────────────────────┐
  │  TypeScript API                          │
  │  Workbook · Worksheet · Cell             │
  │  StyleBuilder · RichTextBuilder          │
  │  Utilities (dates, cell refs, CSV, JSON) │
  └────────────────┬─────────────────────────┘
                   │ JSON string
  ┌────────────────┴─────────────────────────┐
  │  WASM Bridge (wasm-bindgen)              │
  │  read() · write() · readJson()           │
  └────────────────┬─────────────────────────┘
                   │
  ┌────────────────┴─────────────────────────┐
  │  Rust Core (modern-xlsx-core)            │
  │  ZIP (deflate) · XML (quick-xml SAX)     │
  │  SharedStringTable · Style resolution    │
  │  Content types · Relationships           │
  └──────────────────────────────────────────┘

Data Flow

Reader Path

Uint8Array → WASM read()
  → ZIP decompress (zip crate)
  → Parse XML parts (quick-xml SAX)
  → Build WorkbookData struct
  → Serialize to JSON (serde_json)
  → JSON.parse() in JS
  → Workbook class wraps raw data

Writer Path

Workbook.toBuffer()
  → Serialize to JSON (JSON.stringify)
  → WASM write()
  → Build SST from shared string cells
  → Generate XML parts (quick-xml Writer)
  → ZIP compress (zip crate, deflate)
  → Return Uint8Array

Why JSON Bridge?

Data crosses the WASM boundary as JSON strings, not via serde_wasm_bindgen. Benchmarks show this is 8–13x faster for large workbooks because:

  1. serde_json serialization in Rust is heavily optimized (itoa, ryu for numbers)
  2. JSON.parse() is one of the fastest built-in V8/SpiderMonkey operations
  3. A single WASM boundary crossing replaces thousands of individual wasm_bindgen calls

Rust Core Modules

Module Purpose
lib.rs WorkbookData, SheetData, CellData — shared types
reader.rs Read orchestrator (ZIP → parse → WorkbookData)
writer.rs Write orchestrator (WorkbookData → XML → ZIP)
streaming.rs Streaming reader/writer for large files
ooxml/ Individual OOXML part parsers
number_format.rs Excel number format code parser
errors.rs ModernXlsxError error type
validate.rs OOXML validation and repair engine
ole2/ OLE2 compound document read/write (feature-gated: encryption)

OOXML Parsers

Parser OOXML Part
shared_strings.rs xl/sharedStrings.xml
styles.rs xl/styles.xml
worksheet.rs xl/worksheets/sheet*.xml
workbook.rs xl/workbook.xml
relationships.rs *.rels files
content_types.rs [Content_Types].xml
doc_props.rs docProps/core.xml, docProps/app.xml
comments.rs xl/comments*.xml
theme.rs xl/theme/theme1.xml
calc_chain.rs xl/calcChain.xml
validate.rs Validation rules and auto-repair
pivot_table.rs xl/pivotTables/, xl/pivotCache/
threaded_comments.rs xl/threadedComments/, xl/persons/
slicers.rs xl/slicers/, xl/slicerCaches/
timelines.rs xl/timelines/, xl/timelineCache/

Performance Patterns

  • SAX parsing — quick-xml streams events, never builds a DOM tree
  • Vec::with_capacity() — pre-allocated parse buffers on all XML parsers
  • push_entity() — zero-allocation XML entity resolution (writes into caller's buffer)
  • from_utf8().unwrap_or_default() — avoids lossy conversion + allocation
  • entries.remove() — moves data out of HashMap instead of cloning
  • drain() — moves preserved entries instead of cloning large blobs
  • Binary search insertion — rows inserted in sorted order
  • itoa::Buffer — zero-allocation integer formatting in XML
  • cold_path() — Rust 1.95 compiler hint on all error branches for optimal icache layout
  • #[inline] — on hot-path streaming JSON helpers and chart enum methods
  • .find() iterators — single-attribute XML parsing avoids full loop iteration
  • zip() iteration — parallel vector attachment without bounds checks
  • Buffer pre-allocation — XML writer String::with_capacity() based on data size estimates
  • Feature gating#[cfg(feature = "encryption")] makes crypto deps optional for smaller WASM

Bundle Size

Component Size
WASM binary ~939 KB
JS wrapper ~55 KB
Total ~994 KB

Browser Build Pipeline

The TypeScript package produces three build outputs via tsdown (rolldown):

src/index.ts        → dist/index.mjs           (ESM, 55 KB)
src/browser-entry.ts → dist/modern-xlsx.min.js  (IIFE, 29 KB, minified)
src/worker.ts       → dist/modern-xlsx.worker.js (ESM, 6 KB, minified)

IIFE Bundle

The IIFE bundle exposes the full API on window.ModernXlsx. It inlines all TypeScript source but keeps the WASM binary external. The detectWasmUrl() function auto-resolves the WASM path from document.currentScript.src.

Web Worker

The worker entry point runs in a DedicatedWorkerGlobalScope. It:

  1. Receives messages with {type: 'read'|'write', data?, json?}
  2. Auto-initializes WASM on first use
  3. Returns results with transferable ArrayBuffers (zero-copy)

Source Maps

All outputs include source maps (.js.map) for debugging in browser DevTools.

Module Splits (v0.9.1)

Large source files were split into focused submodules for maintainability:

Rust: worksheet.rsworksheet/

The 7,173-line worksheet.rs was split into four files:

File Responsibility
worksheet/mod.rs Types, re-exports
worksheet/parser.rs SAX XML parsing (xl/worksheets/sheet*.xml)
worksheet/writer.rs XML generation
worksheet/json.rs Streaming JSON serialization for WASM bridge

Rust: charts.rscharts/

The 4,862-line charts.rs was split into four files:

File Responsibility
charts/mod.rs Re-exports, chart resolution logic
charts/types.rs ChartData, enums, serde types
charts/parser.rs SAX XML parsing (xl/charts/chart*.xml)
charts/writer.rs XML generation (ChartData::to_xml())

TypeScript: barcode.tsbarcode/

The 1,828-line barcode.ts was split into 11 tree-shakeable modules — one per barcode codec (Code128, EAN-13, QR, etc.) plus a shared utilities module and barrel index.ts.

Performance Patterns (v0.9.1)

  • ryu crate — 2-6x faster f64-to-string formatting in hot paths (worksheet JSON, chart values)
  • itoa::Buffermake_rid() helper eliminates 21 format!("rId{}", n) heap allocations
  • Cow<'static, str> — Relationship fields use borrowed static strings for well-known constants (e.g., OOXML namespace URIs), avoiding allocation entirely
  • Byte-level JSON escaping — batch memcpy of safe spans instead of char-by-char iteration

Clone this wiki locally