Skip to content

Step 1: ie-dom — Complete DOM data structures #4

@thomasnemer

Description

@thomasnemer

Parent: #2

Goal

Complete the arena-allocated DOM tree with node creation, tree mutation, traversal iterators, and query methods. This is a leaf dependency with zero external crate deps beyond std — everything else builds on it.

Current State

crates/ie-dom/src/lib.rs has basic type definitions:

  • NodeId = usize
  • Document { nodes: Vec<Node>, root: NodeId } with new() and Default
  • Node { kind, parent, children, attributes }
  • NodeKind { Document, Element(String), Text(String), Comment(String) }

No manipulation methods, no traversal, no queries, no error handling.

File Changes

Split src/lib.rs into modules:

  • src/lib.rs — re-exports only
  • src/node.rsNode, NodeKind, NodeId types with serde derives
  • src/document.rsDocument struct and all impl methods
  • src/traversal.rsDescendantsIter, AncestorsIter
  • src/error.rsDomError enum

Modify crates/ie-dom/Cargo.toml — add serde and serde_json dependencies.

Implementation

Error type (error.rs)

  • DomError::NodeNotFound(NodeId) — operation references a node that doesn't exist in the arena
  • DomError::CycleDetectedappend_child/insert_before would make a node its own ancestor
  • DomError::NotAChildremove_child called with nodes that aren't in a parent-child relationship
  • Derive thiserror::Error for Display/Error impls

Node types (node.rs)

  • Move NodeId, Node, NodeKind here
  • Add #[derive(Serialize, Deserialize)] on NodeKind
  • Add #[derive(Serialize, Deserialize)] on Node
  • Node::is_element(&self) -> bool
  • Node::is_text(&self) -> bool
  • Node::element_name(&self) -> Option<&str> — returns tag name if Element
  • Node::text_content(&self) -> Option<&str> — returns text if Text or Comment

Node creation methods on Document (document.rs)

  • create_element(tag: &str) -> NodeId:
    • Push a new Node { kind: Element(tag.to_string()), parent: None, children: vec![], attributes: HashMap::new() } to self.nodes
    • Return the index
  • create_text(text: &str) -> NodeId — same pattern with Text kind
  • create_comment(text: &str) -> NodeId — same pattern with Comment kind

Tree mutation methods on Document (document.rs)

  • append_child(parent: NodeId, child: NodeId) -> Result<(), DomError>:
    1. Validate both parent and child exist → NodeNotFound if not
    2. Check child != parentCycleDetected if self-loop
    3. Walk ancestors of parent — if child is found among them → CycleDetected
    4. If child already has a parent, remove it from old parent's children vec
    5. Set child.parent = Some(parent)
    6. Push child to parent.children
  • remove_child(parent: NodeId, child: NodeId) -> Result<(), DomError>:
    1. Validate both exist → NodeNotFound
    2. Verify child.parent == Some(parent)NotAChild if not
    3. Remove child from parent.children vec
    4. Set child.parent = None
  • insert_before(parent: NodeId, new_child: NodeId, reference: NodeId) -> Result<(), DomError>:
    1. Validate all three exist → NodeNotFound
    2. Verify reference.parent == Some(parent)NotAChild if reference is not a child of parent
    3. Same cycle detection as append_child
    4. If new_child has existing parent, detach
    5. Find index of reference in parent.children, insert new_child at that index
    6. Set new_child.parent = Some(parent)

Accessor methods on Document (document.rs)

  • node(&self, id: NodeId) -> Option<&Node>self.nodes.get(id)
  • node_mut(&mut self, id: NodeId) -> Option<&mut Node>self.nodes.get_mut(id)
  • parent(&self, id: NodeId) -> Option<NodeId>self.node(id)?.parent
  • children(&self, id: NodeId) -> &[NodeId] — return &node.children or &[] if not found
  • get_attribute(&self, id: NodeId, name: &str) -> Option<&str> — look up in node's attributes HashMap
  • set_attribute(&mut self, id: NodeId, name: &str, value: &str) — insert into attributes HashMap (no-op if node not found)

Traversal iterators (traversal.rs)

  • DescendantsIter<'a>:
    • Holds &'a Document and a Vec<NodeId> stack (explicit, no recursion)
    • Initialized with children of the starting node pushed in reverse order
    • next(): pop from stack, push current node's children in reverse order, return current
    • Yields depth-first pre-order
    • Does NOT yield the starting node itself (only descendants)
  • AncestorsIter<'a>:
    • Holds &'a Document and current: Option<NodeId>
    • Initialized with node.parent of the starting node
    • next(): return current, advance to current's parent
    • Does NOT yield the starting node itself (only ancestors)
  • Document::descendants(&self, id: NodeId) -> DescendantsIter
  • Document::ancestors(&self, id: NodeId) -> AncestorsIter

Query methods on Document (document.rs)

  • get_elements_by_tag_name(&self, root: NodeId, tag: &str) -> Vec<NodeId>:
    • Use descendants(root) iterator
    • Filter: node.kind is Element(name) where name == tag
    • Collect and return
  • get_element_by_id(&self, root: NodeId, id: &str) -> Option<NodeId>:
    • Use descendants(root) iterator
    • Find first node where attributes.get("id") == Some(id)
    • Return Some(node_id) or None

Known limitation: arena memory

remove_child detaches nodes from the tree but does not free their arena slots. NodeIds are indices into Vec<Node>, and removed nodes remain allocated in the vector. Their parent is set to None and they are removed from their parent's children vec, but the Node struct itself stays in self.nodes.

This is acceptable for Phase 1. Document this as a known limitation in code comments.

Future optimization: Use slotmap or a generational arena with a free list for slot reuse. This would allow remove_child to actually reclaim memory and prevent NodeId reuse bugs (generational indices detect use-after-free).

  • Document::node_count(&self) -> usize — returns self.nodes.len() (total allocated slots, including detached nodes)
  • Document::live_node_count(&self) -> usize — counts nodes that have at least one reference: nodes that have a parent OR are the root node. This gives the count of nodes actually in the tree, excluding orphaned/detached nodes.

Tests

All in crates/ie-dom/src/ as #[cfg(test)] mod tests blocks in the relevant modules.

Document & creation tests

  • new() creates doc with root node of kind Document
  • create_element("div") → node exists, kind is Element("div")
  • create_text("hello") → kind is Text("hello")
  • create_comment("note") → kind is Comment("note")
  • Created nodes have no parent and no children

Tree mutation tests

  • append_child: parent's children contains child, child's parent is parent
  • append_child twice to same parent: both are children, in order
  • append_child reparenting: child with existing parent moves — removed from old parent's children, old parent has fewer children
  • remove_child: child removed from parent's children, child's parent is None
  • remove_child with wrong parent → NotAChild error
  • remove_child with non-existent ids → NodeNotFound
  • insert_before: new_child appears before reference in parent's children
  • insert_before with reference not a child of parent → NotAChild
  • insert_before reparenting: detaches from old parent first

Cycle detection tests

  • append_child(node, node) (self-loop) → CycleDetected
  • Build chain A → B → C, then append_child(C, A)CycleDetected
  • append_child(root, root)CycleDetected
  • Deep chain: 10 nodes deep, try to append root as child of deepest → CycleDetected

Accessor tests

  • node() returns Some for valid id, None for out-of-bounds
  • set_attribute then get_attribute round-trips
  • get_attribute on non-existent attribute → None
  • children() returns correct slice
  • children() on non-existent node → empty slice

Traversal tests

  • Build tree: root → [A, B], A → [C, D], B → [E]
  • descendants(root) yields A, C, D, B, E (depth-first pre-order)
  • descendants on leaf node → empty iterator
  • descendants on node with one child → yields just that child (and its subtree)
  • ancestors(E) yields B, root
  • ancestors(root) → empty iterator

Query tests

  • get_elements_by_tag_name(root, "div") finds all divs, ignores spans
  • get_elements_by_tag_name with no matches → empty vec
  • get_element_by_id(root, "main") finds node with id="main" attribute
  • get_element_by_id with no match → None
  • get_element_by_id with multiple matches → returns first in tree order

Arena memory tests

  • node_count() after creating 5 nodes equals 6 (root + 5)
  • live_node_count() after attaching 3 of 5 nodes to root equals 4 (root + 3)
  • After remove_child: node_count() unchanged, live_node_count() decremented

Serialization tests

  • Serialize Document to JSON via serde_json, deserialize back, verify tree structure matches
  • Round-trip preserves node kinds, attributes, parent-child relationships

Acceptance Criteria

  • cargo test -p ie-dom — all tests pass
  • cargo clippy -p ie-dom -- -D warnings — no warnings
  • cargo fmt -p ie-dom --check — formatted

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions