Skip to content

Step 3: CSS tokenizer, parser, selectors, and cascade #22

@thomasnemer

Description

@thomasnemer

Parent: #19

Goal

Implement a CSS parser following CSS Syntax Module Level 3, a selector matching engine, the cascade algorithm, and expand ComputedStyle to cover ~40 properties needed for layout. Also create the user-agent stylesheet.

Prerequisites

None for parsing — can be developed in parallel with Step 1 (HTML tokenizer).
ie-dom needed for selector matching tests.

File Changes

  • crates/ie-css/src/tokenizer.rs — new file, CSS tokenizer
  • crates/ie-css/src/parser.rs — rewrite, full CSS parser
  • crates/ie-css/src/selector.rs — major expansion
  • crates/ie-css/src/specificity.rs — new file
  • crates/ie-css/src/cascade.rs — new file
  • crates/ie-css/src/style.rs — major expansion (~40 properties)
  • crates/ie-css/src/values.rs — new file, extended value types
  • crates/ie-css/src/inherited.rs — new file, property inheritance classification
  • crates/ie-css/data/ua.css — new file, user-agent stylesheet
  • crates/ie-css/src/lib.rs — update module declarations

Implementation

CSS Tokenizer (tokenizer.rs)

  • Per CSS Syntax Level 3 spec
  • Iterator-based: impl Iterator<Item = CssToken>
  • CssToken enum:
    Ident(String), Function(String), AtKeyword(String), Hash(String, HashType),
    String(String), BadString, Url(String), BadUrl,
    Delim(char), Number(f64, NumType), Percentage(f64), Dimension(f64, String),
    Whitespace, Cdo, Cdc, Colon, Semicolon, Comma,
    SquareBracketOpen, SquareBracketClose,
    ParenOpen, ParenClose,
    CurlyBracketOpen, CurlyBracketClose, Eof
    
  • HashType: Id vs Unrestricted
  • NumType: Integer vs Number
  • Handle escape sequences, URL tokens, string tokens with escapes
  • Comments consumed and discarded (not tokens)

CSS Parser (parser.rs)

  • Parse a stylesheet into Stylesheet { rules: Vec<Rule> }
  • Parse qualified rules: selector list + declaration block
  • Parse at-rules (Phase 2 subset):
    • @import url("...") — record URL for fetching
    • @media (condition) { ... } — parse but defer evaluation to Phase 3
  • Parse declaration blocks: property: value; pairs
  • Parse values into structured Value types
  • Handle shorthand properties: margin, padding, border, background, font, flex
    • Expand shorthands into individual properties during parsing
  • !important flag on declarations
  • Parse inline style attributes (declaration block without selectors)

Selectors (selector.rs)

  • Selector as a list of CompoundSelector connected by Combinator:
    pub struct Selector {
        pub compounds: Vec<(CompoundSelector, Option<Combinator>)>,
    }
    pub enum Combinator { Descendant, Child, NextSibling, SubsequentSibling }
    pub struct CompoundSelector {
        pub type_selector: Option<String>,  // tag name or *
        pub id: Option<String>,
        pub classes: Vec<String>,
        pub attributes: Vec<AttributeSelector>,
        pub pseudo_classes: Vec<PseudoClass>,
        pub pseudo_element: Option<PseudoElement>,
    }
  • Attribute selectors:
    pub struct AttributeSelector {
        pub name: String,
        pub op: Option<AttributeOp>,
        pub value: Option<String>,
        pub case_insensitive: bool,
    }
    pub enum AttributeOp { Equals, Includes, DashMatch, Prefix, Suffix, Substring }
  • Pseudo-classes (Phase 2):
    pub enum PseudoClass {
        Hover, Focus, Active, Visited, Link,
        FirstChild, LastChild, NthChild(NthExpr), NthLastChild(NthExpr),
        OnlyChild, Root, Empty,
        Not(Box<Selector>),
    }
    pub struct NthExpr { pub a: i32, pub b: i32 }  // an+b
  • Pseudo-elements: ::before, ::after, ::first-line, ::first-letter (parse but don't generate content in Phase 2)
  • Selector parsing from string: parse_selector(input: &str) -> Result<Selector>
  • Selector list parsing: parse_selector_list(input: &str) -> Result<Vec<Selector>>

Specificity (specificity.rs)

  • Specificity(u32, u32, u32) — (a, b, c) per CSS spec:
    • a: count of ID selectors
    • b: count of class selectors, attribute selectors, pseudo-classes
    • c: count of type selectors, pseudo-elements
  • fn specificity(selector: &Selector) -> Specificity
  • impl Ord for Specificity — comparison for cascade

Cascade (cascade.rs)

  • fn cascade(rules: &[(Selector, Vec<Declaration>, Origin)], node: NodeId, doc: &Document) -> PropertyMap:
    • Collect all declarations that match the node
    • Sort by: origin → specificity → source order
    • Last wins (with !important overriding non-important regardless of specificity)
  • Origin enum: UserAgent, Author, AuthorImportant, UserAgentImportant
  • PropertyMap: HashMap<PropertyId, CascadedValue> — the raw cascaded values before inheritance/computation

Extended ComputedStyle (style.rs)

  • Expand from current ~5 properties to ~40:

    Box model:

    • display: block, inline, inline-block, flex, grid, none, contents
    • width, height, min-width, max-width, min-height, max-height
    • margin-top/right/bottom/left
    • padding-top/right/bottom/left
    • border-top/right/bottom/left-width
    • border-top/right/bottom/left-style
    • border-top/right/bottom/left-color
    • box-sizing: content-box, border-box

    Typography:

    • font-family: list of family names
    • font-size: absolute (px) after computation
    • font-weight: numeric (100-900)
    • font-style: normal, italic, oblique
    • line-height: normal or number/length
    • text-align: left, right, center, justify
    • text-decoration: none, underline, overline, line-through
    • color: RGBA
    • white-space: normal, nowrap, pre, pre-wrap, pre-line

    Positioning:

    • position: static, relative, absolute, fixed, sticky
    • top, right, bottom, left
    • z-index: auto or integer

    Flexbox:

    • flex-direction, flex-wrap, justify-content, align-items, align-self
    • flex-grow, flex-shrink, flex-basis
    • gap (row-gap, column-gap)

    Visual:

    • background-color
    • overflow, overflow-x, overflow-y
    • visibility: visible, hidden, collapse
    • opacity: 0.0 to 1.0

Extended Values (values.rs)

  • Length units: px, em, rem, %, vw, vh, vmin, vmax, ch, ex
  • Colors: #rgb, #rrggbb, #rgba, #rrggbbaa, rgb(), rgba(), hsl(), hsla(), named colors (CSS Color Level 4 named colors list)
  • auto keyword (for margin, width, height)
  • inherit, initial, unset keywords
  • calc(): basic arithmetic (add, subtract, multiply, divide) with mixed units where valid
  • PropertyId enum: one variant per CSS property (for efficient map keys)

Property inheritance classification (inherited.rs)

  • fn is_inherited(property: PropertyId) -> bool
  • Inherited: color, font-*, line-height, text-align, text-decoration, visibility, white-space, cursor
  • Not inherited: display, width, height, margin-*, padding-*, border-*, position, top/right/bottom/left, background-*, overflow, opacity, flex-*, z-index
  • Initial values per property: fn initial_value(property: PropertyId) -> Value

User-Agent Stylesheet (data/ua.css)

  • Based on WHATWG rendering section normative UA stylesheet
  • Defines default display values for HTML elements:
    html, body, div, section, article, aside, nav, main, header, footer, h1, h2, h3, h4, h5, h6, p, ul, ol, li, pre, blockquote, figure, figcaption, form, fieldset, legend, details, summary, dialog { display: block; }
    head, link, meta, style, script, title, base { display: none; }
    /* ... margins, paddings, font sizes for headings, lists, etc. */
  • Parsed once at startup, stored as Stylesheet
  • Embedded via include_str! or const

Tests

CSS tokenizer tests

  • Tokenize div { color: red; } → correct token sequence
  • Tokenize string with escape: "hello\"world" → single String token
  • URL token: url(https://example.com) → Url token
  • Dimension: 16px → Dimension(16.0, "px")
  • Percentage: 50% → Percentage(50.0)
  • Function: rgb( → Function("rgb")

CSS parser tests

  • Parse simple rule: p { color: red; } → one rule, one selector, one declaration
  • Parse multiple rules
  • Parse shorthand: margin: 10px 20px; → four individual margin properties
  • Parse !important
  • Parse selector list: h1, h2, h3 { ... } → three selectors
  • Parse @import at-rule

Selector tests

  • Parse div > .class#id[attr=val]:hover::before → correct compound selector structure
  • Specificity: #id → (1,0,0), .class → (0,1,0), div → (0,0,1)
  • Specificity: #id .class div → (1,1,1)
  • Parse :nth-child(2n+1) → NthExpr { a: 2, b: 1 }
  • Parse :not(.foo) → Not containing class selector

Cascade tests

  • Two rules with different specificity: higher specificity wins
  • Same specificity: later rule wins
  • !important overrides non-important regardless of specificity
  • UA stylesheet provides defaults, author stylesheet overrides

Property and value tests

  • Parse color: #ff0000 → Color { r: 255, g: 0, b: 0, a: 255 }
  • Parse color: rgb(255, 0, 0) → same
  • Parse margin: 10px → all four margins set to 10px
  • Parse font-size: 1.5em → Em(1.5)
  • Named colors: red, blue, transparent, etc.
  • calc(100% - 20px) parsed correctly

UA stylesheet tests

  • After applying UA stylesheet, <div> has display: block
  • <span> has display: inline
  • <head> has display: none
  • <h1> has larger font-size than <p>

Acceptance Criteria

  • cargo test -p ie-css — all tests pass
  • Can parse CSS from real websites (extract CSS from a few popular sites, parse without crashes)
  • UA stylesheet provides correct defaults for all common HTML elements
  • cargo clippy -p ie-css -- -D warnings — no warnings

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions