Skip to content

Optimize prerender perf: eliminate URL() construction in dependency tracking hot path#4223

Merged
habdelra merged 3 commits intomainfrom
cs-10473-prerender-perf-url-and-dep-tracking
Mar 21, 2026
Merged

Optimize prerender perf: eliminate URL() construction in dependency tracking hot path#4223
habdelra merged 3 commits intomainfrom
cs-10473-prerender-perf-url-and-dep-tracking

Conversation

@habdelra
Copy link
Contributor

@habdelra habdelra commented Mar 20, 2026

Summary

  • Eliminate expensive new URL() construction from the runtime dependency tracking hot path
  • Cache module dependency graph traversal results
  • Skip redundant dependency tracking for already-tracked modules

Results

Measured on staging with the SystemCard (23 linksToManylinksTo chain, clearCache=true):

Metric Before After
Loading → Ready ~247s ~17s
Speedup ~14.5x

Problem

Flame chart profiling of the SystemCard prerender revealed 22 identical ~9-second long tasks, each with 98% of active CPU in the dependency tracking system:

Function % per frame ~ms/frame
URL() constructor 52.8% ~4,900ms
s (minified; also calls URL()) 31.0% ~2,880ms
trimModuleIdentifier 1.3% ~120ms
collectKnownModuleDependencies 1.1% ~105ms

The root cause: every linksTo field getter access triggers trackRuntimeRelationshipModuleDependencies, which calls loader.getKnownConsumedModules(), which walks the entire module dependency graph calling getModule()trimModuleIdentifier()new URL() on every node. With 22 cards each triggering this on render, it's 22 full graph walks with hundreds of thousands of URL constructions.

Changes

1. trimModuleIdentifier → string ops + cache (loader.ts)

Replace trimExecutableExtension(new URL(moduleIdentifier)).href with string.slice() + a Map cache. Module identifiers are already valid URL strings — extension trimming only needs string operations.

2. Cache collectKnownModuleDependencies results (loader.ts)

The flattened dependency set for an evaluated module is immutable. Cache it so 22 identical graph walks become 1, with subtree cache hits for shared dependencies.

3. String-based normalization in dependency-tracker.ts

Replace new URL() calls in canonicalURL, normalizeModuleURL, and normalizeInstanceURL with string operations.

Test plan

  • Lint passes for @cardstack/runtime-common and @cardstack/base
  • CI: existing prerendering, indexing, and card-api tests verify correctness
  • Verified on staging: SystemCard prerender dropped from ~247s to ~17s

Closes CS-10473

🤖 Generated with Claude Code

habdelra and others added 2 commits March 20, 2026 13:21
…dency tracking hot path

Flame chart profiling revealed that 98% of active CPU per frame during card
prerendering was spent in the runtime dependency tracking system, primarily
constructing URL objects. The render produced 22 identical ~9-second long
tasks (one per card deserialization), totaling ~200 seconds of blocked main
thread for a card with 23 linksToMany relationships.

Three optimizations applied:

1. trimModuleIdentifier (loader.ts): Replace `new URL(id).href` with string
   slice operations + a Map cache. Module identifiers are already full URL
   strings, so extension trimming only needs string ops. This was the single
   largest CPU consumer at 52.8% of active time (~5s per card).

2. collectKnownModuleDependencies (loader.ts): Cache the flattened dependency
   set per module identifier. Once a module is evaluated its consumedModules
   never change, so repeated graph walks for the same module return the cached
   result. This turns O(cards × modules) into O(modules).

3. trackRuntimeRelationshipModuleDependencies (card-api.gts): Track which
   modules have already had their full dep trees tracked and skip redundant
   getKnownConsumedModules() calls. This function was called on every linksTo
   field getter access during rendering, each time walking the full module
   dependency graph.

Additionally, normalizeModuleURL/normalizeInstanceURL/canonicalURL in
dependency-tracker.ts now use string operations instead of URL construction,
eliminating another hot source of URL() calls in the tracking pipeline.

Closes CS-10473

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e215f9dc4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets prerender performance bottlenecks in runtime dependency tracking by removing repeated new URL() construction in hot paths and caching module dependency graph traversals, aiming to drastically reduce SystemCard prerender time.

Changes:

  • Replace URL-construction-based executable-extension trimming with string operations + caching.
  • Cache flattened dependency sets for evaluated modules to avoid repeated full dependency graph walks.
  • Add a fast path to skip redundant relationship module dependency tracking, and optimize URL normalization in the dependency tracker.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
packages/runtime-common/loader.ts Adds caching for known module dependency traversal and replaces URL-based trimming with string ops + cache.
packages/runtime-common/dependency-tracker.ts Replaces URL-constructor-based canonicalization/normalization with string operations.
packages/base/card-api.gts Skips repeated relationship module dependency tracking via a module-level cache.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Mar 20, 2026

Host Test Results

    1 files  ±    0      1 suites  ±0   4h 16m 23s ⏱️ + 1h 54m 12s
2 030 tests ±    0  2 014 ✅  -     1  15 💤 ± 0  0 ❌ ±0  1 🔥 +1 
4 090 runs  +2 045  4 058 ✅ +2 028  30 💤 +15  1 ❌ +1  1 🔥 +1 

For more details on these errors, see this check.

Results for commit 2754d6f. ± Comparison against base commit 505f18d.

♻️ This comment has been updated with latest results.

Address review feedback:
- getKnownConsumedModules: filter instead of delete to avoid mutating the
  cached Set returned by collectKnownModuleDependencies
- Remove trackedRelationshipModules skip cache from card-api.gts — it was
  process-global and not cleared between dependency tracking sessions,
  which could cause subsequent renders to under-report module deps. The
  Loader-level caching in collectKnownModuleDependencies already makes
  getKnownConsumedModules fast enough without a caller-side skip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested a review from a team March 20, 2026 18:14
Copy link
Contributor

@backspace backspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you already tried this?

  • Verify on staging that SystemCard prerender time drops significantly

@habdelra
Copy link
Contributor Author

Staging verification after deploy

Measured the SystemCard prerender on staging (clearCache=true, fresh nonce):

Metric Before (pre-fix) After (post-fix)
Loading → Ready ~247s ~17s
Speedup ~14.5x

Timeline:

  • t=5.6s: prerender element found, status="loading"
  • t=16.9s: status changed to "ready"
  • 11.3 seconds of actual render work after the element appeared

Previously this card was consistently timing out at the 90-second prerender limit on staging.

@habdelra
Copy link
Contributor Author

habdelra commented Mar 21, 2026

Post-fix flame chart analysis

After deploying PR, a new flame chart of the same SystemCard render shows the bottleneck landscape has completely changed:

Function % of active CPU What it is
Garbage collector 57.7% GC pressure from object allocation
Qn (minified, Glimmer/Ember) 13.5% Rendering internals
get 2.7% Property access / Glimmer tracking
fetchModule 1.9% Module loading
visitQueue/visit/shouldVisit ~2.5% Babel AST traversal (module compilation)
fetch 1.1% Network requests

What changed

  • URL() constructor is completely gone from the hot list (was 52.8% before)
  • s (minified dep tracking fn) is gone (was 31.0% before)
  • Longest task: 1.7s (down from ~9s per card before)
  • No single JS function dominates — CPU is spread across many small operations

Interpretation

The render is now network + GC bound rather than CPU-bound on a single hot path. The ~17s is spent on: fetching ~137 resources, compiling .gts modules via Babel/content-tag WASM, and GC from object allocations during card deserialization. There is no single obvious optimization target remaining — it is a healthy distribution.

@habdelra habdelra merged commit f03e5fa into main Mar 21, 2026
146 of 150 checks passed
habdelra added a commit that referenced this pull request Mar 23, 2026
…racking hot path (#4223)

* Optimize prerender performance: eliminate URL() construction in dependency tracking hot path

Flame chart profiling revealed that 98% of active CPU per frame during card
prerendering was spent in the runtime dependency tracking system, primarily
constructing URL objects. The render produced 22 identical ~9-second long
tasks (one per card deserialization), totaling ~200 seconds of blocked main
thread for a card with 23 linksToMany relationships.

Three optimizations applied:

1. trimModuleIdentifier (loader.ts): Replace `new URL(id).href` with string
   slice operations + a Map cache. Module identifiers are already full URL
   strings, so extension trimming only needs string ops. This was the single
   largest CPU consumer at 52.8% of active time (~5s per card).

2. collectKnownModuleDependencies (loader.ts): Cache the flattened dependency
   set per module identifier. Once a module is evaluated its consumedModules
   never change, so repeated graph walks for the same module return the cached
   result. This turns O(cards × modules) into O(modules).

3. trackRuntimeRelationshipModuleDependencies (card-api.gts): Track which
   modules have already had their full dep trees tracked and skip redundant
   getKnownConsumedModules() calls. This function was called on every linksTo
   field getter access during rendering, each time walking the full module
   dependency graph.

Additionally, normalizeModuleURL/normalizeInstanceURL/canonicalURL in
dependency-tracker.ts now use string operations instead of URL construction,
eliminating another hot source of URL() calls in the tracking pipeline.

Closes CS-10473

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix cached Set mutation and remove session-scoping issue in dep tracking

Address review feedback:
- getKnownConsumedModules: filter instead of delete to avoid mutating the
  cached Set returned by collectKnownModuleDependencies
- Remove trackedRelationshipModules skip cache from card-api.gts — it was
  process-global and not cleared between dependency tracking sessions,
  which could cause subsequent renders to under-report module deps. The
  Loader-level caching in collectKnownModuleDependencies already makes
  getKnownConsumedModules fast enough without a caller-side skip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants