Skip to content

fix: prevent iframe expansion failure on pages with Trusted Types CSP#3590

Merged
nikitachapovskii-dev merged 2 commits intomasterfrom
fix/parse-with-cheerio-trusted-types-csp
Apr 22, 2026
Merged

fix: prevent iframe expansion failure on pages with Trusted Types CSP#3590
nikitachapovskii-dev merged 2 commits intomasterfrom
fix/parse-with-cheerio-trusted-types-csp

Conversation

@nikitachapovskii-dev
Copy link
Copy Markdown
Contributor

Pages enforcing a Trusted Types Content Security Policy (e.g. Google Sheets) block any browser-side HTML string assignment — including innerHTML and DOMParser.parseFromString. The iframe expansion in parseWithCheerio used frame.evaluate() to inject iframe content into the browser DOM, which was silently blocked by CSP, causing iframe content to be dropped without any visible error.

The fix moves HTML assembly out of the browser entirely. page.content() is called first, loaded into Cheerio on the Node.js side, and iframe content (fetched via Playwright's Node.js API, which is unaffected by CSP) is substituted directly in the Cheerio tree.

Closes #3588

@nikitachapovskii-dev nikitachapovskii-dev self-assigned this Apr 22, 2026
@github-actions github-actions Bot added this to the 139th sprint - Tooling team milestone Apr 22, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 22, 2026
Copy link
Copy Markdown
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nikitachapovskii-dev ! The changes seem alright to me.

I have one idea regarding better logging in suspicious cases... but it's not imo strictly necessary ⬇️

const iframe = await frame.contentFrame();

if (iframe) {
if (iframe && cheerioIframes[index]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the Playwright -> Cheerio mapping, I'm thinking we could also somehow use the srcdoc / src attributes.

Image

Perhaps we could just compare the src(doc) from Playwright and from Cheerio in each step and log a warning if these differ? Wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just log a warning if the iframe arrays are of different lengths

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions!
The second option is probably better as it catches the realistic failure mode (page dynamically adds/removes an iframe) with zero overhead.

The src/srcdoc is more complex and requires an extra getAttribute call per iframe on every crawled page, while the diagnostic would almost never fire 😄

@nikitachapovskii-dev nikitachapovskii-dev merged commit c0b9b50 into master Apr 22, 2026
9 checks passed
@nikitachapovskii-dev nikitachapovskii-dev deleted the fix/parse-with-cheerio-trusted-types-csp branch April 22, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parseWithCheerio iframe expansion fails on pages with Trusted Types CSP

3 participants