Skip to content

CSS: Fix string token value on special string escapes#227

Open
sirreal wants to merge 9 commits intotrunkfrom
fix/css-string-token-backslash-newline-misparse
Open

CSS: Fix string token value on special string escapes#227
sirreal wants to merge 9 commits intotrunkfrom
fix/css-string-token-backslash-newline-misparse

Conversation

@sirreal
Copy link
Copy Markdown
Member

@sirreal sirreal commented Apr 8, 2026

Summary

Note that the css test file diverged from the originals in these cases. For example, see string/0005 where the expected token value is foo like in this PR.

Also see #226 where bad-string is updated to use null token value. These changes will conflict and require an update assuming both are expected to land.

Test plan

CI passes. Visually confirm that updated and new tests are correct.

$this->token_starts_at,
$this->token_length
$this->token_length,
self::TOKEN_STRING === $this->token_type || self::TOKEN_BAD_STRING === $this->token_type
Copy link
Copy Markdown
Member Author

@sirreal sirreal Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diverges from #226 and will need

-			self::TOKEN_STRING === $this->token_type || self::TOKEN_BAD_STRING === $this->token_type
+			self::TOKEN_STRING === $this->token_type

@sirreal sirreal requested review from adamziel and Copilot and removed request for Copilot April 8, 2026 12:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CSS tokenizer’s escape decoding to follow the CSS Syntax spec’s special handling for string tokens, specifically fixing how \ + newline and \ + EOF contribute to decoded string values, and aligns the test corpus accordingly.

Changes:

  • Implement special string-token escape handling (\-newline → discard both, \-EOF → discard backslash) via decode_escapes(..., $string_escapes=true).
  • Update the JSON test corpus expected normalized/value fields for affected cases.
  • Add/extend PHPUnit coverage for string/URL/ident backslash-newline and backslash-EOF behaviors, and introduce a css PHPUnit group.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
components/DataLiberation/CSS/class-cssprocessor.php Renames and extends escape decoding to support string-token-specific escape rules; updates call sites.
components/DataLiberation/Tests/CSSProcessorTest.php Adds targeted regression tests for backslash-newline and backslash-EOF across string/URL/ident contexts; adds @group css.
components/DataLiberation/Tests/css-test-cases.json Fixes expected normalized/value outputs for string tokens containing backslash-newline and backslash-EOF.
components/DataLiberation/Tests/CSSUrlProcessorTest.php Adds @group css to allow running CSS-related tests via a PHPUnit group.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1635 to 1637
if ( $this->is_valid_escape( $at ) ) {
++$at;
$decoded .= $this->decode_escape_at( $at, $bytes_consumed );
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the decode_range() slow path, both the normal segments appended via substr($this->css, ...) earlier in the loop and the decode_escape_at() return value here are not passed through wp_scrub_utf8(). This makes output inconsistent with the fast path (which scrubs) and can leak invalid UTF-8 if the range contains escapes/CR/FF/NUL plus invalid bytes. Consider scrubbing the appended segments (including the decoded escape output) and adding a regression test that combines an invalid byte with a backslash escape.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSS Processor string token values mishandle backslash-newline

2 participants