CSS: Fix string token value on special string escapes#227
CSS: Fix string token value on special string escapes#227
Conversation
| $this->token_starts_at, | ||
| $this->token_length | ||
| $this->token_length, | ||
| self::TOKEN_STRING === $this->token_type || self::TOKEN_BAD_STRING === $this->token_type |
There was a problem hiding this comment.
This diverges from #226 and will need
- self::TOKEN_STRING === $this->token_type || self::TOKEN_BAD_STRING === $this->token_type
+ self::TOKEN_STRING === $this->token_typeThere was a problem hiding this comment.
Pull request overview
This PR updates the CSS tokenizer’s escape decoding to follow the CSS Syntax spec’s special handling for string tokens, specifically fixing how \ + newline and \ + EOF contribute to decoded string values, and aligns the test corpus accordingly.
Changes:
- Implement special string-token escape handling (
\-newline → discard both,\-EOF → discard backslash) viadecode_escapes(..., $string_escapes=true). - Update the JSON test corpus expected
normalized/valuefields for affected cases. - Add/extend PHPUnit coverage for string/URL/ident backslash-newline and backslash-EOF behaviors, and introduce a
cssPHPUnit group.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| components/DataLiberation/CSS/class-cssprocessor.php | Renames and extends escape decoding to support string-token-specific escape rules; updates call sites. |
| components/DataLiberation/Tests/CSSProcessorTest.php | Adds targeted regression tests for backslash-newline and backslash-EOF across string/URL/ident contexts; adds @group css. |
| components/DataLiberation/Tests/css-test-cases.json | Fixes expected normalized/value outputs for string tokens containing backslash-newline and backslash-EOF. |
| components/DataLiberation/Tests/CSSUrlProcessorTest.php | Adds @group css to allow running CSS-related tests via a PHPUnit group. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( $this->is_valid_escape( $at ) ) { | ||
| ++$at; | ||
| $decoded .= $this->decode_escape_at( $at, $bytes_consumed ); |
There was a problem hiding this comment.
In the decode_range() slow path, both the normal segments appended via substr($this->css, ...) earlier in the loop and the decode_escape_at() return value here are not passed through wp_scrub_utf8(). This makes output inconsistent with the fast path (which scrubs) and can leak invalid UTF-8 if the range contains escapes/CR/FF/NUL plus invalid bytes. Consider scrubbing the appended segments (including the decoded escape output) and adding a regression test that combines an invalid byte with a backslash escape.
Summary
\-newline and\-EOF handling in CSS string token value decoding (fixes CSS Processor string token values mishandle backslash-newline #222, CSS Processor string token values mishandle backslash-EOF #223)decode_string_or_url→decode_escapeswith a new$string_escapesparameter for CSS string token rules (§4.3.5):\-newline discards both characters (line continuation),\-EOF discards the backslashcsstest group to allowvendor/bin/phpunit --group cssNote that the css test file diverged from the originals in these cases. For example, see string/0005 where the expected token value is
foolike in this PR.Also see #226 where bad-string is updated to use
nulltoken value. These changes will conflict and require an update assuming both are expected to land.Test plan
CI passes. Visually confirm that updated and new tests are correct.