Skip to content

Conversation

@adamziel
Copy link
Collaborator

@adamziel adamziel commented Oct 21, 2025

Description

Adds support for migrating URLs within CSS syntax in the style HTML attribute during WXR imports. For example, this markup:

<div style="background-image:url(https://oldsite.com/image.png)">

Would be rewritten as:

<div style="background-image:url(https://newsite.com/image.png)">

Motivation

When importing WordPress sites via WXR, URLs embedded in CSS (like background: url("/old-site.com/image.jpg")) need to be migrated to the new site. Previously, these URLs were missed, leading to broken images and assets after import.

Cover blocks are a good example. Without this PR, the background image in this cover block would not be rewritten:

<!-- wp:cover {"url":"http://localhost:8881/wp-content/image.jpg"}} -->
<div style="background-position:50% 50%;background-image:url(http://localhost:8881/wp-content/uploads/2025/09/image-2-766x1024.jpeg)">

Implementation

The implementation introduces a new CSSUrlProcessor class that can parse CSS url() functions, handle CSS escape sequences, and efficiently skip over large data URIs. It uses the same design principles as WP_HTML_Tag_Processor: simple state-machine API, no regexps, minimal allocations. The CSSUrlProcessor is integrated with BlockMarkupURLProcessor and can be used as follows:

$markup = '<div style="background: url(&quot;/old.jpg&quot;)">Content</div>';
$processor = new BlockMarkupUrlProcessor( $markup, 'https://new-site.com' );

while ( $processor->next_url() ) {
    // Finds "/old.jpg" in the style attribute
    $processor->set_raw_url( '/new.jpg' );
}

echo $processor->get_updated_html();
// Output: <div style="background: url(&quot;/new.jpg&quot;)">Content</div>

Testing instructions

  • Review thoroughly
  • Confirm the CI tests pass

@adamziel adamziel added the enhancement New feature or request label Oct 21, 2025
@adamziel adamziel marked this pull request as ready for review October 22, 2025 23:39
@adamziel
Copy link
Collaborator Author

This seems to be looking good.

@adamziel
Copy link
Collaborator Author

@dmsnell would you mind taking a look at this one?

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping, @adamziel. I started doing review but it quickly turned into constant guessing which code was intentional and which was generated, making for a dizzying review experience. I hope I haven’t misjudged, but there are a number of fairly evident questions with the code.

It would help me to know how much effort you put into reviewing this so I can better asses the level of attention to give it.

It’s nice having the URLs inside of style attribute values remapped.

@adamziel
Copy link
Collaborator Author

adamziel commented Oct 23, 2025

Thank you @dmsnell! I've spent some time simplifying the initial LLM implementation, but it seems like I haven't gone deeply enough. You've asked some great questions that I should have caught earlier. Let me do another pass the slow, methodical way before asking you for another review.

@adamziel
Copy link
Collaborator Author

adamziel commented Nov 2, 2025

With the dedicated CSSProcessor class, this PR became much simpler. It's now a pretty natural extension of the BlockMarkupURLProcessor that merely adds another specialized handler for another subsyntax. Thank you @dmsnell for reviewing and giving me that extra push!

@adamziel adamziel merged commit 5303dfb into trunk Nov 3, 2025
22 checks passed
adamziel added a commit to WordPress/wordpress-importer that referenced this pull request Nov 4, 2025
Adds support for rewriting URLs inside CSS syntax, e.g. here:

```html
<div style="background-image:url(/wp-content/uploads/2025/09/image-2-766x1024.jpeg)">
```

Before this PR, the `style` attributes in, e.g., the cover block were skipped by the URL rewriter and continued pointing to the old site.

Fixes #223

## Implementation details

This PR backports `CSSProcessor`, `CSSURLProcessor`, and a few related PRs around Unicode handling from the WordPress/php-toolkit repo:

* WordPress/php-toolkit#197
* WordPress/php-toolkit#195
* WordPress/php-toolkit#199
* WordPress/php-toolkit#200
* WordPress/php-toolkit#201
* WordPress/php-toolkit#202

Note the CSSProcessor and CSSURLProcessor are tested against 300 test cases containing various tricky inputs, quoted and unquoted URLs, strings, comments, unicode escape sequences, and more.

## Testing instructions

This PR comes with a new test case specifically for various tricky CSS inputs. You're also welcome to try and import a WXR file that contains an inline background-image reference and confirm the URL is correctly rewritten.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants