New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parsing and retrieve additional data in REST url-details endpoint #31763
Conversation
|
Size Change: +234 kB (+14%) Total Size: 1.86 MB
ℹ️ View Unchanged
|
ef0fab7
to
791a621
Compare
|
Code seems good to me. I'm -1 on spoofing a user agent though. If a site owner wants to block WordPress from making that API call, that should be their prerogative. |
@TimothyBJacobs Fair point. I was finding that a couple of sites seemed to be blocking requests so I assumed if the request looked more like a typical/common user agent then it would be ok. We can revert as the functionality can still be achieved via a filter if folks want to use my method in their own Plugin/code. |
|
@getdave The above 3 suggestions take care of the remaining issues. Once committed, the PR should be ready for merge. |
Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
|
@hellofromtonya Thanks for these. All committed 🥳 Let's get a final 👍 from @TimothyBJacobs so I can merge this. |
|
Hey @TimothyBJacobs, |
|
@hellofromtonya One for a followup, but I've found an edge case where the icon is provided as a data URI: eg: <link href='' rel='icon' type='image/png'>We may need to account for situations such as these. |
@getdave Aw yes, the data URL. Most browsers block these for top-level navigation due to security concerns. But it seems okay for a favicon. Testing in Firefox and Chrome, both rendered this example: when directly loading the URL into the browser. UPDATE: |
|
@getdave Addressed the icon data URL in PR #32276 and cross-version PHP tested it https://3v4l.org/aaDc9. |
|
Code is looking good to me. I haven't dived deep into the regexes, but the tests look good. The schema needs to be updated to account for the new data we are returning. |
…oint (#31763) * Add basic regex to grab site icon * Retrieve meta description * Ensure cleanup * Improve title regex to account for possible attributes on title * Retrieve OG Image * Fix linting * Fix tests to assert on array subset * Enhance fixture data with more edge cases * Add tests to ensure new properties are captured for icon, description and image. * Add more specific yet flexible test for title * Handle relative resource URLs for icon and image * Use random user agent string to avoid being blocked by certain websites. * Account for open graph image property variations * Add unit test for get_title * Add tests (including some failing) for get_icon * Fix method invocation to remove unused args * Wrap test HTML string in a basic HTML doc. * Parse the head section and use for comparison * Fix broken cache test * Refine wrap method * Add get_image tests * Handle relative URLs when target url has a path * Improves title and icon parsing for PR 31763 (#32021) * Title: removes malformed opening tag pattern and adds tests. * Icon: Allows for different ordering of attribute. Adds happy and unhappy test data. * Icon: allow for any order or combination of attributes. How? Get the icon link element first. Then grab its href. Benefits: - Not dependent upon the order of attributes - Allows for optional or custom attributes * Icon: allows for single, double, or no quotes around attributes. * Update for WPCS standard. * Seek head but fallback to body. * Improves metadata parsing for PR 31763 (#32067) * Description: uses regex instead of tmp file. * Adding test to check for like tag before and after target. * Description: changes regex strategy. Why? Lookahead was not constrained with each element and thus picked up <meta from one and then if not a match, grabbed the name and content from another upstream. The new strategy parses all meta elements with a content attribute. Then loops through them to find the description element. Why this order? The content attribute can contain HTML tags. The > or /> symbol is matched as the end of the meta element (it's closing symbol). If this happens, the content is truncated. Boo. Switching the parsing order solves this problem. Bonus: allows for pre-parsing of all meta elements. Performance boost. * Refactors getting meta with content elements for reuse. * Improves getting <head>..</head> element. - Isolates to the only the <head>..</head> element by stripping all content before the opening tag and ensuring it includes a closing </head> tag. - Performance improvements: - Bails out early if no opening tag is found. - Uses native string functions instead of regex. * Image: use same parsing strategy as description. * Refactor to reuse the process for getting the metadata from the list of meta elements. * Convert description HTML entities into HTML. * Improves PR 31763 for the URL Details Controller (#32162) * Code standards and consistency. * Removed unused data provider. * More formatting and standards. * Title: converts entities. * Fixes asserts: removes deprecated array subset, uses assertSame, and makes consistent. * Fixes method return signatures. * Remove HTML and convert non-HTML entities. * Removes type check from set_cache as data will be string type.. * Update lib/class-wp-rest-url-details-controller.php Co-authored-by: Tonya Mork <hello@hellofromtonya.com> * Update lib/class-wp-rest-url-details-controller.php Co-authored-by: Tonya Mork <hello@hellofromtonya.com> * Update lib/class-wp-rest-url-details-controller.php Co-authored-by: Tonya Mork <hello@hellofromtonya.com> * Icon: if data url, skip relative-to-absolute conversion (#32276) * Fix failing test due to extra character in expected string. * Updates schema for new data items. * Changes icon and image type to uri. * Schema: icon & image: reverts type back to string and adds format of uri. Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
This PR is part of the effort to achieve the vision outlined in #31466. Namely the ability to show meta data about a remote URL when inserting a hyperlink using the block editor.
Following input from @hellofromtonya in #28791 (comment) this PR takes a regex based route to building on the original implementation from #18042 to add a better mechanism for parsing the information from the remote URL's response body HTML.
This helps to address feedback such as #27762 and also sets things up nicely for parsing additional detail from the remote URL (eg: favicon, Open Graph meta...etc).
This PR aims to:
It should also:
<head>tag.<title>tag (eg:<title data-rh="true">BBC - Home</title>).Note the Open Graph spec.
Closes #27762
Automated Testing Instructions
Run the test suite with
The tests themselves reside in
phpunit/class-wp-rest-url-details-controller-test.php.Manual Testing Instructions
npm run wp-env startto boot the local testing env.http://localhost:8889/wp-admin/and login.Option 1
consoleenter:...also try some non-English websites:
You should see a valid REST API response containing
wordpress.org(ie:"Blog Tool, Publishing Platform, and CMS \u2014 WordPress.org").Option 2
consoleenterwpApiSettings.nonce. You should see a valid nonce returned as a string.http://localhost:8889/wp-json/__experimental/url-details/?url=https://wordpress.org&_wpnonce=%YOUR_NONCE_HERE%- be sure to replace the nonce value with that copied from the previous step.wordpress.org(ie:"Blog Tool, Publishing Platform, and CMS \u2014 WordPress.org").