Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing and retrieve additional data in REST url-details endpoint #31763

Merged
merged 34 commits into from May 29, 2021

Conversation

getdave
Copy link
Contributor

@getdave getdave commented May 12, 2021

This PR is part of the effort to achieve the vision outlined in #31466. Namely the ability to show meta data about a remote URL when inserting a hyperlink using the block editor.

Following input from @hellofromtonya in #28791 (comment) this PR takes a regex based route to building on the original implementation from #18042 to add a better mechanism for parsing the information from the remote URL's response body HTML.

This helps to address feedback such as #27762 and also sets things up nicely for parsing additional detail from the remote URL (eg: favicon, Open Graph meta...etc).

This PR aims to:

  • Parse the site icon from the HTML in order to retrieve a usable site icon.
  • Parse a representative image from the HTML to use as a preview.
  • Parse a description from the HTML to serve as a longer description.

It should also:

  • Improve the means by which we parse regex - for example look only in the <head> tag.
  • Improve parsing of a title to account for edge cases where an attribute is added to the <title> tag (eg: <title data-rh="true">BBC - Home</title>).

Note the Open Graph spec.

Closes #27762

Automated Testing Instructions

Run the test suite with

wp-env run phpunit 'phpunit -c /var/www/html/wp-content/plugins/gutenberg/phpunit.xml.dist --verbose --filter WP_REST_URL_Details_Controller_Test'

The tests themselves reside in phpunit/class-wp-rest-url-details-controller-test.php.

Manual Testing Instructions

  • Check out this PR.
  • Run npm run wp-env start to boot the local testing env.
  • Open the Gutenberg local testing env at http://localhost:8889/wp-admin/ and login.
  • Create a new Post.
  • Choose one of the following options:

Option 1

  • Open devtools and in the console enter:
wp.apiFetch( { path: '/__experimental/url-details/?url=https://wordpress.org' } ).then( data => {
    console.log( data );
} );

...also try some non-English websites:

wp.apiFetch( { path: '/__experimental/url-details/?url=http://www.baidu.com' } ).then( data => {
    console.log( data );
} );

You should see a valid REST API response containing

  • the contents of the title tag from wordpress.org (ie: "Blog Tool, Publishing Platform, and CMS \u2014 WordPress.org").
  • the favicon URL from WordPress.org

Option 2

  • Open devtools and in the console enter wpApiSettings.nonce. You should see a valid nonce returned as a string.
  • Copy the nonce.
  • In your browser goto: http://localhost:8889/wp-json/__experimental/url-details/?url=https://wordpress.org&_wpnonce=%YOUR_NONCE_HERE% - be sure to replace the nonce value with that copied from the previous step.
  • You should see a valid REST API response containing
    • the contents of the title tag from wordpress.org (ie: "Blog Tool, Publishing Platform, and CMS \u2014 WordPress.org").
    • the favicon URL from WordPress.org

@github-actions
Copy link

github-actions bot commented May 12, 2021

Size Change: +234 kB (+14%) ⚠️

Total Size: 1.86 MB

Filename Size Change
build/annotations/index.js 2.93 kB +2 B (0%)
build/api-fetch/index.js 2.42 kB +1 B (0%)
build/block-editor/index.js 119 kB +3 B (0%)
build/block-editor/style-rtl.css 12.9 kB +83 B (+1%)
build/block-editor/style.css 12.9 kB +83 B (+1%)
build/block-library/blocks/navigation/style-rtl.css 1.8 kB -4 B (0%)
build/block-library/blocks/navigation/style.css 1.8 kB -3 B (0%)
build/block-library/blocks/post-featured-image/style-rtl.css 141 B +22 B (+18%) ⚠️
build/block-library/blocks/post-featured-image/style.css 141 B +22 B (+18%) ⚠️
build/block-library/index.js 147 kB -109 B (0%)
build/block-library/style-rtl.css 10.3 kB +33 B (0%)
build/block-library/style.css 10.3 kB +34 B (0%)
build/blocks/index.js 47.2 kB +13 B (0%)
build/components/index.js 189 kB +780 B (0%)
build/compose/index.js 10 kB +77 B (+1%)
build/core-data/index.js 12.1 kB +3 B (0%)
build/customize-widgets/index.js 43.1 kB -9 B (0%)
build/customize-widgets/style-rtl.css 1.42 kB +38 B (+3%)
build/customize-widgets/style.css 1.42 kB +38 B (+3%)
build/data/index.js 7.23 kB -1 B (0%)
build/edit-navigation/index.js 13.8 kB -123 B (-1%)
build/edit-post/index.js 571 kB +236 kB (+70%) 🆘
build/edit-site/index.js 25.7 kB -61 B (0%)
build/edit-site/style-rtl.css 4.75 kB -4 B (0%)
build/edit-site/style.css 4.75 kB -2 B (0%)
build/edit-widgets/index.js 292 kB -37 B (0%)
build/editor/index.js 38.3 kB -34 B (0%)
build/format-library/index.js 5.67 kB +3 B (0%)
build/keyboard-shortcuts/index.js 1.65 kB +2 B (0%)
build/navigation/index.js 0 B -2.85 kB (removed) 🏆
build/nux/index.js 2.31 kB -1 B (0%)
build/plugins/index.js 1.99 kB -1 B (0%)
build/reusable-blocks/index.js 2.53 kB -9 B (0%)
build/rich-text/index.js 10.6 kB -22 B (0%)
build/server-side-render/index.js 1.63 kB -2 B (0%)
build/widgets/index.js 1.66 kB +1 B (0%)
ℹ️ View Unchanged
Filename Size Change
build/a11y/index.js 1.12 kB 0 B
build/autop/index.js 2.28 kB 0 B
build/blob/index.js 673 B 0 B
build/block-directory/index.js 6.61 kB 0 B
build/block-directory/style-rtl.css 989 B 0 B
build/block-directory/style.css 990 B 0 B
build/block-library/blocks/archives/editor-rtl.css 61 B 0 B
build/block-library/blocks/archives/editor.css 60 B 0 B
build/block-library/blocks/archives/style-rtl.css 65 B 0 B
build/block-library/blocks/archives/style.css 65 B 0 B
build/block-library/blocks/audio/editor-rtl.css 58 B 0 B
build/block-library/blocks/audio/editor.css 58 B 0 B
build/block-library/blocks/audio/style-rtl.css 112 B 0 B
build/block-library/blocks/audio/style.css 112 B 0 B
build/block-library/blocks/block/editor-rtl.css 161 B 0 B
build/block-library/blocks/block/editor.css 161 B 0 B
build/block-library/blocks/button/editor-rtl.css 475 B 0 B
build/block-library/blocks/button/editor.css 474 B 0 B
build/block-library/blocks/button/style-rtl.css 603 B 0 B
build/block-library/blocks/button/style.css 602 B 0 B
build/block-library/blocks/buttons/editor-rtl.css 315 B 0 B
build/block-library/blocks/buttons/editor.css 315 B 0 B
build/block-library/blocks/buttons/style-rtl.css 375 B 0 B
build/block-library/blocks/buttons/style.css 375 B 0 B
build/block-library/blocks/calendar/style-rtl.css 208 B 0 B
build/block-library/blocks/calendar/style.css 208 B 0 B
build/block-library/blocks/categories/editor-rtl.css 84 B 0 B
build/block-library/blocks/categories/editor.css 83 B 0 B
build/block-library/blocks/categories/style-rtl.css 79 B 0 B
build/block-library/blocks/categories/style.css 79 B 0 B
build/block-library/blocks/code/style-rtl.css 90 B 0 B
build/block-library/blocks/code/style.css 90 B 0 B
build/block-library/blocks/columns/editor-rtl.css 190 B 0 B
build/block-library/blocks/columns/editor.css 190 B 0 B
build/block-library/blocks/columns/style-rtl.css 422 B 0 B
build/block-library/blocks/columns/style.css 422 B 0 B
build/block-library/blocks/cover/editor-rtl.css 644 B 0 B
build/block-library/blocks/cover/editor.css 646 B 0 B
build/block-library/blocks/cover/style-rtl.css 1.22 kB 0 B
build/block-library/blocks/cover/style.css 1.23 kB 0 B
build/block-library/blocks/embed/editor-rtl.css 486 B 0 B
build/block-library/blocks/embed/editor.css 486 B 0 B
build/block-library/blocks/embed/style-rtl.css 401 B 0 B
build/block-library/blocks/embed/style.css 400 B 0 B
build/block-library/blocks/file/editor-rtl.css 301 B 0 B
build/block-library/blocks/file/editor.css 300 B 0 B
build/block-library/blocks/file/frontend.js 771 B 0 B
build/block-library/blocks/file/style-rtl.css 255 B 0 B
build/block-library/blocks/file/style.css 255 B 0 B
build/block-library/blocks/freeform/editor-rtl.css 2.44 kB 0 B
build/block-library/blocks/freeform/editor.css 2.44 kB 0 B
build/block-library/blocks/gallery/editor-rtl.css 704 B 0 B
build/block-library/blocks/gallery/editor.css 705 B 0 B
build/block-library/blocks/gallery/style-rtl.css 1.06 kB 0 B
build/block-library/blocks/gallery/style.css 1.06 kB 0 B
build/block-library/blocks/group/editor-rtl.css 160 B 0 B
build/block-library/blocks/group/editor.css 160 B 0 B
build/block-library/blocks/group/style-rtl.css 57 B 0 B
build/block-library/blocks/group/style.css 57 B 0 B
build/block-library/blocks/heading/editor-rtl.css 129 B 0 B
build/block-library/blocks/heading/editor.css 129 B 0 B
build/block-library/blocks/heading/style-rtl.css 76 B 0 B
build/block-library/blocks/heading/style.css 76 B 0 B
build/block-library/blocks/home-link/style-rtl.css 259 B 0 B
build/block-library/blocks/home-link/style.css 259 B 0 B
build/block-library/blocks/html/editor-rtl.css 281 B 0 B
build/block-library/blocks/html/editor.css 281 B 0 B
build/block-library/blocks/image/editor-rtl.css 717 B 0 B
build/block-library/blocks/image/editor.css 716 B 0 B
build/block-library/blocks/image/style-rtl.css 481 B 0 B
build/block-library/blocks/image/style.css 485 B 0 B
build/block-library/blocks/latest-comments/style-rtl.css 281 B 0 B
build/block-library/blocks/latest-comments/style.css 282 B 0 B
build/block-library/blocks/latest-posts/editor-rtl.css 137 B 0 B
build/block-library/blocks/latest-posts/editor.css 137 B 0 B
build/block-library/blocks/latest-posts/style-rtl.css 523 B 0 B
build/block-library/blocks/latest-posts/style.css 522 B 0 B
build/block-library/blocks/legacy-widget/editor-rtl.css 557 B 0 B
build/block-library/blocks/legacy-widget/editor.css 557 B 0 B
build/block-library/blocks/list/style-rtl.css 63 B 0 B
build/block-library/blocks/list/style.css 63 B 0 B
build/block-library/blocks/media-text/editor-rtl.css 176 B 0 B
build/block-library/blocks/media-text/editor.css 176 B 0 B
build/block-library/blocks/media-text/style-rtl.css 492 B 0 B
build/block-library/blocks/media-text/style.css 489 B 0 B
build/block-library/blocks/more/editor-rtl.css 434 B 0 B
build/block-library/blocks/more/editor.css 434 B 0 B
build/block-library/blocks/navigation-link/editor-rtl.css 633 B 0 B
build/block-library/blocks/navigation-link/editor.css 634 B 0 B
build/block-library/blocks/navigation-link/style-rtl.css 94 B 0 B
build/block-library/blocks/navigation-link/style.css 94 B 0 B
build/block-library/blocks/navigation/editor-rtl.css 1.54 kB 0 B
build/block-library/blocks/navigation/editor.css 1.54 kB 0 B
build/block-library/blocks/navigation/frontend.js 2.85 kB 0 B
build/block-library/blocks/nextpage/editor-rtl.css 395 B 0 B
build/block-library/blocks/nextpage/editor.css 395 B 0 B
build/block-library/blocks/page-list/editor-rtl.css 310 B 0 B
build/block-library/blocks/page-list/editor.css 311 B 0 B
build/block-library/blocks/page-list/style-rtl.css 233 B 0 B
build/block-library/blocks/page-list/style.css 233 B 0 B
build/block-library/blocks/paragraph/editor-rtl.css 157 B 0 B
build/block-library/blocks/paragraph/editor.css 157 B 0 B
build/block-library/blocks/paragraph/style-rtl.css 247 B 0 B
build/block-library/blocks/paragraph/style.css 248 B 0 B
build/block-library/blocks/post-author/editor-rtl.css 209 B 0 B
build/block-library/blocks/post-author/editor.css 209 B 0 B
build/block-library/blocks/post-author/style-rtl.css 183 B 0 B
build/block-library/blocks/post-author/style.css 184 B 0 B
build/block-library/blocks/post-comments-form/style-rtl.css 140 B 0 B
build/block-library/blocks/post-comments-form/style.css 140 B 0 B
build/block-library/blocks/post-comments/style-rtl.css 360 B 0 B
build/block-library/blocks/post-comments/style.css 359 B 0 B
build/block-library/blocks/post-content/editor-rtl.css 139 B 0 B
build/block-library/blocks/post-content/editor.css 139 B 0 B
build/block-library/blocks/post-excerpt/editor-rtl.css 73 B 0 B
build/block-library/blocks/post-excerpt/editor.css 73 B 0 B
build/block-library/blocks/post-excerpt/style-rtl.css 69 B 0 B
build/block-library/blocks/post-excerpt/style.css 69 B 0 B
build/block-library/blocks/post-featured-image/editor-rtl.css 338 B 0 B
build/block-library/blocks/post-featured-image/editor.css 338 B 0 B
build/block-library/blocks/post-title/style-rtl.css 60 B 0 B
build/block-library/blocks/post-title/style.css 60 B 0 B
build/block-library/blocks/preformatted/style-rtl.css 103 B 0 B
build/block-library/blocks/preformatted/style.css 103 B 0 B
build/block-library/blocks/pullquote/editor-rtl.css 183 B 0 B
build/block-library/blocks/pullquote/editor.css 183 B 0 B
build/block-library/blocks/pullquote/style-rtl.css 318 B 0 B
build/block-library/blocks/pullquote/style.css 318 B 0 B
build/block-library/blocks/query-loop/editor-rtl.css 98 B 0 B
build/block-library/blocks/query-loop/editor.css 97 B 0 B
build/block-library/blocks/query-loop/style-rtl.css 315 B 0 B
build/block-library/blocks/query-loop/style.css 317 B 0 B
build/block-library/blocks/query-pagination-numbers/editor-rtl.css 122 B 0 B
build/block-library/blocks/query-pagination-numbers/editor.css 121 B 0 B
build/block-library/blocks/query-pagination/editor-rtl.css 270 B 0 B
build/block-library/blocks/query-pagination/editor.css 262 B 0 B
build/block-library/blocks/query-pagination/style-rtl.css 168 B 0 B
build/block-library/blocks/query-pagination/style.css 168 B 0 B
build/block-library/blocks/query-title/editor-rtl.css 86 B 0 B
build/block-library/blocks/query-title/editor.css 86 B 0 B
build/block-library/blocks/query/editor-rtl.css 131 B 0 B
build/block-library/blocks/query/editor.css 132 B 0 B
build/block-library/blocks/quote/style-rtl.css 169 B 0 B
build/block-library/blocks/quote/style.css 169 B 0 B
build/block-library/blocks/rss/editor-rtl.css 201 B 0 B
build/block-library/blocks/rss/editor.css 202 B 0 B
build/block-library/blocks/rss/style-rtl.css 290 B 0 B
build/block-library/blocks/rss/style.css 290 B 0 B
build/block-library/blocks/search/editor-rtl.css 189 B 0 B
build/block-library/blocks/search/editor.css 189 B 0 B
build/block-library/blocks/search/style-rtl.css 359 B 0 B
build/block-library/blocks/search/style.css 362 B 0 B
build/block-library/blocks/separator/editor-rtl.css 99 B 0 B
build/block-library/blocks/separator/editor.css 99 B 0 B
build/block-library/blocks/separator/style-rtl.css 251 B 0 B
build/block-library/blocks/separator/style.css 251 B 0 B
build/block-library/blocks/shortcode/editor-rtl.css 512 B 0 B
build/block-library/blocks/shortcode/editor.css 512 B 0 B
build/block-library/blocks/site-logo/editor-rtl.css 440 B 0 B
build/block-library/blocks/site-logo/editor.css 441 B 0 B
build/block-library/blocks/site-logo/style-rtl.css 154 B 0 B
build/block-library/blocks/site-logo/style.css 154 B 0 B
build/block-library/blocks/social-link/editor-rtl.css 164 B 0 B
build/block-library/blocks/social-link/editor.css 165 B 0 B
build/block-library/blocks/social-links/editor-rtl.css 800 B 0 B
build/block-library/blocks/social-links/editor.css 799 B 0 B
build/block-library/blocks/social-links/style-rtl.css 1.32 kB 0 B
build/block-library/blocks/social-links/style.css 1.33 kB 0 B
build/block-library/blocks/spacer/editor-rtl.css 308 B 0 B
build/block-library/blocks/spacer/editor.css 308 B 0 B
build/block-library/blocks/spacer/style-rtl.css 48 B 0 B
build/block-library/blocks/spacer/style.css 48 B 0 B
build/block-library/blocks/table/editor-rtl.css 478 B 0 B
build/block-library/blocks/table/editor.css 478 B 0 B
build/block-library/blocks/table/style-rtl.css 480 B 0 B
build/block-library/blocks/table/style.css 480 B 0 B
build/block-library/blocks/tag-cloud/editor-rtl.css 118 B 0 B
build/block-library/blocks/tag-cloud/editor.css 118 B 0 B
build/block-library/blocks/tag-cloud/style-rtl.css 94 B 0 B
build/block-library/blocks/tag-cloud/style.css 94 B 0 B
build/block-library/blocks/template-part/editor-rtl.css 551 B 0 B
build/block-library/blocks/template-part/editor.css 550 B 0 B
build/block-library/blocks/term-description/editor-rtl.css 90 B 0 B
build/block-library/blocks/term-description/editor.css 90 B 0 B
build/block-library/blocks/text-columns/editor-rtl.css 95 B 0 B
build/block-library/blocks/text-columns/editor.css 95 B 0 B
build/block-library/blocks/text-columns/style-rtl.css 166 B 0 B
build/block-library/blocks/text-columns/style.css 166 B 0 B
build/block-library/blocks/verse/style-rtl.css 87 B 0 B
build/block-library/blocks/verse/style.css 87 B 0 B
build/block-library/blocks/video/editor-rtl.css 569 B 0 B
build/block-library/blocks/video/editor.css 570 B 0 B
build/block-library/blocks/video/style-rtl.css 173 B 0 B
build/block-library/blocks/video/style.css 173 B 0 B
build/block-library/common-rtl.css 1.26 kB 0 B
build/block-library/common.css 1.26 kB 0 B
build/block-library/editor-rtl.css 9.93 kB 0 B
build/block-library/editor.css 9.92 kB 0 B
build/block-library/reset-rtl.css 506 B 0 B
build/block-library/reset.css 507 B 0 B
build/block-library/theme-rtl.css 692 B 0 B
build/block-library/theme.css 693 B 0 B
build/block-serialization-default-parser/index.js 1.29 kB 0 B
build/block-serialization-spec-parser/index.js 3.06 kB 0 B
build/components/style-rtl.css 16.2 kB 0 B
build/components/style.css 16.2 kB 0 B
build/data-controls/index.js 829 B 0 B
build/date/index.js 31.8 kB 0 B
build/deprecated/index.js 739 B 0 B
build/dom-ready/index.js 577 B 0 B
build/dom/index.js 4.62 kB 0 B
build/edit-navigation/style-rtl.css 3.08 kB 0 B
build/edit-navigation/style.css 3.08 kB 0 B
build/edit-post/classic-rtl.css 454 B 0 B
build/edit-post/classic.css 454 B 0 B
build/edit-post/style-rtl.css 6.81 kB 0 B
build/edit-post/style.css 6.8 kB 0 B
build/edit-widgets/style-rtl.css 3.46 kB 0 B
build/edit-widgets/style.css 3.47 kB 0 B
build/editor/style-rtl.css 3.92 kB 0 B
build/editor/style.css 3.91 kB 0 B
build/element/index.js 3.44 kB 0 B
build/escape-html/index.js 739 B 0 B
build/format-library/style-rtl.css 637 B 0 B
build/format-library/style.css 639 B 0 B
build/hooks/index.js 1.76 kB 0 B
build/html-entities/index.js 627 B 0 B
build/i18n/index.js 3.73 kB 0 B
build/is-shallow-equal/index.js 710 B 0 B
build/keycodes/index.js 1.43 kB 0 B
build/list-reusable-blocks/index.js 2.06 kB 0 B
build/list-reusable-blocks/style-rtl.css 629 B 0 B
build/list-reusable-blocks/style.css 628 B 0 B
build/media-utils/index.js 3.08 kB 0 B
build/notices/index.js 1.07 kB 0 B
build/nux/style-rtl.css 718 B 0 B
build/nux/style.css 716 B 0 B
build/primitives/index.js 1.03 kB 0 B
build/priority-queue/index.js 791 B 0 B
build/react-i18n/index.js 923 B 0 B
build/redux-routine/index.js 2.82 kB 0 B
build/reusable-blocks/style-rtl.css 225 B 0 B
build/reusable-blocks/style.css 225 B 0 B
build/shortcode/index.js 1.68 kB 0 B
build/token-list/index.js 846 B 0 B
build/url/index.js 1.95 kB 0 B
build/viewport/index.js 1.28 kB 0 B
build/warning/index.js 1.13 kB 0 B
build/wordcount/index.js 1.24 kB 0 B

compressed-size-action

@getdave getdave changed the title Add basic regex to grab site icon Improve parsing and retrieve additional data in REST url-details endpoint May 12, 2021
@getdave getdave force-pushed the try/retrieve-more-data-from-url-details-api branch 2 times, most recently from ef0fab7 to 791a621 Compare May 19, 2021 10:50
@getdave getdave marked this pull request as ready for review May 19, 2021 14:07
@TimothyBJacobs
Copy link
Member

Code seems good to me. I'm -1 on spoofing a user agent though. If a site owner wants to block WordPress from making that API call, that should be their prerogative.

@getdave
Copy link
Contributor Author

getdave commented May 25, 2021

I'm -1 on spoofing a user agent though. If a site owner wants to block WordPress from making that API call, that should be their prerogative.

@TimothyBJacobs Fair point. I was finding that a couple of sites seemed to be blocking requests so I assumed if the request looked more like a typical/common user agent then it would be ok.

We can revert as the functionality can still be achieved via a filter if folks want to use my method in their own Plugin/code.

@hellofromtonya
Copy link
Contributor

@getdave The above 3 suggestions take care of the remaining issues. Once committed, the PR should be ready for merge.

getdave and others added 3 commits May 26, 2021 15:32
Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
@getdave
Copy link
Contributor Author

getdave commented May 26, 2021

@hellofromtonya Thanks for these. All committed 🥳

Let's get a final 👍 from @TimothyBJacobs so I can merge this.

@hellofromtonya
Copy link
Contributor

Hey @TimothyBJacobs, user-agent force code is now removed. Appreciate a code review and, if 👍 , approval.

@getdave
Copy link
Contributor Author

getdave commented May 27, 2021

@hellofromtonya One for a followup, but I've found an edge case where the icon is provided as a data URI:

eg:

<link href='data:image/png;base64,iVBORw0KGgo=' rel='icon' type='image/png'>

We may need to account for situations such as these.

@hellofromtonya
Copy link
Contributor

hellofromtonya commented May 27, 2021

We may need to account for situations such as these.

@getdave Aw yes, the data URL. What do you foresee we need to account for, i.e. special handling needs?

Most browsers block these for top-level navigation due to security concerns. But it seems okay for a favicon. Testing in Firefox and Chrome, both rendered this example:

data:image/x-icon;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQEAYAAABPYyMiAAAABmJLR0T///////8JWPfcAAAACXBIWXMAAABIAAAASABGyWs+AAAAF0lEQVRIx2NgGAWjYBSMglEwCkbBSAcACBAAAeaR9cIAAAAASUVORK5CYII=

when directly loading the URL into the browser.

UPDATE:
I see. Need to skip over the relative to absolute URL code. On it.

@hellofromtonya
Copy link
Contributor

@getdave Addressed the icon data URL in PR #32276 and cross-version PHP tested it https://3v4l.org/aaDc9.

@TimothyBJacobs
Copy link
Member

Code is looking good to me. I haven't dived deep into the regexes, but the tests look good.

The schema needs to be updated to account for the new data we are returning.

@getdave getdave merged commit 40f058b into trunk May 29, 2021
@getdave getdave deleted the try/retrieve-more-data-from-url-details-api branch May 29, 2021 10:39
@github-actions github-actions bot added this to the Gutenberg 10.8 milestone May 29, 2021
vcanales pushed a commit that referenced this pull request Jul 7, 2021
…oint (#31763)

* Add basic regex to grab site icon

* Retrieve meta description

* Ensure cleanup

* Improve title regex to account for possible attributes on title

* Retrieve OG Image

* Fix linting

* Fix tests to assert on array subset

* Enhance fixture data with more edge cases

* Add tests to ensure new properties are captured for icon, description and image.

* Add more specific yet flexible test for title

* Handle relative resource URLs for icon and image

* Use random user agent string to avoid being blocked by certain websites.

* Account for open graph image property variations

* Add unit test for get_title

* Add tests (including some failing) for get_icon

* Fix method invocation to remove unused args

* Wrap test HTML string in a basic HTML doc.

* Parse the head section and use for comparison

* Fix broken cache test

* Refine wrap method

* Add get_image tests

* Handle relative URLs when target url has a path

* Improves title and icon parsing for PR 31763 (#32021)

* Title: removes malformed opening tag pattern and adds tests.

* Icon: Allows for different ordering of attribute. Adds happy
and unhappy test data.

* Icon: allow for any order or combination of attributes.

How?

Get the icon link element first.
Then grab its href.

Benefits:
- Not dependent upon the order of attributes
- Allows for optional or custom attributes

* Icon: allows for single, double, or no quotes around attributes.

* Update for WPCS standard.

* Seek head but fallback to body.

* Improves metadata parsing for PR 31763 (#32067)

* Description: uses regex instead of tmp file.

* Adding test to check for like tag before and after target.

* Description: changes regex strategy.

Why?

Lookahead was not constrained with each element and thus picked up
<meta from one and then if not a match, grabbed the name and content
from another upstream.

The new strategy parses all meta elements with a content
attribute. Then loops through them to find the description element.

Why this order?
The content attribute can contain HTML tags. The > or /> symbol is
matched as the end of the meta element (it's closing symbol). If
this happens, the content is truncated. Boo.

Switching the parsing order solves this problem.

Bonus: allows for pre-parsing of all meta elements. Performance boost.

* Refactors getting meta with content elements for reuse.

* Improves getting <head>..</head> element.

- Isolates to the only the <head>..</head> element by stripping
  all content before the opening tag and ensuring it includes a closing
  </head> tag.
- Performance improvements:
   - Bails out early if no opening tag is found.
   - Uses native string functions instead of regex.

* Image: use same parsing strategy as description.

* Refactor to reuse the process for getting the metadata from the list of meta elements.

* Convert description HTML entities into HTML.

* Improves PR 31763 for the URL Details Controller (#32162)

* Code standards and consistency.

* Removed unused data provider.

* More formatting and standards.

* Title: converts entities.

* Fixes asserts: removes deprecated array subset, uses assertSame, and makes consistent.

* Fixes method return signatures.

* Remove HTML and convert non-HTML entities.

* Removes type check from set_cache as data will be string type..

* Update lib/class-wp-rest-url-details-controller.php

Co-authored-by: Tonya Mork <hello@hellofromtonya.com>

* Update lib/class-wp-rest-url-details-controller.php

Co-authored-by: Tonya Mork <hello@hellofromtonya.com>

* Update lib/class-wp-rest-url-details-controller.php

Co-authored-by: Tonya Mork <hello@hellofromtonya.com>

* Icon: if data url, skip relative-to-absolute conversion (#32276)

* Fix failing test due to extra character in expected string.

* Updates schema for new data items.

* Changes icon and image type to uri.

* Schema: icon & image: reverts type back to string and adds format of uri.

Co-authored-by: Tonya Mork <hello@hellofromtonya.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
REST API Interaction Related to REST API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

URL Details endpoint follow up - allow for arbitrary attributes on the <title>?
3 participants