Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve URL handling in Optimization Detective #1043

Merged
merged 5 commits into from Mar 19, 2024

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Mar 10, 2024

  • Compute the current URL server-side instead of letting client specify the URL in JavaScript. Previously there was no validation that the supplied URL is actually valid or that it corresponds to the server-computed slug from the normalized query vars. True the URL is not used except as metadata, but it was not ideal that proper validation was not being performed. Now the validity of the supplied URL is guaranteed as it is used alongside the slug in the computation of the nonce.
  • Each stored URL metric now includes the original URL for which the data was gathered, being stored alongside the viewport, timestamp, and elements. The URL is also stored as the post_title of the od-url-metrics post so there is apparently duplication with the URLs stored in the post_content. However, this may not be the case since the URL metrics being stored the post are grouped because they share the same normalized query vars. Each of the URL metrics have have a slightly different actual URL (although they should share the same canonical URL). Storing the original URL with each URL metric can assist with debugging issues with query var normalization.
  • Removes JSON pretty-printing of post_content for URL Metrics posts.
  • Removed TODO for adding a version identifier for the stored URL metrics since strict schema validation now achieves this by dropping entries which no longer match the schema.

Checklist

  • PR has either [Focus] or Infrastructure label.
  • PR has a [Type] label.
  • PR has a milestone or the no milestone label.

@westonruter westonruter added [Type] Enhancement A suggestion for improvement of an existing feature [Focus] Images Issues related to the Images focus area no milestone PRs that do not have a defined milestone for release [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Mar 10, 2024
Copy link

github-actions bot commented Mar 10, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: westonruter <westonruter@git.wordpress.org>
Co-authored-by: felixarntz <flixos90@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Base automatically changed from update/ilo-to-optimization-detective to feature/image-loading-optimization March 18, 2024 21:54
@felixarntz felixarntz self-requested a review as a code owner March 18, 2024 21:54
*
* @return string Current URL.
*/
function od_get_current_url(): string {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter LGTM, just a few small questions.

Comment on lines +119 to +122
if ( isset( $_SERVER['REQUEST_URI'] ) ) {
// phpcs:ignore WordPress.Security.ValidatedSanitizedInput.InputNotSanitized
$current_url .= ltrim( wp_unslash( $_SERVER['REQUEST_URI'] ), '/' );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about my favorite edge-case of sites within a subdirectory, this works as expected, right? I guess since $_SERVER['REQUEST_URI'] would include the full path, including the part that's potentially part of home_url(), this should be fine. Just want to make sure since the function doesn't otherwise consider a path that may be part of home_url() already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, REQUEST_URI should include the full path including the subdirectory root path. Otherwise, this function here only uses home_url() to obtain the default scheme, host, and port. The path is not used.

// phpcs:ignore WordPress.Security.ValidatedSanitizedInput.InputNotSanitized
$current_url .= ltrim( wp_unslash( $_SERVER['REQUEST_URI'] ), '/' );
}
return esc_url_raw( $current_url );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this use esc_url_raw()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, humm, since potentially malicious data is present in REQUEST_URI. Perhaps this is overkill, but it has worked well in the AMP plugin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this is already handled by $wpdb->prepare() when storing the value. But in any case, doesn't hurt I guess.

@@ -190,7 +190,7 @@ static function ( OD_URL_Metric $url_metric ): array {
},
$group_collection->get_flattened_url_metrics()
),
JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES // TODO: No need for pretty-printing.
JSON_UNESCAPED_SLASHES // No need for escaped slashes since not printed to frontend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely unrelated question: Why is this relevant? Do frontend slashes have to be escaped? I'm asking since the Speculation Rules plugin JSON output contains backslashes which Domenic suggested to remove in my Speculation Rules post draft code example. I'm confused about whether those should or shouldn't be there. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The slashes should always be present when the JSON is inline in HTML, as this prevents malicious strings from breaking out of the containing script. For example, if someone somehow got </script> into the data being serialized to JSON, this would prematurely terminate the script element, whereas if slashes are escaped, then <\/script> would not.

In the context of post_content here where the JSON is never being printed to HTML, there is no need for this escaping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, so that should probably be restored to Speculation Rules 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explaining. I believe Speculation Rules already does this correctly then, feel free to check. I was referring to the Make Core post draft I've been working on.

@westonruter westonruter merged commit 18d36fe into feature/image-loading-optimization Mar 19, 2024
48 checks passed
@westonruter westonruter deleted the update/url-handling branch March 19, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Focus] Images Issues related to the Images focus area no milestone PRs that do not have a defined milestone for release [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants