-
Notifications
You must be signed in to change notification settings - Fork 3k
49956: Spammer comment links. #291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
49956: Spammer comment links. #291
Conversation
1133434
to
d363c5c
Compare
$show_pending_links = isset( $commenter['comment_author'] ) && $commenter['comment_author']; | ||
|
||
if ( '0' == $comment->comment_approved && ! $show_pending_links ) { | ||
return wp_kses( $comment_text, array() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider: wp_kses_allowed_html()
for comments context and unsetting links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we only want to remove links
// Otherwise we match against email addresses. | ||
if ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) { | ||
// Only include requested comment. | ||
$approved_clauses[] = $wpdb->prepare( "( comment_author_email = %s AND comment_approved = '0' AND comment_ID = %d )", $unapproved_identifier, (int) $_GET['unapproved'] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edge case: prevents replies to own unmoderated comments from displaying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 80475ec
src/wp-includes/class-wp.php
Outdated
@@ -404,6 +404,9 @@ public function send_headers() { | |||
|
|||
if ( is_user_logged_in() ) { | |||
$headers = array_merge( $headers, wp_get_nocache_headers() ); | |||
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) { | |||
// Unmoderated comments are only visible for one minute via the moderation hash. | |||
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big Host and CDN providers may not love this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wp_get_nocache_headers()
returns an Expires
header too, in addition to Cache-Control
. I think we wouldn't need this, because those two headers are already turning cache off (with no-cache
directives and a 1984 Expires
header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we could emit an X-Robots-Tag
header to absolutely make sure nobody can submit the URL and get it crawled within the 1 minute Window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch re robots
tags. In ee3fe6d, I've added a noindex
instruction but extended the existing code for replytocom
links rather than add the header. The effect should be the same.
wp_get_nocache_headers()
returns anExpires
header too, in addition toCache-Control
...
In the current release, there isn't a no-cacheing instruction for CDNs so I've added this in. I decided to allow the page to be cached for one minute as that's the life of the page.
There's a narrow window for exploiting the CDN cache, with the comment potentially being visible up to two minutes after it was posted but I think that's a safe compromise, especially with your noindex
suggestion.
Please let me know if you think I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
CDNs/browsers will pick the Cache-Control
header over the Expires
header is both are present. Pretty much every client that supports HTTP/1.1 should be preferring the Cache-Control
header, which supports max-age=XYZ
pattern to limit the cache duration. Do you think we need the Cache-Control
header updated too, if we were to allow caching for one minute?
I think the intention is to prevent browsers and proxies/CDNs, etc from caching at all, so I believe an expires date with a 1984 expiration date, combined with a Cache-Control
header that forces all clients/CDNs to re-validate, and not to store the pages is the more appropriate one. These pages will never be cached, even for the one minute duration, but I doubt there are significant gains to be had with a CDN/browser caching the response for one minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache-Control header added in f8bbb95 with max-age=60, must-revalidate
.
src/wp-includes/class-wp.php
Outdated
@@ -404,6 +404,9 @@ public function send_headers() { | |||
|
|||
if ( is_user_logged_in() ) { | |||
$headers = array_merge( $headers, wp_get_nocache_headers() ); | |||
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) { | |||
// Unmoderated comments are only visible for one minute via the moderation hash. | |||
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wp_get_nocache_headers()
returns an Expires
header too, in addition to Cache-Control
. I think we wouldn't need this, because those two headers are already turning cache off (with no-cache
directives and a 1984 Expires
header.
@@ -1852,7 +1852,12 @@ function wp_get_unapproved_comment_author_email() { | |||
$comment = get_comment( $comment_id ); | |||
|
|||
if ( $comment && hash_equals( $_GET['moderation-hash'], wp_hash( $comment->comment_date_gmt ) ) ) { | |||
$commenter_email = $comment->comment_author_email; | |||
// The comment will only be viewable by the comment author for 1 minute. | |||
$comment_preview_expires = strtotime( $comment->comment_date_gmt . '+1 minute' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just $comment->comment_date_gmt + 60
because it's already a UNIX timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$comment->comment_date_gmt
is mysql formatted, eg 2020-05-24 04:21:27
, so strtotime()
is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry about the noise here.
src/wp-includes/class-wp.php
Outdated
@@ -404,6 +404,9 @@ public function send_headers() { | |||
|
|||
if ( is_user_logged_in() ) { | |||
$headers = array_merge( $headers, wp_get_nocache_headers() ); | |||
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) { | |||
// Unmoderated comments are only visible for one minute via the moderation hash. | |||
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, we could emit an X-Robots-Tag
header to absolutely make sure nobody can submit the URL and get it crawled within the 1 minute Window.
src/wp-includes/default-filters.php
Outdated
if ( isset( $_GET['replytocom'] ) ) { | ||
if ( | ||
isset( $_GET['replytocom'] ) || | ||
( isset( $_GET['unapproved'] ) && isset( $_GET['moderation-hash'] ) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof I'm sorry for the minor nitpick: I think we can simplify this to isset( $_GET['replytocom']] ) || isset( $_GET['unapproved'], $_GET['moderation-hash'] )
, because isset()
accepts any number of arguments, and returns true if all of them are set. We don't have to call isset()
multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in two minds about blocking robots on these pages, so asked @jono-alderson from Yoast who frequently contributes SEO recommendations to the WordPress project for some advice.
I've been told the canonical
meta tag pointing to the post's main page should be adequate for these pages so I'll revert the relevant commit.
I'm sorry for the minor nitpick
No need to be sorry, I'd forgotten isset()
allowed this and given the number of sites using WordPress, getting the code to it's best possible state is important. :)
src/wp-includes/class-wp.php
Outdated
@@ -404,6 +404,9 @@ public function send_headers() { | |||
|
|||
if ( is_user_logged_in() ) { | |||
$headers = array_merge( $headers, wp_get_nocache_headers() ); | |||
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) { | |||
// Unmoderated comments are only visible for one minute via the moderation hash. | |||
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
CDNs/browsers will pick the Cache-Control
header over the Expires
header is both are present. Pretty much every client that supports HTTP/1.1 should be preferring the Cache-Control
header, which supports max-age=XYZ
pattern to limit the cache duration. Do you think we need the Cache-Control
header updated too, if we were to allow caching for one minute?
I think the intention is to prevent browsers and proxies/CDNs, etc from caching at all, so I believe an expires date with a 1984 expiration date, combined with a Cache-Control
header that forces all clients/CDNs to re-validate, and not to store the pages is the more appropriate one. These pages will never be cached, even for the one minute duration, but I doubt there are significant gains to be had with a CDN/browser caching the response for one minute.
Hi @peterwilsoncc - can we take a look at the robots tag again please? It looks like Google has indeed indexed URLs containing "moderation-hash", and if we were to backport this to earlier versions, having a more strict robots tag would help to eventually de-index those URLs. |
@jono-alderson are you able to help out with @Ayesh's question? |
Assuming there's a valid canonical URL tag in place, and that the site/page doesn't suffer from any significant SEO issues, then Google shouldn't index the variant version. I'm still nervous about adding robots controls (specifically, a
|
80475ec
to
1f7434d
Compare
noindex unapproved comment previews. Revert "noindex unapproved comment previews." This reverts commit ee3fe6d. Include a cache control header too. Hide reply link on unapproved comments using mod hash.
1f7434d
to
1717e3e
Compare
$show_pending_links = isset( $commenter['comment_author'] ) && $commenter['comment_author']; | ||
|
||
if ( '0' == $comment->comment_approved && ! $show_pending_links ) { | ||
return wp_kses( $comment_text, array() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we only want to remove links
Merged |
https://core.trac.wordpress.org/ticket/49956