Skip to content

49956: Spammer comment links. #291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

peterwilsoncc
Copy link
Contributor

@peterwilsoncc peterwilsoncc force-pushed the 49956-spam-comment-links branch from 1133434 to d363c5c Compare May 24, 2020 03:10
$show_pending_links = isset( $commenter['comment_author'] ) && $commenter['comment_author'];

if ( '0' == $comment->comment_approved && ! $show_pending_links ) {
return wp_kses( $comment_text, array() );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: wp_kses_allowed_html() for comments context and unsetting links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we only want to remove links

// Otherwise we match against email addresses.
if ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) {
// Only include requested comment.
$approved_clauses[] = $wpdb->prepare( "( comment_author_email = %s AND comment_approved = '0' AND comment_ID = %d )", $unapproved_identifier, (int) $_GET['unapproved'] );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edge case: prevents replies to own unmoderated comments from displaying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 80475ec

@@ -404,6 +404,9 @@ public function send_headers() {

if ( is_user_logged_in() ) {
$headers = array_merge( $headers, wp_get_nocache_headers() );
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) {
// Unmoderated comments are only visible for one minute via the moderation hash.
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big Host and CDN providers may not love this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wp_get_nocache_headers() returns an Expires header too, in addition to Cache-Control. I think we wouldn't need this, because those two headers are already turning cache off (with no-cache directives and a 1984 Expires header.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we could emit an X-Robots-Tag header to absolutely make sure nobody can submit the URL and get it crawled within the 1 minute Window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch re robots tags. In ee3fe6d, I've added a noindex instruction but extended the existing code for replytocom links rather than add the header. The effect should be the same.

wp_get_nocache_headers() returns an Expires header too, in addition to Cache-Control...

In the current release, there isn't a no-cacheing instruction for CDNs so I've added this in. I decided to allow the page to be cached for one minute as that's the life of the page.

There's a narrow window for exploiting the CDN cache, with the comment potentially being visible up to two minutes after it was posted but I think that's a safe compromise, especially with your noindex suggestion.

Please let me know if you think I am missing something.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
CDNs/browsers will pick the Cache-Control header over the Expires header is both are present. Pretty much every client that supports HTTP/1.1 should be preferring the Cache-Control header, which supports max-age=XYZ pattern to limit the cache duration. Do you think we need the Cache-Control header updated too, if we were to allow caching for one minute?

I think the intention is to prevent browsers and proxies/CDNs, etc from caching at all, so I believe an expires date with a 1984 expiration date, combined with a Cache-Control header that forces all clients/CDNs to re-validate, and not to store the pages is the more appropriate one. These pages will never be cached, even for the one minute duration, but I doubt there are significant gains to be had with a CDN/browser caching the response for one minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache-Control header added in f8bbb95 with max-age=60, must-revalidate.

@@ -404,6 +404,9 @@ public function send_headers() {

if ( is_user_logged_in() ) {
$headers = array_merge( $headers, wp_get_nocache_headers() );
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) {
// Unmoderated comments are only visible for one minute via the moderation hash.
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wp_get_nocache_headers() returns an Expires header too, in addition to Cache-Control. I think we wouldn't need this, because those two headers are already turning cache off (with no-cache directives and a 1984 Expires header.

@@ -1852,7 +1852,12 @@ function wp_get_unapproved_comment_author_email() {
$comment = get_comment( $comment_id );

if ( $comment && hash_equals( $_GET['moderation-hash'], wp_hash( $comment->comment_date_gmt ) ) ) {
$commenter_email = $comment->comment_author_email;
// The comment will only be viewable by the comment author for 1 minute.
$comment_preview_expires = strtotime( $comment->comment_date_gmt . '+1 minute' );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just $comment->comment_date_gmt + 60because it's already a UNIX timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$comment->comment_date_gmt is mysql formatted, eg 2020-05-24 04:21:27, so strtotime() is required.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry about the noise here.

@@ -404,6 +404,9 @@ public function send_headers() {

if ( is_user_logged_in() ) {
$headers = array_merge( $headers, wp_get_nocache_headers() );
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) {
// Unmoderated comments are only visible for one minute via the moderation hash.
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we could emit an X-Robots-Tag header to absolutely make sure nobody can submit the URL and get it crawled within the 1 minute Window.

if ( isset( $_GET['replytocom'] ) ) {
if (
isset( $_GET['replytocom'] ) ||
( isset( $_GET['unapproved'] ) && isset( $_GET['moderation-hash'] ) )
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof I'm sorry for the minor nitpick: I think we can simplify this to isset( $_GET['replytocom']] ) || isset( $_GET['unapproved'], $_GET['moderation-hash'] ), because isset() accepts any number of arguments, and returns true if all of them are set. We don't have to call isset() multiple times.

Copy link
Contributor Author

@peterwilsoncc peterwilsoncc May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in two minds about blocking robots on these pages, so asked @jono-alderson from Yoast who frequently contributes SEO recommendations to the WordPress project for some advice.

I've been told the canonical meta tag pointing to the post's main page should be adequate for these pages so I'll revert the relevant commit.

I'm sorry for the minor nitpick

No need to be sorry, I'd forgotten isset() allowed this and given the number of sites using WordPress, getting the code to it's best possible state is important. :)

@@ -404,6 +404,9 @@ public function send_headers() {

if ( is_user_logged_in() ) {
$headers = array_merge( $headers, wp_get_nocache_headers() );
} elseif ( ! empty( $_GET['unapproved'] ) && ! empty( $_GET['moderation-hash'] ) ) {
// Unmoderated comments are only visible for one minute via the moderation hash.
$headers['Expires'] = gmdate( 'D, d M Y H:i:s', time() + MINUTE_IN_SECONDS );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
CDNs/browsers will pick the Cache-Control header over the Expires header is both are present. Pretty much every client that supports HTTP/1.1 should be preferring the Cache-Control header, which supports max-age=XYZ pattern to limit the cache duration. Do you think we need the Cache-Control header updated too, if we were to allow caching for one minute?

I think the intention is to prevent browsers and proxies/CDNs, etc from caching at all, so I believe an expires date with a 1984 expiration date, combined with a Cache-Control header that forces all clients/CDNs to re-validate, and not to store the pages is the more appropriate one. These pages will never be cached, even for the one minute duration, but I doubt there are significant gains to be had with a CDN/browser caching the response for one minute.

@Ayesh
Copy link

Ayesh commented May 26, 2020

Hi @peterwilsoncc - can we take a look at the robots tag again please?

It looks like Google has indeed indexed URLs containing "moderation-hash", and if we were to backport this to earlier versions, having a more strict robots tag would help to eventually de-index those URLs.

@peterwilsoncc
Copy link
Contributor Author

Hi @peterwilsoncc - can we take a look at the robots tag again please?

It looks like Google has indeed indexed URLs containing "moderation-hash", and if we were to backport this to earlier versions, having a more strict robots tag would help to eventually de-index those URLs.

@jono-alderson are you able to help out with @Ayesh's question?

@jonoalderson
Copy link

Assuming there's a valid canonical URL tag in place, and that the site/page doesn't suffer from any significant SEO issues, then Google shouldn't index the variant version.

I'm still nervous about adding robots controls (specifically, a noindex, follow directive) to the page because:

  • WordPress core doesn't do a good job of reconciling the relationship between canonical tags and meta robots tags; and having a noindex and a canonical can cause problems. Without logic to handle this, I'd be nervous about making a bigger mess.
  • Legitimate (non-abusive) links which contain these types of parameters might be shared or linked to by users, in which case the page shouldn't be noindex'd.

@peterwilsoncc peterwilsoncc force-pushed the 49956-spam-comment-links branch from 80475ec to 1f7434d Compare May 29, 2020 08:04
noindex unapproved comment previews.

Revert "noindex unapproved comment previews."

This reverts commit ee3fe6d.

Include a cache control header too.

Hide reply link on unapproved comments using mod hash.
@peterwilsoncc peterwilsoncc force-pushed the 49956-spam-comment-links branch from 1f7434d to 1717e3e Compare May 29, 2020 08:12
$show_pending_links = isset( $commenter['comment_author'] ) && $commenter['comment_author'];

if ( '0' == $comment->comment_approved && ! $show_pending_links ) {
return wp_kses( $comment_text, array() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we only want to remove links

@peterwilsoncc
Copy link
Contributor Author

Merged

@peterwilsoncc peterwilsoncc deleted the 49956-spam-comment-links branch June 7, 2020 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants