New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 'X-Robots-Tag: noindex' from RSS feeds with podcasts #10262

Open
monbauza opened this Issue Jul 5, 2018 · 20 comments

Comments

Projects
None yet
@monbauza
Copy link

monbauza commented Jul 5, 2018

Is your feature request related to a problem? Please describe.

Yoast SEO automatically adds a X-Robots-Tag: noindex to RSS feeds (see plugin code here). In most situatons, you don't want RSS feeds to appear in the search results as this could lead to duplicate content issues and results in a bad user experience.

On the other hand, Google doesn't show feeds in the search results, with the exception of podcasts (feeds with multimedia enclosures). See Google Webmaster Central Blog for more information.

A user recently contacted us on the WordPress.org forum explaining that his podcasts will not show up in the new Google Podcasts app because of the X-Robots-Tag: noindex generated by Yoast SEO.

Describe the solution you'd like

Investigate whether the X-Robots-Tag: noindex is really needed for RSS feeds and whether Google is smart enough to exclude RSS feeds from the search results when the X-Robots-Tag: noindex is not present in the feed.

In case the X-Robots-Tag: noindex is still needed, provide a way (possibly a filter?) to exclude the noindex tag from RSS feeds with podcasts.

@yoastbot yoastbot added the support label Jul 5, 2018

@monbauza monbauza added the enhancement label Jul 5, 2018

@Pcosta88

This comment has been minimized.

Copy link
Contributor

Pcosta88 commented Jul 6, 2018

A user reports

From my testing it appears the Yoast SEO plugin is adding the tag. A simple fix, do not set the header if the function is_feed() returns true. There should be no need to set noindex for a RSS feed since it is not an HTML page in the first place.

@sunkast

This comment has been minimized.

Copy link

sunkast commented Jul 6, 2018

Here are the guidelines from Google regarding how podcast RSS feeds need to be created.

https://developers.google.com/search/docs/data-types/podcast

The most relevant part of the guideline states:

The homepage, the RSS feed, and any non-blocked audio files must be exposed to Googlebot; that is, they must not require a login, and must not be protected by robots.txt or tags.

Also at the bottom, under the troubleshooting section it states:

Make sure your feed is indexed using the link generator tool. If the tool shows that your feed “is not currently indexed by Google Podcasts,” make sure you are following our guidelines. If you have additional questions, visit the Google Webmaster Central Help Forum.

If Yoast sets X-Robots-Tag to noindex, then the RSS feed will not meet Google's guidelines.

@amandato

This comment has been minimized.

Copy link

amandato commented Jul 6, 2018

Hello, I am the developer of the PowerPress plugin. It appears that all feeds are getting marked as noindex by the Yoast SEO plugin. This has prompted some questions...

  1. Why would we want to make sure search does not index RSS feeds from a WordPress site? Has there been a situation when RSS gets indexed in search results or that the RSS harms the SEO ranking in some way? If this is just code that was added preemptively without a real reason, can is_feed check be removed from Yoast's SEO plugin?
  2. If the noindex is necessary in all RSS feeds on a site, it is safe to remove this when the feed is for a podcast? If it is still not safe, please explain how to best proceed as the tag is telling Google Podcasts, which is now part of Google search, not to index the podcast which is problematic for podcasters. If yes, we will add logic to stop Yoast's plugin from adding the noindex for podcast feeds that way non-podcast feeds in a WordPress site will still enjoy the noindex but podcast feeds will have the noindex removed.

Thanks,
Angelo

@amandato

This comment has been minimized.

Copy link

amandato commented Jul 6, 2018

Site note, is it correct to noindex the robots.txt? I think this may also be causing a problem for podcasts that use the validator through Google when submitting a podcast to google play music as one of the errors we hear from users is that they need to make sure their robots.txt allows the path to their podcast feed to be indexable.

It seems either redundant or excessive to set a robots tag in the headers for a file that itself is for robots. The examples I found were used to prevent search engines from indexing files that you may not want them to index, for example images and pdf files that you do link to on your site.

@sunkast

This comment has been minimized.

Copy link

sunkast commented Jul 31, 2018

In the meantime, can there at least be an option to disable this function?

@devnihil

This comment has been minimized.

Copy link

devnihil commented Aug 24, 2018

Please inform the customer of conversation # 418706 when this conversation has been closed.

@jono-alderson

This comment has been minimized.

Copy link

jono-alderson commented Aug 27, 2018

From an SEO perspective, it's generally worthwhile preventing Google from indexing RSS feeds via the x-robots HTTP header. Note for reference, that when this has a value of noindex, that doesn't prevent Google from accessing or consuming the information - just from indexing it.

That aside, we should definitely add the ability to filter this value, so that we can be podcast-friendly. Easy fix!

@robantonelli

This comment has been minimized.

Copy link

robantonelli commented Sep 5, 2018

Hi all I am also experiencing this same issue but there doesn't seem to be a resolution on GH or on the WP thread if you've already had the Yoast plugin mess up your header. I have already made the recommended changes to my PowerPress settings. However, the damage is already done. How do you CORRECT the problem now that Yoast made this change already (because unfortunately it seems the recommended workaround only helps people that haven't installed the plugin yet).

@jono-alderson

This comment has been minimized.

Copy link

jono-alderson commented Sep 5, 2018

Hello!

As I outlined above, it's my belief that this isn't an error/mistake - our noindex'ing of RSS content is intentional and, from an SEO perspective, good practice. We're not 'messing up' any headers.

However, I agree that there's a good case for adding the ability to allow users to disable this feature for specific feeds. That's why we've added this to our development queue and process; we'll get to this at some point in the near future!

@robantonelli

This comment has been minimized.

Copy link

robantonelli commented Sep 5, 2018

Hi,

I appreciate you getting back to me, but with all due respect, when your SEO plugin blocks our podcast from our listeners on the Google Podcast platform (after previously being available) I'd absolutely describe that as a mistake, regardless of what the design intent was. And you're not 'messing up' headers in that there's something wrong with the actual header as a whole; you did mess up my feed, which has an element in the header that is now blocked due to your plugin (without my knowledge) adding the X-Robots-tag: noindex to that feed. Based on your comment, it seems that it's more important that you protect the reputation of Yoast by explaining that "this isn't an error/mistake" instead of acknowledging that there is an issue that needs to be addressed (i.e thousands of our listeners just lost access to our podcast because of your plugin). It would be appreciated if you could outline exactly how to remove the x-robots-tag: noindex element from the headers after it has been added. Because, whether you believe this is a mistake or not, it doesn't change the fact that it royally messed up our podcast on Google Podcasts, which I believe warrants a resolution from the people who made the plug-in. If possible, please provide & document details on how to remove the tag from the header after it has been added.

@jono-alderson

This comment has been minimized.

Copy link

jono-alderson commented Sep 5, 2018

We don't currently have an easy way to turn the x-robots header on or off for specific feeds - that's, in part, why I'm classifying this as a feature request, rather than as a bug. This additional functionality which we should, and need to, add new functionality in order to support.

In the meantime, you might be able to 'undo' our addition of the header if you're comfortable juggling some PHP. You can see in

public function noindex_feed() {
that we use a noindex_feed function, which is triggered on template_redirect, to check and set the header.

I think you could also hook into template_redirect with a different priority / at a different point (or, at any point before the page begins to be constructed), and un-set that header.

It's a bit of a clunky solution, but it'd give you the opportunity to get your site working properly whilst we investigate options for a more robust solution.

@crazyworks777

This comment has been minimized.

Copy link

crazyworks777 commented Oct 4, 2018

I miss the old Yoast plugin, where the user have more control, currently yoast think they know better than the user(webmaster & SEO) and this is an perfect example, where you block the rss feed, the other problem is with the site maps, Yoast decide there can only be 1000 links per sitemap and the transient entries for cache of sitemaps(Many entries never deleteded, sometime there is a problem with your checks there and then you add also a noindex in the sitemap, I miss the old Yoast plugin, where a webmaster or SEO can enable and disable as he sees fit. I think yoast must also provide and advance options, where the seo or webmaster can have more control, because how many times Yoast messed up, like the images thing, last time?

@mike503

This comment has been minimized.

Copy link

mike503 commented Oct 16, 2018

This is also a headache for me - this shouldn't be hardcoded behavior. It should be a toggle. I tried to insert an action after with header_remove() but it would not remove it. Neither would remove_action() (best I could tell) - so I instead used PHP's built-in header override and put in a header('X-Robots-Tag: index, follow', true) in an action; it's gross but it does change the "noindex" properly.

Would appreciate a cleaner option.

@moorscode

This comment has been minimized.

Copy link
Member

moorscode commented Nov 2, 2018

Simple snippet to remove the noindex header from the RSS feed:

add_action( 'init', function() {
	if ( ! class_exists( 'WPSEO_Frontend' ) ) {
		return;
	}

	$wpseo_frontend = WPSEO_Frontend::get_instance();
	remove_action( 'template_redirect', array( $wpseo_frontend, 'noindex_feed' ) );
});

You can add this into your functions.php of your theme.

@klarking

This comment has been minimized.

Copy link

klarking commented Nov 13, 2018

Hello i have been reading this thread. I have 1400 URLS not being indexed because of this exact reason. https://www.btoxicfree.com/tag/most-powerful/feed/ URLS like this that have been excluded from search it started on the 21 August 2018. Prior to that 0 were excluded. Interesting to note a drop in search results and traffic, one can assume this is why. Please see a picture of what happened. I am trying to make sense of this. Why did it start on the 21 August. Should i implement the fix noted above? Is it bad for them to be indexed. It seems good to me? Can someone please advise me. I am not very good at this stuff. I appreciate your help. moorscode

no index

@LeeLevy

This comment has been minimized.

Copy link

LeeLevy commented Nov 18, 2018

Hello i have been reading this thread. I have 1400 URLS not being indexed because of this exact reason. https://www.btoxicfree.com/tag/most-powerful/feed/ URLS like this that have been excluded from search it started on the 21 August 2018. Prior to that 0 were excluded. Interesting to note a drop in search results and traffic, one can assume this is why. Please see a picture of what happened. I am trying to make sense of this. Why did it start on the 21 August. Should i implement the fix noted above? Is it bad for them to be indexed. It seems good to me? Can someone please advise me. I am not very good at this stuff. I appreciate your help. moorscode

no index

I have the exact same issue. I am sure suddenly removing from index abulk of pages does harm your SEO efforts, in the other hand, I know the best practice is no index, to avoid duplicated content
Can you please advise?

@mayada-ibrahim

This comment has been minimized.

Copy link

mayada-ibrahim commented Nov 23, 2018

Please inform the customer of conversation # 446974 when this conversation has been closed.

@jono-alderson

This comment has been minimized.

Copy link

jono-alderson commented Dec 3, 2018

From an SEO perspective, preventing these kinds of URLs from being indexed makes sense - these aren't "pages" which you'd want users to land on from search results. Adding a noindex directive is the correct way to handle this. Our plugin does this automatically.

As for the Search Console reports for "Excluded by 'noindex' tag", this is purely informational, and perfectly normal. The rows in here don't represent errors/issues/etc, or relate to performance in any way, in this regard.

To be clear; most users don't need to do anything, and their sites will perform at their best if you leave the default setting as-is.

If you want to remove the noindex directive - which we'd typically recommend against - you can use the code snippet from @moorscode above.

@princezuda

This comment has been minimized.

Copy link

princezuda commented Jan 10, 2019

You can also use our plugin that automatically removes the Yoast headers for you. https://wordpress.org/plugins/redirect-editor/

@amandato

This comment has been minimized.

Copy link

amandato commented Jan 12, 2019

I have a pre-release version of PowerPress that fixes this issue. Please contact me cio [at] rawvoice dot com for the link if you would like to test/use.

For those who want to wait, this feature will be released PowerPress 7.4.1 in 1-4 weeks, depending on how fast testers get back to us how the fix worked out for them.

For a regular website I get it, the feeds should not be indexed. For a podcast though this prevents your podcast from getting onto Google Podcasts, which is a big deal especially since podcasts may soon to be treated as 1 class citizens in search, similar to youtube videos.

Thanks,
Angelo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment