Add robots.txt entries for ad provider's crawler automatically #76

merged 4 commits into from May 2, 2013

2 participants

Automattic member

Google recommends explicitly setting a User-agent for their ads crawler - see If the User-agent isn't found in robots.txt, Google creates an alert in the AdSense dashboard (and emails the owner, I believe), asking the owner to add an entry to their robots.txt.

We can do this step automatically using the do_robotstxt action, making it even easier for users to add ads to their site. This Pull Request adds this support to ACM_Provider by adding a new $crawler_user_agent member to the class, which is meant to be set by child classes. If the ad provider doesn't have a crawler, leaving this member null will bypass the robots.txt modification.

Whether or not robots.txt is modified can be controlled by the acm_should_do_robotstxt filter, which accepts a boolean and the ACM_Provider instance.

The disallowed paths default to blank (we might want to make this mimic the default WP behavior, which is to disallow /wp-admin - feedback welcome) and are filterable with acm_robotstxt_disallow, which accepts / returns an array of disallowed paths and the ACM_Provider instance.

If the blog is marked as private, the site root is disallowed (which is the default behavior in WP).

Wanted to run this by anyone who is interested for feedback before committing.

Fixes #75

nickdaugherty added some commits May 1, 2013
@nickdaugherty nickdaugherty Added .DS_Store to .gitignore f709367
@nickdaugherty nickdaugherty Merge remote-tracking branch 'upstream/master' 7a3c74f
@nickdaugherty nickdaugherty Add robots.txt entries for provider's crawlers
Google recommends setting an empty robots.txt Disallow for their user
agent, Mediapartners-Google for sites serving their ads.

To prevent users from having to add these entries on their own, we can
add them automatically. Each provider now has a $crawler_user_agent
parameter, and the Disallows are filterable.

Looks good to me

@nickdaugherty nickdaugherty merged commit c063740 into Automattic:master May 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment