Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bingbot cannot index (or scrape) WPcom Simple sites #83341

Closed
inaikem opened this issue Oct 23, 2023 · 42 comments
Closed

Bingbot cannot index (or scrape) WPcom Simple sites #83341

inaikem opened this issue Oct 23, 2023 · 42 comments
Labels
Customer Report Issues or PRs that were reported via Happiness. Previously known as "Happiness Request". [Feature Group] User Interaction & Engagement Tools and features for site owners to share, promote, and manage engagement with their audiences. [Feature] SEO Tools Tools for improving a site's search engine optimization. [Interaction #] > 20 (Automated) interaction count label for better visibility. Please don't add these manualliy. [Platform] Simple [Pri] High [Product] WordPress.com All features accessible on and related to WordPress.com. [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts Triaged To be used when issues have been triaged. [Type] Bug

Comments

@inaikem
Copy link
Contributor

inaikem commented Oct 23, 2023

Predef

Please see this internal P2 for details on our predef: p7DVsv-j5F-p2#comment-48067.

This issue will be updated with additional context as needed.

Quick summary

We have determined that Microsoft is using its generic Bingbot crawler to scrape sites. They have not yet documented a way to block the scraping behavior, so for the moment, we have blocked Bingbot from indexing Simple sites via robots.txt directives.

We've created this issue to track support interactions related to this.

Steps to reproduce

N/A

What you expected to happen

N/A

What actually happened

N/A

Impact

Some (< 50%)

Available workarounds?

No but the platform is still usable

Platform (Simple and/or Atomic)

No response

Logs or notes

No response

@inaikem inaikem added [Type] Bug Needs triage Ticket needs to be triaged [Product] WordPress.com All features accessible on and related to WordPress.com. [Feature Group] User Interaction & Engagement Tools and features for site owners to share, promote, and manage engagement with their audiences. labels Oct 23, 2023
@github-actions github-actions bot added [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts [Pri] High labels Oct 23, 2023
@inaikem
Copy link
Contributor Author

inaikem commented Oct 23, 2023

Initial interactions:

7163433-zd-a8c
7176519-zd-a8c
7197506-zd-a8c
7187505-zd-a8c
7198087-zd-a8c

Update to the P2 here: peGwbA-Oi-p2#comment-1679

Edit: looks like the script didn't catch the link above. Opening the comment to edit changes the link automagically

@github-actions
Copy link

github-actions bot commented Oct 23, 2023

Support References

This comment is automatically generated. Please do not edit it.

  • p7DVsv-j5F-p2#comment-48067
  • 7163433-zen
  • 7176519-zen
  • 7197506-zen
  • 7187505-zen
  • 7198087-zen
  • peGwbA-Oi-p2#comment-1679
  • 7174793-zen
  • 7162106-zen
  • 7151777-zen
  • 7220012-zen
  • 7191656-zen
  • 7229871-zen
  • 7205505-zen
  • 7211878-zen
  • 7232284-zen
  • 7253862-zen
  • 7243871-zen
  • 7274083-zen
  • 7277358-zen
  • 7277446-zen
  • 7293184-zen
  • 7329176-zen
  • 7334644-zen
  • https://wordpress.com/forums/topic/blocking-bingbot-in-robots-txt-why
  • 7344690-zen
  • 7360087-zen
  • 7381700-zen
  • 7232492-zen
  • 7411333-zen
  • 7412303-zen
  • 7414145-zen
  • 7471023-zen
  • 7472791-zen
  • 7453782-zen
  • 7497842-zen
  • 7493070-zen
  • 070306-zen
  • 7560761-zen
  • 7581254-zen
  • 7581664-zen
  • 7669127-zen
  • 7658463-zen

@github-actions github-actions bot added the Customer Report Issues or PRs that were reported via Happiness. Previously known as "Happiness Request". label Oct 23, 2023
@john-legg john-legg added [Pri] Normal and removed [Pri] High [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts labels Oct 23, 2023
@inaikem
Copy link
Contributor Author

inaikem commented Oct 24, 2023

Adding a quick update: this link to a Sep 2023 announcement covers content usage controls.

@Greatdane
Copy link

7174793-zd-a8c

@Greatdane
Copy link

7162106-zd-a8c

@dragstor
Copy link
Member

7151777-zen

@cuemarie cuemarie added Triaged To be used when issues have been triaged. [Feature] SEO Tools Tools for improving a site's search engine optimization. [Platform] Simple and removed Needs triage Ticket needs to be triaged labels Oct 25, 2023
@cuemarie cuemarie changed the title Bingbot cannot index (or scrape) WPcom simeple sites Bingbot cannot index (or scrape) WPcom Simple sites Oct 25, 2023
@carolframen
Copy link

7220012-zen

@ariel-maidana
Copy link

7191656-zd-a8c

@carolframen
Copy link

7229871-zen

@github-actions github-actions bot added the [Interaction #] > 10 (Automated) interaction count label for better visibility. Please don't add these manualliy. label Oct 28, 2023
@Greatdane
Copy link

7205505-zd-a8c

@github-actions github-actions bot added the [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts label Nov 1, 2023
@jorpdesigns
Copy link

7293184-zen

@github-actions github-actions bot added [Interaction #] > 20 (Automated) interaction count label for better visibility. Please don't add these manualliy. and removed [Interaction #] > 10 (Automated) interaction count label for better visibility. Please don't add these manualliy. labels Nov 9, 2023
@Aurorum
Copy link
Contributor

Aurorum commented Nov 18, 2023

7329176-zen

@Aurorum
Copy link
Contributor

Aurorum commented Nov 20, 2023

7334644-zen

@aleone89
Copy link

@Greatdane
Copy link

Another report; 81441-odie

@ahmadbaig1
Copy link

Another report- 7344690-zen

@dataspun
Copy link

Another report, self-reporting

@Aurorum
Copy link
Contributor

Aurorum commented Nov 25, 2023

7360087-zen

@sudeepbaral
Copy link

  • 7381700-zen

Another report

@Aurorum
Copy link
Contributor

Aurorum commented Dec 4, 2023

7232492-zen

@jorpdesigns
Copy link

7411333-zen

@Robertght
Copy link

7412303-zen

@Aurorum
Copy link
Contributor

Aurorum commented Dec 6, 2023

7414145-zen

@essleeung
Copy link

7471023-zd-a8c

@ariel-maidana
Copy link

7472791-zd-a8c
This user requested Bingbot to be unblocked.

@OmarFPG
Copy link

OmarFPG commented Dec 21, 2023

7453782-zen

@Gustavo-Hilario
Copy link

Another report: 7497842-zen

@suyogyashukla
Copy link
Member

7493070-zen

@AminMehrani
Copy link

070306-Zen

@edequalsawesome
Copy link

Another report: #7560761-zen

@Aurorum
Copy link
Contributor

Aurorum commented Jan 14, 2024

7581254-zen

@coder-mahfuz
Copy link

7581664-zen

@AminMehrani
Copy link

7669127-Zen

@AlexanderSky
Copy link

7658463-zen

@happychait
Copy link

Closing this issue as the Bingbot is no longer blocked on WordPress.com sites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer Report Issues or PRs that were reported via Happiness. Previously known as "Happiness Request". [Feature Group] User Interaction & Engagement Tools and features for site owners to share, promote, and manage engagement with their audiences. [Feature] SEO Tools Tools for improving a site's search engine optimization. [Interaction #] > 20 (Automated) interaction count label for better visibility. Please don't add these manualliy. [Platform] Simple [Pri] High [Product] WordPress.com All features accessible on and related to WordPress.com. [Status] Priority Review Triggered Quality squad has been notified of this issue in #dotcom-triage-alerts Triaged To be used when issues have been triaged. [Type] Bug
Development

No branches or pull requests