Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy and code changes for robots check #28

Merged
merged 2 commits into from
May 5, 2023
Merged

Conversation

sjledoux
Copy link
Collaborator

@sjledoux sjledoux commented May 4, 2023

After investigating the issue relating to case-insensitivity, it was determined that this was not the source of a problem with the robots check as our code uses python's urllib library for requests, which implements a case-insensitive dictionary for request headers.

However, it was determined that there were some issues with the current implementation of the robots check. In particular, we realized that the non-existence of a robots.txt does not prevent web-crawlers from indexing a page. Additionally, the exiting method of checking the 'X-Robots-Tag' would exclude values of the tag that ought to be valid. This PR makes corresponding changes in the code and the submission guide to fix these issues.

@github-actions
Copy link

github-actions bot commented May 4, 2023

Looks like you've passed all of the checks!

@sjledoux sjledoux requested a review from helenyc May 4, 2023 19:48
@sjledoux sjledoux merged commit cc9dc6c into main May 5, 2023
@sjledoux sjledoux deleted the Change-Robots-Policy branch May 5, 2023 14:34
@erikb-stripe
Copy link
Contributor

I opened #246 today. This change breaks for service domains that redirect to another domain that itself doesn't have a X-Robots-Tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants