Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robots.txt default message is confusing #345

Open
fe-lix- opened this issue Jul 5, 2023 · 0 comments
Open

Robots.txt default message is confusing #345

fe-lix- opened this issue Jul 5, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@fe-lix-
Copy link
Contributor

fe-lix- commented Jul 5, 2023

Is your feature request related to a problem? Please describe.

Description

The current default robots.txt can create confusion for the users. It does not help understanding why on a production website it would be returned instead of the robots.txt configured in the site repository.

This is the current default robots.txt:

# Helix robots.txt FAQ
#
# Q: This looks like a default robots.txt, how can I provide my own?
# A: Put a file named robots.txt into the root of your GitHub 
# repo, Franklin will serve it from there.
#
# Q: Why am I'm seeing this robots.txt instead of the one I 
# configured?
# A: You are visiting from *.hlx.page or *.hlx.live - in order 
# to prevent these sites from showing up in search engines and 
# giving you a duplicate content penalty on your real site we 
# exclude all robots 
# 
# Q: What do you mean with "real site"?
# A: If you add a custom domain to this site (e.g. 
# example.com), then Franklin detects that you are ready for 
# production and serves your own robots.txt - but only on 
# example.com
#
# Q: This does not answer my questions at all. What can I do?
# A: head over to #franklin-chat on Slack or 
# github.com/adobe/helix-home/issues and ask your question 
# there.
User-agent: *
Disallow: /

Phrasing issue in the default robots.txt

The problem is the part defining the "real site". The message states that:

Problem 1 - Franklin detects that you are ready for production

This is actually not the case, the behavior of returning the default robots.txt or not is defined by the presence of the x-forwarded-host header in the BYOCDN configuration. So a client would be trying to find out where to configure this example.com domain in helix.

There is no mention of the domain anywhere in the helix documentation except on the Push invalidation configuration. And in the BYOCDN configuration, there is no mention of the importance of x-forwarded-host as the definition of the "real site". Only a screenshot with the header configured.

Problem 2 - but only on example.com

This behavior is not factual, once the CDN is correctly configured any domain hooked on that CDN will show the robots.txt from the repository. I believe rephrasing this passage might help users understand the issue.

By example, if you are using Cloudfront the repository robots.txt would be returned from the "real site" domain (ie: example.com) and your CloudFront distribution (randomid123.cloudfront.net)

Behaviour in which the problem appears

The current problematic behavior is the following:

  1. Create a new website
  2. Configure the BYOCDN but omit the x-forwarded-host(by mistake let's say)
  3. See the default robots.txt
  4. Reading the message you commit a robots.txt to your repository
  5. Everything works as expected except the default robots.txt is still returned

Suggested solution

  1. Append to the BYOCDN documentation information about the importance of x-forwarded-host
  2. Add to the Go-Live verification documentation the presence of the x-forwarded-host
  3. Change the default robots.txt text to mention that if they see the default robots.txt on the "real site", it's most probably due to a CDN configuration problem. This would point them to the amended documentation above.
@fe-lix- fe-lix- added the enhancement New feature or request label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant