Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robots.txt fehlt #56

Closed
bekanntmacher opened this issue Feb 8, 2018 · 12 comments
Closed

robots.txt fehlt #56

bekanntmacher opened this issue Feb 8, 2018 · 12 comments
Assignees
Labels
Milestone

Comments

@bekanntmacher
Copy link

Bei der Installation fehlt die robots.txt. Die saubere Lösung wäre aber:

User-Agent: *
Disallow: 

Denn eine fehlende robots.txt sendet dem Crawler ein 404.

@aschempp
Copy link
Member

aschempp commented Feb 8, 2018

@contao/developers should we add one to the skeleton? I honestly liked the optimizations we had in Contao 3.5 (same for the .htaccess).

@Toflar
Copy link
Member

Toflar commented Feb 8, 2018

I liked them too. I‘d say yes.

@leofeyer leofeyer added this to the 4.6.0 milestone Feb 8, 2018
@fritzmg
Copy link
Contributor

fritzmg commented Feb 8, 2018

Denn eine fehlende robots.txt sendet dem Crawler ein 404.

Das ist kein Problem und gilt natürlich für jede Ressource, die auf dem Server nicht existiert ;).
Google hat (angeblich) sogar mal selbst empfohlen keine robots.txt zu verwenden, wenn du ohnehin alles indexiert haben möchtest.

@bekanntmacher
Copy link
Author

Google hat (angeblich) sogar mal selbst empfohlen keine robots.txt zu verwenden, wenn du ohnehin alles indexiert haben möchtest.

Das stimmt so nicht. Google empfiehlt eine robots.txt

Alle grossen Boots stützen sich auf die robots.txt ab. Wieso sollen wir ihnen diese Information nicht geben?

@fritzmg
Copy link
Contributor

fritzmg commented Feb 8, 2018

Mit einer robots.txt so wie du sie gepostet hast, gibst du dem Crawler keine zusätzlichen Informationen.

@bekanntmacher
Copy link
Author

Du beantwortest meine Frage nicht

@fritzmg
Copy link
Contributor

fritzmg commented Feb 8, 2018

Du beantwortest meine Frage nicht

Falls deine Frage ist, ob es einen Grund geben könnte eine robots.txt mit dem Inhalt

User-Agent: *
Disallow: 

nicht zu haben, dann kann ich dir das auch nicht beantworten ;). Ich hinterfrage es nur.

I honestly liked the optimizations we had in Contao 3.5 (same for the .htaccess).

Regarding the .htaccess I agree. I always copy the same parts into the .htaccess for each new Contao 4 installation (i.e. the same rules that are present in the .htaccess.default in Contao 3 regarding caching and compression for example). This has already been suggested in the past, but nothing was done about it (yet).

@leofeyer leofeyer removed this from the 4.6.0 milestone Mar 20, 2018
@leofeyer leofeyer added this to the 4.6.0 milestone Apr 11, 2018
@leofeyer leofeyer self-assigned this Apr 11, 2018
@leofeyer
Copy link
Member

Implemented in 1ee049f.

@akroii
Copy link

akroii commented Aug 22, 2019

Just a note:
Google doesn't support robots.txt at 1th sep 2019 anymore
https://webmasters.googleblog.com/2019/07/a-note-on-unsupported-rules-in-robotstxt.html

@fritzmg
Copy link
Contributor

fritzmg commented Aug 22, 2019

@akroii the article says

In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we're retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019.

The robots.txt itself is still used.

@akroii
Copy link

akroii commented Aug 22, 2019

Sorry, ... my fault. I mean the Disallow rule.
See here https://t3n.de/news/robotstxt-google-schafft-andere-1175830/

@fritzmg
Copy link
Contributor

fritzmg commented Aug 22, 2019

No, the Disallow rule is still supported.

The following rules will not be supported any more:

  • Noindex
  • Nofollow
  • Crawl-delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants