Support the Robots meta tag #18

sylvinus · 2016-02-22T07:44:40Z

Not sure if Common Crawl already filters those pages, but we should do it on our side too anyway.

Some pointers:

Add a is_indexable method to Document
HTMLDocument.is_indexable() should test for the presence of noindex in self.head_metas["robots"]
Documents should be skipped accordingly in indexer.py
To support nofollow we could simply test for its presence in close_tag

The text was updated successfully, but these errors were encountered:

sylvinus · 2016-03-06T05:47:59Z

sylvinus · 2016-10-13T08:56:16Z

@jhildreth started a branch with some good work that should be merged:
master...jhildreth:feature/robots-tag

sylvinus added enhancement help wanted easy python labels Feb 22, 2016

Provide feedback