Skip to content

[Server] Before initaiting the scrape check robot.txt on the domain. #236

@R-Sandor

Description

@R-Sandor

Details

Resource: https://yoast.com/ultimate-guide-robots-txt/

Robots.txt syntax

robots.txt file consists of one or more blocks of directives, each starting with a user-agent line. The “user-agent” is the name of the specific spider it addresses. You can have one block for all search engines, using a wildcard for the user-agent, or particular blocks for particular search engines. A search engine spider will always pick the block that best matches its name.

These blocks look like this (don’t be scared, we’ll explain below):

User-agent: * 
Disallow: / 

User-agent: Googlebot 
Disallow: 

User-agent: bingbot 
Disallow: /not-for-bing/ 

Directives like Allow and Disallow should not be case-sensitive, so it’s up to you to write them in lowercase or capitalize them. The values are case-sensitive, so /photo/ is not the same as /Photo/. We like capitalizing directives because it makes the file easier (for humans) to read.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions