Add robots.txt checking option

## Feature Request

Currently markgrab does not check `robots.txt` before fetching. Add an optional `respect_robots=True` parameter that:

1. Fetches and parses `robots.txt` for the target domain
2. Checks if the URL path is allowed for the configured user agent
3. Raises `RobotsDisallowed` or silently skips if disallowed

This should be opt-in (default `False`) to maintain backward compatibility.

### Motivation
Legal compliance for production deployments. Currently documented in Disclaimer but not enforced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add robots.txt checking option #1

Feature Request

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add robots.txt checking option #1

Description

Feature Request

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions