An async broken-link checker for websites, built with Python's asyncio and aiohttp.
Crawls a site from a starting URL, follows internal links up to a configurable depth, and reports broken links, timeouts, and errors.
Requires Python 3.11+.
pip install aiohttpFor running tests:
pip install pytest pytest-asynciopython -m link_checker https://example.com --depth 2| Flag | Default | Description |
|---|---|---|
--depth |
3 | Maximum crawl depth |
--concurrency |
10 | Max concurrent requests (semaphore limit) |
--timeout |
15 | Request timeout in seconds |
--workers |
10 | Number of worker coroutines |
--output |
— | Path to save JSON report |
--verbose |
— | Enable debug logging |
# Check a site with depth 2, save JSON report
python -m link_checker https://example.com --depth 2 --output report.json
# More aggressive crawl
python -m link_checker https://example.com --depth 4 --concurrency 20 --workers 20Pressing Ctrl+C during a crawl prints partial results collected so far.
Unit tests for models, parser, and reporter run without external dependencies:
pytest tests/test_models.py tests/test_parser.py tests/test_reporter.py -vIntegration tests require go-httpbin, a local HTTP test server.
# Build go-httpbin (requires Go)
git clone https://github.com/mccutchen/go-httpbin.git /tmp/go-httpbin
cd /tmp/go-httpbin && go build -o go-httpbin ./cmd/go-httpbin
# Copy binary to your PATH or set the path in conftest.py
cp /tmp/go-httpbin/go-httpbin /usr/local/bin/
# Run all tests
pytest -vThe test fixture in tests/conftest.py starts and stops the go-httpbin server automatically.
link_checker/
├── __init__.py
├── __main__.py # CLI entry point, graceful shutdown
├── crawler.py # Queue + TaskGroup worker pool
├── fetcher.py # Single-URL fetcher with semaphore
├── models.py # Data models (CrawlURL, LinkCheckResult, CrawlReport)
├── parser.py # HTML link extractor
└── reporter.py # Console and JSON reporting
tests/
├── conftest.py # go-httpbin fixture
├── test_crawler.py
├── test_fetcher.py
├── test_models.py
├── test_parser.py
└── test_reporter.py