Skip to content

dsdesign-dev/link_checker_async

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

link_checker

An async broken-link checker for websites, built with Python's asyncio and aiohttp.

Crawls a site from a starting URL, follows internal links up to a configurable depth, and reports broken links, timeouts, and errors.

Installation

Requires Python 3.11+.

pip install aiohttp

For running tests:

pip install pytest pytest-asyncio

Usage

python -m link_checker https://example.com --depth 2

Options

Flag Default Description
--depth 3 Maximum crawl depth
--concurrency 10 Max concurrent requests (semaphore limit)
--timeout 15 Request timeout in seconds
--workers 10 Number of worker coroutines
--output Path to save JSON report
--verbose Enable debug logging

Example

# Check a site with depth 2, save JSON report
python -m link_checker https://example.com --depth 2 --output report.json

# More aggressive crawl
python -m link_checker https://example.com --depth 4 --concurrency 20 --workers 20

Pressing Ctrl+C during a crawl prints partial results collected so far.

Testing

Unit tests for models, parser, and reporter run without external dependencies:

pytest tests/test_models.py tests/test_parser.py tests/test_reporter.py -v

Integration tests (fetcher, crawler)

Integration tests require go-httpbin, a local HTTP test server.

# Build go-httpbin (requires Go)
git clone https://github.com/mccutchen/go-httpbin.git /tmp/go-httpbin
cd /tmp/go-httpbin && go build -o go-httpbin ./cmd/go-httpbin

# Copy binary to your PATH or set the path in conftest.py
cp /tmp/go-httpbin/go-httpbin /usr/local/bin/

# Run all tests
pytest -v

The test fixture in tests/conftest.py starts and stops the go-httpbin server automatically.

Project structure

link_checker/
├── __init__.py
├── __main__.py    # CLI entry point, graceful shutdown
├── crawler.py     # Queue + TaskGroup worker pool
├── fetcher.py     # Single-URL fetcher with semaphore
├── models.py      # Data models (CrawlURL, LinkCheckResult, CrawlReport)
├── parser.py      # HTML link extractor
└── reporter.py    # Console and JSON reporting
tests/
├── conftest.py    # go-httpbin fixture
├── test_crawler.py
├── test_fetcher.py
├── test_models.py
├── test_parser.py
└── test_reporter.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages