A recursive web crawler that captures full-page screenshots along navigation chains, producing a visual tree of your website's structure. Now with authentication, stealth mode, rate limiting, and click tracking for JS-heavy sites.
Given any website, NavTree Crawler will:
- Start at the homepage (root)
- Find all clickable navigation elements (CTAs) — including JavaScript-only buttons
- Visit each one, take a full-page screenshot
- Recursively discover CTAs on each new page
- Build a tree structure representing the navigation hierarchy
- Generate an HTML report with thumbnails and links
The result is a complete visual map of how users navigate through your site.
Homepage
├── About Us (🔗 link)
│ ├── Team
│ └── History
├── Products (🖱️ JS click)
│ ├── Category A
│ │ ├── Product 1
│ │ └── Product 2
│ └── Category B
└── Contact
| Feature | Description |
|---|---|
| 🔐 Authentication | Cookies, localStorage, sessionStorage, HTTP headers |
| 🥷 Stealth Mode | Bypass bot detection with playwright-extra stealth plugin |
| ⏱️ Rate Limiting | Fixed or randomized delays between requests |
| 🖱️ Click Tracking | Detect navigation from JavaScript button clicks, not just <a> links |
| 📁 Auto-named Output | Screenshots folder automatically named after the website (e.g., screenshots_example_com) |
# Clone or download this repository
cd nav-tree-crawler
# Install dependencies
npm install
# Install Playwright browsers
npx playwright install chromium# Basic crawl
node src/cli.js https://example.com
# With rate limiting (be polite!)
node src/cli.js https://example.com --random-delay 1000-3000
# Authenticated crawl
node src/cli.js https://dashboard.example.com --cookies ./cookies.json
# Full options
node src/cli.js https://spa.example.com \
--random-delay 500-1500 \
--cookies ./auth.json \
--depth 4 \
--pages 200
# See all options
node src/cli.js --helpconst NavTreeCrawler = require('./src/crawler');
const crawler = new NavTreeCrawler({
baseUrl: 'https://example.com',
// outputDir defaults to './screenshots_example_com' based on the URL
maxDepth: 5,
maxPages: 100,
// Authentication
cookies: [{ name: 'session', value: 'abc123', domain: '.example.com', path: '/' }],
// Rate limiting
randomDelay: [1000, 3000],
// Click tracking for SPAs
clickTracking: true,
});
await crawler.crawl();
await crawler.generateReport();Crawl login-protected sites by injecting authentication credentials.
// Direct cookie injection
new NavTreeCrawler({
cookies: [
{
name: 'session_id',
value: 'your-session-token',
domain: '.example.com',
path: '/',
httpOnly: true,
secure: true,
}
],
});
// Or load from file
new NavTreeCrawler({
cookiesFile: './cookies.json',
});Cookies file format (same as browser export):
[
{
"name": "session",
"value": "abc123",
"domain": ".example.com",
"path": "/"
}
]new NavTreeCrawler({
extraHeaders: {
'Authorization': 'Bearer your-jwt-token',
'X-API-Key': 'your-api-key',
},
});Useful for SPAs that store auth tokens in browser storage:
new NavTreeCrawler({
localStorage: {
'auth_token': 'your-token',
'user_id': '12345',
},
sessionStorage: {
'temp_data': 'value',
},
});Avoid bot detection using the puppeteer-extra-plugin-stealth.
Stealth mode is enabled by default and helps bypass:
- Cloudflare bot protection
- DataDome
- PerimeterX
- Basic headless browser detection
new NavTreeCrawler({
stealth: true, // default
// Custom user agent (optional)
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...',
});To disable stealth mode:
new NavTreeCrawler({
stealth: false,
});Be polite to servers and avoid getting blocked.
new NavTreeCrawler({
requestDelay: 1000, // 1 second between each request
});More human-like behavior, harder to detect as a bot:
new NavTreeCrawler({
randomDelay: [1000, 3000], // Random 1-3 seconds between requests
});# Fixed delay
node src/cli.js https://example.com --delay 1000
# Random delay
node src/cli.js https://example.com --random-delay 1000-3000Detect navigation from JavaScript button clicks, not just <a href="..."> links. Essential for SPAs (React, Vue, Angular).
- Finds all clickable elements (buttons,
[role="button"],[onclick], etc.) - Clicks each element
- Detects if navigation occurred (URL change OR content change)
- Uses content hashing to detect SPA state changes even when URL doesn't change
new NavTreeCrawler({
clickTracking: true, // default
// How long to wait for navigation after click
clickNavigationTimeout: 5000, // 5 seconds
// Extra wait for SPA transitions
additionalWait: 1500,
// Custom clickable selectors
clickableSelectors: [
'button:not([disabled])',
'[role="button"]',
'[onclick]',
'[data-href]',
'.nav-item',
'.menu-trigger',
],
});# Disable click tracking (links only)
node src/cli.js https://example.com --no-click-tracking
# Custom click timeout
node src/cli.js https://spa.example.com --click-timeout 8000| Option | Default | Description |
|---|---|---|
baseUrl |
- | Starting URL for the crawl |
outputDir |
./screenshots_<hostname> |
Where to save screenshots and reports (auto-generated from URL) |
maxDepth |
5 |
Maximum navigation depth |
maxPages |
100 |
Maximum pages to crawl (safety limit) |
viewport |
{width: 1920, height: 1080} |
Browser viewport size |
waitUntil |
'networkidle' |
Page load wait strategy |
additionalWait |
500 |
Extra wait time (ms) after page load |
sameDomainOnly |
true |
Only crawl pages on the same domain |
includeSubdomains |
false |
Include subdomains when sameDomainOnly is true |
| Option | Default | Description |
|---|---|---|
cookies |
[] |
Array of cookie objects to inject |
cookiesFile |
null |
Path to JSON file with cookies |
extraHeaders |
{} |
HTTP headers to add to all requests |
localStorage |
{} |
Items to set in localStorage |
sessionStorage |
{} |
Items to set in sessionStorage |
| Option | Default | Description |
|---|---|---|
stealth |
true |
Enable stealth mode |
userAgent |
null |
Custom user agent string |
| Option | Default | Description |
|---|---|---|
requestDelay |
0 |
Fixed delay between requests (ms) |
randomDelay |
null |
Random delay range [min, max] in ms |
| Option | Default | Description |
|---|---|---|
clickTracking |
true |
Enable JS click tracking |
clickNavigationTimeout |
5000 |
Timeout waiting for navigation after click (ms) |
clickableSelectors |
(see below) | Selectors for clickable elements |
CTA Selectors:
[
'nav a',
'header a',
'[role="navigation"] a',
'a[href]:not([href^="mailto:"]):not([href^="tel:"]):not([href^="#"]):not([href^="javascript:"])',
'button',
'[role="button"]',
'[onclick]',
'.btn',
'.button',
]Clickable Selectors:
[
'button:not([disabled])',
'[role="button"]',
'[onclick]',
'[data-href]',
'[data-link]',
'[data-nav]',
'.clickable',
'[tabindex="0"]',
]Exclude Selectors:
[
'a[href*="login"]',
'a[href*="logout"]',
'a[href*="signin"]',
'a[href*="signup"]',
'a[rel="external"]',
]After crawling, you'll find screenshots in a directory named after the website (e.g., ./screenshots_example_com for https://example.com). The folder contains:
*.png- Full-page screenshotstree.json- JSON tree structuremetadata.json- Crawl metadata and optionsreport.html- Visual HTML report
{
"url": "https://example.com",
"title": "Example Domain",
"screenshot": "d0_001_home_abc123.png",
"depth": 0,
"contentHash": "a1b2c3d4...",
"children": [
{
"url": "https://example.com/about",
"ctaText": "About Us",
"ctaType": "link",
"screenshot": "d1_002_about_def456.png",
"depth": 1,
"children": []
},
{
"url": "https://example.com/products",
"ctaText": "View Products",
"ctaType": "click",
"screenshot": "d1_003_products_ghi789.png",
"depth": 1,
"children": []
}
]
}- QA Testing - Visual regression testing across your entire site
- Site Audits - Understand your site's navigation structure
- Documentation - Generate visual documentation of user flows
- Competitive Analysis - Map out competitor site structures
- Accessibility Review - Screenshot all pages for manual review
- Archive/Backup - Visual snapshot of your entire site
- SPA Testing - Verify all JS-navigated routes are working
- CAPTCHAs - Can't solve CAPTCHAs (though stealth helps avoid them)
- 2FA - Can't handle two-factor authentication flows
- Infinite Scroll - Captures initial viewport only
- Complex SPAs - Some edge cases with JS navigation may be missed
- iframes - Doesn't crawl into iframes
MIT