Skip to content

Posty5/Screenshoter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

@posty5/screenshoter

Crawl any website, discover all its pages, and take full-page screenshots β€” saved in a folder structure that mirrors the URL path. Built with Crawlee and Playwright.

npm version license


🌟 What is @posty5/screenshoter?

@posty5/screenshoter is a TypeScript package that automates the process of capturing screenshots for every page on a website. It handles two tasks:

  1. Crawl β€” Discover all URLs on a website using Crawlee's PlaywrightCrawler or sitemap parsing
  2. Capture β€” Take a screenshot of each URL using Playwright and save it to disk
  3. Single Page β€” Capture a screenshot of a single URL without crawling
  4. Batch Capture β€” Capture screenshots of a list of URLs you provide

Screenshots are saved in a folder structure that mirrors the URL path:

https://posty5.com/en/social-media-publisher
  β†’ screenshots/posty5.com/en/social-media-publisher/capture.webp

https://posty5.com/en/qr-code-generator
  β†’ screenshots/posty5.com/en/qr-code-generator/capture.webp

https://posty5.com/
  β†’ screenshots/posty5.com/capture.webp

Use Cases:

  • πŸ“Έ Visual regression testing
  • πŸ—‚οΈ Website archiving and documentation
  • πŸ–ΌοΈ Generating preview images for SEO / social sharing
  • πŸ“Š Auditing website pages at scale
  • πŸ” Collecting URLs from a website for analysis

πŸ“¦ Installation

npm install @posty5/screenshoter

After installing, make sure Playwright's Chromium browser is available:

npx playwright install chromium

πŸš€ Quick Start

Programmatic API

import { captureWebsite } from "@posty5/screenshoter";

const result = await captureWebsite({
  url: "https://posty5.com",
  outputDir: "./screenshots",
  format: "webp",
  excludePatterns: ["/api/**", "/user/*"],
});

console.log(`Captured ${result.captured} of ${result.totalUrls} pages`);

CLI

# Take screenshots of all pages
npx screenshoter capture https://posty5.com -o ./screenshots

# Collect URLs only (save to file)
npx screenshoter collect-urls https://posty5.com --file urls.txt

πŸ“š API Reference

captureWebsite(config)

Crawl a website, discover all pages, and take a screenshot of each one.

Returns: Promise<CaptureResult>

import { captureWebsite } from "@posty5/screenshoter";

const result = await captureWebsite({
  url: "https://posty5.com",
  outputDir: "./screenshots",
  format: "webp",
  viewport: { width: 1440, height: 900 },
  maxPages: 50,
  maxDepth: 3,
  excludePatterns: ["/api/**", "/user/*", "*.pdf"],
  concurrency: 3,
  waitAfterLoad: 1000,
  headless: true,
});

console.log(result);
// {
//   totalUrls: 42,
//   captured: 40,
//   failed: 2,
//   skipped: 0,
//   pages: [
//     { url: 'https://posty5.com/en', filePath: './screenshots/posty5.com/en/capture.webp', status: 'ok' },
//     { url: 'https://posty5.com/en/about', filePath: './screenshots/posty5.com/en/about/capture.webp', status: 'ok' },
//     ...
//   ]
// }

collectUrls(config)

Crawl a website and save all discovered URLs to a text file β€” one URL per line.

Config type: CollectUrlsConfig β€” only URL collection options (no screenshot settings).

Returns: Promise<string[]>

import { collectUrls } from "@posty5/screenshoter";

const urls = await collectUrls({
  url: "https://posty5.com",
  outputFile: "./urls.txt",
  maxPages: 100,
  excludePatterns: ["/api/**"],
});

console.log(`Found ${urls.length} URLs`);
// urls.txt:
// https://posty5.com
// https://posty5.com/en
// https://posty5.com/en/social-media-publisher
// https://posty5.com/en/qr-code-generator
// ...

capturePage(config)

Capture a screenshot of a single page β€” no crawling involved.

Config type: CapturePageConfig β€” only screenshot options + url.

Returns: Promise<PageResult>

import { capturePage } from "@posty5/screenshoter";

const result = await capturePage({
  url: "https://posty5.com/en/about",
  outputDir: "./screenshots",
  format: "png",
  viewport: { width: 1440, height: 710 },
  waitAfterLoad: 1500,
});

console.log(result);
// {
//   url: 'https://posty5.com/en/about',
//   filePath: './screenshots/posty5.com/en/about/capture.png',
//   status: 'ok'
// }

capturePages(urls, config?)

Capture screenshots for a list of URLs. Reuses a single browser instance and supports concurrency.

Config type: CaptureConfig β€” only screenshot options (no url or crawl settings).

Returns: Promise<CaptureResult>

import { capturePages } from "@posty5/screenshoter";

const result = await capturePages(["https://posty5.com", "https://posty5.com/en/about", "https://posty5.com/en/social-media-publisher"], {
  outputDir: "./screenshots",
  format: "png",
  viewport: { width: 1440, height: 710 },
  concurrency: 2,
  waitAfterLoad: 1500,
});

console.log(`Captured ${result.captured} of ${result.totalUrls} pages`);

scrollToEnd(page)

A beforeScreenshot utility that smoothly scrolls to the bottom of the page and waits for scroll-triggered animations to settle before the screenshot is taken. Useful for pages with fade-in or reveal animations driven by IntersectionObserver.

Returns: Promise<void>

import { capturePages, scrollToEnd } from "@posty5/screenshoter";

const result = await capturePages(urls, {
  outputDir: "./screenshots",
  format: "png",
  fullPage: true,
  beforeScreenshot: scrollToEnd,
});

βš™οΈ Configuration

The configuration is split into two concerns:

URL Collection Options (CollectUrlsConfig)

Used by collectUrls() and captureWebsite().

Option Type Default Description
url string (required) Root URL to start crawling from
strategy 'crawl' | 'sitemap' 'crawl' URL discovery strategy
sitemapUrl string <url>/sitemap.xml Custom sitemap URL (sitemap strategy only)
maxPages number 100 Maximum number of pages to crawl
maxDepth number 5 Maximum crawl depth from the root URL
excludePatterns string[] [] Glob patterns for URL paths to skip
includePatterns string[] [] Glob patterns β€” only matching URLs are captured
sameDomainOnly boolean true Only capture URLs on the same domain
headless boolean true Run browser in headless mode (crawl strategy)
shouldCapture (url: string) => boolean β€” Programmatic filter β€” return false to skip a URL

For collectUrls(), an additional option is available:

Option Type Default Description
outputFile string 'urls.txt' File path to save the URL list

Screenshot Options (CaptureConfig)

Used by capturePage(), capturePages(), and captureWebsite().

Option Type Default Description
outputDir string './screenshots' Directory to save screenshots
format 'webp' | 'png' | 'jpeg' 'webp' Screenshot image format
viewport { width, height } { width: 1440, height: 900 } Browser viewport size
fullPage boolean false Capture the full scrollable page instead of just the viewport
waitAfterLoad number 1000 Milliseconds to wait after page load before screenshot
concurrency number 3 Number of parallel browser pages
headless boolean true Run browser in headless mode
scrollToEnd boolean false Smoothly scroll to the bottom before screenshot (triggers scroll animations). Auto-enabled when fullPage is true
beforeScreenshot (page: Page) => Promise<void> β€” Callback to run before each screenshot

captureWebsite() accepts both sets of options combined (CaptureWebsiteConfig).


🚫 Filtering Dynamic Pages

Three mechanisms to control which pages get captured:

1. Exclude Patterns

Skip URLs matching glob patterns:

await captureWebsite({
  url: "https://example.com",
  excludePatterns: [
    "/api/**", // Skip all API routes
    "/user/*", // Skip user profile pages
    "/*/edit", // Skip edit pages
    "*.pdf", // Skip PDF links
    "/auth/**", // Skip auth pages
  ],
});

2. Include Patterns (Whitelist)

When set, only matching URLs are captured:

await captureWebsite({
  url: "https://example.com",
  includePatterns: [
    "/en/**", // Only capture English pages
    "/products/*", // Only capture product pages
  ],
});

3. Custom Filter Callback

For complex logic:

await captureWebsite({
  url: "https://example.com",
  shouldCapture: (url) => {
    const parsed = new URL(url);
    // Skip URLs with query parameters
    if (parsed.search) return false;
    // Skip paths with more than 4 segments
    if (parsed.pathname.split("/").filter(Boolean).length > 4) return false;
    return true;
  },
});

πŸ–₯️ CLI Reference

capture β€” Take Screenshots

npx screenshoter capture <url> [options]
Option Description Default
-o, --output <dir> Output directory ./screenshots
-f, --format <format> Image format: webp, png, jpeg webp
--width <number> Viewport width 1440
--height <number> Viewport height 900
--max-pages <number> Maximum pages to crawl 100
--max-depth <number> Maximum crawl depth 5
--exclude <patterns...> Glob patterns to exclude β€”
--include <patterns...> Glob patterns to include β€”
--concurrency <number> Parallel browser pages 3
--wait <number> Wait ms after page load 1000
--no-headless Show browser window β€”
--urls-file <path> Use URLs from a file instead of crawling β€”

Examples:

# Basic usage
npx screenshoter capture https://posty5.com

# Custom output and format
npx screenshoter capture https://posty5.com -o ./my-screenshots -f png

# Exclude dynamic pages
npx screenshoter capture https://posty5.com --exclude "/api/**" "/user/*" "/auth/**"

# Only capture specific sections
npx screenshoter capture https://posty5.com --include "/en/**"

# Limit crawl depth and page count
npx screenshoter capture https://posty5.com --max-pages 20 --max-depth 2

# Use a pre-collected URLs file
npx screenshoter capture https://posty5.com --urls-file urls.txt

collect-urls β€” Collect URLs Only

npx screenshoter collect-urls <url> [options]
Option Description Default
--file <path> Output file path urls.txt
--max-pages <number> Maximum pages to crawl 100
--max-depth <number> Maximum crawl depth 5
--exclude <patterns...> Glob patterns to exclude β€”
--include <patterns...> Glob patterns to include β€”
--no-headless Show browser window β€”

Examples:

# Collect all URLs
npx screenshoter collect-urls https://posty5.com

# Save to custom file with filters
npx screenshoter collect-urls https://posty5.com --file sitemap.txt --exclude "/api/**"

πŸ”§ Advanced Usage

Scroll Animations (Fade-in on Scroll)

Some pages reveal content as you scroll (using IntersectionObserver or CSS fade-in classes). A viewport-only screenshot would capture those elements as invisible/empty.

Use scrollToEnd: true or the built-in scrollToEnd utility to scroll through the page first:

import { capturePages, scrollToEnd } from "@posty5/screenshoter";

// Option 1 β€” config flag (auto-enabled when fullPage: true)
const result = await capturePages(urls, {
  fullPage: true, // scrollToEnd is automatically true
  waitAfterLoad: 1000,
});

// Option 2 β€” explicit flag
const result = await capturePages(urls, {
  scrollToEnd: true,
  waitAfterLoad: 1000,
});

// Option 3 β€” use the scrollToEnd utility as a beforeScreenshot hook
const result = await capturePages(urls, {
  fullPage: true,
  beforeScreenshot: scrollToEnd,
});

Before Screenshot Hook

Dismiss cookie banners, close popups, or interact with the page before taking the screenshot:

await captureWebsite({
  url: "https://example.com",
  beforeScreenshot: async (page) => {
    // Dismiss cookie consent
    const cookieBtn = page.locator('button:has-text("Accept")');
    if (await cookieBtn.isVisible()) {
      await cookieBtn.click();
      await page.waitForTimeout(500);
    }
  },
});

Two-Step Workflow: Collect β†’ Edit β†’ Capture

First collect URLs, manually edit the list, then capture only the URLs you want:

# Step 1: Collect all URLs
npx screenshoter collect-urls https://posty5.com --file urls.txt

# Step 2: Edit urls.txt β€” remove any pages you don't want

# Step 3: Capture only the remaining URLs
npx screenshoter capture https://posty5.com --urls-file urls.txt

Or programmatically with separated configs:

import { collectUrls, capturePages } from "@posty5/screenshoter";

// Step 1: Collect URLs (CollectUrlsConfig only)
const urls = await collectUrls({
  url: "https://posty5.com",
  strategy: "sitemap",
  maxPages: 500,
  excludePatterns: ["/api/**"],
  outputFile: "./urls.txt",
});

// Step 2: Filter programmatically
const staticPages = urls.filter((u) => !u.includes("/trends/"));

// Step 3: Capture with separate config (CaptureConfig only)
const result = await capturePages(staticPages, {
  outputDir: "./screenshots",
  format: "png",
  viewport: { width: 1440, height: 710 },
  concurrency: 3,
});

πŸ“‚ Output Structure

The output folder mirrors the URL path structure:

screenshots/
β”œβ”€β”€ posty5.com/
β”‚   β”œβ”€β”€ capture.webp                              ← https://posty5.com
β”‚   β”œβ”€β”€ en/
β”‚   β”‚   β”œβ”€β”€ capture.webp                          ← https://posty5.com/en
β”‚   β”‚   β”œβ”€β”€ social-media-publisher/
β”‚   β”‚   β”‚   └── capture.webp                      ← https://posty5.com/en/social-media-publisher
β”‚   β”‚   β”œβ”€β”€ qr-code-generator/
β”‚   β”‚   β”‚   └── capture.webp                      ← https://posty5.com/en/qr-code-generator
β”‚   β”‚   └── url-shortener/
β”‚   β”‚       └── capture.webp                      ← https://posty5.com/en/url-shortener
β”‚   └── ar/
β”‚       β”œβ”€β”€ capture.webp                          ← https://posty5.com/ar
β”‚       └── social-media-publisher/
β”‚           └── capture.webp                      ← https://posty5.com/ar/social-media-publisher

πŸ’» Requirements

  • Node.js: >= 18.0.0
  • TypeScript: Full type definitions included
  • Browser: Chromium (auto-installed via Playwright)

πŸ“– Resources


πŸ†˜ Support


🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Commit your changes: git commit -m 'Add amazing feature'
  5. Push to the branch: git push origin feature/amazing-feature
  6. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details.


Made with ❀️ by the Posty5 team

About

Crawl a website and take screenshots of every page, saving them in a folder structure mirroring the URL path

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors