Skip to content


Repository files navigation

Jacob's Crawler

CLI Version

You can try the CLI version by installing the package globally:

pnpm i -g @jacoblincool/crawler-cli

Then you can run the crawler with the following command:

crawler --depth 3

It will dump the whole website into a directory named in the current working directory.

Library Version

You can also use the library version in your own project:

pnpm i @jacoblincool/crawler

Then you can use it in your code:

import { Crawler } from "@jacoblincool/crawler";

(async () => {
    const crawler = new Crawler();

    const result = await crawler.start({
        entry: "",
        depth: 3,
        actors: {
            images: {
                match: /^https:\/\/,
                action: async ({ page, targets, path }) => {
                    const hyperlinks = await page.getByRole("link").all();
                    for (const link of hyperlinks) {
                        const href = await link.getAttribute("href");
                        if (href !== null) {

                    const images = await page.getByRole("img").all();
                    const urls = await Promise.all(
               => image.getAttribute("src")),

                    return [ Set(urls.filter((url) => url !== null))];

    await crawler.stop();
// console.log(result):
  images: Map(17) {
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png', '' ],
    '' => [ '/logo.png', '' ],
    '' => [ '/logo.png', '' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ],
    '' => [ '/logo.png' ]