scpr

scpr is a simple and straightforward webscraping CLI tool made to scrape page as markdown content, and developed to be used both by humans and by coding agents (either as an MCP server or as a skill).

scpr is written in Go and based on colly for web scraping and html-to-markdown for converting HTML pages to markdown.

Installation

Install with Go (v1.24+ required):

go install github.com/AstraBert/scpr

Install with NPM:

npm install @cle-does-things/scpr

Extra instructions for Windows installation

If you are on Windows, scpr might not be available right after global installation with npm. In that case, you might need to take extra steps:

Find where the node executable is stored on your machine:

Get-Command node

This will print the directory where node.exe is stored: scpr will be installed at .\bin\scpr.exe in that folder.

Note

If you are using nvm for Windows, node.exe will be at C:\Users\nvm4w\nodejs

Add {NODE_FOLDER}\bin (in the case of nvm: C:\Users\nvm4w\nodejs\bin) to the PATH environment variables. Follow this guide for instructions on how to set PATH env variables.
Restart your computer
Test scpr --help from your terminal. The execution might be challenged by your antivirus, but, since the executable does not contain any harmful code, the antivirus will eventually allow it

Usage

As a CLI tool

Basic usage (scrape a single page):

scpr --url https://example.com --output ./scraped

This will scrape the page and save it as a markdown file in the ./scraped folder.

Recursive scraping

To scrape a page and all linked pages within the same domain:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 3

Parallel scraping

Speed up recursive scraping with multiple threads:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 2 --parallel 5

Additional options

--log - Set logging level (info, debug, warn, error)
--max - Maximum depth of pages to follow (default: 1)
--parallel - Number of concurrent threads (default: 1)
--allowed - Allowed domains for recursive scraping (can be specified multiple times)

For more details, run:

scpr --help

As a stdio MCP server

Start the MCP server with:

scpr mcp

And configure it in agents using:

{
  "mcpServers": {
    "web-scraping": {
      "type": "stdio",
      "command": "scpr",
      "args": [
        "mcp"
      ],
      "env": {}
    }
  }
}

The above JSON snippet is reported as used by Claude Code, adapt it to your agent before using it

Contributing

Contributions are welcome! Please read the Contributing Guide to get started.

License

This project is licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.claude/skills/web-scraping		.claude/skills/web-scraping
.github/workflows		.github/workflows
cmd		cmd
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
.mcp.json		.mcp.json
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scpr

Installation

Extra instructions for Windows installation

Usage

As a CLI tool

As a stdio MCP server

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scpr

Installation

Extra instructions for Windows installation

Usage

As a CLI tool

As a stdio MCP server

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages