Crawlarr

Crawlarr is a web crawler built using the Go programming language. This tool allows users to input a base URL, and it will search through the HTML code to locate all anchor tags (<a>) on the page. Crawlarr will then follow these links and repeat the process, searching through each subsequent page for more anchor tags until either the end of the website or a user-defined maximum depth is reached. This tool leverages concurrency to significantly increase its speed.

See the changelog for the latest updates.

Installation

Download go (go 1.20 required).
Download or clone the project.
Download the binary from the Releases or build it yourself.
Configure Crawlarr.

Compiling from source

Use build.sh or use go build in cmd/crawlarr/

Usage

With a binary:
- Run chmod +x crawlarr.
- Start the tool with ./crawlarr
Running from source:
- Start the tool with go run ./cmd/crawlarr/main.go or cd ./cmd/crawlarr/ && go run .

Find the results in links.txt.

Configuring Crawlarr

The config can be found at the root of the project.

Open the config in your favorite editor.
Enable the features you want to use. See Config details for in-depth explanations.

Config details

Item	Values	Meaning
debug	`boolean`	Enable debug logs
baseUrl	`text`	The URL to starts with
matchType	`text`	Matching type for URL
depthLimit	`number`	Maximum crawling depth
delay	`number`	Delay in ms between crawls

Matching types

SAME_BASE:
Match the same base URL, e.g:

baseUrl: "http://example.com/this-page/"
+ valid match : http://example.com/this-page/random-page/
- discarded match : http://example.com/another-page/
- discarded match : http://test.example.com/
- discarded match : http://random.site/a-third-page/

SAME_HOST:
Match the same host, e.g:

baseUrl: "http://example.com/this-page/"
+ valid match : http://example.com/this-page/random-page/
+ valid match : http://example.com/another-page/
- discarded match : http://test.example.com/
- discarded match : http://random.site/another-page/

SAME_ORIGIN:
Match the same origin, e.g:

baseUrl: "http://example.com/this-page/"
+ valid match : http://example.com/this-page/random-page/
+ valid match : http://example.com/another-page/
+ valid match : http://test.example.com/
- discarded match : http://random.site/another-page/

DANGEROUS_NO_MATCH_TYPE_ONLY_ENABLE_IF_YOU_KNOW_WHAT_YOURE_DOING:
Match any URL (this can go very far), e.g:

baseUrl: "http://example.com/this-page/"
+ valid match : http://example.com/this-page/random-page/
+ valid match : http://example.com/another-page/
+ valid match : http://test.example.com/
+ valid match : http://random.site/another-page/

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with [project name]. If not, see https://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
cmd/crawlarr		cmd/crawlarr
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
config.json		config.json
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

cmd/crawlarr

cmd/crawlarr

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

build.sh

build.sh

config.json

config.json

go.mod

go.mod

Repository files navigation

Crawlarr

Table of content

Installation

Compiling from source

Usage

Configuring Crawlarr

Config details

Matching types

License

About

Releases 1

Sponsor this project

Languages

License

LockBlock-dev/crawlarr

Folders and files

Latest commit

History

Repository files navigation

Crawlarr

Table of content

Installation

Compiling from source

Usage

Configuring Crawlarr

Config details

Matching types

License

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages