Crawlarr is a web crawler built using the Go programming language. This tool allows users to input a base URL, and it will search through the HTML code to locate all anchor tags (<a>
) on the page. Crawlarr will then follow these links and repeat the process, searching through each subsequent page for more anchor tags until either the end of the website or a user-defined maximum depth is reached. This tool leverages concurrency to significantly increase its speed.
See the changelog for the latest updates.
- Download go (go 1.20 required).
- Download or clone the project.
- Download the binary from the Releases or build it yourself.
- Configure Crawlarr.
- Use
build.sh
or usego build
incmd/crawlarr/
- With a binary:
- Run
chmod +x crawlarr
. - Start the tool with
./crawlarr
- Run
- Running from source:
- Start the tool with
go run ./cmd/crawlarr/main.go
orcd ./cmd/crawlarr/ && go run .
- Start the tool with
Find the results in links.txt
.
The config can be found at the root of the project.
- Open the
config
in your favorite editor. - Enable the features you want to use. See Config details for in-depth explanations.
Item | Values | Meaning |
---|---|---|
debug | boolean |
Enable debug logs |
baseUrl | text |
The URL to starts with |
matchType | text |
Matching type for URL |
depthLimit | number |
Maximum crawling depth |
delay | number |
Delay in ms between crawls |
-
SAME_BASE
:
Match the same base URL, e.g:baseUrl: "http://example.com/this-page/" + valid match : http://example.com/this-page/random-page/ - discarded match : http://example.com/another-page/ - discarded match : http://test.example.com/ - discarded match : http://random.site/a-third-page/
-
SAME_HOST
:
Match the same host, e.g:baseUrl: "http://example.com/this-page/" + valid match : http://example.com/this-page/random-page/ + valid match : http://example.com/another-page/ - discarded match : http://test.example.com/ - discarded match : http://random.site/another-page/
-
SAME_ORIGIN
:
Match the same origin, e.g:baseUrl: "http://example.com/this-page/" + valid match : http://example.com/this-page/random-page/ + valid match : http://example.com/another-page/ + valid match : http://test.example.com/ - discarded match : http://random.site/another-page/
-
DANGEROUS_NO_MATCH_TYPE_ONLY_ENABLE_IF_YOU_KNOW_WHAT_YOURE_DOING
:
Match any URL (this can go very far), e.g:baseUrl: "http://example.com/this-page/" + valid match : http://example.com/this-page/random-page/ + valid match : http://example.com/another-page/ + valid match : http://test.example.com/ + valid match : http://random.site/another-page/
Copyright (c) 2023 LockBlock-dev
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with [project name]. If not, see https://www.gnu.org/licenses/.