Web Crawler

Overview

This web crawler crawls a given website and generates a report for all the internal and external links found during the crawl.

Repository mirrors

Code Flow: https://codeflow.dananglin.me.uk/apollo/web-crawler
GitHub: https://github.com/dananglin/web-crawler

Requirements

Go: A minimum version of Go 1.23.0 is required for building/installing the web crawler. Please go here to download the latest version.

Build the application

Clone this repository to your local machine.

git clone https://github.com/dananglin/web-crawler.git

Build the application.

Build with go
```
go build -o crawler .
```
Or build with mage if you have it installed.
```
mage build
```

Run the application

Run the application specifying the website that you want to crawl.

Format

./crawler [FLAGS] URL

Examples

Crawl the Crawler Test Site.
```
./crawler https://crawler-test.com
```
Crawl the site using 3 concurrent workers and stop the crawl after discovering a maximum of 100 unique pages.
```
./crawler --max-workers 3 --max-pages 100 https://crawler-test.com
```

Crawl the site and print out a JSON report.

./crawler --max-workers 3 --max-pages 100 --format json https://crawler-test.com

Crawl the site and save the report to a CSV file.

mkdir -p reports
./crawler --max-workers 3 --max-pages 100 --format csv --file reports/report.csv https://crawler-test.com

Flags

You can configure the application with the following flags.

Name	Description	Default
`max-workers`	The maximum number of concurrent workers.	2
`max-pages`	The maximum number of pages the crawler can discoverd before stopping the crawl.	10
`format`	The format of the generated report. Currently supports `text`, `csv` or `json`.	text
`file`	The file to save the generated report to. Leave this empty to print to the screen instead.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.forgejo/workflows		.forgejo/workflows
internal		internal
magefiles		magefiles
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler

Overview

Repository mirrors

Requirements

Build the application

Run the application

Format

Examples

Flags

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Overview

Repository mirrors

Requirements

Build the application

Run the application

Format

Examples

Flags

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages