Simple crawler for check the http code for all urls of your site, all made in Go inside a distroless container (only 11MB~)!
Visit also docker hub repository.
Basically is a simple app that parse your sitemap.xml and then make a request for each url in your site, is very usable for CI integration (for example test each url and terminate if detect any url with a bad http code).
Also you can configure the size of the workers pool for increase the number of parallel task and process all urls more quickly (default value is 1
)!.
- Docker Engine. ❤️
You only run this command in your terminal:
docker run \
-e 'HOST=https://www.enriquetejeda.com' \
etejeda/crawler-http-checker:latest
- Rename the
.env.example
to.env
and configure the values - Compile with the command
make build
- Run the command
make run
#!groovy
pipeline {
agent { node { label 'master' } }
options { skipDefaultCheckout true }
environment {}
stages {
stage('Build'){
steps {
checkout scm
}
}
stage('Test'){
steps {
echo 'Verify all urls..'
docker.image('etejeda/crawler-http-checker:latest').run('-e HOST=https://www.enriquetejeda.com')
}
}
stage('Deploy'){
steps {
echo 'deploy'
}
}
}
}
I provided a makefile for do this job, only run this command:
make build
I provided a makefile for do this job, only run this command:
make build-docker
Name | Description | Default | Required |
---|---|---|---|
HOST | The host for scan | - | yes |
NEW_HOST | If you require replace the url for other | - | no |
USER_AGENT | User-Agent use for make each request | Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) |
no |
WORKER_SIZE | The number of parallel task | 1 |
no |
SITEMAP_FILENAME | Name of the file for sitemap | sitemap.xml |
no |
Please feel free to contribute to this project, please fork the repository and make a pull request!. ❤️
Like this project? Please give it a ★ on this GitHub! (it helps me a lot).
See LICENSE for full details.
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.