Skip to content

Bviolet12/Cheker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Actions: My Broken Link Checker âś”

GitHub Marketplace license release GitHub release date GitHub Actions status Docker Hub Build Status

This is a GitHub Action to check broken link in your static files or web pages. The muffet is used for URL checking task.

See the basic GitHub Action example to run periodic checks (weekly) against google.com:

on:
  schedule:
    - cron: '0 0 * * 0'

name: Check markdown links
jobs:
  my-broken-link-checker:
    name: Check broken links
    runs-on: ubuntu-latest
    steps:
      - name: Check
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://www.google.com
          cmd_params: "--one-page-only --max-connections=3 --color=always"  # Check just one page

Check out the real demo:

My Broken Link Checker demo

This deploy action can be combined with Static Site Generators (Hugo, MkDocs, Gatsby, GitBook, mdBook, etc.). The following examples expects to have the web page stored in ./build directory. There is a caddy web server started during the tests which is using the hostname from the URL parameter and serving the web pages (see the details in entrypoint.sh).

- name: Check
  uses: ruzickap/action-my-broken-link-checker@v2
  with:
    url: https://www.example.com/test123
    pages_path: ./build/
    cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --timeout=20'  # muffet parameters

Do you want to skip the docker build step? OK, the script mode is also available:

- name: Check
  env:
    INPUT_URL: https://www.example.com/test123
    INPUT_PAGES_PATH: ./build/
    INPUT_CMD_PARAMS: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --skip-tls-verification'  # --skip-tls-verification is mandatory parameter when using https and "PAGES_PATH"
  run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

Parameters

Environment variables used by ./entrypoint.sh script.

Variable Default Description
INPUT_CMD_PARAMS --buffer-size=8192 --max-connections=10 --color=always --verbose Command-line parameters for URL checker muffet - details here
INPUT_DEBUG false Enable debug mode for the ./entrypoint.sh script (set -x)
INPUT_PAGES_PATH Relative path to the directory with local web pages
INPUT_URL (Mandatory / Required) URL which will be checked

Example of Periodic checks

Pipeline for periodic link checks:

name: periodic-broken-link-checks

on:
  workflow_dispatch:
  push:
    paths:
      - .github/workflows/periodic-broken-link-checks.yml
  schedule:
    - cron: '3 3 * * 3'

jobs:
  broken-link-checker:
    runs-on: ubuntu-latest
    steps:

      - name: Get GH Pages URL
        id: gh_pages_url
        uses: actions/github-script@v5
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            let result = await github.request('GET /repos/:owner/:repo/pages', {
              owner: context.repo.owner,
              repo: context.repo.repo
            });
            console.log(result.data.html_url);
            return result.data.html_url
          result-encoding: string

      - name: Check broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: ${{ steps.gh_pages_url.outputs.result }}
          cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --timeout=20'

Full example

GitHub Action example:

name: Checks

on:
  push:
    branches:
      - main

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Create web page
        run: |
          mkdir -v public
          cat > public/index.html << EOF
          <!DOCTYPE html>
          <html>
            <head>
              My page which will be stored on my-testing-domain.com domain
            </head>
            <body>
              Links:
              <ul>
                <li><a href="https://my-testing-domain.com">https://my-testing-domain.com</a></li>
                <li><a href="https://my-testing-domain.com:443">https://my-testing-domain.com:443</a></li>
              </ul>
            </body>
          </html>
          EOF

      - name: Check links using script
        env:
          INPUT_URL: https://my-testing-domain.com
          INPUT_PAGES_PATH: ./public/
          INPUT_CMD_PARAMS: '--skip-tls-verification --verbose --color=always'
          INPUT_DEBUG: true
        run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

      - name: Check links using container
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://my-testing-domain.com
          pages_path: ./public/
          cmd_params: '--skip-tls-verification --verbose --color=always'
          debug: true

Best practices

Let's try to automate the creating the web pages as much as possible.

The ideal situation require the repository naming convention, where the name of the GitHub repository should match the URL where it will be hosted.

GitHub Pages with custom domain

The mandatory part is the repository name awsug.cz which is the same as the domain:

The web pages will be stored as GitHub Pages on it's own domain.

The GH Action file may looks like:

name: hugo-build

on:
  pull_request:
    types: [opened, synchronize]
  push:

jobs:
  hugo-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Checkout submodules
        shell: bash
        run: |
          auth_header="$(git config --local --get http.https://github.com/.extraheader)"
          git submodule sync --recursive
          git -c "http.extraheader=$auth_header" -c protocol.version=2 submodule update --init --force --recursive --depth=1

      - name: Setup Hugo
        uses: peaceiris/actions-hugo@v2
        with:
          hugo-version: '0.62.0'

      - name: Build
        run: |
          hugo --gc
          cp LICENSE README.md public/
          echo "${{ github.event.repository.name }}" > public/CNAME

      - name: Check broken links
        env:
          INPUT_URL: https://${{ github.event.repository.name }}
          INPUT_PAGES_PATH: public
          INPUT_CMD_PARAMS: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --exclude="(mylabs.dev|linkedin.com)"'
        run: |
          wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

      - name: Check links using container
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://my-testing-domain.com
          pages_path: ./public/
          cmd_params: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --exclude="(mylabs.dev|linkedin.com)"'
          debug: true

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
        env:
          ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          PUBLISH_BRANCH: gh-pages
          PUBLISH_DIR: public
        with:
          forceOrphan: true

The example is using Hugo.

GitHub Pages with github.io domain

The mandatory part is the repository name k8s-harbor which is the directory part at the and of ruzickap.github.io:

In the example the web pages will be using GitHub's domain github.io.

name: vuepress-build-check-deploy

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - .github/workflows/vuepress-build-check-deploy.yml
      - docs/**
      - package.json
      - package-lock.json
  push:
    paths:
      - .github/workflows/vuepress-build-check-deploy.yml
      - docs/**
      - package.json
      - package-lock.json

jobs:
  vuepress-build-check-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install Node.js 12
        uses: actions/setup-node@v1
        with:
          node-version: 12.x

      - name: Install VuePress and build the document
        run: |
          npm install
          npm run build
          cp LICENSE docs/.vuepress/dist
          sed -e "s@(part-@(https://github.com/${GITHUB_REPOSITORY}/tree/main/docs/part-@" -e 's@.\/.vuepress\/public\/@./@' docs/README.md > docs/.vuepress/dist/README.md
          ln -s docs/.vuepress/dist ${{ github.event.repository.name }}

      - name: Check broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}
          pages_path: .
          cmd_params: '--exclude=mylabs.dev --max-connections-per-host=5 --rate-limit=5 --timeout=20 --header="User-Agent:curl/7.54.0" --skip-tls-verification'

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
        env:
          ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          PUBLISH_BRANCH: gh-pages
          PUBLISH_DIR: ./docs/.vuepress/dist
        with:
          forceOrphan: true

In this case I'm using VuePress to create my page.

GitHub Action my-broken-link-checker


Both examples can be used as a generic template, and you do not need to change them for your projects.

Running locally

It's possible to use the checking script locally. It will install caddy and muffet binaries if they are not already installed on your system.

export INPUT_URL="https://google.com"
export INPUT_CMD_PARAMS="--ignore-fragments --one-page-only --max-connections=10 --color=always --verbose"
./entrypoint.sh

Output:

*** INFO: [2019-12-30 14:53:54] Start checking: "https://google.com"
https://www.google.com/
        200     http://www.google.cz/history/optout?hl=cs
        200     http://www.google.cz/intl/cs/services/
        200     https://accounts.google.com/ServiceLogin?hl=cs&passive=true&continue=https://www.google.com/
        200     https://drive.google.com/?tab=wo
        200     https://mail.google.com/mail/?tab=wm
        200     https://maps.google.cz/maps?hl=cs&tab=wl
        200     https://news.google.cz/nwshp?hl=cs&tab=wn
        200     https://play.google.com/?hl=cs&tab=w8
        200     https://www.google.com/advanced_search?hl=cs&authuser=0
        200     https://www.google.com/images/branding/googlelogo/1x/googlelogo_white_background_color_272x92dp.png
        200     https://www.google.com/intl/cs/about.html
        200     https://www.google.com/intl/cs/ads/
        200     https://www.google.com/intl/cs/policies/privacy/
        200     https://www.google.com/intl/cs/policies/terms/
        200     https://www.google.com/language_tools?hl=cs&authuser=0
        200     https://www.google.com/preferences?hl=cs
        200     https://www.google.com/setprefdomain?prefdom=CZ&prev=https://www.google.cz/&sig=K_WmKyDZc24PJiXFyTjsUeLLrG-P4%3D
        200     https://www.google.com/textinputassistant/tia.png
        200     https://www.google.cz/imghp?hl=cs&tab=wi
        200     https://www.google.cz/intl/cs/about/products?tab=wh
        200     https://www.youtube.com/?gl=CZ&tab=w1
*** INFO: [2019-12-30 14:53:55] Checks completed...

You can also use the advantage of the container to run the checks locally without touching your system:

export INPUT_URL="https://google.com"
export INPUT_CMD_PARAMS="--ignore-fragments --one-page-only --max-connections=10 --color=always --verbose"
docker run --rm -t -e INPUT_URL -e INPUT_CMD_PARAMS peru/my-broken-link-checker

my-broken-link-checker-demo

Another example when checking the the web page locally stored on your disk. In this case I'm using the web page created in the ./tests/ directory from this git repository:

export INPUT_URL="https://my-testing-domain.com"
export INPUT_PAGES_PATH="${PWD}/tests/"
export INPUT_CMD_PARAMS="--skip-tls-verification --verbose --color=always"
./entrypoint.sh

Output:

*** INFO: Using path "/home/pruzicka/git/action-my-broken-link-checker/tests/" as domain "my-testing-domain.com" with URI "https://my-testing-domain.com"
*** INFO: [2019-12-30 14:54:22] Start checking: "https://my-testing-domain.com"
https://my-testing-domain.com/
        200     https://my-testing-domain.com
        200     https://my-testing-domain.com/run_tests.sh
        200     https://my-testing-domain.com:443
        200     https://my-testing-domain.com:443/run_tests.sh
https://my-testing-domain.com:443/
        200     https://my-testing-domain.com
        200     https://my-testing-domain.com/run_tests.sh
        200     https://my-testing-domain.com:443
        200     https://my-testing-domain.com:443/run_tests.sh
*** INFO: [2019-12-30 14:54:22] Checks completed...

The same example as above, but in this case I'm using the container:

export INPUT_URL="https://my-testing-domain.com"
export INPUT_PAGES_PATH="${PWD}/tests/"
export INPUT_CMD_PARAMS="--skip-tls-verification --verbose"
docker run --rm -t -e INPUT_URL -e INPUT_CMD_PARAMS -e INPUT_PAGES_PATH -v "$INPUT_PAGES_PATH:$INPUT_PAGES_PATH" peru/my-broken-link-checker

Examples

Some other examples of building and checking web pages using Static Site Generators and GitHub Actions can be found here: https://github.com/peaceiris/actions-gh-pages/

The following links contains real examples of My Broken Link Checker: