Filters access log lines by verifying IP addresses through a reverse look up, matched against a given domain suffix. Useful for access log filtering.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
verify-ip
verify-ip-test

README.md

verify-ip

Given access log request lines on STDIN, returns only those from an IP that reverse-lookups to a domain matching the one specified.

An example use case is filtering out web server access log lines that come only from valid Google bot requests, to avoid a spoofed user agent string, for example:

cat access.log |
  grep -E 'Feedfetcher-Google[^"]+"$' |
  verify-ip --domain 'google\.com'

Comes with a convenience method for filtering out all Google web crawler (does not include Feedfetcher) requests:

verify-ip --google

License and contributing

Copyright (c) 2013 Adam Prescott https://aprescott.com/.

verify-ip is released under the MIT license. See LICENSE for details.

The quickest way to get changes contributed:

  1. Visit the GitHub repository.
  2. Fork the repository.
  3. Check out a branch on the latest master for your change: git checkout -b master new-feature --- do not make changes on master!
  4. Send a pull request on GitHub, including a description of what you've changed.

Help & usage

Synopsis:

    verify-ip --domain[-pattern] REGEX_PATTERN
              [--google]
              [-h | --help]

Options:

    -h, --help

    	Print the help page and exit.

    --domain[-pattern] REGEX_PATTERN

        When the IP specified by --ip is put through a reverse look-up,
        only treat it as a "valid" IP if the domain found matches
        REGEX_PATTERN. Note that REGEX_PATTERN will be used as,

            (^|\.?)${REGEX_PATTERN}\.$

        to ensure a fully-qualified domain so that, e.g.,
        "googlebot.com.fakedomain.com." does not match "googlebot\.com",
        and "myfakegooglebot.com." does not match "googlebot\.com\.$".

    --google

        Optional.

        Pre-filters lines that match only known Google web crawler user
        agent string fragments, such as "Mediapartners-Google" and Assumes
        --domain is passed with "googlebot\.com".

        Assumes that the user agent strings do not contain any "
        characters and appear at the end of the line, as with
        the combined log format.

Notes:

    All lines are assumed to contain an IP address as the first space-
    delimited token.

Examples:

    Simple usage:
  
        cat access.log | verify-ip --domain "foo\.com"

    Filter Google web crawler requests by UA + IP:

        cat access.log | verify-ip --google

    Filter valid Google requests only for AdsBot-Google
    requests, based on a UA string from a combined logging
    format:

        cat access.log | grep -E 'AdsBot-Google[^"]+"$' |
          verify-ip --domain 'googlebot\.com'

    Filter only Google Reader requests that come from Feedfetcher,
    which comes from the "google.com" domain, as per
    http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182072 :

        cat access.log | grep -E 'Feedfetcher-Google' |
          verify-ip --domain "google\.com"