Reads web addresses from a CSV file and uses curl to record their HTTP status codes. Results are provided in a modified version of the original CSV file. Created and tested on Fedora 32 Linux.
There is no installation. The script can be cloned or copied. If copied, make sure the new file is executable.
Clone method.
git clone https://github.com/bcgov/site-status-checker
cd site-status-checkerCopy and paste method.
<copy and paste into count.sh>
chmod +x count.shRun the script without parameters for basic help.
./count.sh
Reads a CSV file of websites (header=URL) and outputs their statuses (header=HTTP_STATUS)
[TIMEOUT=10] ./count.sh ./input.csv [./results.csv]Required: Consume an input file with the first parameter.
./count.sh sites.csvOptional: Change the default file output from results.csv with the second parameter.
./count.sh sites.csv output-file.csvA CSV (comma separated value) is expected. Call the input column header URL. A second CSV file is created with the same content, but adding results under the HTTP_STATUS column. If not present, the column will be created.
URL is the default input header.
HTTP_STATUS is the default output header.
Input and output headers can be specified at runtime.
HEADER_IN=URL HEADER_OUT=HTTP_STATUS ./count.sh sites.csvThe default curl timeout has been set at 15 seconds, but may be changed as follows.
TIMEOUT=30 ./count.sh sites.csvA verbose mode is provided for troubleshooting.
VERBOSE=true ./count.sh sites.csvAny combination of variables and optional parameters can be specified.
VERBOSE=true TIMEOUT=30 HEADER_IN=URL HEADER_OUT=HTTP_STATUS ./count.sh sites.csv output-file.csvEmpty and filtered out addresses will be labeled Excluded.
FTP sites and local file shares cannot be used. E.g. ftp:// or \\.
Addresses that do not receive a response will be labeled No Response.
This may mean that the address is wrong, has been moved without providing a redirect or is down.
Sites that do respond will receive HTTP status codes. Please see Wikipedia for a List of HTTP status codes.
200-codes indicate success. Any other codes may be generalized as an error.
400-codes indicate client errors. They are usually related to rights or incorrect addresses.
500-codes indicate server errors. The site could be down, overloaded, timing out otherwise unavailable.
100-codes are informational. These will not be reported.
300 codes are redirections. The will be silently followed, resulting in 200, 400 or 500 codes.
Sample data.
| Won | Too | URL | Fore | HTTP_STATUS |
|---|---|---|---|---|
| sweet | bike | https://github.com/bcgov/site-status-checker | margin | 1 |
| encourage | egg | https://www.google.ca/ | 2 | |
| adventure | lily | https://www.google.ca/ (duplicate!) | threaten | 3 |
| traffic | \\things.yup.blorg.idir.yup | butterfly | 4 | |
| reckless | read | ftp://watermelon | conscious | 5 |
| sphere | appear | eato.burrito (this is not a thing) | gradient | |
| landowner | suite | https://www.facebook.com/marketplace | album | 7 |
| free | https://farts.com/ (redirects to thepooter.com) | reduction | 8 | |
| tribute | balance | www.amazon.ca/dp/B06XR8LS2L?keywords=parrot&pirate=yaar | infinite | 9 |
Sample run.
./count.sh sites.csv| Won | Too | URL | Fore | HTTP_STATUS |
|---|---|---|---|---|
| sweet | bike | https://github.com/bcgov/site-status-checker | margin | 200 |
| encourage | egg | https://www.google.ca/ | 200 | |
| adventure | lily | https://www.google.ca/ (duplicate!) | threaten | 200 |
| traffic | \\things.yup.blorg.idir.yup | butterfly | Excluded | |
| reckless | read | ftp://watermelon | conscious | Excluded |
| sphere | appear | eato.burrito (this is not a thing) | gradient | No Response |
| landowner | suite | https://www.facebook.com/marketplace | album | 200 |
| free | https://farts.com/ (redirects to thepooter.com) | reduction | 200 | |
| tribute | balance | www.amazon.ca/dp/B06XR8LS2L?keywords=parrot&pirate=yaar | infinite | 405 |
| 10 in / 10 out |
Sample run appending a non-standard header.
HEADER_OUT=NEW ./count.sh sites.csv| Won | Too | URL | Fore | HTTP_STATUS | NEW |
|---|---|---|---|---|---|
| sweet | bike | https://github.com/bcgov/site-status-checker | margin | 1 | 200 |
| encourage | egg | https://www.google.ca/ | 2 | 200 | |
| adventure | lily | https://www.google.ca/ (duplicate!) | threaten | 3 | 200 |
| traffic | \\things.yup.blorg.idir.yup | butterfly | 4 | Excluded | |
| reckless | read | ftp://watermelon | conscious | 5 | Excluded |
| sphere | appear | eato.burrito (this is not a thing) | gradient | No Response | |
| landowner | suite | https://www.facebook.com/marketplace | album | 7 | 200 |
| free | https://farts.com/ (redirects to thepooter.com) | reduction | 8 | 200 | |
| tribute | balance | www.amazon.ca/dp/B06XR8LS2L?keywords=parrot&pirate=yaar | infinite | 9 | 503 |
| 10 in / 10 out |
Julian Subda, Product Owner
Shivagani Murti, Technical Administrator
Derek Roberts, DevOps Specialst
Matt Lewis, creator of csvtomd-web, which formatted the CSV in this guide
Danny Guo, for the starter README.md
Please request features or issue corrections by submitting issues.
Pull requests are even better! Contributions will be squashed on merge after review and acceptance. Providing adequate description of changes makes this much easier.
Of course, please test thoroughly using sites.csv and any other CSV data.
Please be aware it is unsafe to provide confidential data to an online tool.