Big Crawler IPs 🕷️

Welcome to Big Crawler IPs! This is a serverless function that routinely checks both Google and Bing's list of official crawler IP addresses and saves them into a BigQuery table.

Deployment Guide

Watch the step by step deployment guide here.

Environment Variables

When deploying Big Crawler IPs your Cloud Function needs the following environment variables:

bqProjectId: Your Google Cloud Project's ID
bqDataset: The name of your BigQuery dataset
bqTable: The name of your table in BigQuery
gServiceAccount: Your Google Cloud Service Account
kgKey: A unique identifier to "authenticate" incoming HTTP requests

How It Works

Once the function is deployed as a Google Cloud Function that is triggered via cron job HTTP request, the function then:

Checks to make sure that your table exists and if it doesn't create one
Gathers all of the existing IPs from your BigQuery table
Scrapes the offical GoogleBot and BingBot IP address files
Cross references the pre-existing IPs in BigQuery and the official scrapped IPs
Saves the IPs that were found on the official list but, not in BigQuery

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

index.js

index.js

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Big Crawler IPs 🕷️

Deployment Guide

Environment Variables

How It Works

About

Releases

Packages

Languages

License

JordanChoo/big-crawler-ips

Folders and files

Latest commit

History

Repository files navigation

Big Crawler IPs 🕷️

Deployment Guide

Environment Variables

How It Works

About

Topics

Resources

License

Stars

Watchers

Forks

Languages