MySQL to Google BigQuery Sync Tool
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin Added support for .env files when installed via composer Oct 25, 2016
cache
src Added custom bigquery table name option Jul 14, 2017
.env.sample Removed google/apiclient dependecy Oct 27, 2016
.gitignore Added cache folder Oct 25, 2016
LICENSE First commit. Syncing a single table without table creation support. Oct 25, 2016
README.md +updated README Apr 6, 2017
composer.json
composer.lock Removed google/apiclient dependecy Oct 27, 2016

README.md

MySQL to Google BigQuery Logo

MySQL to Google BigQuery Sync Tool

Table of Contents

How it works

Steps when no order column has been supplied:

  • Count MySQL table rows
  • Count BigQuery table rows
  • MySQL rows > BigQuery rows?
  • Get the rows diff, split in batches of XXXXX rows/batch

Steps when order column has been supplied:

  • Get max value for order column from MySQL table
  • Get max value for order column from BigQuery table
  • Max value MySQL > Max value BigQuery?
  • Delete all rows with order column value = max value BigQuery to make sure no duplicate records are being created in BigQuery
  • Get max value for order column from BigQuery table
  • Get the rows diff based on new max value BigQuery, split in batches of XXXXX rows/batch

Final three steps:

  • Dump MySQL rows to a JSON
  • Send JSON to BigQuery
  • Repeat until all batches are sent

Tip: Create a cron job for keep syncing the tables using an interval like 15 minutes (respect the Load Jobs quota policy)

Requirements

The following PHP versions are supported:

  • PHP 7
  • HHVM
  • PDO Extension with MySQL driver

Usage

Download the library using composer:

$ composer require memeddev/mysql-to-google-bigquery

Now, define some environment variables or create a .env file on the root of the project, replacing the values:

BQ_PROJECT_ID=bigquery-project-id
BQ_KEY_FILE=google-service-account-json-key-file.json
BQ_DATASET=bigquery-dataset-name

DB_DATABASE_NAME=mysql-database-name
DB_USERNAME=mysql_username
DB_PASSWORD=mysql_password
DB_HOST=mysql-host

IGNORE_COLUMNS=password,hidden_column,another_column

PS: To create the Google Service Account JSON Key File, access https://console.cloud.google.com/apis/credentials/serviceaccountkey

Run:

vendor/bin/console sync table-name

If you want to auto create the table on BigQuery:

vendor/bin/console sync table-name --create-table

If you want to delete (and create) the table on BigQuery for a full dump:

vendor/bin/console sync table-name --delete-table

Credits

❤️ Memed SA (memed.com.br)

License

MIT license, see LICENSE