Skip to content

bballboy8/airbyte_sync

 
 

Repository files navigation

What this is

This script leverages and extends the integration code from a provider called Airbyte to ingest data from Github (a source) and write that data into Google’s data warehouse service, BigQuery (a destination). It allows you to run this integration locally through a Docker container.

You can find the integration code for the source Github here. You can find the integration code for our destination, BigQuery here.

Precursors

A Github instance and a Personal Access Token. You can find more details about how to set up the Github connector here, including the required scopes for the Personal Access Token. A Google BigQuery instance with a specific project and dataset for this sync and a GoogleCloud service account. You can find more details about how to set up the BigQuery integration here.

How to run

  • If running on EC2, use Amazon Linux 2 AMI (HVM) - Kernel 5.10, SSD Volume Type image with t2.medium instance type and 30GB of disk space. This instance type and disk space is what Airbyte recommends. The scripts will probably run on other image types as well, but you need to make sure that bash and envsubst are available.
  • Install Docker and docker-compose. See this as an example.
  • Download this repo
  • Create config.yml to configure Github org/repository and BigQuery project and dataset ids (you can use sample_config.yml as an example)
  • In one shell:
cd airbyte_sync
docker-compose up

Wait until the server is started on port 8000 (there will be some fancy ascii art)

  • In the other shell:
export GITHUB_PERSONAL_ACCESS_TOKEN=<your Github PAK>
export GCP_CREDENTIALS_JSON='<json auth file for GCP system account>'
cd airbyte_sync
bash ./configure_connection.sh config.yml

Once it finishes, run

bash ./sync_data.sh

It'll kick off the data sync. The data sync may take a while - watch the logs in the docker-compose to see the progress. By default, this sync is configured to run manually (it's not configured to run on a schedule).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 61.0%
  • Shell 35.2%
  • Dockerfile 3.8%