db2viz is a data pipeline project that demonstrates how to move data from an on-premises database (Postgres) to Google Cloud BigQuery for visualization in Looker Studio.
cmd/main.go: The entry point for the application.config/config.go: Configuration loader.internal/db/postgres_connector.go: Connects to the Postgres database.internal/db/db.go: Contains logic for connecting to the database and loading data.internal/data/loader.go: Loads data from Postgres.internal/data/transformer.go: Transforms data before uploading.internal/gcp/pubsub.go: Publishes data to Google Cloud Pub/Sub.scripts/run_postgres_docker.sh: Script to run a Postgres container.
-
Run the Postgres Docker container
./scripts/run_postgres_docker.sh
-
Configure your
config/config.yamlwith the necessary details for Postgres and Google Cloud Pub/Sub.postgres: host: localhost port: 5432 user: postgres password: mysecretpassword dbname: tfmv sslmode: disable tables: - name: nation schema: production topic_id: nation - name: regions schema: production topic_id: regions concurrency: 2 pubsub: project_id: tfmv-371720 credentials: /path/to/your/service-account.json workers: 20
-
Build and run the Go application
docker build -t db2viz . docker run --rm db2viz
This project demonstrates a simple ETL (Extract, Transform, Load) pipeline:
- Extract: Data is extracted from a Postgres database.
- Transform: The data is transformed as necessary within the application.
- Load:
- The transformed data is published to a Google Cloud Pub/Sub topic.
- Pub/Sub subscriptions are used to directly populate the BigQuery tables as the data streams into Pub/Sub.
The data is ultimately loaded into Google BigQuery for visualization in Looker Studio.
To stream data from Pub/Sub to BigQuery, follow these steps:
- Ensure you have the necessary permissions and enable the required APIs for Pub/Sub and BigQuery in your Google Cloud project.
- Create Pub/Sub subscriptions that write directly to BigQuery:
- Define a subscription for each Pub/Sub topic associated with your tables.
- Configure the subscription to use a push endpoint that writes to BigQuery.
Once the data is in BigQuery, you can use Looker Studio to visualize it:
- Go to Looker Studio (formerly Data Studio).
- Click on "Create" and select "Data Source".
- Choose "BigQuery" as the connector.
- Select your Google Cloud project, dataset, and table.
- Click "Connect".
- Create your report and add visualizations as needed.
To set up connectivity between your on-premises network and Google Cloud, you can use one of the following methods:
- Create a Cloud VPN Gateway: Establish a VPN gateway in GCP.
- Configure VPN Tunnels: Set up IPsec tunnels between your on-premises VPN gateway and the GCP VPN gateway.
- Use Cloud Router: Configure dynamic routing using BGP.
- Configure Firewall Rules: Allow traffic between on-premises and GCP.
For higher bandwidth and lower latency:
- Order Circuits: Arrange for dedicated or partner interconnect circuits.
- Create Interconnect Connection: Establish a physical connection to GCP.
- Configure VLAN Attachments: Set up VLANs for traffic routing.
- Use Cloud Router: Manage dynamic routing.
To set up an HA VPN:
- Create an HA VPN Gateway:
gcloud compute target-vpn-gateways create ha-vpn-gateway --region us-central1
