GitHub

How to migrate from database to data lake with dbt Cloud and Starburst Galaxy

With the advent of modern table formats such as Apache Iceberg that provide data warehousing capabilities on object storage, there has been a huge groundswell around the concept of a modern data lake. However, the process of actually standing up a data lake can be quite daunting if you don’t know where to start.

In this blog post, we will be walking through the necessary steps involved to migrate a table currently residing in Postgres into a modern data lake in AWS S3. During this process, we will leverage Starburst Galaxy as our data lake analytics platform and dbt Cloud as our data transformation tool. This will involve connecting to the underlying Postgres database and transforming the data into a manageable format by adhering to SCD Type #2 to account for DML statements.

Pre-requisites

Before getting started, ensure you have the following:

A GitHub account
A Starburst Galaxy domain
A running PostgreSQL instance (any RDBMS instance will work)
Access to a dbt Cloud account
Authentication credentials to Postgres and AWS S3

Setup Process

Configure two new catalogs within Starburst Galaxy

Postgresql catalog to your existing database
AWS S3 catalog for your data lake table

Locate your connection variables in Starburst Galaxy’s Partner Connect pane
Navigate to dbt Cloud and configure a new project

From account settings (using the gear menu in the top right corner), click “+ New Project”.
Enter a project name and choose Starburst as your connection.
Enter the settings for your new project (Host and Port) obtained from the second step above.
Enter the development credentials for your new project.
- User and password (step 2)
- Catalog: Name of the AWS S3 catalog you created in step 1
- Schema: Name of the schema that you would like your data lake to reside in (e.g. “burst_bank”)

Configure integration between dbt Cloud and GitHub

Install dbt Cloud in your Github account
Clone the following repo within dbt Cloud using HTTPS: https://github.com/YCat33/rds_to_s3_dbt.git

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
analyses		analyses
macros		macros
models		models
seeds		seeds
snapshots		snapshots
tests		tests
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyses

analyses

macros

macros

models

models

seeds

seeds

snapshots

snapshots

tests

tests

.gitignore

.gitignore

README.md

README.md

dbt_project.yml

dbt_project.yml

Repository files navigation

How to migrate from database to data lake with dbt Cloud and Starburst Galaxy

Pre-requisites

Setup Process

About

Releases

Packages

YCat33/rds_to_s3_dbt

Folders and files

Latest commit

History

Repository files navigation

How to migrate from database to data lake with dbt Cloud and Starburst Galaxy

Pre-requisites

Setup Process

About

Resources

Stars

Watchers

Forks