diff --git a/.github/tests/dbt-get-started.sh b/.github/tests/dbt-get-started.sh index ae5eec3a..41d91631 100755 --- a/.github/tests/dbt-get-started.sh +++ b/.github/tests/dbt-get-started.sh @@ -13,5 +13,5 @@ docker-compose exec -T dbt dbt run sleep 5 # Check that there's data making it's way to the avg_bid materialized view -record_count=$(docker-compose run -T mzcli -Atc 'SELECT COUNT(*) FROM avg_bid') +record_count=$(docker-compose run -T cli -Atc 'SELECT COUNT(*) FROM avg_bid') [[ "$record_count" -gt 0 ]] diff --git a/dbt-get-started/Dockerfile b/dbt-get-started/Dockerfile new file mode 100644 index 00000000..ee788df5 --- /dev/null +++ b/dbt-get-started/Dockerfile @@ -0,0 +1,8 @@ +FROM python:3.9.9-bullseye + +WORKDIR /usr/app/dbt + +RUN set -ex; \ + pip install --no-cache-dir dbt-materialize==1.1.2 + +ENTRYPOINT ["/bin/bash"] diff --git a/dbt-get-started/README.md b/dbt-get-started/README.md index e1543563..be14aa37 100644 --- a/dbt-get-started/README.md +++ b/dbt-get-started/README.md @@ -2,7 +2,7 @@ [dbt](https://docs.getdbt.com/docs/introduction) has become the standard for data transformation (“the T in ELT”). It combines the accessibility of SQL with software engineering best practices, allowing you to not only build reliable data pipelines, but also document, test and version-control them. -While dbt is a great fit for **batch** transformations, it can only **approximate** transforming streaming data. This demo recreates the Materialize [getting started guide](https://materialize.com/docs/get-started/) using dbt as the transformation layer. +This demo recreates the Materialize [getting started guide](https://materialize.com/docs/get-started/) using dbt as the transformation layer. ## Docker @@ -38,15 +38,15 @@ dbt --version We've created a few core models that take care of defining the building blocks of a dbt+Materialize project, including a streaming [source](https://materialize.com/docs/overview/api-components/#sources): -- `market_orders_raw.sql` +- `sources/market_orders_raw.sql` , as well as a staging [view](https://materialize.com/docs/overview/api-components/#non-materialized-views) to transform the source data: -- `market_orders.sql` +- `staging/stg_market__orders.sql` -and a [materialized view](https://materialize.com/docs/overview/api-components/#materialized-views) that continuously updates as the underlying data changes: +, and a [materialized view](https://materialize.com/docs/overview/api-components/#materialized-views) that continuously updates as the underlying data changes: -- `avg_bid.sql` +- `marts/avg_bid.sql` To run the models: @@ -56,12 +56,50 @@ dbt run > :crab: As an exercise, you can add models for the queries demonstrating [joins](https://materialize.com/docs/get-started/#joins) and [temporal filters](https://materialize.com/docs/get-started/#temporal-filters). +### Test the project + +To help demonstrate how `dbt test` works with Materialize for **continuous testing**, we've added some [generic tests](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) to the [`avg_bid` model](dbt/models/marts/avg_bid.sql): + +```yaml +models: + - name: avg_bid + description: 'Computes the average bid price' + columns: + - name: symbol + description: 'The stock ticker' + tests: + - not_null + - unique +``` + +, and configured testing in the [project file](dbt/dbt_project.yml): + +```yaml +tests: + mz_get_started: + marts: + +store_failures: true + +schema: 'etl_failure' +``` + +Note that tests are configured to [`store_failures`](https://docs.getdbt.com/reference/resource-configs/store_failures), which instructs dbt to create a materialized view for each test using the respective `SELECT` statements. + +To run the tests: + +```bash +dbt test +``` + +This creates two materialized views in a dedicated schema (`public_etl_failures`): `not_null_avg_bid_symbol` and `unique_avg_bid_symbol`. dbt takes care of naming the views based on the type of test (`not_null`, `unique`) and the columns being tested (`symbol`). + +These views are continuously updated as new data streams in, and allow you to monitor failing rows **as soon as** an assertion fails. You can use this feature for unit testing during the development of your dbt models, and later in production to trigger real-time alerts downstream. + ## Materialize -To connect to the running Materialize service, you can use `mzcli`, which is included in the setup: +To connect to the running Materialize service, you can use a PostgreSQL-compatible client like `psql`, which is bundled in the `materialize/cli` image: ```bash -docker-compose run mzcli +docker-compose run cli ``` and run a few commands to check the objects created through dbt: @@ -99,6 +137,30 @@ SHOW MATERIALIZED VIEWS; You'll notice that you're only able to `SELECT` from `avg_bid` — this is because it is the only materialized view! This view is incrementally updated as new data streams in, so you get fresh and correct results with low latency. Behind the scenes, Materialize is indexing the results of the embedded query in memory. +### Continuous testing + +To validate that the schema storing the tests was created: + +```sql +SHOW SCHEMAS; + + name +-------------------- + public + public_etl_failure +``` + +, and that the materialized views that continuously test the `avg_bid` view for failures are up and running: + +```sql +SHOW VIEWS FROM public_etl_failure; + + name +------------------------- + not_null_avg_bid_symbol + unique_avg_bid_symbol +``` + ## Local installation To set up dbt and Materialize in your local environment instead of using Docker, follow the instructions in the [documentation](https://materialize.com/docs/guides/dbt/). diff --git a/dbt-get-started/compose.yaml b/dbt-get-started/compose.yaml index c5739346..66e5f936 100644 --- a/dbt-get-started/compose.yaml +++ b/dbt-get-started/compose.yaml @@ -6,11 +6,11 @@ services: ports: - 6875:6875 healthcheck: {test: curl -f localhost:6875, interval: 1s, start_period: 30s} - mzcli: + cli: image: materialize/cli:v0.26.0 - container_name: mzcli + container_name: cli dbt: - image: materialize/dbt-materialize:v0.26.0 + build: ./ container_name: dbt init: true entrypoint: /bin/bash diff --git a/dbt-get-started/dbt/dbt_project.yml b/dbt-get-started/dbt/dbt_project.yml index d2e34d5d..4db385b7 100644 --- a/dbt-get-started/dbt/dbt_project.yml +++ b/dbt-get-started/dbt/dbt_project.yml @@ -14,3 +14,9 @@ target-path: 'target' # directory which will store compiled SQL files clean-targets: # directories to be removed by `dbt clean` - 'target' - 'dbt_modules' + +tests: + mz_get_started: + marts: + +store_failures: true + +schema: 'etl_failure' diff --git a/dbt-get-started/dbt/models/avg_bid.sql b/dbt-get-started/dbt/models/marts/avg_bid.sql similarity index 53% rename from dbt-get-started/dbt/models/avg_bid.sql rename to dbt-get-started/dbt/models/marts/avg_bid.sql index 4dd4538b..e2441cbe 100644 --- a/dbt-get-started/dbt/models/avg_bid.sql +++ b/dbt-get-started/dbt/models/marts/avg_bid.sql @@ -1,6 +1,6 @@ {{ config(materialized='materializedview') }} SELECT symbol, - AVG(bid_price) AS avg -FROM {{ ref('market_orders') }} + AVG(bid_price) AS avg_bid +FROM {{ ref('stg_market_orders') }} GROUP BY symbol diff --git a/dbt-get-started/dbt/models/marts/models.yml b/dbt-get-started/dbt/models/marts/models.yml new file mode 100644 index 00000000..67b9ffb7 --- /dev/null +++ b/dbt-get-started/dbt/models/marts/models.yml @@ -0,0 +1,11 @@ +version: 2 + +models: + - name: avg_bid + description: 'Computes the average bid price' + columns: + - name: symbol + description: 'The stock ticker' + tests: + - not_null + - unique diff --git a/dbt-get-started/dbt/models/schema.yml b/dbt-get-started/dbt/models/schema.yml deleted file mode 100644 index 32815f56..00000000 --- a/dbt-get-started/dbt/models/schema.yml +++ /dev/null @@ -1,18 +0,0 @@ -version: 2 - -models: - - name: public - description: 'Raw market order data' - - - name: market_orders - description: 'Converts market order data to proper data types' - - - name: avg_bid - description: 'Computes the average bid price' - columns: - - name: symbol - description: 'The stock ticker' - tests: - - not_null - - name: avg - description: 'The average bid price' diff --git a/dbt-get-started/dbt/models/market_orders_raw.sql b/dbt-get-started/dbt/models/sources/market_orders_raw.sql similarity index 100% rename from dbt-get-started/dbt/models/market_orders_raw.sql rename to dbt-get-started/dbt/models/sources/market_orders_raw.sql diff --git a/dbt-get-started/dbt/models/sources/sources.yml b/dbt-get-started/dbt/models/sources/sources.yml new file mode 100644 index 00000000..0501ebc7 --- /dev/null +++ b/dbt-get-started/dbt/models/sources/sources.yml @@ -0,0 +1,7 @@ +version: 2 + +sources: + - name: market_orders + schema: public + tables: + - name: market_orders_raw diff --git a/dbt-get-started/dbt/models/staging/models.yml b/dbt-get-started/dbt/models/staging/models.yml new file mode 100644 index 00000000..959d53fa --- /dev/null +++ b/dbt-get-started/dbt/models/staging/models.yml @@ -0,0 +1,5 @@ +version: 2 + +models: + - name: stg_market_orders + description: 'Converts market order data to proper data types' diff --git a/dbt-get-started/dbt/models/market_orders.sql b/dbt-get-started/dbt/models/staging/stg_market_orders.sql similarity index 84% rename from dbt-get-started/dbt/models/market_orders.sql rename to dbt-get-started/dbt/models/staging/stg_market_orders.sql index 79dc3c3a..1a41bdc9 100644 --- a/dbt-get-started/dbt/models/market_orders.sql +++ b/dbt-get-started/dbt/models/staging/stg_market_orders.sql @@ -6,4 +6,4 @@ SELECT (text::jsonb)->>'symbol' AS symbol, (text::jsonb)->>'trade_type' AS trade_type, to_timestamp(((text::jsonb)->'timestamp')::bigint) AS ts -FROM {{ ref('market_orders_raw') }} +FROM {{ source('market_orders', 'market_orders_raw') }}