-
Notifications
You must be signed in to change notification settings - Fork 9
dbt-getting-started: materialized tests #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
845a289
6eb8803
06ba2dd
8b8b732
07f1c9a
211c16c
8896c10
f1209d9
8a06fa0
69848a6
50e4bfa
ad8d1eb
7d75069
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| FROM python:3.9.9-bullseye | ||
|
|
||
| WORKDIR /usr/app/dbt | ||
|
|
||
| RUN set -ex; \ | ||
| pip install --no-cache-dir dbt-materialize==1.1.2 | ||
|
|
||
| ENTRYPOINT ["/bin/bash"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ | |
|
|
||
| [dbt](https://docs.getdbt.com/docs/introduction) has become the standard for data transformation (“the T in ELT”). It combines the accessibility of SQL with software engineering best practices, allowing you to not only build reliable data pipelines, but also document, test and version-control them. | ||
|
|
||
| While dbt is a great fit for **batch** transformations, it can only **approximate** transforming streaming data. This demo recreates the Materialize [getting started guide](https://materialize.com/docs/get-started/) using dbt as the transformation layer. | ||
| This demo recreates the Materialize [getting started guide](https://materialize.com/docs/get-started/) using dbt as the transformation layer. | ||
|
|
||
| ## Docker | ||
|
|
||
|
|
@@ -38,15 +38,15 @@ dbt --version | |
|
|
||
| We've created a few core models that take care of defining the building blocks of a dbt+Materialize project, including a streaming [source](https://materialize.com/docs/overview/api-components/#sources): | ||
|
|
||
| - `market_orders_raw.sql` | ||
| - `sources/market_orders_raw.sql` | ||
|
|
||
| , as well as a staging [view](https://materialize.com/docs/overview/api-components/#non-materialized-views) to transform the source data: | ||
|
|
||
| - `market_orders.sql` | ||
| - `staging/stg_market__orders.sql` | ||
|
|
||
| and a [materialized view](https://materialize.com/docs/overview/api-components/#materialized-views) that continuously updates as the underlying data changes: | ||
| , and a [materialized view](https://materialize.com/docs/overview/api-components/#materialized-views) that continuously updates as the underlying data changes: | ||
|
|
||
| - `avg_bid.sql` | ||
| - `marts/avg_bid.sql` | ||
|
|
||
| To run the models: | ||
|
|
||
|
|
@@ -56,12 +56,50 @@ dbt run | |
|
|
||
| > :crab: As an exercise, you can add models for the queries demonstrating [joins](https://materialize.com/docs/get-started/#joins) and [temporal filters](https://materialize.com/docs/get-started/#temporal-filters). | ||
|
|
||
| ### Test the project | ||
|
|
||
| To help demonstrate how `dbt test` works with Materialize for **continuous testing**, we've added some [generic tests](https://docs.getdbt.com/docs/building-a-dbt-project/tests#generic-tests) to the [`avg_bid` model](dbt/models/marts/avg_bid.sql): | ||
|
|
||
| ```yaml | ||
| models: | ||
| - name: avg_bid | ||
| description: 'Computes the average bid price' | ||
| columns: | ||
| - name: symbol | ||
| description: 'The stock ticker' | ||
| tests: | ||
| - not_null | ||
| - unique | ||
| ``` | ||
|
|
||
| , and configured testing in the [project file](dbt/dbt_project.yml): | ||
|
|
||
| ```yaml | ||
| tests: | ||
| mz_get_started: | ||
| marts: | ||
| +store_failures: true | ||
| +schema: 'etl_failure' | ||
| ``` | ||
|
|
||
| Note that tests are configured to [`store_failures`](https://docs.getdbt.com/reference/resource-configs/store_failures), which instructs dbt to create a materialized view for each test using the respective `SELECT` statements. | ||
|
|
||
| To run the tests: | ||
|
|
||
| ```bash | ||
| dbt test | ||
| ``` | ||
|
|
||
| This creates two materialized views in a dedicated schema (`public_etl_failures`): `not_null_avg_bid_symbol` and `unique_avg_bid_symbol`. dbt takes care of naming the views based on the type of test (`not_null`, `unique`) and the columns being tested (`symbol`). | ||
|
|
||
| These views are continuously updated as new data streams in, and allow you to monitor failing rows **as soon as** an assertion fails. You can use this feature for unit testing during the development of your dbt models, and later in production to trigger real-time alerts downstream. | ||
|
|
||
| ## Materialize | ||
|
|
||
| To connect to the running Materialize service, you can use `mzcli`, which is included in the setup: | ||
| To connect to the running Materialize service, you can use a PostgreSQL-compatible client like `psql`, which is bundled in the `materialize/cli` image: | ||
|
|
||
| ```bash | ||
| docker-compose run mzcli | ||
| docker-compose run cli | ||
| ``` | ||
|
|
||
| and run a few commands to check the objects created through dbt: | ||
|
|
@@ -99,6 +137,30 @@ SHOW MATERIALIZED VIEWS; | |
|
|
||
| You'll notice that you're only able to `SELECT` from `avg_bid` — this is because it is the only materialized view! This view is incrementally updated as new data streams in, so you get fresh and correct results with low latency. Behind the scenes, Materialize is indexing the results of the embedded query in memory. | ||
|
|
||
| ### Continuous testing | ||
|
|
||
| To validate that the schema storing the tests was created: | ||
|
|
||
| ```sql | ||
| SHOW SCHEMAS; | ||
|
|
||
| name | ||
| -------------------- | ||
| public | ||
| public_etl_failure | ||
| ``` | ||
|
|
||
| , and that the materialized views that continuously test the `avg_bid` view for failures are up and running: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The README is kind of minimal right now, but in a second iteration we could add an example that shows a test failing (this is probably easier if we base things off |
||
|
|
||
| ```sql | ||
| SHOW VIEWS FROM public_etl_failure; | ||
|
|
||
| name | ||
| ------------------------- | ||
| not_null_avg_bid_symbol | ||
| unique_avg_bid_symbol | ||
| ``` | ||
|
|
||
| ## Local installation | ||
|
|
||
| To set up dbt and Materialize in your local environment instead of using Docker, follow the instructions in the [documentation](https://materialize.com/docs/guides/dbt/). | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| {{ config(materialized='materializedview') }} | ||
|
|
||
| SELECT symbol, | ||
| AVG(bid_price) AS avg | ||
| FROM {{ ref('market_orders') }} | ||
| AVG(bid_price) AS avg_bid | ||
| FROM {{ ref('stg_market_orders') }} | ||
| GROUP BY symbol |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| version: 2 | ||
|
|
||
| models: | ||
| - name: avg_bid | ||
| description: 'Computes the average bid price' | ||
| columns: | ||
| - name: symbol | ||
| description: 'The stock ticker' | ||
| tests: | ||
| - not_null | ||
| - unique |
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| version: 2 | ||
|
|
||
| sources: | ||
| - name: market_orders | ||
| schema: public | ||
| tables: | ||
| - name: market_orders_raw |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| version: 2 | ||
|
|
||
| models: | ||
| - name: stg_market_orders | ||
| description: 'Converts market order data to proper data types' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahelium, after shuffling things around, I think it makes more sense to jump straight to continuous testing, rather than adding to the old text. Does it make sense? I wonder if there are scenarios where users will not want to run testing continuously. 🤔
Reworded the original a bunch and tried to use some terminology we want to standardize on (e.g. continuous testing). Let me know if something feels yucky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Go for it.