Skip to content

Commit 9bd68d6

Browse files
authored
Merge pull request #41 from dp2020-dev/blog_content
Blog content
2 parents 6b10b7b + 6358491 commit 9bd68d6

File tree

2 files changed

+41
-25
lines changed

2 files changed

+41
-25
lines changed

_posts/2024-12-08-dbt-expectations.md

Lines changed: 41 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,48 +1,64 @@
11
---
22
layout: post
3-
title: Using dbt Expectations as part of a dbt build.
3+
title: Using dbt expectations as part of a dbt build.
44
---
55

6-
<why look at data testing>
6+
<i> The objective of the blog post is to give a practical overview of the data transformation testing tool Great Expectations/dbt expectations. </i>
7+
8+
### Why data testing?
9+
10+
Having been involved in data transformations in the past (e.g. moving data from on prem to the Azure cloud) I'm aware of the potential complexity of ensuring the quality of data from source to target, verifying the transformations at each stage and maintaining data integrity.
11+
12+
Given
13+
14+
### Great Expectations
15+
16+
[Great Expectations.io](https://greatexpectations.io/) and its open source version [dbt expectations](https://github.com/calogica/dbt-expectations) are frameworks that enable automated tests to be embedded in ingestion/transformation pipelines.
17+
18+
<GE Image>
19+
![Great Expectations logo', December 2024](/images/gx_logo_horiz_color.png)
20+
21+
This is a widely used tool in data engineering, and in order to try it out and evaluate this tool, I undertook the following Udemy course, the screenshots and material are based on this:
722

823
[The Complete dbt (Data Build Tool) Bootcamp:](https://www.udemy.com/course/complete-dbt-data-build-tool-bootcamp-zero-to-hero-learn-dbt)
924
![Microsoft AI Fundamentals](/images/AI900.png)
1025

11-
This course covers the theory and practical application of a data project using snowflake as the data warehouse, and the open source version of dbt. What was particularly relevant for a tester are the sections covering dbt expectations<add link>. This post will explain at a high level what dbt expectations can do, how it can enable QA in a data ingestion/data transformation project rather than a hand on how to' guide (I can recommend the aforementioned Udemy course).
26+
This course covers the theory and practical application of a data project using snowflake as the data warehouse, and the open source version of dbt. What was particularly relevant for a tester are the sections covering dbt expectations<add link>. This post will explain at a high level what dbt expectations can do, how it can enable QA in a data ingestion/data transformation project rather than a hand on how to' guide.
1227

13-
Purpose of this post:
28+
### What is dbt expectations?
1429

15-
Demand for data transformation testing- dbt is a widely used tool for data engineering
30+
dbt expectations is an open source package for dbt based on Great Expectations, to enable testing in a data warehouse.
1631

17-
What is dbt?
32+
<b> How is it used to test, and why? </b>
1833

19-
What is dbt expectations?
34+
Using the dbt expectations package allows data to be verified in terms of quality and accuracy at specific stages of the transformation process. It includes built in tests including not_null, unique etc. and custom tests written in sql which can extend test coverage (see /tests/no_nulls_in_dim_listings for example.)
2035

21-
How is it used to test, and why?
36+
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in [/models/schema.yml](https://github.com/dp2020-dev/completeDbtBootcamp/blob/main/models/schema.yml):
2237

23-
Using the dbt expectations package allows data to be verified in terms of quality and accuracy at specific stages of the transformation process. It includes built in tests including not_null, unique etc. and custom tests written in sql which can extend test coverage (see /tests/no_nulls_in_dim_listings for example.)
38+
#### Basic Expectations:
39+
40+
<b>not_null:</b> Ensures that the column doesn't contain null values.
41+
<b>unique:</b> Verifies that all values in the column are distinct.
42+
43+
#### Relationship Expectations:
44+
45+
<b>relationships:</b> Checks if a foreign key relationship <b>exists between two columns in different models.
2446

25-
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in /models/schema.yml:
47+
#### Value-Based Expectations:
2648

27-
Basic Expectations:
49+
<b>accepted_values:</b> Ensures that the column only contains specific values from a predefined list.
50+
<b>positive_value:</b> Verifies that the column values are positive numbers.
2851

29-
not_null: Ensures that the column doesn't contain null values.
30-
unique: Verifies that all values in the column are distinct.
31-
Relationship Expectations:
52+
#### Statistical Expectations:
3253

33-
relationships: Checks if a foreign key relationship exists between two columns in different models.
34-
Value-Based Expectations:
54+
#### dbt_expectations. <b>expect_table_row_count_to_equal_other_table:</b> Compares the row count of two tables.
3555

36-
accepted_values: Ensures that the column only contains specific values from a predefined list.
37-
positive_value: Verifies that the column values are positive numbers.
38-
Statistical Expectations:
56+
<b>dbt_expectations.expect_column_values_to_be_of_type: </b>Checks the data type of a column.
57+
<b>dbt_expectations.</b>expect_column_quantile_values_to_be_between: Verifies that quantile values fall within a specific range.
58+
<b>dbt_expectations.expect_column_max_to_be_between:</b> Ensures that the maximum value of a column is within a certain range.
3959

40-
dbt_expectations.expect_table_row_count_to_equal_other_table: Compares the row count of two tables.
41-
dbt_expectations.expect_column_values_to_be_of_type: Checks the data type of a column.
42-
dbt_expectations.expect_column_quantile_values_to_be_between: Verifies that quantile values fall within a specific range.
43-
dbt_expectations.expect_column_max_to_be_between: Ensures that the maximum value of a column is within a certain range.
60+
#### Example test:
4461

45-
Example test:
4662
Room_type, see screenshot.
4763

4864
To run the tests in the schema:
@@ -54,7 +70,7 @@ To debug, the standard tool is dbt test --debug, but the advice on the bootcamp
5470

5571
In a specific example, the failing sql code is run directly against the table (in Snowflake in this example) to find where exactly the failure is.
5672

57-
Lineage Graph (Data Flow DAG)
73+
### Lineage Graph (Data Flow DAG)
5874

5975
Source data in green -> dependencies
6076

images/gx_logo_horiz_color.png

7.3 KB
Loading

0 commit comments

Comments
 (0)