You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2024-12-08-dbt-expectations.md
+41-25Lines changed: 41 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,48 +1,64 @@
1
1
---
2
2
layout: post
3
-
title: Using dbt Expectations as part of a dbt build.
3
+
title: Using dbt expectations as part of a dbt build.
4
4
---
5
5
6
-
<whylookatdatatesting>
6
+
<i> The objective of the blog post is to give a practical overview of the data transformation testing tool Great Expectations/dbt expectations. </i>
7
+
8
+
### Why data testing?
9
+
10
+
Having been involved in data transformations in the past (e.g. moving data from on prem to the Azure cloud) I'm aware of the potential complexity of ensuring the quality of data from source to target, verifying the transformations at each stage and maintaining data integrity.
11
+
12
+
Given
13
+
14
+
### Great Expectations
15
+
16
+
[Great Expectations.io](https://greatexpectations.io/) and its open source version [dbt expectations](https://github.com/calogica/dbt-expectations) are frameworks that enable automated tests to be embedded in ingestion/transformation pipelines.
17
+
18
+
<GEImage>
19
+

20
+
21
+
This is a widely used tool in data engineering, and in order to try it out and evaluate this tool, I undertook the following Udemy course, the screenshots and material are based on this:
This course covers the theory and practical application of a data project using snowflake as the data warehouse, and the open source version of dbt. What was particularly relevant for a tester are the sections covering dbt expectations<addlink>. This post will explain at a high level what dbt expectations can do, how it can enable QA in a data ingestion/data transformation project rather than a hand on how to' guide (I can recommend the aforementioned Udemy course).
26
+
This course covers the theory and practical application of a data project using snowflake as the data warehouse, and the open source version of dbt. What was particularly relevant for a tester are the sections covering dbt expectations<addlink>. This post will explain at a high level what dbt expectations can do, how it can enable QA in a data ingestion/data transformation project rather than a hand on how to' guide.
12
27
13
-
Purpose of this post:
28
+
### What is dbt expectations?
14
29
15
-
Demand for data transformation testing- dbt is a widely used tool for data engineering
30
+
dbt expectations is an open source package for dbt based on Great Expectations, to enable testing in a data warehouse.
16
31
17
-
What is dbt?
32
+
<b> How is it used to test, and why? </b>
18
33
19
-
What is dbt expectations?
34
+
Using the dbt expectations package allows data to be verified in terms of quality and accuracy at specific stages of the transformation process. It includes built in tests including not_null, unique etc. and custom tests written in sql which can extend test coverage (see /tests/no_nulls_in_dim_listings for example.)
20
35
21
-
How is it used to test, and why?
36
+
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in [/models/schema.yml](https://github.com/dp2020-dev/completeDbtBootcamp/blob/main/models/schema.yml):
22
37
23
-
Using the dbt expectations package allows data to be verified in terms of quality and accuracy at specific stages of the transformation process. It includes built in tests including not_null, unique etc. and custom tests written in sql which can extend test coverage (see /tests/no_nulls_in_dim_listings for example.)
38
+
#### Basic Expectations:
39
+
40
+
<b>not_null:</b> Ensures that the column doesn't contain null values.
41
+
<b>unique:</b> Verifies that all values in the column are distinct.
42
+
43
+
#### Relationship Expectations:
44
+
45
+
<b>relationships:</b> Checks if a foreign key relationship <b>exists between two columns in different models.
24
46
25
-
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in /models/schema.yml:
47
+
#### Value-Based Expectations:
26
48
27
-
Basic Expectations:
49
+
<b>accepted_values:</b> Ensures that the column only contains specific values from a predefined list.
50
+
<b>positive_value:</b> Verifies that the column values are positive numbers.
28
51
29
-
not_null: Ensures that the column doesn't contain null values.
30
-
unique: Verifies that all values in the column are distinct.
31
-
Relationship Expectations:
52
+
#### Statistical Expectations:
32
53
33
-
relationships: Checks if a foreign key relationship exists between two columns in different models.
34
-
Value-Based Expectations:
54
+
#### dbt_expectations. <b>expect_table_row_count_to_equal_other_table:</b> Compares the row count of two tables.
35
55
36
-
accepted_values: Ensures that the column only contains specific values from a predefined list.
37
-
positive_value: Verifies that the column values are positive numbers.
38
-
Statistical Expectations:
56
+
<b>dbt_expectations.expect_column_values_to_be_of_type: </b>Checks the data type of a column.
57
+
<b>dbt_expectations.</b>expect_column_quantile_values_to_be_between: Verifies that quantile values fall within a specific range.
58
+
<b>dbt_expectations.expect_column_max_to_be_between:</b> Ensures that the maximum value of a column is within a certain range.
39
59
40
-
dbt_expectations.expect_table_row_count_to_equal_other_table: Compares the row count of two tables.
41
-
dbt_expectations.expect_column_values_to_be_of_type: Checks the data type of a column.
42
-
dbt_expectations.expect_column_quantile_values_to_be_between: Verifies that quantile values fall within a specific range.
43
-
dbt_expectations.expect_column_max_to_be_between: Ensures that the maximum value of a column is within a certain range.
60
+
#### Example test:
44
61
45
-
Example test:
46
62
Room_type, see screenshot.
47
63
48
64
To run the tests in the schema:
@@ -54,7 +70,7 @@ To debug, the standard tool is dbt test --debug, but the advice on the bootcamp
54
70
55
71
In a specific example, the failing sql code is run directly against the table (in Snowflake in this example) to find where exactly the failure is.
0 commit comments