You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2024-12-08-dbt-expectations.md
+7-15Lines changed: 7 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ dbt expectations is an open source package for dbt based on Great Expectations,
29
29
30
30
Using the dbt expectations package allows data to be verified in terms of quality and accuracy at specific stages of the transformation process. It includes built in tests including not_null, unique etc. and custom tests written in sql which can extend test coverage (see /tests/no_nulls_in_dim_listings for example.)
31
31
32
-
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in [/models/schema.yml](https://github.com/dp2020-dev/completeDbtBootcamp/blob/main/models/schema.yml):
32
+
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in [/models/schema.yml](https://github.com/dp2020-dev/completeDbtBootcamp/blob/main/models/schema.yml)
33
33
34
34
#### Basic Expectations:
35
35
@@ -68,24 +68,16 @@ In a specific example, the failing sql code is run directly against the table (i
68
68
69
69
### Lineage Graph (Data Flow DAG)
70
70
71
-
In the sections above we've looked at practical tests in dbt expectations which can be embedded in the data transformation pipeline, they can also be featured in the 'lineage graph' alongside the source tables, dimension, fact tables etc. to show where and when the tests run, what table it relates to etc.
72
-
73
-
Provided the test in question is included in the schema.yml and has a description value, we can see it included on the lineage graph generated by dbt:
71
+
In the section above we've looked at practical tests in dbt expectations which can be embedded in the data transformation pipeline. These tests can be included on a really useful dbt feature, the 'lineage graph' alongside the source tables, dimension, fact tables etc. to show where and when the tests run, what table it relates to etc.
74
72
75
73

76
74
77
-
Source data in green -> dependencies
78
-
79
-
Select what types of elements to include in the graph, refresh to only show selection
80
-
81
-
To create: dbt docs generate
82
-
To open: dbt docs serve
83
-
84
-
Opens docs based on .md (confirm this)
75
+
Provided the test in question is included in the schema.yml and has a description value, it will be included in the correct part of the data transformation flow.
85
76
86
77
For example:
87
78
88
-
The lineage graph shows the flow of data in our tata warehouse, for instance we can see at a glance that dim_listings_cleansed is a cleansed dimension table based on the src_listings table.
89
-
Figure 1 Lineage Graph png
79
+
The lineage graph shows the flow of data in our data warehouse, for instance we can see at a glance that {% highlight js %}dim_listings_cleansed{% endhighlight js %} is a cleansed dimension table based on the {% highlight js %}src_listings table.{% endhighlight js %}
80
+
81
+
By right clicking and checking documentation for {% highlight js %} dim_listings_cleansed {% endhighlight js %} , we can check all the tests we have in place for this stage of the transformation, for instance we can tell the the {% highlight js %}room_type{% endhighlight js %} test checks the type of room as per the description.
90
82
91
-
By right clicking and checking documentation for dim_listings_cleansed, we can check all the tests we have in place for this stage of the transformation, for instance we can tell the the room_type test checks the type of room as per the description. While it takes some additional time to understand the how to set descriptions and how to link the schema.yml to the test files (I found I had to adhere closely to a set folder structure to get this to work), the benefit of having this lineage gr[ah and information are evidence- we can see what's tested where during the data transformation, and I feel it will save signicant time for someone picking up these tests to extend coverage/adapt.
83
+
While it takes some additional time to understand the how to set descriptions and how to link the schema.yml to the test files (I found I had to adhere closely to a set folder structure to get this to work), the benefit of having this lineage gr[ah and information are evidence- we can see what's tested where during the data transformation, and I feel it will save significant time for someone picking up these tests to extend coverage/adapt.
0 commit comments