Skip to content

Commit 462ceb9

Browse files
authored
Merge pull request #44 from dp2020-dev/blog_content
add graph.
2 parents 271e9a3 + fc422cd commit 462ceb9

File tree

1 file changed

+7
-15
lines changed

1 file changed

+7
-15
lines changed

_posts/2024-12-08-dbt-expectations.md

Lines changed: 7 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ dbt expectations is an open source package for dbt based on Great Expectations,
2929

3030
Using the dbt expectations package allows data to be verified in terms of quality and accuracy at specific stages of the transformation process. It includes built in tests including not_null, unique etc. and custom tests written in sql which can extend test coverage (see /tests/no_nulls_in_dim_listings for example.)
3131

32-
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in [/models/schema.yml](https://github.com/dp2020-dev/completeDbtBootcamp/blob/main/models/schema.yml):
32+
When the package is imported etc. the tests are written in the schema.yml file. This is a breakdown of the examples in [/models/schema.yml](https://github.com/dp2020-dev/completeDbtBootcamp/blob/main/models/schema.yml)
3333

3434
#### Basic Expectations:
3535

@@ -68,24 +68,16 @@ In a specific example, the failing sql code is run directly against the table (i
6868

6969
### Lineage Graph (Data Flow DAG)
7070

71-
In the sections above we've looked at practical tests in dbt expectations which can be embedded in the data transformation pipeline, they can also be featured in the 'lineage graph' alongside the source tables, dimension, fact tables etc. to show where and when the tests run, what table it relates to etc.
72-
73-
Provided the test in question is included in the schema.yml and has a description value, we can see it included on the lineage graph generated by dbt:
71+
In the section above we've looked at practical tests in dbt expectations which can be embedded in the data transformation pipeline. These tests can be included on a really useful dbt feature, the 'lineage graph' alongside the source tables, dimension, fact tables etc. to show where and when the tests run, what table it relates to etc.
7472

7573
![dbt lineage graph](/images/dbt-dag-3.png)
7674

77-
Source data in green -> dependencies
78-
79-
Select what types of elements to include in the graph, refresh to only show selection
80-
81-
To create: dbt docs generate
82-
To open: dbt docs serve
83-
84-
Opens docs based on .md (confirm this)
75+
Provided the test in question is included in the schema.yml and has a description value, it will be included in the correct part of the data transformation flow.
8576

8677
For example:
8778

88-
The lineage graph shows the flow of data in our tata warehouse, for instance we can see at a glance that dim_listings_cleansed is a cleansed dimension table based on the src_listings table.
89-
Figure 1 Lineage Graph png
79+
The lineage graph shows the flow of data in our data warehouse, for instance we can see at a glance that {% highlight js %}dim_listings_cleansed{% endhighlight js %} is a cleansed dimension table based on the {% highlight js %}src_listings table.{% endhighlight js %}
80+
81+
By right clicking and checking documentation for {% highlight js %} dim_listings_cleansed {% endhighlight js %} , we can check all the tests we have in place for this stage of the transformation, for instance we can tell the the {% highlight js %}room_type{% endhighlight js %} test checks the type of room as per the description.
9082

91-
By right clicking and checking documentation for dim_listings_cleansed, we can check all the tests we have in place for this stage of the transformation, for instance we can tell the the room_type test checks the type of room as per the description. While it takes some additional time to understand the how to set descriptions and how to link the schema.yml to the test files (I found I had to adhere closely to a set folder structure to get this to work), the benefit of having this lineage gr[ah and information are evidence- we can see what's tested where during the data transformation, and I feel it will save signicant time for someone picking up these tests to extend coverage/adapt.
83+
While it takes some additional time to understand the how to set descriptions and how to link the schema.yml to the test files (I found I had to adhere closely to a set folder structure to get this to work), the benefit of having this lineage gr[ah and information are evidence- we can see what's tested where during the data transformation, and I feel it will save significant time for someone picking up these tests to extend coverage/adapt.

0 commit comments

Comments
 (0)