[CT-1905] [Spike] Get model "catalog" info after building, and fire in an event #6732
Labels
Impact: CA
Impact: Orch
logging
performance
Refinement
Maintainer input needed
Team:Adapters
Issues designated for the adapter area of the code
copying from #5325 (comment)
Let's gather catalog info about the relations produced by the materialization, as soon as it finishes building them (here). Materializations already return the set of relations it creates/updates, for the purposes of updating dbt's cache. Why not share the wealth with programmatic consumers of dbt metadata?
(Will serializing Relation objects be an absolute nightmare? Relation objects can be reimplemented by adapter, of course, though they all inherit from
BaseRelation
, which should be serializable. Even so, we may not want all the object attributes included in the logging event—probably just a subset.)For now, the only really valuable information included on the relation object is database location (
database.schema.identifier
) and relation type (view, table, etc). However, I could see doing two things to make this very valuable, for which this logging lays necessary groundwork:columns
(with data types) and table statistics (maybe even column-level stats, too) — a.k.a. the same basic contract asCatalogInfo
describe
the just-built relation, to populate those fields, which will then be logged out once the materialization completesPut it all together, and we'll be able to provide realer-time access to catalog info, rather than trying to grab it all in one big memory-intensive batch during
docs generate
.The text was updated successfully, but these errors were encountered: