Skip to content

Multi Active Satellite v0

tkirschke edited this page Oct 18, 2022 · 3 revisions

This macro creates a multi-active satellite version 0, meaning that it should be materialized as an incremental table. It should be applied 'on top' of the staging layer, and is either connected to a Hub or a Link. On top of each version 0 multi-active satellite, a version 1 multi-active satellite should be created, using the ma_sat_v1 macro. This extends the v0 satellite by a virtually calculated load end date. Each satellite can only be loaded by one source model, since we typically recommend a satellite split by source system.

If a stage model is defined as multi-active, all satellites out of that stage model need to be implemented as multi-active satellites.

Features:

  • Can handle multiple updates per batch, without loosing intermediate changes. Therefor initial loading is supported.
  • Using a dynamic high-water-mark to optimize loading performance of multiple loads
Parameters Data Type Explanation
parent_hashkey string Name of the hashkey column inside the stage of the object that this satellite is attached to.
src_hashdiff string Name of the hashdiff column of this satellite, that was created inside the staging area and is calculated out of the entire payload of this satellite. The stage must hold one hashdiff per satellite entity.
src_ma_key string | list of strings Name(s) of the multi-active keys inside the staging area. Need to be the same ones, as defined in the stage model.
src_payload list of strings A list of all the descriptive attributes that should be included in this satellite. Needs to be the columns that are feeded into the hashdiff calculation of this satellite. Do not include the multi-active key in the payload of a multi-active satellite, it is included automatically!
source_model string Name of the underlying staging model, must be available inside dbt as a model.
src_ldts string Name of the ldts column inside the source models. Is optional, will use the global variable 'datavault4dbt.ldts_alias'. Needs to use the same column name as defined as alias inside the staging model.
src_rsrc string Name of the rsrc column inside the source models. Is optional, will use the global variable 'datavault4dbt.rsrc_alias'. Needs to use the same column name as defined as alias inside the staging model.

Example 1

{{ config(materialized='incremental') }}

{%- set yaml_metadata -%}
source_model: 'stg_customer'
parent_hashkey: 'hk_customer_h'
src_hashdiff: 'hd_customer_s'
src_ma_key: 'ma_attribute'
src_payload: 
    - phonenumber
    - address
{%- endset -%}

{%- set metadata_dict = fromyaml(yaml_metadata) -%}

{{ datavault4dbt.ma_sat_v0(source_model=metadata_dict['source_model'],
                        parent_hashkey=metadata_dict['parent_hashkey'],
                        src_hashdiff=metadata_dict['src_hashdiff'],
                        src_ma_key=metadata_dict['src_ma_key'],
                        src_payload=metadata_dict['src_payload']) }}

Description

  • source_model:
    • stg_customer: This satellite is created out of the stage for customer data. The stage must be set up as a multi active stage to enable proper hashdiff calculation.
  • parent_hashkey:
    • hk_customer_h: The multi active satellite is attached to the main business object of stg_customer, which is the Hub customer. The hashkey of that hub is hk_customer_h.
  • src_hashdiff:
    • hd_customer_s: The hashdiff column inside the staging model that belongs to this satellite. Needs to have the same input attributes as the payload of this satellite.
  • src_ma_key:
    • 'ma_attribute': For each hashkey and load date, there are multiple ma_attributes that are active at the same time. The combination of hashkey and ma_attribute needs to be unique per ldts.
  • src_payload:
    • ['phonenumber', 'address']: The multi active satellite for customers needs to contain all descriptive attributes that belong to customers. Need to be the same columns as used for the hashdiff calculation of this satellite.