PIT
Tim Kirschke edited this page Mar 17, 2023
·
11 revisions
This macro creates a PIT table to gather snapshot based information of one hub and its surrounding satellites. For this macro to work, a snapshot table is required, that has a trigger collumn to identify which snapshots to include in the PIT table. The easiest way to create such a snapshot table is to use the control_snap macros provided by this package.
Features:
- Tracks the active satellite entries for each entry in a Hub for each snapshot
- Strongly improves performance if upstream queries requires many JOIN operations
- Creates a unique dimension key to optimize loading performance of incremental loads
- Allows to insert a static string as record source column, matching business vault definition of a record source
Parameters | Data Type | Explanation |
---|---|---|
pit_type | string | String to insert into the 'pit_type' column. Allows for future implementations of other PIT variants, like T-PITs etc. Can be set freely, something like 'PIT' could be the default. Optional. |
tracked_entity | string | Name of the tracked Hub entity. Must be available as a model inside the dbt project. |
hashkey | string | The name of the hashkey column inside the previously refered Hub entity. |
sat_names | list of strings | A list of all the satellites that should be included in this PIT table. Can only be satellites that are attached to the tracked Hub, and should typically include all those satellites. You should always refer here to the version 1 satellites, since those hold the load-end-date. The macro currently supports regular satellites and nh-satellites. |
snapshot_relation | string | The name of the snapshot relation. It needs to be available as a model inside this dbt project. |
snapshot_trigger_column | string | The name of the column inside the previously mentioned snapshot relation, that is boolean and identifies the snapshots that should be included in the PIT table. |
dimension_key | string | The desired name of the dimension key inside the PIT table. Should follow some naming conventions. Recommended is the name of the hashkey with a '_d' suffix. |
ldts | string | Name of the ldts column inside all source models. Is optional, will use the global variable 'datavault4dbt.ldts_alias'. Needs to use the same column name as defined as alias inside the staging model. |
custom_rsrc | string | A custom string that should be inserted into the 'rsrc' column inside the PIT table. Since a PIT table is a business vault entity, the technical record source is no longer used here. Optional. |
ledts | string | Name of the load-end-date column inside the satellites. Is optional, will use the global variable 'datavault4dbt.ledts_alias' if not set here. |
{{ config(materialized='incremental',
post_hook="{{ datavault4dbt.clean_up_pit('control_snap_v1') }}") }}
{%- set yaml_metadata -%}
pit_type: 'Regular PIT'
tracked_entity: 'account_h'
hashkey: 'hk_account_h'
sat_names:
- account_lroc_p_s
- account_lroc_n_s
- account_hroc_p_s
- account_hroc_n_s
snapshot_relation: 'control_snap_v1'
snapshot_trigger_column: 'is_active'
dimension_key: 'hk_account_d'
custom_rsrc: 'PIT table for SAP/Accounts. For more information see our Wiki!'
{%- endset -%}
{%- set metadata_dict = fromyaml(yaml_metadata) -%}
{%- set pit_type = metadata_dict['pit_type'] -%}
{%- set tracked_entity = metadata_dict['tracked_entity'] -%}
{%- set hashkey = metadata_dict['hashkey'] -%}
{%- set sat_names = metadata_dict['sat_names'] -%}
{%- set snapshot_relation = metadata_dict['snapshot_relation'] -%}
{%- set snapshot_trigger_column = metadata_dict['snapshot_trigger_column'] -%}
{%- set dimension_key = metadata_dict['dimension_key'] -%}
{%- set custom_rsrc = metadata_dict['custom_rsrc'] -%}
{{ datavault4dbt.pit(pit_type=pit_type,
tracked_entity=tracked_entity,
hashkey=hashkey,
sat_names=sat_names,
snapshot_relation=snapshot_relation,
snapshot_trigger_column=snapshot_trigger_column,
dimension_key=dimension_key,
custom_rsrc=custom_rsrc) }}
-
pit_type:
- Regular PIT: PIT type is set to 'Regular PIT'. Optional.
-
tracked_entity:
- account_h: This PIT table tracks the Hub Account.
-
hashkey:
- hk_account_h: The name of the hashkey column ('hk_account_h') inside the previously refered Hub entity ('account_h').
-
sat_names:
- ['account_lroc_p_s','account_lroc_n_s','account_hroc_p_s','account_hroc_n_s']: This four satellites are inclueded in the PIT table.
-
snapshot_relation:
- control_snap_v1: The name of the snapshot relation.
-
snapshot_trigger_column:
- is_active: The name of the column inside the previously mentioned snapshot relation that is boolean and identifies the snapshots that should be included in the PIT table.
-
dimension_key:
- hk_account_d: The desired name of the dimension key inside the PIT table.
-
custom_rsrc:
- PIT table for SAP/Accounts.: A custom string that should be inserted into the 'rsrc' column inside the PIT table. Optional.
Table of Content
- Staging
- DV-Entities
- Hubs
- Links
- Satellites
- Standard Satellite
- Multi-Active Satellite
- Record-Tracking Satellite
- Non-Historized Satellite
-
Reference Data
- Reference Hub
- Reference Satellite
- Reference Tables
- Business Vault
- PIT
- Snapshot Control
- General
- Global Variables
- Multi-Source Entites
- How To Track Effectivity
- The rsrc_static Attribute
- Adaper Specific Notes
- Releases