Skip to content

Commit

Permalink
Improve insert method used (#381)
Browse files Browse the repository at this point in the history
* Add named columns

* Remove unused `rows_affected` column in `test_executions` #195

* Reorganise macro folder

* Add documentation for macros

* Consolidate logic in upload results

* Simplify how nodes are extracted

* Generalise identifying objects to load further

* Have all results included in the loop

* Split out get_dataset_content

* Move the get table content into its own macro

* Bug fixing

* Move relation definition to insert macro to avoid quoting issues

* Revert "Remove unused `rows_affected` column in `test_executions` #195"

This reverts commit 9182fdc.

* Remove ---debug left in databricks tox

* Apply suggestions from code review

Co-authored-by: Jared Rimmer <100997264+jared-rimmer@users.noreply.github.com>

---------

Co-authored-by: Jared Rimmer <100997264+jared-rimmer@users.noreply.github.com>
  • Loading branch information
glsdown and jared-rimmer committed Sep 18, 2023
1 parent 532edbb commit 7a33172
Show file tree
Hide file tree
Showing 26 changed files with 656 additions and 255 deletions.
246 changes: 246 additions & 0 deletions macros/_macros.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
version: 2

macros:
## DATABASE SPECIFIC HELPERS ##
- name: column_identifier
description: |
Dependent on the adapter type, return the identifier for a column using a numerical index.
arguments:
- name: column_index
type: integer
description: |
The index of the column to return the identifier for
- name: generate_surrogate_key
description: |
Since folks commonly install dbt_artifacts alongside a myriad of other packages,
we copy the dbt_utils implementation of the surrogate_key macro so we don't have
any dependencies to make conflicts worse!
This version is:
URL: https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/generate_surrogate_key.sql
Commit SHA: eaa0e41b033bdf252eff0ae014ec11888f37ebff
Date: 2023-04-28
arguments:
- name: field_list
type: list
description: |
A list of fields to concatenate together to form the surrogate key
- name: get_relation
description: |
Identify a relation in the graph from a relation name
arguments:
- name: get_relation_name
type: string
description: |
The name of the relation to return from the graph
- name: parse_json
description: |
Dependent on the adapter type, return a column which parses the JSON field.
arguments:
- name: field
type: string
description: |
The name of the field to parse
- name: type_array
description: |
Dependent on the adapter type, returns the native type for storing an array.
- name: type_boolean
description: |
Dependent on the adapter type, returns the native boolean type.
- name: type_json
description: |
Dependent on the adapter type, returns the native type for storing JSON.
## MIGRATION ##
- name: migrate_from_v0_to_v1
description: |
A macro to assist with migrating from v0 to v1 of dbt_artifacts. See
https://github.com/brooklyn-data/dbt_artifacts/blob/main/README.md#migrating-from-100-to-100
for details on the usage.
arguments:
- name: old_database
type: string
description: |
The database of the <1.0.0 output (fct_/dim_) models - does not have to be different to `new_database`
- name: old_schema
type: string
description: |
The schema of the <1.0.0 output (fct_/dim_) models - does not have to be different to `new_schema`
- name: new_database
type: string
description: |
The target database that the v1 artifact sources are in - does not have to be different to `old_database`
- name: new_schema
type: string
description: |
The target schema that the v1 artifact sources are in - does not have to be different to `old_schema`
## UPLOAD INDIVIDUAL DATASETS ##
- name: upload_exposures
description: |
The macro to support upload of the data to the exposures table.
arguments:
- name: exposures
type: list
description: |
A list of exposure objects extracted from the dbt graph
- name: upload_invocations
description: |
The macro to support upload of the data to the invocations table.
- name: upload_model_executions
description: |
The macro to support upload of the data to the model_executions table.
arguments:
- name: models
type: list
description: |
A list of model execution results objects extracted from the dbt result object
- name: upload_models
description: |
The macro to support upload of the data to the models table.
arguments:
- name: models
type: list
description: |
A list of test objects extracted from the dbt graph
- name: upload_seed_executions
description: |
The macro to support upload of the data to the seed_executions table.
arguments:
- name: seeds
type: list
description: |
A list of seed execution results objects extracted from the dbt result object
- name: upload_seeds
description: |
The macro to support upload of the data to the seeds table.
arguments:
- name: seeds
type: list
description: |
A list of seeds objects extracted from the dbt graph
- name: upload_snapshot_executions
description: |
The macro to support upload of the data to the snapshot_executions table.
arguments:
- name: snapshots
type: list
description: |
A list of snapshot execution results objects extracted from the dbt result object
- name: upload_snapshots
description: |
The macro to support upload of the data to the snapshots table.
arguments:
- name: snapshots
type: list
description: |
A list of snapshots objects extracted from the dbt graph
- name: upload_sources
description: |
The macro to support upload of the data to the sources table.
arguments:
- name: sources
type: list
description: |
A list of sources objects extracted from the dbt graph
- name: upload_test_executions
description: |
The macro to support upload of the data to the test_executions table.
arguments:
- name: tests
type: list
description: |
A list of test execution results objects extracted from the dbt result object
- name: upload_tests
description: |
The macro to support upload of the data to the tests table.
arguments:
- name: tests
type: list
description: |
A list of test objects extracted from the dbt graph
## UPLOAD RESULTS ##
- name: get_column_name_list
description: |
A macro to return the list of column names for a particular dataset. Returns a comment if the dataset is not
valid.
arguments:
- name: dataset
type: string
description: |
The name of the dataset to return the column names for e.g. `models`
- name: get_dataset_content
description: |
A macro to extract the data to be uploaded from either the results or the graph object.
arguments:
- name: dataset
type: string
description: |
The name of the dataset to return the data for e.g. `models`
- name: get_table_content_values
description: |
A macro to create the insert statement values required to be uploaded to the table.
arguments:
- name: dataset
type: string
description: |
The name of the dataset to return the column names for e.g. `models`
- name: objects_to_upload
type: list
description: |
The objects to be used to generate the insert statement values - extracted from `get_dataset_content`
- name: insert_into_metadata_table
description: |
Dependent on the adapter type, the wrapper to insert the data into a table from a list of values. Used in the
`upload_results` macro, alongside the `get_column_lists` macro to generate the column names and the
`upload_dataset` macros to generate the data to be inserted.
arguments:
- name: database_name
type: string
description: |
The database name for the relation that the data is to be inserted into
- name: schema_name
type: string
description: |
The schema name for the relation that the data is to be inserted into
- name: table_name
type: string
description: |
The table name for the relation that the data is to be inserted into
- name: fields
type: string
description: |
The list of fields for the relation that the data is to be inserted into
- name: content
type: string
description: |
The data content to insert into the relation
- name: upload_results
description: |
The main macro called to upload the metadata into each of the source tables.
arguments:
- name: results
type: list
description: |
The results object from dbt.
File renamed without changes.
File renamed without changes.
14 changes: 14 additions & 0 deletions macros/database_specific_helpers/get_relation.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{% macro get_relation(relation_name) %}
{% if execute %}
{% set model_get_relation_node = graph.nodes.values() | selectattr('name', 'equalto', relation_name) | first %}
{% set relation = api.Relation.create(
database = model_get_relation_node.database,
schema = model_get_relation_node.schema,
identifier = model_get_relation_node.alias
)
%}
{% do return(relation) %}
{% else %}
{% do return(api.Relation.create()) %}
{% endif %}
{% endmacro %}
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
{% endmacro %}

{% macro snowflake__type_json() %}
OBJECT
object
{% endmacro %}

{% macro bigquery__type_json() %}
JSON
json
{% endmacro %}

{#- ARRAY -#}
Expand All @@ -37,9 +37,9 @@
{% endmacro %}

{% macro snowflake__type_array() %}
ARRAY
array
{% endmacro %}

{% macro bigquery__type_array() %}
ARRAY<string>
array<string>
{% endmacro %}
38 changes: 0 additions & 38 deletions macros/insert_into_metadata_table.sql

This file was deleted.

File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
{% macro upload_exposures(graph) -%}
{% set exposures = [] %}
{% for node in graph.exposures.values() %}
{% do exposures.append(node) %}
{% endfor %}
{% macro upload_exposures(exposures) -%}
{{ return(adapter.dispatch('get_exposures_dml_sql', 'dbt_artifacts')(exposures)) }}
{%- endmacro %}

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
{% macro upload_model_executions(results) -%}
{% set models = [] %}
{% for result in results %}
{% if result.node.resource_type == "model" %}
{% do models.append(result) %}
{% endif %}
{% endfor %}
{% macro upload_model_executions(models) -%}
{{ return(adapter.dispatch('get_model_executions_dml_sql', 'dbt_artifacts')(models)) }}
{%- endmacro %}

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
{% macro upload_seed_executions(results) -%}
{% set seeds = [] %}
{% for result in results %}
{% if result.node.resource_type == "seed" %}
{% do seeds.append(result) %}
{% endif %}
{% endfor %}
{% macro upload_seed_executions(seeds) -%}
{{ return(adapter.dispatch('get_seed_executions_dml_sql', 'dbt_artifacts')(seeds)) }}
{%- endmacro %}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,4 @@
{% macro upload_seeds(graph) -%}
{% set seeds = [] %}
{% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "seed") %}
{% do seeds.append(node) %}
{% endfor %}
{% macro upload_seeds(seeds) -%}
{{ return(adapter.dispatch('get_seeds_dml_sql', 'dbt_artifacts')(seeds)) }}
{%- endmacro %}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
{% macro upload_snapshot_executions(results) -%}
{% set snapshots = [] %}
{% for result in results %}
{% if result.node.resource_type == "snapshot" %}
{% do snapshots.append(result) %}
{% endif %}
{% endfor %}
{% macro upload_snapshot_executions(snapshots) -%}
{{ return(adapter.dispatch('get_snapshot_executions_dml_sql', 'dbt_artifacts')(snapshots)) }}
{%- endmacro %}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
{% macro upload_snapshots(graph) -%}
{% set snapshots = [] %}
{% for node in graph.nodes.values() | selectattr("resource_type", "equalto", "snapshot") %}
{% do snapshots.append(node) %}
{% endfor %}
{% macro upload_snapshots(snapshots) -%}

{{ return(adapter.dispatch('get_snapshots_dml_sql', 'dbt_artifacts')(snapshots)) }}

{%- endmacro %}
Expand Down
File renamed without changes.
Loading

0 comments on commit 7a33172

Please sign in to comment.