Releases · dbt-labs/dbt-core

added support for custom SQL data tests
- SQL returns 0 results --> pass
- SQL returns > 0 results --> fail
dbt-core integration tests
- running in Continuous Integration environments
  - windows (appveyor)
  - linux (circle)
- with code coverage

Custom SQL data tests

Schema tests have proven to be an essential part of a modern analytical workflow. These schema tests validate basic constraints about your data. Namely: not null, unique, accepted value, and foreign key relationship properties can be asserted using schema tests.

With dbt v0.5.4, you can now write your own custom "data tests". These data tests are SQL SELECT statements that return 0 rows on success, or > 0 rows on failure. A typical data test might look like:

-- tests/assert_less_than_5_pct_event_cookie_ids_are_null.sql

-- If >= 5% of cookie_ids are null, then the test returns 1 row (failure).
-- If < 5% of cookie_ids are null, then the test returns 0 rows (success)

with calc as (

    select
      sum(case when cookie_id is null then 1 else 0 end)::float / count(*)::float as fraction
    from {{ ref('events') }}

)

select * from calc where fraction < 0.05

To enable data tests, add the test-paths config to your dbt_project.yml file:

name: 'Vandelay Industries`
version: '1.0'

source-paths: ["models"]
target-path: "target"
test-paths: ["tests"]        # look for *.sql files in the "tests" directory
....

Any .sql file found in the test-paths director(y|ies) will be evaluated as data tests. These tests can be run with:

dbt test # run schema + data tests
dbt test --schema # run only schema tests
dbt test --data # run only data tests
dbt test --data --schema # run schema + data tests

# For more information, try
dbt test -h

DBT-core integration tests

With the dbt 0.5.4 release, dbt now features a robust integration test suite. These integration tests will help mitigate the risk of software regressions, and in so doing, will help us develop dbt more quickly. You can check out the tests here, and the test results here (linux/osx) and here (windows).

The Future

You can check out the DBT roadmap here. In the next few weeks, we'll be working on bugfixes, minor features, improved macro support, and expanded control over runtime materialization configs.

As always, feel free to reach out to us on Slack with any questions or comments!

Assets 2

27 Oct 04:15

drewbanin

v0.5.3

40948fb

dbt 0.5.3

Bugfix release.

Fixes regressions introduced in 0.5.1 and 0.5.2.

Fixed 0.5.1 regressions

Incremental models were broken by the new column expansion feature. Column expansion is implemented as

alter table ... add column tmp_col varchar({new_size});
update ... set tmp_col = existing_col
alter table ... drop column existing_col
alter table ... rename tmp_col to existing_col

This has the side-effect of moving the existing_col to the "end" of the table. When an incremental model tries to

insert into {table} (
   select * from tmp_table
)

suddenly the columns in {table} are incongruent with the columns in tmp_table. This insert subsequently fails.

The fix for this issue is twofold:

If the incremental model table DOES NOT already exist, avoid inserts altogether. Instead, run a create table as (...) statement
If the incremental model table DOES already exist, query for the columns in the existing table and use those to build the insert statement, eg:

insert into "dbt_dbanin"."sessions" ("session_end_tstamp", "session_start_tstamp", ...)
(
    select "session_end_tstamp", "session_start_tstamp", ...
    from "sessions__dbt_incremental_tmp"
);

In this way, the source and destination columns are guaranteed to be in the same order!

Fixed 0.5.2 regressions

We attempted to refactor the way profiles work in dbt. Previously, a default user profile was loaded, and the profiles specified in dbt_project.yml or on the command line (with --profile) would be applied on top of the user config. This implementation is some of the earliest code that was committed to dbt.

As dbt has grown, we found this implementation to be a little unwieldy and hard to maintain. The 0.5.2 release made it so that only one profile could be loaded at a time. This profile needed to be specified in either dbt_project.yml or on the command line with --profile. A bug was errantly introduced during this change which broke the handling of dependency projects.

The future

The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/analyst-collective/dbt/milestone/16 .

As always, feel free to reach out to us on Slack with any questions or concerns:

Assets 2

25 Oct 01:56

drewbanin

0.5.2

c8c049e

dbt 0.5.2

Patch release fixing a bug that arises when profiles are overridden on the command line with the --profile flag.

See https://github.com/analyst-collective/dbt/releases/tag/v0.5.1

Assets 2

21 Oct 17:11

drewbanin

v0.5.1

4739be2

dbt version 0.5.1

0. tl;dr

Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
Column type resolution for incremental models (no more Value too long for character type errors)
Postgres support
Top-level configs applied to your project + all dependencies
--threads CLI option + better multithreaded output

1. Source table archival #183

Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. Imagine a users table which is synced to your data warehouse from a production database. This users table is a representation of what your users look like now. Consider what happens if you need to look at revenue by city for each of your users trended over time. Specifically, what happens if a user moved from, say, Philadelphia to New York? To do this correctly, you need to archive snapshots of the users table on a recurring basis. With this release, dbt now provides an easy mechanism to store such snapshots.

To use this new feature, declare the tables you want to archive in your dbt_project.yml file:

archive:
    - source_schema: synced_production_data  # schema to look for tables in (declared below)
      target_schema: dbt_archive             # where to archive the data to
      tables:                                # list of tables to archive
        - source_table: users                # table to archive
          target_table: users_archived       # table to insert archived data into
          updated_at: updated_at             # used to determine when data has changed
          unique_key: id                     # used to generate archival query

        - source_table: some_other_table
           target_table: some_other_table_archive
           updated_at: "updatedAt"
           unique_key: "expressions || work || LOWER(too)"

    - source_schema: some_other_schema
      ....

The archived tables will mirror the schema of the source tables they're generated from. In addition, three fields are added to the archive table:

valid_from: The timestamp when this archived row was inserted (and first considered valid)
valid_to: The timestamp when this archived row became invalidated. The first archived record for a given unique_key has valid_to = NULL. When newer data is archived for that unique_key, the valid_to field of the old record is set to the valid_from field of the new record!
scd_id: A unique key generated for each archive record. Scd = Slowly Changing Dimension.

dbt models can be built on top of these archived tables. The most recent record for a given unique_key is the one where valid_to is null.

To run this archive process, use the command dbt archive. After testing and confirming that the archival works, you should schedule this process through cron (or similar).

2. Incremental column expansion #175

Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a varchar(16) field which is inserted into a varchar(8) field.
In practice, this error looks like:

Value too long for character type
DETAIL:
  -----------------------------------------------
  error:  Value too long for character type
  code:      8001
  context:   Value too long for type character varying(8)
  query:     3743263
  location:  funcs_string.hpp:392
  process:   query4_35 [pid=18194]
  -----------------------------------------------

With this release, dbt will detect when column types are incongruent and will attempt to reconcile these different types if possible. Specifically, dbt will alter the incremental model table schema from character varying(x) to character varying(y) for some y > x. This should drastically reduce the occurrence of this class of error.

3. First-class Postgres support #183

With this release, Postgres became a first-class dbt target. You can configure a postgres database target in your ~/.dbt/profiles.yml file:

warehouse:
  outputs:
    dev:
      type: postgres    # configure a target for Postgres
      host: localhost
      user: Drew
      ....
  run-target: dev

While Redshift is built on top of Postgres, the two are subtly different. For instance, Redshift supports sort and dist keys, while Postgres does not! dbt will use the database target type parameter to generate the appropriate SQL for the target database.

4. Root-level configs #161

Configurations in dbt_project.yml can now be declared at the models: level. These configurations will apply to the primary project, as well as any dependency projects. This feature is particularly useful for setting pre- or post- hooks that run for every model. In practice, this looks like:

name: 'My DBT Project'

models:
    post-hook:
        - "grant select on {{this}} to looker_user"     # Applied to 'My DBT Project' and 'Snowplow' dependency
    'My DBT Project':
        enabled: true
    'Snowplow':
        enabled: true

5. --threads CLI option #143

The number of threads that DBT uses can now be overridden with a CLI argument. The number of threads used must be between 1 and 8.

dbt run --threads 1    # fine
# or
dbt run --threads 4    # great
# or
dbt run --threads 42    # too many!

In addition to this new CLI argument, the output from multi-threaded dbt runs should be a little more orderly now. Models won't show as STARTed until they're actually queued to run. Previously, the output here was a little confusing. Happy threading!

Upgrading

To upgrade to version 0.5.1 of dbt, run:

pip install --upgrade dbt

And another thing

Join us on slack with questions or comments

Made with ♥️ by 🐟🏙 📈

Assets 2

28 Sep 01:17

drewbanin

v0.5.0

0096774

dbt version 0.5.0

0. tl;dr

use a temp table when executing incremental models
arbitrary configuration (using config variables)
specify branches for dependencies
more & better docs

1. new incremental model generation #138

In previous versions of dbt, an edge case existed which caused the sql_where query to select different rows in the delete and insert steps. As a result, it was possible to construct incremental models which would insert duplicate records into the specified table. With this release, DBT uses a temp table which will 1) circumvent this issue and 2) improve query performance. For more information, check out the GitHub issue: #138

2. Arbitrary configuration #146

Configuration in dbt is incredibly powerful: it is what allows models to change their behavior without changing their code. Previously, all configuration was done using built-in parameters, but that actually limits the user in the power of configuration.

With this release, you can inject variables from dbt_project.yml into your top-level and dependency models. In practice, variables work like this:

# dbt_project.yml

models:
  my_project:
    vars:
      exclude_ip: '192.168.1.1'

-- filtered_events.sql

-- source code
select * from public.events where ip_address != '{{ var("exclude_ip") }}'

-- compiles to
select * from public.events where ip_address != '192.168.1.1'

The vars parameter in dbt_project.yml is compiled, so you can use jinja templating there as well! The primary use case for this is specifying "input" models to a dependency.

Previously, dependencies used ref(...) to select from a project's base models. That interface was brittle, and the idea that dependency code had unbridled access to all of your top-level models made us a little uneasy. As of this release, we're deprecating the ability for dependencies to ref(...) top-level models. Instead, the recommended way for this to work is with vars! An example:

-- dbt_modules/snowplow/models/events.sql

select * from {{ var('snowplow_events_table') }}

and

models:
  Snowplow:
    vars:
      snowplow_events_table: "{{ ref('base_events') }}"

This effectively mirrors the previous behavior, but it much more explicit about what's happening under the hood!

3. specify a dependency branch #165

With this release, you can point DBT to a specific branch of a dependency repo. The syntax looks like this:

repositories:
    - https://github.com/analyst-collective/dbt-audit.git@development # use the "development" branch

4. More & Better Docs!

Check em out! And let us know if there's anything you think we can improve upon!

Upgrading

To upgrade to version 0.5.0 of dbt, run:

pip install --upgrade dbt

Assets 2

20 Sep 16:17

drewbanin

v0.4.7

94b6e82

dbt version 0.4.7

0. tl;dr

--version command
pre- and post- run hooks
windows support
event tracking

1. --version #135

The --version command was added to help aid debugging. Further, organizations can use it to ensure that everyone in their org is up-to-date with dbt.

$ dbt --version
installed version: 0.4.7
   latest version: 0.4.7
Up to date!

2. pre-and-post-hooks #147

With this release, you can now specify pre- and post- hooks that are run before and after a model is run, respectively. Hooks are useful for running grant statements, inserting a log of runs into an audit table, and more! Here's an example of a grant statement implemented using a post-hook:

models:
  my_project:
    post-hook: "grant select on table {{this}} to looker_user"
    my_model:
       materialized: view
    some_model:
      materialized: table
      post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())"

Hooks are recursively appended, so the my_model model will only receive the grant select... hook, whereas the some_model model will receive both the grant select... and insert into... hooks.

Finally, note that the grant statement uses the (hopefully familiar) {{this}} syntax whereas the insert statement uses the {{this.name}} syntax. When DBT creates a model:

A temp table is created
The original model is dropped
The temp table is renamed to the final model name

DBT will intelligently uses the right table/view name when you invoke {{this}}, but you have a couple of more specific options available if you need them:

{{this}} : "schema"."table__dbt_tmp"
{{this.schema}}: "schema"
{{this.table}}: "table__dbt_tmp"
{{this.name}}: "table"

3. Event tracking #89

We want to build the best version of DBT possible, and a crucial part of that is understanding how users work with DBT. To this end, we've added some really simple event tracking to DBT (using Snowplow). We do not track credentials, model contents or model names (we consider these private, and frankly none of our business). This release includes basic event tracking that reports 1) when dbt is invoked 2) when models are run, and 3) basic platform information (OS + python version). The schemas for these events can be seen here

You can opt out of event tracking at any time by adding the following to the top of you ~/.dbt/profiles.yml file:

config:
    send_anonymous_usage_stats: False

4. Windows support #154

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt 0.7.0

dbt 0.6.2

dbt 0.6.1

dbt 0.6.0

tl;dr

Custom SQL data tests

DBT-core integration tests

The Future

Fixed 0.5.1 regressions

Fixed 0.5.2 regressions

The future

0. tl;dr

1. Source table archival #183

2. Incremental column expansion #175

3. First-class Postgres support #183

4. Root-level configs #161

5. --threads CLI option #143

Upgrading

And another thing

0. tl;dr

1. new incremental model generation #138

2. Arbitrary configuration #146

3. specify a dependency branch #165

4. More & Better Docs!

Upgrading

0. tl;dr

1. --version #135

2. pre-and-post-hooks #147

3. Event tracking #89

4. Windows support #154

Releases: dbt-labs/dbt-core

dbt 0.7.0

dbt 0.7.0

dbt 0.6.2

dbt 0.6.2

dbt 0.6.1

dbt 0.6.1

dbt 0.6.0

dbt 0.6.0

dbt 0.5.4

tl;dr

Custom SQL data tests

DBT-core integration tests

The Future

dbt 0.5.3

Fixed 0.5.1 regressions

Fixed 0.5.2 regressions

The future

dbt 0.5.2

dbt version 0.5.1

0. tl;dr

1. Source table archival #183

2. Incremental column expansion #175

3. First-class Postgres support #183

4. Root-level configs #161

5. --threads CLI option #143

Upgrading

And another thing

dbt version 0.5.0

0. tl;dr

1. new incremental model generation #138

2. Arbitrary configuration #146

3. specify a dependency branch #165

4. More & Better Docs!

Upgrading

dbt version 0.4.7

0. tl;dr

1. --version #135

2. pre-and-post-hooks #147

3. Event tracking #89

4. Windows support #154