Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt should have testing functionality to ensure common schema rules are valid #38

Closed
jthandy opened this issue May 14, 2016 · 2 comments
Assignees

Comments

@jthandy
Copy link
Member

jthandy commented May 14, 2016

Rules like uniqueness and referential integrity are incredibly common when doing analysis on a schema. As a part of the testing process, dbt should be able to be fed a schema configuration that instructs it how to test that a schema is following these rules, and then be able to run these tests automatically.

There are three specific schema constraints that we should test for:

  1. Not null
  2. Uniqueness
  3. Referential integrity

Below I'll provide the most standardized SQL to use to test these constraints. All tests pass when the queries return 0.

not null

Could be declared like table.field is not null.

with t as (

  select [field] as f
  from [table]

)

select count(*) 
from t
where f is null

uniqueness

Could be declared like table.field is unique.

with t as (

  select [field] as f
  from [table]

)

select count(*) from (

  select f
  from t
  group by f
  having count(*) > 1

)

referential integrity

Could be declared like parent table one joins to child table many on one.id = many.one_id.

with one as (

  select [pk field name] as id
  from [table name of parent table]

), many as (

  select [fk field name] as id
  from [table name of child table]

)

select count(*)
from many
where id not in (select id from one)
and id is not null
@jthandy
Copy link
Member Author

jthandy commented May 14, 2016

/cc @drewbanin

Wanted to have thought this through so that when you're ready for it you have a roadmap to implement. I think that generating and running a bunch of tests using these as templates would cover what we need.

I don't feel too strongly about the specific syntax I used for constructing the schema declarations, but I did try to make it "sql-like", which I think is a good thing.

@jthandy
Copy link
Member Author

jthandy commented Jun 22, 2016

This has been deployed to pypi.

@jthandy jthandy closed this as completed Jun 22, 2016
yu-iskw pushed a commit to yu-iskw/dbt that referenced this issue Aug 17, 2021
…te_series

added new generate_series macro
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants