# DBT Tests

### Introduction

Another benefit of DBT is that it allows us to test that our data.  For example, we can use tests to ensure that no data in a specific column has null values, or that all values in a column are unique.  In this lesson, we'll see how to DBT can allow to write some tests to ensure data quality.

### Schema Testing Options

With DBT, we can use tests that DBT provides for us out of the box.  Let's start.  

The following helpers that DBT provides to test our data are the following:

* `unique`: Assert that all values in a column are unique

* `not_null`: Assert that no values in a column are null

* `accepted_values`: Assert that all values in a column are one of the accepted values

* `relationships`: Ensure that every foreign key maps to a primary key in the other model.

### Adding a test

Ok, so now that we see what we can do with DBT, let's begin to add our first test.  We'll start by adding tests for some of the models in our `staging` folder.

Under our hubspot folder, we can add a new file called `schema.yml` and add the following to that file.

```yaml
version: 2

models: 
  - name: stg_hubspot_contacts
    columns:
      - name: first_name
        tests:
          - not_null
      - name: last_name
        tests:
          - not_null
```

> This is in addition to our `sources.yaml` file.

The yaml above asserts that in the `stg_hubspot_contacts` model, the `first_name` and `last_name` columns do not have any null values.  Now let's move through the details.

At the top, the `version: 2` indicates the version of the DBT api that we are using.  Then, because we can assert tests for multiple `models` within the same test file, we then added the `models` key.  Then, we moved in two spaces and specified the name of the columns we wanted to test in the model.  And then we added a key for tests, and specified each test.

> So with yaml, we indicate that one entity is a subset of another by moving in two spaces.  And, as for those dashes, we need a dash before each element of a list.

### Running the tests

Now it's time to again run our tests against the data.  

In our DBT command line at the bottom, we can run tests by typing in `dbt test` and then pressing return. 

> Or, if we want to only run tests of a specific model, we can do so with a call to `dbt test --models stg_hubspot_contacts`.

Upon doing so, we should see something like the following:

```bash
16:38:18  1 of 2 START test not_null_stg_hubspot_contacts_first_name ..................... [RUN]
16:38:18  1 of 2 PASS not_null_stg_hubspot_contacts_first_name ........................... [PASS in 0.03s]
16:38:18  2 of 2 START test not_null_stg_hubspot_contacts_last_name ...................... [RUN]
16:38:18  2 of 2 PASS not_null_stg_hubspot_contacts_last_name ............................ [PASS in 0.01s]
16:38:18
16:38:18  Finished running 2 tests in 0 hours 0 minutes and 0.29 seconds (0.29s).
16:38:18
16:38:18  Completed successfully
16:38:18
16:38:18  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
```

### Viewing the compiled tests

In the output above, we can see `logs` of our test run.  However, we can also see the SQL that executed these tests if we look in our `target/compiled` folder.

Go to `target/compiled/staging/hubspot/schema.yaml` and there we'll see the SQL behind our two tests.

`not_null_stg_hubspot_contacts_first_name.sql`

```sql
select first_name
from "northwinds"."dev"."stg_hubspot_contacts"
where first_name is null
```

So we can see that to see that all contacts have a first name, DBT runs a test to see if any first_name is null.  If it gets back a value, it knows that the test fails.

> So just think of the test as looking for any defects in our data.  When a defect is spotted, a flag is raised.

### Debugging Tests

Now let's move through the procedure we can perform when a test does fail.  Let's say that we want to assert the following:

* All first names must be either `fran` or `daphne`

We can add that test by updating our yaml to be the following:

```yaml
version: 2

models: 
  - name: stg_hubspot_contacts
    columns:
      - name: first_name
        tests:
          - not_null
          - accepted_values:
              values: ['fran', 'daphne']
      - name: last_name
        tests:
          - not_null
```

Ok, now let's run the tests.

`dbt test`

```bash
16:59:25  Completed with 1 error and 0 warnings:
16:59:25
16:59:25  Failure in test accepted_values_stg_hubspot_contacts_first_name__fran__daphne (models/staging/hubspot/schema.yaml)
16:59:25    Got 477 results, configured to fail if != 0
```

Ok, so we can see that this time we can see that something is broken.  And that it has found 477 results that do not have one of those accepted values.

It would be nice if we can take a closer look at our defective data, to see what is going on.

Well notice that in our console it says:
```bash
compiled Code at target/compiled/northwinds_dbt/models/staging/hubspot/schema.yaml/accepted_values_stg_hubspot_contacts_first_name__fran__daphne.sql
```

If we go to that file, we'll see the following:

```sql
with all_values as (

    select
        first_name as value_field,
        count(*) as n_records

    from "northwinds"."dev"."stg_hubspot_contacts"
    group by first_name

)

select *
from all_values
where value_field not in (
    'fran','daphne'
)
```

We can then copy that query into postgres, or we can just run the file against our database like so, to see the invalid data.

```bash
psql -d northwinds -f target/compiled/northwinds_dbt/models/staging/hubspot/schema.yaml/accepted_values_stg_hubspot_contacts_first_name__fran__daphne.sql
```

```bash
value_field | n_records
-------------+-----------
 Torrie      |         1
 Derk        |         1
 Madge       |         1
 Rozele      |         1
```

### Finishing up

Ok, so let's change back to our original tests.

```yaml
version: 2

models: 
  - name: stg_hubspot_contacts
    columns:
      - name: first_name
        tests:
          - not_null
      - name: last_name
        tests:
          - not_null
```

And from there perform `dbt test` to confirm it works.  Then in git, add a new commit.  Then merge the branch to main and push the changes to github.

### Resources

[DBT Test Documentation](https://docs.getdbt.com/reference/resource-properties/tests)