# Source Freshness

### Introduction

In the last lesson, we saw how we can use sources so that if the name or schema of our source table changes, we will only need to change the source yaml file, and not the rest of our codebase.  

In this lesson, we'll see how sources will allow us to enforce the freshness of our data, so that we are not querying stale data.

Let's get started.

### Asserting Freshness

**Warning!!** You likely **will not** be able to code along with this reading if you do not have a `_fivetran_synced` column.  Still, read the lesson and ensure that you understand why we would use this feature, and how it works.

When performing analysis, it's common to want to ensure that our data is up to date.  We can assess the freshness of our data directly in our sources.  We can do so by updating our yaml file to something like the following:

```yaml
version: 2
sources:
  - name: rds
    database: fivetran_database
    freshness: 
      warn_after: {count: 12, period: hour}
    loaded_at_field: _fivetran_synced
    schema: postgres_northwinds_rds_public
    tables:
      - name: customers
```

So in the yaml above, we now added a new field called `freshness` and asked to warn after 12 hours.

Notice that we also added a `loaded_at_field` pointing to `_fivetran_synced`.  This is needed for the freshness component.  Here, we're specifying the column that dbt will look to in checking if the the data has been loaded too long ago.  

Ok, so once we updated our yaml file, we can now check for freshness by running the following in our DBT console:

`dbt source snapshot-freshness`

> <img src="./dbt-freshness.png" width="100%">

> Notice the `1` in yellow indicates a warning.

So when we ran `dbt source snapshot-freshness`, we can see that this did pass, and we received one warning.

And then if we look at the logs for the customers table, followed by details, we can see how dbt performed the query:

> <img src="./cust-details.png" width="100%">

So we can see that DBT looks at the most recent time in the `_fivetran_syced` column, which we specified above.  If the most recent time was too long ago, then it will throw a warning, which it did.

### Table level freshness

Above, we placed the same freshness rules across our entire `fivetran` database -- or at least all of the tables listed in sources in that database.  But we can override this general configuration for an individual table with something like the following:

```yaml
version: 2
sources:
  - name: rds
    database: fivetran_database
    freshness: 
      warn_after: {count: 12, period: hour}
    loaded_at_field: _fivetran_synced
    schema: postgres_northwinds_rds_public
    tables:
      - name: customers
        freshness:
          error_after: {count: 6, period: hour}
```

So now, we can think of the `warn after 12 hours` as the default behavior, while we override that behavior on the customers table to error after six hours.

If we run `dbt source snapshot-freshness`, this time we see that customers failed.

> <img src="./failed-freshness.png" width="80%">

Let's remove the freshness requirement on customers so we no longer get that failure.

```yaml
version: 2
sources:
  - name: rds
    database: fivetran_database
    freshness: 
      warn_after: {count: 12, period: hour}
    loaded_at_field: _fivetran_synced
    schema: postgres_northwinds_rds_public
    tables:
      - name: customers
```

### Summary

In this lesson, we learned how to check for freshness through our dbt sources.  We did so by updating our yaml file to have a freshness filed.

```yaml
version: 2
sources:
  - name: rds
    database: fivetran_database
    freshness: 
      warn_after: {count: 12, period: hour}
    loaded_at_field: _fivetran_synced
```

Important to this is the `loaded_at_field` as this is the column that dbt will look for when checking for freshness.  And to run this check, we run the command `dbt source snapshot-freshness` in our console.

### Resources

[DBT FTW](https://www.justinwagg.com/dbt/)