# Sources in DBT

### Introduction

Now so far we have seen how we can use DBT to coerce data that is located in our data warehouse.  And we've then written queries that coerce that data.  

But, it turns out that in DBT, we also have mechanisms to keep track of that original source data -- and in DBT we do this with sources.

In this lesson, we'll see how we can set up sources in DBT, and also see some of the benefits of doing so. 

### Naming Dependencies with sources

Now so far our DBT models folder looks something like the following:

```md
models
└── staging
    ├── stg_shippers.sql
    ├── stg_rds_customers.sql
    └── stg_rds_suppliers.sql
```

So we have written code that queries our categories, customers, and suppliers table in our DBT folder.  So it would be nice if we had a way to describe our source data.

That's what sources allow for us -- they are a way for us to describe and name our loaded raw data from inside DBT.

> **Before getting started**, confirm you are on the main branch.

* `git branch -a`

> Then create a new branch called `adding_sources`.

* `git checkout -b adding_sources`



### Creating Sources

We can create sources, by adding a `sources.yaml` file inside `models/staging` folder.  And then in that file, we can write something like the following:

```yaml
# sources.yaml
version: 2
sources:
  - name: rds
    database: northwinds
    schema: public
    tables:
      - name: customers
```

> **What's rds?**  The name `rds` comes from AWS's service for a transactional database - rds ([more info here](https://aws.amazon.com/rds/)).  In a production stack, we would not be pulling our data from our local postgres instance, but rather from AWS's database service -- rds.  

### Referencing our sources

Once we do that, we should reference our source tables not by their name defined in the data warehouse, but through the source.  So for example, we can change the beginning of our customers staging model from:

```sql
with source as (
    select contact_name, address, phone from public.customers
),
```

To the following:

```sql


with source as (
select contact_name, address, phone from {{ source('rds', 'customers') }}
),
```

Here's how this works.

We just used DBT's `source` function to reference the name and table that we specified in the `source.yaml` file above.  When we do so, dbt will use the information in our yaml file to fill in our database, schema, and table name.

But what are those `{{ }}`.  Well those are delimiters to specify that we inside those brackets we are no longer writing sql, but rather are writing something called Jinja.  Jinja is built on, and very similar to Python.

### Why this is valuable

So we just saw a way that we can replace our table name with our sources.  Why would we want to do something like this?  

Well, this essentially protects us from changes to the naming in our external database.  For example, if our schema name changes from `public` to `team` -- then we can simply update our yaml table like so:
```yaml
version: 2

sources:
  - name: rds
    database: northwinds
    schema: team # not public
    tables:
      - name: customers
```
and all other references to our source outside of that yaml file will stay as:

```python
{{ source('rds', 'customers') }} 
```

So sources are valuable because they isolate the external dependencies (changes to the database name, table name, or schema) to source yaml file.

### Summary

In this lesson, we learned about writing sources.  We do so by creating a `yaml` file within our models folder.  Then instead of referencing the raw table name, we can instead use the source function to reference our table, like so:

```python
{{ source('postgres', 'customers') }} 
```

Because the source function is a function that DBT provides us, and not SQL, we need to surround it with the delimiters of `{{ }}` to specify that we are using Jinja -- which is similar to python.