# The DBT Workflow

### Introduction

Now that we have gotten our DBT account connected to our data warehouse and our github repository, it's time to see how we can use DBT with our data warehouse.  Now remember, that there are a few main benefits of using DBT:


1. It provides an opinionated workflow for writing our queries
2. It provides opinionated *file structure* for organizing our SQL queries
1. It allows to quickly turn SQL SELECT statements into new SQL tables populated with that selected data


In this lesson, we'll focus on the two benefits: we'll see how DBT can allow us to quickly create and populate new tables, and we'll see how it enforces a proper development workflow.

### Moving to DBT

Now as we know, we can always directly query our Northwinds database where it's currently set up, in postgres, with something like:

```sql
select * from customers limit 1;
```

Ok, now let's try to run this same query from DBT.  First, remember that we should have already connected DBT to our postgres database.  

You can confirm your connection by navigating to your local dbt repository, and then typing in the following:

```bash
dbt debug
```

There, under `connection` you should see something like the following: 

```bash
22:54:19  Connection:
22:54:19    host: localhost
22:54:19    port: 5432
22:54:19    user: jeffreykatz
22:54:19    database: northwinds
```

And remember that this is connection is made from the `dbt_project.yml` file -- with the value for `profile`.

```yaml
profile: 'northwinds_dbt'
```

And this connecting to ~/.dbt/profiles.yml file.  If you run `cat .dbt/profiles.yml` from your command line, you should see the database information.

```yaml
northwinds_dbt:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      user: jeffreykatz # add your username
      password: "" # add your password, or use "" if none
      port: 5432
      dbname: northwinds
      schema: dev
      threads: 1
      connect_timeout: 30
```

### A good workflow

Ok, so the main purpose of DBT is to bring the best practices of coding to analytics engineering.  And that means a good git workflow.  So currently, your `codebase` folder should look something like the following:

```bash
logs requirements.txt
northwinds_dbt venv
```

The northwinds_dbt folder is where our dbt repository is located.  And `venv` contains our required pip packages.  We should also create a `.gitignore` file so that our venv folder is not pushed up to github.

`.gitignore`

```
venv/
```

And we should also create a git repository inside of the `codebase` directory.
```
git init
```

Now if you type `ls`, inside of the `codebase` directory you should see something like the following:

```bash
.git logs requirements.txt
.gitignore northwinds_dbt venv
```

Next, we can create a github repository.  So go ahead, create a repo -- mine is called `northwinds_dbt_core`.

<img src="./north-dbt.png" width="60%">

And then make a github commit, and push up to github.

`git status`

```bash
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.gitignore
	logs/
	northwinds_dbt/
	requirements.txt
```

```bash
git add -A
git commit -m 'add initial setup'
```

Then add the remote origin:

` git remote add origin your_github_repo`

And finally push up to github.

```bash
git push origin main
```

### Making a change

Ok, so now that we pushed the initial repository, let's add some code to the codebase.  

First create a new branch in git.

`git checkout -b build_customers_dim_model`

Ok, so now it's about time add our query to DBT.

Now, we should have a `models/staging` folder, and then under that folder a file called `stg_customers.sql`.  Once this new file is created, we can write the following query.

`select contact_name, address, phone from customers`

> Notice that there is no semi-colon at the end of our SQL statement. DBT will add that for us.  

Then run the following.

```bash
dbt compile
```

When you run dbt compile, dbt will not actually touch your database.  Instead all it does is generate a file.  Go to the `northwinds_dbt` directory and navigate to `target/compile/models/staging/stg_customers.sql`.

There you'll see something like the following:

```sql
select contact_name, address, phone from customers
```

Now if you look at the `target/run` folder, you will see that there is not yet a `stg_customers.sql` file.  So now run:

```
dbt run
```

This will do a couple of things:
    
1. If there's any update, it will recreate the file in the `target/compile` folder
2. Notice that it also added a new `stg_customers.sql` to the `target/run` folder
3. It runs the the file in the `target/run` folder against the database

Ok, so we already looked at the file in the `target/compile` folder, next let's look at the file in the `target/run` folder.

```sql
create view "northwinds"."dev"."stg_customers__dbt_tmp"
  as (
    select contact_name, address, phone from customers
  );
```

So DBT will always create a new view, using the name of the file (above stg_customers) and then populate that view with the select statement in the last line of the file we wrote.  So we wrote:

`select contact_name, address, phone from customers` and DBT used it to generate the code for the view above.

Also, notice that in the terminal, it says that a new view was created: `dev.stg_customers`.

```bash
00:04:57  1 of 1 START sql view model dev.stg_customers .................................. [RUN]
00:04:57  1 of 1 OK created sql view model dev.stg_customers ............................. [CREATE VIEW in 0.07s]
```

### Returning to Postgres

So at this point we've used DBT to quickly create a new view.  Let's also confirm that it has in fact changed our database by returning to postgres.  Connect to your `northwinds` database.

```bash
psql northwinds
```

And then run the following:

```sql
\dv dev.*
```

```bash
List of relations
 Schema |      Name      | Type |    Owner
--------+----------------+------+-------------
 dev    | stg_categories | view | jeffreykatz
 dev    | stg_customers  | view | jeffreykatz
```

You should see the `stg_customers` listed.  And then if you select from it, you'll see the view populated with the three columns we specified:

```sql
select * from dev.stg_customers limit 3;
```

```bash
contact_name  |            address            |    phone
----------------+-------------------------------+--------------
 Maria Anders   | Obere Str. 57                 | 030-0074321
 Ana Trujillo   | Avda. de la Constitución 2222 | (5) 555-4729
 Antonio Moreno | Mataderos  2312               | (5) 555-3932
(3 rows)
```

### Updating the codebase

Alright, so now that we've confirmed that our postgres database was updated properly with our DBT code, the next step is to commit our changes to the codebase, and merge our changes to the `main` branch.

First confirm that you are on the feature branch (not main).

```bash
git branch -a
```

```bash
* build_customers_dim_model
  main
  remotes/origin/main
```

> The `*` indicates your current branch.  Remember you can run `git checkout -b name_of_branch` to create and switch to a new branch).

Then add a new commit and push that branch up to github.

```bash
git add -A
git commit -m 'add stg_customers model'
git push origin build_customers_dim_model 
```

If you go to your github repoistory, you can toggle to your new branch.

<img src="./github-branch.png" width="60%">

Then, click on `compare and pull request`.

> <img src="./compare-pull.png">

Then click create pull request, and add a commit message, then click on `Confirm merge`.

<img src="./create-pull.png" width="60%">

Finally, click on `Merge pull request` to merge our updated code with the main branch.

<img src="./merge-request.png" width="60%">

And then click on `confirm merge`.

If we go to the main branch, we will find our `models/stg_customers.sql` file with the code from DBT.

<img src="./updated-final.png">

And our work is complete :)

### Summary

In this lesson, we saw the workflow of DBT.  DBT connects to both our database, and runs sql commands against our database.  
We write our query in the models folder, and then can run model against the database by running:

```
dbt run
```
which will see if there are any updates in any model, and `dbt run --models models/staging/stg_customers.sql` to just perform the run command with that specific file.  The dbt run command performs the following:

1. Creates a file in the `target/compiled`
2. Creates a file that will wrap our code in a `create view` statement in `target/run`
3. Run the file in the `target/run` folder against the database, thus creating a new view.

And we confirmed the change was made in postgres, by connecting to our db, and looking at the view.

Now at this point, we still have not added our changes to the DBT codebase on the DBT main branch.  So to do this, we used DBT to create a new commit first on our branch, and then went to github to create a pull request and merge those changes into `main`.