# DBT Workflow Lab

### Introduction

In this lesson, we'll practice working through the DBT workflow.  And we'll use DBT with both our database and our github repository.

We'll create a new view in our database, and then update our main branch on Github to reflect the changes in our DBT codebase.  Let's get started.

### Working with our Postgres Data

Let's start by connecting to our northwinds database in postgres.

From there, run a query in the query panel to select the first five shippers from the `shippers` table.

```bash
 shipper_id |   company_name    |     phone
------------+-------------------+----------------
          1 | Speedy Express    | (503) 555-9831
          2 | United Package    | (503) 555-3199
          3 | Federal Shipping  | (503) 555-9931
          4 | Alliance Shippers | 1-800-222-0451
          5 | UPS               | 1-800-782-7892
```

### Setting up DBT

Ok, now it's time to perform our redshift queries from DBT.  Remember, that in our ELT process, we use DBT to perform just the `T` component.  The data is generally already loaded into our data warehouse (here postgres), and from there we'll select and clean our loaded data.  

But to make any changes, we first need to create a new branch, as we cannot make any changes directly on the master branch.  So begin by creating a new branch called `build_stg_shippers_model`. 

> Running: `git branch -a` should show the * next to the `build_stg_shippers_model` branch.

And then in the `models/staging` folder add a new file: `stg_shippers.sql`.

### Querying from DBT 

Ok, so now that we created a new branch and updated our file structure, it's time to query our database from dbt.  

So write a select statement to just select the `shipper_id` and `company_name` columns from the `shippers` table. 

Then **only compile** the sql.  

> If you go to the `target/compiled` folder, you should see a new file `stg_shippers.sql`.

Ok, now let's try to create a new view called `stg_shippers`.  If it works, you should see something like the following in the terminal.

```bash
1:28:04  1 of 1 START sql view model dev.stg_shippers ................................... [RUN]
01:28:04  1 of 1 OK created sql view model dev.stg_shippers .............................. [CREATE VIEW in 0.07s]
```

This will have created a new view by first creating a new file in the `target/run` folder and then running that against the database.  So take a look at the file in the `target/run` folder.

```sql
create view "northwinds"."dev"."stg_shippers__dbt_tmp"
  as (
    select shipper_id, company_name from shippers
  );
```

And then connect to postgres and select from this view to confirm that this view has been created.

```bash
shipper_id |   company_name
------------+------------------
          1 | Speedy Express
          2 | United Package
          3 | Federal Shipping
(3 rows)
```

Ok, this looks good.  Now that the changes have been made to our database, we need to update our main branch.

### Updating the main branch

Ok, so let's first see the files we have changed so far.

```bash
git status

    models/staging/stg_shippers.sql
```

> Notice the files in the `target` directory are not added -- this is because of the .gitignore in our dbt project folder.

Ok, let's now update main branch on github with this file.

So now to update the main branch, we'll need to:

* make a commit on the current branch
* push that branch to github
* create the pull request
* merge the pull request to main

You'll know that the main branch was properly updated if you go to your repository's main branch and see the new `stg_shippers.sql` file.

### Summary

Let's make sure that we have a good understanding of the DBT workflow.  We start off with a good coding workflow in general:

* checkout a new feature branch from main
* Add some code to that branch (eg. a new .sql file).

Then we can create a new view:

* Create a new table through DBT with `dbt run --models` command
* Go to postgres (or whichever database being used) to verify the changes are properly made

Then update the main branch
* Make a commit
* Push to the branch on github
* Open the pull request on github
* Merge the pull request to main

And then we go to the main branch to verify that our codebase on main has been updated.

### Resources

[RDS Redshift Lab](https://github.com/jigsawlabs-student/rds-to-redshift-lab/blob/main/notes/0-rds-to-redshift-solution.ipynb)

[Data Warehouse Spectrum -> DBT Towards DS](https://towardsdatascience.com/a-data-warehouse-implementation-on-aws-a96d0e251abd)

[Stitch -> DBT](https://www.startdataengineering.com/post/build-a-simple-data-engineering-platform/)