# DBT Workflow Lab

### Introduction

In this lesson, we'll practice working through the DBT workflow.  And we'll use DBT to connect to both our snowflake and github accounts, and create a new table in our data warehouse, and then update our master branch on Github to reflect the changes in our DBT codebase.  Let's get started.

### Working with our Snowflake Data

Let's start by logging into snowflake and connecting to our database where the northwinds data is loaded.

Perform a query directly in snowflake.

> <img src="./suppliers_snowflake.png" width="90%">

### Setting up DBT

Ok, now it's time to perform our snowflake queries from DBT.  Remember, that in our ELT process, we use DBT to perform simply the `T` component.  The data is already loaded into our data warehouse, and from there we'll select and clean our loaded data.  

> If you are not currently on the master branch, change over to the master branch by clicking on `checkout branch` and then selecting `master`.

> <img src="./checkout-branch.png" width="90%">

You'll know that you are on the master branch if it says `branch: master`.

> <img src="./master-branch.png" width="70%">

Ok, as we know, we cannot make any changes directly on the master branch.  So begin by creating a new branch called `build_dim_suppliers_model`. 

And then create a new folder called `models` with a `dim_suppliers.sql` file in it.  And remove the `examples` folder.  When completed, your file structure should look something like the following.

<img src="./dim_supp.png" width="40%">

### Querying from DBT 

Ok, so now that we created a new branch and updated our file structure, it's time to query our data warehouse.  

Before having DBT make any new tables in our data warehouse, let's take a look at some of the data we have already loaded in.  Begin by querying the `suppliers` table from DBT.

<img src="./suppliers-preview.png" width="100%">

And now let's take a look at the `Compiled SQL` tab to see what DBT executed for us. 

> <img src="./compiled-sql.png" width="80%">

So we can see that it added in the `limit 500`.

Ok, now let's try to create a new table called `dim_categories`.  To do so, first use DBT to select the `supplier_id`, `company_name`, and rename   `company_name` to `name`.  

Then preview the results, and you should see something like the following.

<img src="./updated_supp.png" width="80%">

Make sure that you have saved the `dim_suppliers.sql` file, so that the green dot at the top of the file no longer appears.

Once you do, we are now ready to create a new `dim_suppliers` table with the `id`, and `name` columns that we see above.

So now use the proper dbt command to create the `dim_suppliers` table.

> If it works, you should see `Passed` and something like the following:

> <img src="./dim-supply.png" width="60%">

Then open the tabs for the view the logs and details.

> <img src="./suppliers_success.png" width="80%">

Notice under the details at the bottom it says `OK created view model dim_suppliers`.

A view is a kind of table in postgres or snowflake.  Let's confirm that a new table has been created by going back to snowflake, and confirming that there is a schema beginning with `dbt_` that has the `dim_suppliers` table.

> <img src="./snowflake-supp.png" width="80%">

Ok, this looks good.  Now that the changes have been made to our database, we need to update our master branch.

### Updating the master branch

So now to update the master branch, we'll need to:

* make a commit on the current branch
* open the pull request
* create the pull request
* merge the pull request to master

You'll know that the master branch was properly updated if you go to your repository's master branch and see the following:

> <img src="./added_supp.png" width="60%">

With that you have completed your first set of changes through DBT.

### Summary

Let's make sure that we have a good understanding of the DBT workflow.  We start off with a good coding workflow in general:

* checkout a new branch from master
* code on that new branch

Then we get to our DBT queries:

* Preview a DBT query
* Create a new table through DBT with `dbt run --models` command
* Go to snowflake to verify the changes are properly made

Then update the master branch
* Make a commit
* Open the pull request on github
* Create a pull request
* Merge the pull request to master

And then we go to the master branch to verify that our codebase on master has been updated.

### Resources

[RDS Redshift Lab](https://github.com/jigsawlabs-student/rds-to-redshift-lab/blob/main/notes/0-rds-to-redshift-solution.ipynb)

[Data Warehouse Spectrum -> DBT Towards DS](https://towardsdatascience.com/a-data-warehouse-implementation-on-aws-a96d0e251abd)

[Stitch -> DBT](https://www.startdataengineering.com/post/build-a-simple-data-engineering-platform/)