# DBT Workflow Lab

### Introduction

In this lesson, we'll practice working through the DBT workflow.  And we'll use DBT to connect to both our snowf and github accounts, and create a new table in our data warehouse, and then update our master branch on Github to reflect the changes in our DBT codebase.  Let's get started.

### Working with our Redshift Data

Let's start by logging into AWS and connecting to our redshift database through the Query Editor.  You'll see that you have connected the Query Editor to the proper database if you see various tables listed over to the right.

> <img src="./tables_avail.png" width="40%">

From there, run a query in the query panel to select the first five events from the `event` table.

> While the actual events may vary, we should see something like the following:

<img src="./event_queried.png" width="60%">

### Setting up DBT

Ok, now it's time to perform our redshift queries from DBT.  Remember, that in our ELT process, we use DBT to perform just the `T` component.  The data is generally already loaded into our data warehouse, and from there we'll select and clean our loaded data.  

But to make any changes, we first need to create a new branch, as we cannot make any changes directly on the master branch.  So begin by creating a new branch called `build_dim_categories_model`. 

And then create a new folder called `models` with a `dim_categories.sql` file in it.  And remove the `examples` folder.  When completed, your file structure should look the following.

<img src="./build_dim_categories.png" width="30%">

### Querying Redshift from DBT 

Ok, so now that we created a new branch and updated our file structure, it's time to query our data warehouse.  

Before having DBT make any new tables in our data warehouse, let's take a look at some of the data we have already loaded in.  Begin by querying the `event` table from DBT.

<img src="./cat_id.png" width="100%">

Ok, so now let's take a look at the `Compiled SQL` tab to see what DBT executed for us. 

<img src="./cat_limited.png" width="100%">

So we can see that it added in the `limit 500`.

Ok, now let's try to create a new table called `dim_categories`.  To do so, first use DBT to select the `catid`, `catgroup`, and `catname` columns, and rename them as `id`, `group` and `name`.  

Then preview the results, and you should see something like the following.

<img src="./cat_results.png" width="100%">

So we are now ready to create a new `dim_categories` table with the `id`, `group` and `name` columns that we seee above.

So now use the proper dbt command to create the `dim_categories` table.

> If it works, you should see `Passed` and something like the following:

<img src="./proper_run.png" width="60%">

Then open the tabs for the view the logs and details.

> <img src="./logs_details.png" width="80%">

Notice under the details at the bottom it says `OK created view model dim_categories`.

A view is a kind of table in postgres or redshift.  Let's confirm that a new table has been created by going back to our redshift query editor, and confirming that there is a schema beginning with `dbt` that has the `dim_categories` table.

> <img src="./dim_cat_cols.png" width="30%">

Ok, this looks good.  Now that the changes have been made to our database, we need to update our master branch.

### Updating the master branch

So now to update the master branch, we'll need to:

* make a commit on the current branch
* open the pull request
* create the pull request
* merge the pull request to master

You'll know that the master branch was properly updated if you go to your repository's master branch and see the following:

> <img src="./pull_request_lab.png" width="60%">

With that you have completed your first set of changes through DBT.

### Summary

Let's make sure that we have a good understanding of the DBT workflow.  We start off with a good coding workflow in general:

* checkout a new branch from master
* code on that new branch

Then we get to our DBT queries:

* Preview a DBT query
* Create a new table through DBT with `dbt run --models` command
* Go to redshift to verify the changes are properly made

Then update the master branch
* Make a commit
* Open the pull request on github
* Create a pull request
* Merge the pull request to master

And then we go to the master branch to verify that our codebase on master has been updated.

### Resources

[RDS Redshift Lab](https://github.com/jigsawlabs-student/rds-to-redshift-lab/blob/main/notes/0-rds-to-redshift-solution.ipynb)

[Data Warehouse Spectrum -> DBT Towards DS](https://towardsdatascience.com/a-data-warehouse-implementation-on-aws-a96d0e251abd)

[Stitch -> DBT](https://www.startdataengineering.com/post/build-a-simple-data-engineering-platform/)