# End to End DBT

### Introduction

In this lesson, we'll start with exploring our data in snowflake, use DBT to transform our data, and end by producing a visualization in Mode.

### Getting a fact table

Return to our snowflake database, and take a look at the tables listed in our northwinds OLTP database. 

<img src="./rds-tables.png" width="40%">

Move through the tables above in your own snowflake database, and try to identify our fact table.  Remember that the fact table should have primarily foreign keys as well as numerical information we wish to measure.

> There may be more than one table that will help us construct our fact table.

### Moving to DBT

Now let's move to DBT, and create a new model in staging.  The staging model should be for orders and should have the following columns of `order_id`, `order_date`, `employee_id`, `customer_id`, `product_id`, `quantity`, `discount`, and `unit_price`.  

To build the staging model, update the sources yaml file with the necessary sources.

> Notice that we did not include the shipping information -- including the `required_date`.  This is because, theoretically, that would go in it's own staging file.  All of the `ship_` prefixes were hints that this belongs in a separate model.  

We should see the following results, ordered by order date from earliest to latest.

<img src="./updated-orders.png" width="100%">

* Using Jinja

Notice that each of the foreign keys, of `order_id`, `product_id`, and `employee_id`, and `customer_id` have `rds-` preceding each id.

Use a for loop in Jinja to produce this repetitive work for the `product_id`, `employee_id`, and `customer_id` columns.

* Adding tests

Finally, before wrapping up, add a couple of tests to the new sources.  We can see how, by looking at the [DBT documentation](https://docs.getdbt.com/docs/building-a-dbt-project/using-sources).

### Moving to Integration

Next create an integration file.  Remember that in our integration section, we generally combine our related staging files together.  Here, of course, we only have one staging file related to orders.  

So let's just load that in.  We should see something like the following in our lineage.

> <img src="./orders-lineage.png" width="90%">

* Surrogate key

Now we use the `dbt.surrogate_key` function generate a new id for us.  This is because, when we merge duplicate records, we lose the original id.  In this case, because our data only comes from one source, there are no duplicate records, so we are likely in good shape.  

Still, because it's a very small amount of work, and we may add additional sources in the future, we'll still use the `surrogate_key` methods.  

Have the the surrogate key generated from the `product_id`, `order_date`, and `customer_id` columns.

In our preview, we should see something like the following.

<img src="./int_orders.png" width="100%">

* `utils.star`

> Notice that we do not have the original `order_id` column.  Use the `dbt_utils.star` function to select each column minus the `order_id` column.

### Onto marts

Now let's create a `orders_fact` table.  Begin by creating a new file in the `mart` folder called `orders_fact.sql`.

> The previewed data should look something like the following.  We ordered the data by `order_date`.

<img src="./updated-mart.png" width="100%">

> Remember that in the final fact table, we would like to use all of the foreign keys generated by our surrogate keys.  Here, that is only the `contact_pk`.  

* Materialize the table

Make sure to materialize the table.  And then move to snowflake to confirm that it is converted.

<img src="./mart-tables.png" width="60%">

### Adding some tests

Before moving to our business intelligence tool, let's add some additional tests to our codebase.  

#### Integration tests

At the integration level, we should ensure that all of our primary key columns are unique across each integration table, and that no primary key is null.  Do so by adding a new file called `schema.yml` in the integration folder, and then adding the related tests for each integration file.

> Remember that our primary keys are generated from the `surrogate_key` function, and the same key will be generated if the dependent values are the same.

* So at this point we should to ensure that the primary key for the `int_companies`, `int_contacts`, and `int_orders` models are not null, and that they are each unique.
* We should also have our tests that hubspot_contacts do not have null values in the `first_name` and `last_name` columns.

If we add the related tests, and then run DBT test we see something like the following:

> <img src="./failing-test.png" width="100%">

We can see that we have one failing test.

If you look at the details of the test, you can select part of the query, execute, and discover the reason for the failure. 

> Copy the relevant part of the query in a new file in dbt.  There, we can discover the record that is failing us.

<img src="./unique-fail.png" width="60%">

It looks like we did not specify a unique combination of columns when generating the surrogate key.  So add in the `order_id` column to our list of other columns for our surrogate key for orders -- and none of the other columns.  That should do the trick.  

<img src="./passing-int-tests.png" width="100%">

After making the necessary changes, confirm that all tests are now passing.

#### Mart Tests

Next, we should add the following set of tests for our mart tables.

* Primary key tests

Begin by adding the same tests for the primary keys -- that pks are not null, and are unique -- for each of our three mart tables.

* Referential tests

One of the main components to ensure in the marts is referential integrity.  By that, we mean that we should ensure that each foreign key has a corresponding primary key in the relevant table.

For example, when a contact has a `company_pk`, we want to make sure that `company_pk` exists in the companies dimension table.  Look for `relationships` in the [testing documentation](https://docs.getdbt.com/docs/building-a-dbt-project/tests).  

Then add two new tests -- one for each of the relevant foreign keys across our mart tables.

We should see that all of our tests are now passing.

<img src="./passing-tests.png" width="100%">

### Drawing Insights

From there, use mode to produce the following chart -- that displays the total revenue for each day (you do not have to factor in discounts).

<img src="./revenue-by-day.png" width="100%">

### Summary

Think back through the steps involved in this lab.  And think through the components involved in each stage -- from exploring data in snowflake, to sources, staging, integration and marts.  

As a review, record some example code that belongs in each component of the codebase (from sources, to staging, to integration, to marts).  And if you would like even more practice, you can load in some mixpanel data which is available at:

* `s3://jigsaw-labs-student/northwinds/northwinds_mixpanel.csv`

And can be accessed with the API keys of:

* KEY_ID: 'AKIARIMMA5YSLC62OGJ4'
* SECRET: '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'

Take a look at the Staging in Snowflake reading, and Northwinds Data Lab for reference.