# Loading CRM data lab

### Introduction

In this lesson, we'll work on using our source data in DBT, and providing some initial coercions of that data in staging.

### Loading Hubspot Data

Start from the master branch.  You should currently have a `sources.yaml` file on the master branch.  

> If you don't check to see that you merged the previous branch into master.  Then you can `pull from master` from the master branch in DBT.

Then, create a new branch called `add_hubspot_models`.

From here, let's begin to place some of our staging files into different folders.  Create a new folder called `rds` to place the rds files, as well as the related `sources.yaml` file.  

And also create a folder called `hubspot` that wil have it's own `sources.yaml` and a model for hubspot contacts called `stg_hubspot_contacts.sql`.  


> You can move the files into the newly created folder by clicking on each file, and choosing `rename`, and then updating the path.

When it's all setup, your file structure should look like the following:

> <img src="./file-structure.png" width="40%">

As you can see above, we also want to create a new folder called `hubspot`, and add a new file called `stg_hubspot_contacts.sql`, as well as another `sources.yaml` file.

For the hubspot contacts:

1. Create a source, and reference it from the staging file

> We can confirm that the source is working when we see the following in the lineage:
> <img src="./hubspot-lineage.png" width="60%">

2. Provide the same cleanup to the phone numbers data as we did in the `stg_rds_customers.sql` model, and also display a `contact_id` with a prefix of `hubspot`.

And when our phone number information is properly cleaned, the data should look like the following:

> <img src="./hubspot-prefix.png" width="100%">

### Adding relations

Now in our hubspot data, we also have company information.  Let's add that now.  

Create a file called `stg_hubspot_companies.sql` and then add `company_id` column derived from the company name, and add also have a column for the company name.  When complete the data should look like the following.  

> <img src="./hubspot-queries.png" width="100%">

And group by business name to make sure we do not have duplicates.

Next, we should associate the our contacts with the related business through their business id.  When this is complete our contact data should look like the following.

> <img src="./related-hubspot.png" width="100%">

And the lineage for contacts should look like the following.

<img src="./hubspot-contacts-lineage.png" width="100%">

### Making it consistent

Now we have done most of the work with setting up our hubspot models.  But we should take a moment to make our hubspot staging models a bit more consistent with the related models in our rds folder.

This means the following:

* We should align the columns in our various sources so that our columns are in the same order, with consistent names and consistent formatting

If we look at the `stg_rds_customers` file, the columns should currently align with the `stg_hubspot_contacts` file.  

* Confirm that the columns are in the same order, and with the same formatting (for example the same formatting should apply to phone numbers).  

Then take a look at the `stg_rds_companies` and `stg_hubspot_companies` files.  The main issue right now is that under hubspot, we are using the column name `business_name`, whereas under our `stg_rds_companies` we are using `company_name`.  Let's change both to use the column name `name`.

> Make sure to update the join when complete.

Confirm that everything is working by running:
    
```bash
dbt run --models staging.*
```

You should see only green check marks.

### Resources

[DBT date utils](https://github.com/calogica/dbt-date)

[Snowflake datetime](https://docs.snowflake.com/en/sql-reference/functions/year.html#examples)