# Loading CRM data lab

### Introduction

In this lesson, we'll work on using our source data in DBT, and providing some initial coercions of that data in staging.

### Loading Hubspot Data

Start from the main branch.  You should currently have a `sources.yaml` file on the main branch.  

Then, create a new branch called `add_hubspot_models`.

From here, let's begin to place some of our staging files into different folders.  Create a new folder called `models/staging/rds` (if you don't already have this) to place the rds files, as well as the related `sources.yaml` file.  

And also create a folder called `models/staging/hubspot` that will have it's own `sources.yaml` and a model for hubspot contacts called `stg_hubspot_contacts.sql`.  

When it's all setup, your file structure should look like the following:

```
models
└── staging
    ├── hubspot
    │   ├── sources.yaml
    │   └── stg_hubspot_contacts.sql
    └── rds
        ├── sources.yaml
        ├── stg_shipping.sql
        ├── stg_rds_companies.sql
        ├── stg_rds_customers.sql
        └── stg_rds_suppliers.sql
```

For the hubspot contacts:

1. Update the `sources.yaml` file in the hubspot folder, so that it references our new `dev.northwinds_hubspot` table.

Then from the `stg_hubspot_contacts.sql` file, select * from that source.  Run `dbt run` to confirm that it is connected correctly. 

2. Provide the same cleanup to the phone numbers data as we did in the `stg_rds_customers.sql` model, and also display a `contact_id` with a prefix of `hubspot`.

And when our phone number information is properly cleaned (and you run `dbt run`, you should be able to select from the `stg_hubspot_contacts` table and see the properly formatted data.

So run the following:

Run: 
    
`psql -d northwinds -c "select * from dev.stg_hubspot_contacts order by last_name limit 3"`

```
 contact_id  | first_name | last_name |     phone
-------------+------------+-----------+----------------
 hubspot-462 | Cello      | Abbado    | (821) 998-1092
 hubspot-237 | Paolo      | Accorti   | (011) 498-8260
 hubspot-100 | Marci      | Addy      | (651) 597-0736
 ```

### Adding relations

Now in our hubspot data, we also have company information.  Let's add that now.  

Create a file called `stg_hubspot_companies.sql` and then add `company_id` column derived from the company name, and add also have a column for the business name.  When complete the data should look like the following.  

> <img src="./hubspot-queries.png" width="100%">

Run the following command:

`psql -d northwinds -c "select * from dev.stg_hubspot_companies order by last_name limit 3"`

```
company_id                 |           business_name
--------------------------------------------+------------------------------------
 hubspot-adams,-rau-and-pollich             | Adams, Rau and Pollich
 hubspot-alfreds-futterkiste                | Alfreds Futterkiste
 hubspot-ana-trujillo-emparedados-y-helados | Ana Trujillo Emparedados y helados
````

And group by business name to make sure we do not have duplicates.

Next, we should associate the our contacts with the related business through their business id.  When this is complete, confirm the data is correct by running the following:

`psql -d northwinds -c "select * from dev.stg_hubspot_contacts order by last_name limit 3"`

```
contact_id  | first_name | last_name |     phone      |            company_id
-------------+------------+-----------+----------------+-----------------------------------
 hubspot-462 | Cello      | Abbado    | (821) 998-1092 | hubspot-pagac-spencer
 hubspot-237 | Paolo      | Accorti   | (011) 498-8260 | hubspot-franchi-s.p.a.
 hubspot-100 | Marci      | Addy      | (651) 597-0736 | hubspot-wiegand,-upton-and-ledner
 ```

### Making it consistent

Now we have done most of the work with setting up our hubspot models.  But we should take a moment to make our hubspot staging models a bit more consistent with the related models in our rds folder.

This means the following:

* We should align the columns in our various models so that our columns are in the same order, with consistent names and consistent formatting

If we look at the `stg_rds_customers` file, the columns should currently align with the `stg_hubspot_contacts` file.  

* Confirm that the columns are in the same order, and with the same formatting (for example the same formatting should apply to phone numbers).  

Then take a look at the `stg_rds_companies` and `stg_hubspot_companies` files.  The main issue right now is that under hubspot, we are using the column name `business_name`, whereas under our `stg_rds_companies` we are using `company_name`.  Let's change both to use the column name `name`.

> Make sure to update the join when complete.

Confirm that everything is working by running:
    
```bash
dbt run --models models/staging
```

You should see only green check marks.

### Wrapping Up

Make a commit 

* `git add -A`
* `git commit -m 'add hubspot staging'`
* `git checkout main`
* `git merge -` (the hyphen references the branch you were most recently on)
* `git push origin main`

### Resources

[DBT date utils](https://github.com/calogica/dbt-date)

[Snowflake datetime](https://docs.snowflake.com/en/sql-reference/functions/year.html#examples)