# Staging Lab

### Introduction

In this lesson, we'll work through creating another staging table.  Let's get started.

### Getting setup 

We can begin by starting from main branch, and then creating a new branch for staging suppliers.  

Type the proper git command to confirm that you are no longer on the main branch, and are instead on the `build_staging_suppliers` branch.

Then make sure that your files are structured so that you have a staging folder underneath the models folder.  And in the staging folder, create the `stg_rds_suppliers.sql` file.

If you also have the `stg_rds_customers.sql` file from the reading, the file structure will look like the following.

```
northwinds-dbt
│   
└───models
   │  
   └───staging
    ├── stg_shippers.sql
    ├── stg_rds_customers.sql
    └── stg_rds_suppliers.sql
```

And from here we can begin to fill in the contents of the `stg_rds_suppliers.sql` file.

### Writing our Staging File

Remember that our staging file should follow the structure of:

1. Import CTEs
2. Logical CTEs
3. Final select statement

Let's take care of the import CTE and the final select statement first.  

* For the import CTE, select from the relevant suppliers table, and name the CTE as `source`.
* For the final select statement, select from the cte, and run `dbt run` to create the views.



If you call `dbt run`, you should first see the text `Completed Successfully`.  And then confirm that we created a new view `dev.stg_rds_suppliers` with all of the columns from our original suppliers table.

> From the command line (bash) you can run the following.

`psql -d northwinds -c "select * from dev.stg_rds_suppliers limit 1"`

And you should see the following columns outputted to the screen.

> `supplier_id |  company_name  |   contact_name   |   contact_title    |    address     |  city  | region | postal_code | country |     phone      | fax | homepage`

If you see this, it's a good sign that you have successfully created your import cte.

* Understanding our bash command

We'll be using that bash command a bit, so it's worth making sure we understand it.

`psql -d northwinds -c "select * from dev.stg_rds_suppliers limit 1"`

* `psql` - the program we'll be using, postgres
* `-d northwinds` - specifying our northwinds database
* `-c "select ..."` - issuing the command (`-c` is for command) "select * ".

### Creating our logical CTE

Next, it's time to write our logical CTE.  For this, we'll want to select each individual column.  But for the contact name, we want to the column data separated into columns of `contact_first_name` and `contact_last_name`.

Call `dbt run`, and then place the same bash command into the terminal.

`psql -d northwinds -c "select * from dev.stg_rds_suppliers limit 1"`

This time you should see `contact_first_name` and `contact_last_name` as separate fields, and the names separated out.

> The output will look something like the following.

> <img src="./prev-logical.png" width="100%">

Now our data is starting to look pretty strong, but we have noticed that our phone number data is inconsistent.  For example, if we just select the first five phone numbers, we'll see something like the following:

`psql -d northwinds -c "select phone from dev.stg_rds_suppliers limit 8"`

> <img src="./phone-numbers.png" width="40%">

And if we look further down our table, we'll see.

> <img src="./ill-formatted.png" width="40%">

Here, we can see a couple of errors -- our phone numbers have different formats and are of different lengths. 

Before we clean up this data, let's explore our data to see what we're working with.  

From postgres, remove any dashes or parentheses and periods, and then let's group our phone numbers by their length, counting how many phone numbers we have of each length.

> To do so, look at this [stackoverflow post](https://stackoverflow.com/questions/38619072/how-to-replace-multiple-special-characters-in-postgres-9-5) where it mentions the `translate` function.  You will likely need to break this query into CTEs.

We should see something like the following:

> <img src="./grouped_nums.png" width="100%">

Looking at the data above, our ten digit numbers we can assume our ten digit numbers are valid numbers, and everything under 10 digits we can assume are invalid. 

So by the time select our final data, we should:
   
1. Only have phone numbers that have 10 digits -- any other phone numbers we should replace with null.
2. Have our phone numbers in the format of `(171) 555-2222`.

If you run `dbt run`, and then perform the bash command again, we should see the data returned like so.

`psql -d northwinds -c "select * from dev.stg_rds_suppliers limit 3"`

```markdown
phone
----------------
 (171) 555-2222
 (100) 555-4822
 (313) 555-5735
```

When the staging model is written.  Make a commit and merge the changes to the main branch.

```
git status

git add -A

git commit -m 'add staging suppliers, coerce phone numbers'

git checkout main

git merge 
```

### Summary

In this lesson, we learned about structuring our staging file, such that we have import ctes, logical ctes and our select statement.

We also practiced working with strings and along the way we saw:
    
1. Translate
2. Case WHEN
3. Substring