# Adding Ids

### Seeing the problem

There's one more component to add to our integration models, and that is a unique primary key.  Currently, if we look at our data, we can see that we have two different identifiers that link back to our sources.

<img src="./combined_companies.png" width="100%">

This will lead to complicated queries when we need to associate records.  For example, if we look at the contact staging model.  We can see that, we currently have two foreign keys to associate a contact to a company.    

<img src="./company_ids.png" width="90%">

In this lesson, we'll generate a new, single primary key, that we'll use in our final mart tables.  

Then, we'll re-associate our data using these new primary keys.  

> For example, with our contacts table above, we'll replace the columns of `hubspot_company_id` and `rds_company_id`, with the newly generated company primary key.  

Ok, let's get started.

### Getting setup

To generate the primary key, we'll use a package -- `dbt_utils` -- that we do not currently have installed in dbt.  So let's see how we can install the package, and then start using it to generate a primary key.

* Installing a dbt package

We can see how to install packages with the following [documentation](https://docs.getdbt.com/docs/building-a-dbt-project/package-management).  There, we'll see the following example:

> <img src="./adding-package.png" width="60%">

So we can see that we'd like to add a `packages.yml` file at the same level as our `dbt_project.yml` file.  Go ahead, and add that file so our file tree looks like the following.

> <img src="./packages.png" width="60%">

Then, in that file, we can add the following:

```yaml
packages:
  - package: dbt-labs/dbt_utils
    version: 0.8.0
```

> Copy and paste the above into the `packages.yml`. 

Then in the dbt command line, run `dbt deps`.

> <img src="./dbt-deps.png" width="100%">

Now that we've installed the `dbt_utils` package, we can use it to generate our primary keys.

### Generating primary keys

From `dbt_utils`, we'll want to use the `surrogate_key` function, will we can use to generate an id.

We can add a primary key to our contacts integration table, if we update our select statement at the very end of the file to the following:

```sql
select {{ dbt_utils.surrogate_key(['first_name', 'last_name', 'phone']) }} as contact_pk,
hubspot_contact_id, rds_contact_id,
first_name, last_name, phone, hubspot_company_id, rds_company_id from final 
```

And, the `dbt_utils.surrogate_key` will generate a new id, which we see in the first column below.

<img src="./contact-pk.png" width="100%">

Let's take another look at the `surrogate_key` function.

```sql
{{ dbt_utils.surrogate_key['first_name', 'last_name', 'phone'] }}
```

This generates an id using the values from the `first_name`, `last_name` and `phone` columns.    

> Why use the surrogate key?  The main benefit of the surrogate key is that so long as the specified key -- above 'first_name', 'last_name' and 'phone' -- does not change, then the generated key will not change.  This is a good thing, because it means that our id will stay consistent regardless of how many times we re-generate our tables.

> Of course, it's important that we specify one or more column that are unique across our data.  Above above, the specified columns will do the trick.

From there, we can just generate the contacts model.

<img src="./int_contacts.png" width="100%">

### Changing the companies table

Next, let's add a primary key to the companies table, also using our `dbt_utils.surrogate_key` function.

Do so by using the surrogate_key to derive the primary key from the `name` of the company.  Our data should look like the following:

<img src="./primary-key.png" width="100%">

Then run the models to create a new view from this data.

### Summary

In this lesson, we saw how we can generate a new primary key using the `surrogate_key` function.  With the surrogate key function, the same id will be generated each time, so long as the values of the related columns do not change.

This is valuable for keeping even the ids of our data more consistent regardless of how many times we run our models.

In our mart tables, we will reassociate our records -- ie. contacts and their related companies -- with these newly generated primary keys.