# Integration models with Jinja

### Introduction

In the last couple of lessons, we saw how we can use Jinja to loop through select statements, and then combine them together with a union all function.

Now one natural place for performing a pattern like this is with our integration models.  For example, in our companies integration model, we select from each of our sources that have companies -- rds and hubspot, select the related fields and union them together.

In this lesson, we'll work towards refactoring our companies integration model with Jinja.

### Our Starting Point

Currently, our `int_companies.sql` may have some complicated code at the moment, but to make our refactoring job easier, let's create a new file, and use the following as a starting point.

```sql
with merged_companies as (
    select name from from {{ ref('stg_hubspot_companies') }} union all
    select name from {{ ref('stg_rds_companies') }}
),
select name from merged_companies group by name
```

And then create an additional file, and place the following at the top of that file.

```sql
{% set sources = ["stg_hubspot_companies", "stg_rds_companies"] %}
```

Use this `sources` variable and some Jinja to create the merged companies CTE.  The compiled SQL should look something like the following:

```sql
with merged_companies as (
        select name from FIVETRAN_DATABASE.dbt_jigsawlabsstudent.stg_hubspot_companies
        union all
        select name from FIVETRAN_DATABASE.dbt_jigsawlabsstudent.stg_rds_companies
  )
select name from merged_companies group by name
limit 500
```

And if we preview the results, we should see the following:

> <img src="./prev-jin.png" width="60%">

### Adding additional columns

Now above we are selecting the `name` column from each table.  But in our original query, we selected multiple columns from our source tables like so:

```sql
merged_companies as (
    select company_id as hubspot_company_id, null as rds_company_id,
    name from hubspot_companies union all
    select null as hubspot_company_id, company_id as rds_company_id,
    name from rds_companies
)
```

One challenge of the above, is that for each source, we only want to select the company_id when we are selecting from that source, and otherwise we want to select null.

For example, when our source is hubspot, then we want to select company_id as hubspot_company_id but otherwise, we want to select null.  It turns out we can accomplish something like this with a one line if else statement.  In Jinja, this looks like the following:

```sql
select name, {{ 'company_id' if 'hubspot' in source else 'null' }} as hubspot_company_id
```

Add this to the statement you constructed above.  When you preview the query, it should now look like the following:

> <img src="./prev-updated-cols.png" width="60%">

Next use a one line if statement, and update the query so that we add an `rds_company_id` column.  When we preview the results, it should look like the following:

> <img src="./hubspot-satterfield.png" width="80%">

From there, we join in the `stg_rds_companies` model, so that we can add in the `address`, `city`, `postal_code` and `country` columns.

> We can reference -- and even copy code -- from our original query in the `int_companies.sql` for inspiration.

When we are finished, and preview the results, we should see something like the following:

> <img src="./remainder-factor.png" width="100%">

Finally, we can replace our original code in the `int_companies.sql` file, with the code we created above.  Then remove the extra files, make a commit and merge the code into master.

### Summary

In this lesson, we saw how to refactor our integration model with Jinja.  Our integration models are a natural place to take advantage of Jinja because, in merging our tables, we cycle through multiple tables, select the same columns, and then union these select statements together.

### Solution

```sql
{% set sources = ["stg_hubspot_companies", "stg_rds_companies"] %}

  with merged_companies as (
    {% for source in sources %}
        select name, {{ 'company_id' if 'hubspot' in source else 'null' }} as hubspot_company_id,
        {{ 'company_id' if 'rds' in source else 'null' }} as rds_company_id
         from {{ ref(source) }}
        {% if not loop.last %}union all{% endif %}
    {% endfor %}
  ), 
    deduped as (
    select max(hubspot_company_id) as hubspot_company_id, 
    max(rds_company_id) as rds_company_id, name from merged_companies group by name
  )
  select {{ dbt_utils.surrogate_key(['deduped.name']) }} as company_pk, hubspot_company_id, rds_company_id, deduped.name, address,
   postal_code, city, country from deduped
  join {{ ref('stg_rds_companies') }} rds_companies on rds_companies.company_id = deduped.rds_company_id
```