# Introduction to SQL for Excel Users – Part 17: JOIN Filtering

[Original post](https://www.daveondata.com/blog/introduction-to-sql-for-excel-users-part-17-join-filtering-gotchas/)

## INNER JOIN Filtering

_NOTE – There will be no Excel coverage in this post as it isn’t necessary for the concepts and I wanted to keep the post to a reasonable length._

So far I’ve covered the basics of using LEFT OUTER and INNER JOINs.

One of the cool aspects of JOINs is that you can use the ON clause to not only specify JOIN conditions, but also to filter.

In [None]:
SELECT C.CustomerKey
      ,C.FirstName
      ,C.LastName
      ,SUM(FIS.SalesAmount) AS TotalSalesAmount
FROM DimCustomer C
    INNER JOIN FactInternetSales FIS ON (C.CustomerKey = FIS.CustomerKey)
WHERE FIS.ProductKey = 374
GROUP BY C.CustomerKey, C.FirstName, C.LastName


The SQL code ☝ returns only the data for those customers that have placed Internet orders for ProductKey 374 and aggregates the total amount of sales for each customer.

If you execute the query in SQL Server Management Studio (SSMS), you get back 142 rows of data.

You can also write the ☝ SQL like so:

In [None]:
SELECT C.CustomerKey
      ,C.FirstName
      ,C.LastName
      ,SUM(FIS.SalesAmount) AS TotalSalesAmount
FROM DimCustomer C
    INNER JOIN FactInternetSales FIS ON (C.CustomerKey = FIS.CustomerKey AND
                                         FIS.ProductKey = 374)
GROUP BY C.CustomerKey, C.FirstName, C.LastName
ORDER BY C.CustomerKey

When you execute this second query, you get exactly the same results!

This is how the second bit of SQL code works conceptually:

1. Grab all records from DimCustomer…
1. Grab all records from FactInternetSales that have a ProductKey of 374…
1. INNER JOIN #1 to #2 ON CustomerKey matches…
1. GROUP BY customers…
1. SELECT customers…
1. SUM the sales for each customer…
1. ORDER the results BY CustomerKey.
1. OK, so when using INNER JOINs you can filter using either WHERE or ON.

However, things are a bit more complicated with LEFT OUTER JOINs…

## LEFT OUTER JOIN Filtering

The two key concepts to remember when it comes to filtering LEFT OUTER JOINs:

JOINs are executed before WHERE.
As I covered previously, LEFT OUTER JOINs preserve all the rows in the left virtual table.
Gaining understanding of how these two concepts can impact your filtering is easiest achieved by example.

When you execute the following SQL in SSMS you get back 142 rows, just like the INNER JOIN examples I covered previously.

In [None]:
SELECT C.CustomerKey
      ,C.FirstName
      ,C.LastName
      ,SUM(FIS.SalesAmount) AS TotalSalesAmount
FROM DimCustomer C
    LEFT OUTER JOIN FactInternetSales FIS ON (C.CustomerKey = FIS.CustomerKey)
WHERE FIS.ProductKey = 374
GROUP BY C.CustomerKey, C.FirstName, C.LastName
ORDER BY C.CustomerKey

Conceptually, the SQL code is executed as follows:

1. Grab all the records from DimCustomer…
1. Grab all the records from FactInternetSales…
1. LEFT OUTER JOIN #1 on #2, keep all records from #1…
1. In the case where there’s no CustomerKey match between #1 and #2, fill with NULLs…
1. Keep only the records WHERE ProductKey is 374…
1. GROUP BY customers…
1. SELECT customers…
1. SUM the sales for each customer…
1. ORDER the results BY CustomerKey.

The critical step in the above conceptual execution is #5.

ProductKeys other than 374 are discarded, including NULLs!

By way of comparison, consider the following SQL:

In [None]:
SELECT C.CustomerKey
      ,C.FirstName
      ,C.LastName
      ,SUM(FIS.SalesAmount) AS TotalSalesAmount
FROM DimCustomer C
    LEFT OUTER JOIN FactInternetSales FIS ON (C.CustomerKey = FIS.CustomerKey AND
                                              FIS.ProductKey = 374)
GROUP BY C.CustomerKey, C.FirstName, C.LastName
ORDER BY C.CustomerKey

If you run the SQL ☝ you get back 18,484 rows of data! 😲

This result is totally logical when you consider the conceptual execution:

1. Grab all the records from DimCustomer…
1. Grab all the records from FactInternetSales that have a ProductKey of 374…
1. LEFT OUTER JOIN #1 on #2, keep all records from #1…
1. In the case where there’s no CustomerKey match between #1 and #2, fill with NULLs…
1. GROUP BY customers…
1. SELECT customers…
1. SUM the sales for each customer…
1. ORDER the results BY CustomerKey.

The critical steps in the above conceptual execution are #3 and #4.

There are a total of 18,484 records in DimCustomer.

There are a total of 142 records in FactInternetSales with a ProductKey of 374.

The result set ☝ has exactly 142 non-NULL values as the following query demonstrates:

In [None]:
SELECT C.CustomerKey
      ,C.FirstName
      ,C.LastName
      ,SUM(FIS.SalesAmount) AS TotalSalesAmount
FROM DimCustomer C
    INNER JOIN FactInternetSales FIS ON (C.CustomerKey = FIS.CustomerKey AND
                                         FIS.ProductKey = 374)
WHERE FIS.ProductKey IS NOT NULL
GROUP BY C.CustomerKey, C.FirstName, C.LastName
ORDER BY C.CustomerKey

There you have it, the subtleties of JOIN filtering.

Nothing to worry about with INNER JOINs.

However, be mindful of filtering LEFT OUTER JOINs.

## Oh, BTW…

There’s a reason beyond just writing top-notch SQL that you want to learn the subtleties of JOIN filtering.

It’s a commonly asked interview question for jobs that require SQL skills.

I know because I’ve asked it myself – both as a hiring manager and as a individial contributor.

You’ve been warned! 😉

## The Learning Arc

The next post will begin coverage of arguably the most useful part of SQL for the analytics pro…

Yes, I’m referring to CASE WHEN. 💥😁

Stay healthy and happy data sleuthing!