# Introduction to SQL for Excel Users – Part 10: Basic CTEs

[Original post](https://www.daveondata.com/blog/introduction-to-sql-for-excel-users-part-10-basic-ctes/)

## The Problem

In the last post I introduced the mighty ROW_NUMBER SQL window function.

Here’s one of the queries from that post:

In [None]:
SELECT FCC.FactCallCenterID
      ,FCC.DateKey
      ,FCC.WageType
      ,FCC.Shift
      ,FCC.Calls
      ,ROW_NUMBER() OVER (PARTITION BY FCC.DateKey ORDER BY FCC.Calls DESC) AS RowNum
FROM FactCallCenter FCC
ORDER BY FCC.DateKey ASC

In the last post, the above data was seeking to answer the question, “which Shift was busiest in terms of the highest number of Calls?”

As the highlight ☝ illustrates, the returned data that really answers the question is where RowNum = 1.

There’s a bunch of extraneous data returned.

It’s very tempting to alter the query with a WHERE clause to filter down to just the rows I want:

In [None]:
SELECT FCC.FactCallCenterID
      ,FCC.DateKey
      ,FCC.WageType
      ,FCC.Shift
      ,FCC.Calls
      ,ROW_NUMBER() OVER (PARTITION BY FCC.DateKey ORDER BY FCC.Calls DESC) AS RowNum
FROM FactCallCenter FCC
WHERE RowNum = 1
ORDER BY FCC.DateKey ASC

The problem stems from the logical order of query processing.

Given where this series is at right now, the following lays out the logical order of query processing:

1. Pull data using FROM
1. If present, filter data using WHERE
1. If present, group data using GROUP BY
1. SELECT data, including any engineered features and/or aggregate/window functions
1. If present, sort the data using ORDER BY

Now the error message makes sense.

The creation of RowNum happens after the WHERE!

So what I need is a way to work with RowNum after the whole query is executed. 🤔

As I cover in a previous post, SELECTs produce virtual tables.

Turns out my SQL code spends a lot of time working with virtual tables, so I’m going to review the concept again.

## Virtual Tables in Excel

Per the process of this series, I will use Excel to explore the concept of SQL virtual tables.

The simplest way to think of a virtual table is that it is a copy of the “actual” table.

We can easily simulate this by making a copy of the CallCenter worksheet in Excel:

![excel copy](10\excelcopy.png)

Excel is smart enough to know that the copy is not the “actual” table, so it changes the table name:

![copy table](10\copytablename.png)

Now I can do whatever I want to the CallCenter3 table, knowing that the “actual” table (i.e., CallCenter) will not be changed by anything that I do.

While this Excel example is contrived, you will quickly find that virtual tables are a very powerful and useful idea!

## Virtual Tables in SQL

SQL is chock full of virtual tables.

Here’s the thing. Unlike Excel, SQL virtual tables are unnamed by default.

Most of the time this isn’t a problem, you’ll work with unnamed virtual tables in your SQL code with nary a second thought (e.g., when you use JOINs – preview of coming attractions 😁).

However, there are times when being able to give a name to a virtual table so you can work with it directly is very handy.

Enter what are know as SQL common table expressions. Commonly called CTEs (pun intended! 🤣).

SQL CTEs allow you to define queries and give the virtual tables resulting from the queries a name.

This allows you to work with virtual tables explicitly in your SQL code.

Righteous!

Here’s a conceptual template of how you code up CTEs:

```
WITH <CTE name /> AS
(
     <CTE query here />
)
SELECT <CTE columns />
FROM <CTE name />;
```

Note the ; at the end of the code. Think of this as telling the DB, “Hey! That’s the whole query.”

I’m going to introduce some terminology that will be used a ton in this series:

- Inner query
- Outer query

Using the above conceptual template, I can think of the query inside the WITH as the inner query.

I can also think of the query outside the WITH as the outer query (slick, huh?).

Both the inner and outer queries can be as complicated (or simple) as I would like.

Time to see a CTE in action.

## CTEs Were Made for ROW_NUMBER

As you will see, CTEs are a perfect complement to SQL window functions like the mighty ROW_NUMBER!

The SQL code

In [None]:
WITH DailyShiftsByCalls AS
(
    SELECT FCC.FactCallCenterID
          ,FCC.DateKey
          ,FCC.WageType
          ,FCC.Shift
          ,FCC.Calls
          ,ROW_NUMBER() OVER (PARTITION BY FCC.DateKey ORDER BY FCC.Calls DESC) AS RowNum
    FROM FactCallCenter FCC
)
SELECT DSBC.FactCallCenterID
      ,DSBC.DateKey
      ,DSBC.WageType
      ,DSBC.Shift
      ,DSBC.Calls
FROM DailyShiftsByCalls DSBC
WHERE DSBC.RowNum = 1
ORDER BY DSBC.DateKey

Taking a look at the code ☝ some thing are worthy of noting:

- You can’t use ORDER BY inside CTEs, so I took it out of the inner query
- The outer query uses the CTE name in the FROM
- The outer query uses WHERE to do the filtering
- The outer query is the place for ORDER BY

Excellent!

The combination of CTEs and ROW_NUMBER will be a recurring pattern in your SQL queries.

It is very common for data in a DB to have logical sub-groupings where you want to ask specific questions about the sub-groupings (aka windows). Things like:

- What is the largest value of X for each window?
- What was the oldest value of X for each window?
- What was the smallest value of X for each window?
- What was the newest value of X for each window?

Wildly useful stuff!

## The Learning Arc

In the next post I will revisit SQL window functions, including a RFM-like analysis.

It’s gonna be sweet!

Stay healthy and happy data sleuthing!