# Introduction to SQL for Excel Users – Part 14: More Dates

[Original post](https://www.daveondata.com/blog/introduction-to-sql-for-excel-users-part-14-more-dates/)

## RFM Analysis

I cover RFM analysis more extensively in Part 11 of the series, I will summarize the work so far for convenience.

RFM analysis is a simple, but wildly useful, technique from the world of direct marketing.

RFM is primarily used as means of quantifying the value of customers along three vectors:

- **R**ecency – How recently has a customer made a purchase?
- **F**requency – How often does a customer make a purchase?
- **M**onetary – How much does a customer spend on purchases?

The analysis consists of ranking each customers **R**, **F**, and **M** with a score ranging from 1 to 10 – where 10 is the best score.

In the previous post I skipped Recency as I had not covered dates as of yet.

The following query was the end result of the previous post that provided **F** and **M** scores via the NTILE window function:

In [None]:
WITH CustomerSalesOrders AS
(
    SELECT FIS.CustomerKey
          ,FIS.SalesOrderNumber
          ,SUM(SalesAmount) AS SalesAmount
    FROM FactInternetSales FIS
    GROUP BY FIS.CustomerKey, FIS.SalesOrderNumber
),
CustomerSalesOrderHistory AS
(
    SELECT CSO.CustomerKey
          ,COUNT(*) AS SalesOrderCount
          ,SUM(CSO.SalesAmount) AS SalesAmount
    FROM CustomerSalesOrders CSO
    GROUP BY CSO.CustomerKey
)
SELECT CSOH.CustomerKey
      ,NTILE(10) OVER (ORDER BY CSOH.SalesOrderCount ASC) AS FrequencyScore
      ,NTILE(10) OVER (ORDER BY CSOH.SalesAmount ASC) AS MonetaryScore
FROM CustomerSalesOrderHistory CSOH
ORDER BY CSOH.CustomerKey;

## Adding Sales Order Dates

To assign a Recency score to customers I need sales order dates.

More specifically, I need the most recent sales order date for each customer.

As discussed in Part 11, I need to account for two aspects of the data:

Every sales order consists of multiple rows of data – one row for each sales order line
There can be multiple sales orders per customer
No worries!

Nothing the MAX aggregate function can’t handle!

First up, I need to change the CustomerSalesOrders CTE to return the most recent sales order (i.e., MAX) date for each sales order:

In [None]:
WITH CustomerSalesOrders AS
(
    SELECT FIS.CustomerKey
          ,FIS.SalesOrderNumber
          ,SUM(SalesAmount) AS SalesAmount
          ,MAX(OrderDate) AS OrderDate
    FROM FactInternetSales FIS
    GROUP BY FIS.CustomerKey, FIS.SalesOrderNumber
),
CustomerSalesOrderHistory AS
(
    SELECT CSO.CustomerKey
          ,COUNT(*) AS SalesOrderCount
          ,SUM(CSO.SalesAmount) AS SalesAmount
    FROM CustomerSalesOrders CSO
    GROUP BY CSO.CustomerKey
)
SELECT CSOH.CustomerKey
      ,NTILE(10) OVER (ORDER BY CSOH.SalesOrderCount ASC) AS FrequencyScore
      ,NTILE(10) OVER (ORDER BY CSOH.SalesAmount ASC) AS MonetaryScore
FROM CustomerSalesOrderHistory CSOH
ORDER BY CSOH.CustomerKey;

By using MAX I’ve made sure that every sales order has the single, most recent order date.

Moving on, I need to modify the CustomerSalesOrderHistory CTE to have the most recent order date for each customer.

Once again, MAX to the rescue:

In [None]:
WITH CustomerSalesOrders AS
(
    SELECT FIS.CustomerKey
          ,FIS.SalesOrderNumber
          ,SUM(SalesAmount) AS SalesAmount
          ,MAX(OrderDate) AS OrderDate
    FROM FactInternetSales FIS
    GROUP BY FIS.CustomerKey, FIS.SalesOrderNumber
),
CustomerSalesOrderHistory AS
(
    SELECT CSO.CustomerKey
          ,COUNT(*) AS SalesOrderCount
          ,SUM(CSO.SalesAmount) AS SalesAmount
          ,MAX(CSO.OrderDate) AS MostRecentOrderDate
    FROM CustomerSalesOrders CSO
    GROUP BY CSO.CustomerKey
)
SELECT CSOH.CustomerKey
      ,NTILE(10) OVER (ORDER BY CSOH.SalesOrderCount ASC) AS FrequencyScore
      ,NTILE(10) OVER (ORDER BY CSOH.SalesAmount ASC) AS MonetaryScore
FROM CustomerSalesOrderHistory CSOH
ORDER BY CSOH.CustomerKey;

With MostRecentOrderDate added to CustomerSalesOrderHistory I’ve got all the raw materials I need to calculate Recency.

## Date Differences T-SQL

It shouldn’t surprise you to know that SQL Server has a number of date and time functions.

As with Excel, one of the most useful T-SQL date functions is the mighty DATEDIFF.

Conceptually, T-SQL’s DATEDIFF works just like Excel’s DATEDIF, but the parameters are in a different order:

```
DATEDIFF(<timescale>, <start date>, <end date>)
```

In [None]:
SELECT DATEDIFF(DAY,'2018-01-01 00:00:00', '2020-01-01 00:00:00') AS DiffInDays
      ,DATEDIFF(MONTH,'2018-01-01 00:00:00', '2020-01-01 00:00:00') AS DiffInMonths
      ,DATEDIFF(YEAR,'2018-01-01 00:00:00', '2020-01-01 00:00:00') AS DiffInYears
      

Also, just like in Excel, I can get the chocolate and peanut butter effect by combining DATEDIFF with CURRENT_TIMESTAMP:

In [None]:
SELECT DATEDIFF(DAY,'2018-01-01 00:00:00', CURRENT_TIMESTAMP) AS DiffInDays
      ,DATEDIFF(MONTH,'2018-01-01 00:00:00', CURRENT_TIMESTAMP) AS DiffInMonths
      ,DATEDIFF(YEAR,'2018-01-01 00:00:00', CURRENT_TIMESTAMP) AS DiffInYears
      

Most excellent.

With the ability to calculate elapsed times, I can now finish up the RFM analysis.

Using DATEDIFF and CURRENT_TIMESTAMP I can modify the CustomerSalesOrderHistory CTE to calculate the elapsed days from the most recent sales order:

In [None]:
WITH CustomerSalesOrders AS
(
    SELECT FIS.CustomerKey
          ,FIS.SalesOrderNumber
          ,SUM(SalesAmount) AS SalesAmount
          ,MAX(OrderDate) AS OrderDate
    FROM FactInternetSales FIS
    GROUP BY FIS.CustomerKey, FIS.SalesOrderNumber
),
CustomerSalesOrderHistory AS
(
    SELECT CSO.CustomerKey
          ,COUNT(*) AS SalesOrderCount
          ,SUM(CSO.SalesAmount) AS SalesAmount
          ,DATEDIFF(DAY, MAX(CSO.OrderDate), CURRENT_TIMESTAMP) AS ElapsedDaysToMostRecentOrder
    FROM CustomerSalesOrders CSO
    GROUP BY CSO.CustomerKey
)
SELECT CSOH.CustomerKey
      ,NTILE(10) OVER (ORDER BY CSOH.SalesOrderCount ASC) AS FrequencyScore
      ,NTILE(10) OVER (ORDER BY CSOH.SalesAmount ASC) AS MonetaryScore
FROM CustomerSalesOrderHistory CSOH
ORDER BY CSOH.CustomerKey;


Next up, I need to modify the outer query to add the Recency score:

In [None]:
WITH CustomerSalesOrders AS
(
    SELECT FIS.CustomerKey
          ,FIS.SalesOrderNumber
          ,SUM(SalesAmount) AS SalesAmount
          ,MAX(OrderDate) AS OrderDate
    FROM FactInternetSales FIS
    GROUP BY FIS.CustomerKey, FIS.SalesOrderNumber
),
CustomerSalesOrderHistory AS
(
    SELECT CSO.CustomerKey
          ,COUNT(*) AS SalesOrderCount
          ,SUM(CSO.SalesAmount) AS SalesAmount
          ,DATEDIFF(DAY, MAX(CSO.OrderDate), CURRENT_TIMESTAMP) AS ElapsedDaysToMostRecentOrder
    FROM CustomerSalesOrders CSO
    GROUP BY CSO.CustomerKey
)
SELECT CSOH.CustomerKey
      ,NTILE(10) OVER (ORDER BY CSOH.ElapsedDaysToMostRecentOrder DESC) AS RecencyScore
      ,NTILE(10) OVER (ORDER BY CSOH.SalesOrderCount ASC) AS FrequencyScore
      ,NTILE(10) OVER (ORDER BY CSOH.SalesAmount ASC) AS MonetaryScore
FROM CustomerSalesOrderHistory CSOH
ORDER BY CSOH.CustomerKey;

Notice that the RecencyScore is calculated with NTILE using CSOH.ElapsedDaysToMostRecentOrder in descending order.

This is because I want the smallest value to recieve a score of 10 (i.e., less elapsed days are better).

Alrighty, then!

Lastly, if I wanted to see just my 10-10-10 customers:

In [None]:
WITH CustomerSalesOrders AS
(
    SELECT FIS.CustomerKey
          ,FIS.SalesOrderNumber
          ,SUM(SalesAmount) AS SalesAmount
          ,MAX(OrderDate) AS OrderDate
    FROM FactInternetSales FIS
    GROUP BY FIS.CustomerKey, FIS.SalesOrderNumber
),
CustomerSalesOrderHistory AS
(
    SELECT CSO.CustomerKey
          ,COUNT(*) AS SalesOrderCount
          ,SUM(CSO.SalesAmount) AS SalesAmount
          ,DATEDIFF(DAY, MAX(CSO.OrderDate), CURRENT_TIMESTAMP) AS ElapsedDaysToMostRecentOrder
    FROM CustomerSalesOrders CSO
    GROUP BY CSO.CustomerKey
),
RFMAnalysis AS
(
    SELECT CSOH.CustomerKey
          ,NTILE(10) OVER (ORDER BY CSOH.ElapsedDaysToMostRecentOrder DESC) AS RecencyScore
          ,NTILE(10) OVER (ORDER BY CSOH.SalesOrderCount ASC) AS FrequencyScore
          ,NTILE(10) OVER (ORDER BY CSOH.SalesAmount ASC) AS MonetaryScore
    FROM CustomerSalesOrderHistory CSOH
)
SELECT RFM.CustomerKey
      ,RFM.RecencyScore
      ,RFM.FrequencyScore
      ,RFM.MonetaryScore
FROM RFMAnalysis RFM
WHERE RFM.RecencyScore = 10 AND
      RFM.FrequencyScore = 10 AND
      RFM.MonetaryScore = 10
ORDER BY RFM.CustomerKey ASC;

There you have it.

RFM analysis is a wildly simple and useful technique.

I’ve personally used the ideas of RFM, for example, to rank US zip codes in terms of desirability for marketing efforts.

Now you can use RFM with your own business data.

## The Learning Arc

This won’t be the last time I cover working with time, but the series will be moving on.

Next up is coverage of working with more than 1 table of data at a time.

Yes, it is time to cover JOINs.

Stay healthy and happy data sleuthing