An example of using SQL WITH Common Table Expressions to produce more legible
SQL.

A major complaint with SQL is that it composes statements by right-ward nesting.
That is: a sequence of operations `A -> B -> C` is represented as `SELECT C FROM SELECT B FROM SELECT A`.
However, the SQL 99 standard introduced the `WITH` statement and common table
expressions ([ref](https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL)).
These statements allow forward composition.

Let's take a look at asking the `data_algebra` ([ref](https://github.com/WinVector/data_algebra)) to emit SQL with and without common table expressions.

First we set up some example data.

In [1]:
import sqlite3

from data_algebra.data_ops import *
import data_algebra.test_util
import data_algebra.SQLite

d = data_algebra.default_data_model.pd.DataFrame({
    'x': [1, 2, 3]
})

d

Unnamed: 0,x
0,1
1,2
2,3


Next we set up our calculations. Please note the order they are performed.

In [2]:
ops = describe_table(d, table_name='d') .\
    extend({'z': 'x + 1'}) .\
    extend({'q': 'z + 2'}) .\
    extend({'h': 'q + 3'})

ops

TableDescription(
 table_name='d',
 column_names=[
   'x']) .\
   extend({
    'z': 'x + 1'}) .\
   extend({
    'q': 'z + 2'}) .\
   extend({
    'h': 'q + 3'})

In [3]:
res_pandas = ops.transform(d)

res_pandas

Unnamed: 0,x,z,q,h
0,1,2,4,7
1,2,3,5,8
2,3,4,6,9


In [4]:
expect = data_algebra.default_data_model.pd.DataFrame({
    'x': [1, 2, 3],
    'z': [2, 3, 4],
    'q': [4, 5, 6],
    'h': [7, 8, 9]
})

assert data_algebra.test_util.equivalent_frames(res_pandas, expect)

In [5]:
db_model = data_algebra.SQLite.SQLiteModel()
with sqlite3.connect(":memory:") as conn:
    db_model.prepare_connection(conn)
    db_handle = db_model.db_handle(conn)
    db_handle.insert_table(d, table_name='d')
    sql_regular = db_handle.to_sql(ops, pretty=True, use_with=False, annotate=True)
    res_regular = db_handle.read_query(sql_regular)
    sql_with = db_handle.to_sql(ops, pretty=True, use_with=True, annotate=True)
    res_with = db_handle.read_query(sql_with)

assert data_algebra.test_util.equivalent_frames(res_regular, expect)
assert data_algebra.test_util.equivalent_frames(res_with, expect)

The standard nested SQL for these operations looks like the following.

In [6]:
print(sql_regular)

SELECT -- extend({ 'h': 'q + 3'})
 "x",
 "z",
 "q",
 "q" + 3 AS "h"
FROM
  (SELECT -- extend({ 'q': 'z + 2'})
 "x",
 "z",
 "z" + 2 AS "q"
   FROM
     (SELECT -- extend({ 'z': 'x + 1'})
 "x",
 "x" + 1 AS "z"
      FROM "d") "extend_0") "extend_1"


The common table expression version looks like this, which involves less nesting and values move forward notation.

In [7]:
print(sql_with)


WITH "extend_0" AS
  (SELECT -- extend({ 'z': 'x + 1'})
 "x",
 "x" + 1 AS "z"
   FROM "d"),
     "extend_1" AS
  (SELECT -- extend({ 'q': 'z + 2'})
 "x",
 "z",
 "z" + 2 AS "q"
   FROM "extend_0")
SELECT -- extend({ 'h': 'q + 3'})
 "x",
 "z",
 "q",
 "q" + 3 AS "h"
FROM "extend_1"


It is interesting to note when `WITH` or common table expressions became widely available.
The Wikipedia has the versions (and hence dates) ([ref](https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL))
when common table expressions are supported in the following
databases.

  * Teradata (starting with version 14) ([2012](https://downloads.teradata.com/database/training/teradata-database-14-overview))
  * Microsoft SQL Server (starting with version 2005)
  * Oracle (with recursion since 11g release 2) ([2009](https://support.oracle.com/knowledge/Oracle%20Cloud/2068368_1.html))
  * PostgreSQL (since 8.4) ([2009](https://www.postgresql.org/about/news/postgresql-84-released-now-easier-to-use-than-ever-1108/))
  * MariaDB (since 10.2) ([2017](https://mariadb.com/kb/en/changes-improvements-in-mariadb-102/))
  * MySQL (since 8.0) ([2016](https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-0.html))
  * SQLite (since 3.8.3) ([2014](https://www.sqlite.org/releaselog/3_8_3.html))
  * DB2 (starting with version 11.5 Mod Pack 2 ([ref](https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0059217.html)) ([2019](https://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/9/897/ENUS219-219/index.html&request_locale=en))

Some of the cost of implementing common table expressions, is they are where databases allow recursive or fixed-point
semantic extensions. From the database point of view these are major semantic changes, not mere notational conveniences.