Jamie Brandon has a super thoughtful [writeup](https://www.scattered-thoughts.net/writing/against-sql/) titled `Against SQL` about how working with SQL is difficult and ways to improve upon it. While the latter task is comparable to implementing a new programming language, writing SQL queries can be made easier using regular programming abstractions. 

We address the SQL query examples highlighted in the post using FunSQL, which is a Julia/Python library  to compose SQL queries. While not dissimilar to pipelined DSLs or ORMs in its objective, it stays close to SQL semantics and aims to feel just like writing SQL directly. 

In [1]:
from funsql import *

### Verbose to express - [link](https://www.scattered-thoughts.net/writing/against-sql/#verbose-to-express)

The example given shows how SQL is verbose since we can't abstract over common patterns. While the SQL spec allows for function definitions, the article points out the limits wrt the types of the input args and hence the reduced flexibility. 

```sql
select foo.id, quux.value 
from foo, bar, quux 
where foo.bar_id = bar.id and bar.quux_id = quux.id
```

Regular languages like python don't share the restrictions of SQL functions, so creating higher level abstractions is convenient. Here, we create a function to join multiple tables through foreign key relationships. 

In [2]:
foo = SQLTable(S.foo, ["id", "bar_id"])
bar = SQLTable(S.bar, ["id", "quux_id"])
quux = SQLTable(S.quux, ["id", "value"])

Since FunSQL constructs are regular functions and objects in the host language, we can use them to abstract over composite SQL clauses. 

In [3]:
def fk_join(*args, id_column="id"):
    # args is an interleaved list of tables and foreign key names
    table = From(args[0])
    fk_name = None
    for i, arg in enumerate(args[1:]):
        if i % 2 == 0:
            fk_name = S(arg)
        else:
            joinee = From(arg)
            table = table >> Join(
                joinee, on=Fun("=", Get(fk_name), Get(id_column, over=joinee))
            )
    return table


q = fk_join(foo, "bar_id", bar, "quux_id", quux)
render(q)

query: 
SELECT
  "foo_1"."bar_id", 
  "bar_1"."quux_id", 
  "quux_1"."value"
FROM "foo" AS "foo_1"
INNER JOIN "bar" AS "bar_1" ON ("foo_1"."bar_id" = "bar_1"."id")
INNER JOIN "quux" AS "quux_1" ON ("bar_1"."quux_id" = "quux_1"."id")

### Fragile structure - [link](https://www.scattered-thoughts.net/writing/against-sql/#fragile-structure)

The example shows how correlated subqueries in a `SELECT` clause can only return a single column, and must be swapped for lateral joins if we need flexibility in the output type. 

```sql
-- inline for a single column
select
    manager.name,
    (select employee.name
    from employee
    where employee.manager = manager.name
    order by employee.salary desc
    limit 1)
from manager;

-- lateral join for multiple columns
select manager.name, employee.name, employee.salary
from manager
join lateral (
   select employee.name, employee.salary
   from employee
   where employee.manager = manager.name
   order by employee.salary desc
   limit 1
) as employee
on true;
```

I couldn't find the reason for why inline subqueries are allowed at places where a _scalar_ expression is required  - args for a `SELECT` clause, `WHERE` expressions, and more. Though this feels like an inconsistency coming from SQL's desire to be less verbose? The nested query is really more like a _table_ than a _column_, so lateral joins are the "correct" choice. However, inline subqueries are slightly easier to read/write (and also test?). 

To _hide_ this detail from the query writer, we could compile to an inline query when a single column is selected, and use a lateral join otherwise. Alternatively, we could just output a lateral join everytime. By creating an abstraction for the `correlated join`, we can still keep the query syntax concise. 

In [4]:
manager = SQLTable("manager", ["id", "name"])
employee = SQLTable("employee", ["id", "name", "salary", "manager"])

# returns highest paid employee for a given manager
def most_paid_employee(m_name):
    return (
        From(employee)
        >> Where(Fun("=", Get.manager, Var.MANAGER_NAME))
        >> Order(Get.salary >> Desc())
        >> Limit(1)
        >> Bind(aka(m_name, S.MANAGER_NAME))
    )


q = most_paid_employee("ABC") >> Select(Get.name, Get.salary)
render(q)

query: 
SELECT
  "employee_1"."name", 
  "employee_1"."salary"
FROM "employee" AS "employee_1"
WHERE ("employee_1"."manager" = 'ABC')
ORDER BY "employee_1"."salary" DESC
FETCH FIRST 1 ROW ONLY

Now, we can use this subquery to compute top salaried employee for all the managers. 

In [5]:
q = (
    From(manager)
    >> Join(most_paid_employee(Get.name) >> As("employee"), on=True, left=True)
    >> Select(
        Get.name,
        aka(Get.employee.name, "emp_name"),
        aka(Get.employee.salary, "emp_salary"),
    )
)
render(q)

query: 
SELECT
  "manager_1"."name", 
  "employee_2"."name" AS "emp_name", 
  "employee_2"."salary" AS "emp_salary"
FROM "manager" AS "manager_1"
LEFT JOIN LATERAL (
  SELECT
    "employee_1"."name", 
    "employee_1"."salary"
  FROM "employee" AS "employee_1"
  WHERE ("employee_1"."manager" = "manager_1"."name")
  ORDER BY "employee_1"."salary" DESC
  FETCH FIRST 1 ROW ONLY
) AS "employee_2" ON TRUE

Since the columns selected are specified in the end, we don't have to go back and edit the correlated query whether we pick single, multiple or no columns from it!

### Incompressible code - [link](https://www.scattered-thoughts.net/writing/against-sql/#incompressible)

The post provides multiple examples where SQL makes you tear your hair out. I concur. 

#### Variables

Temporary scalar variables can't be created unless they are included in the output. So, this arithmetic op can't be abstracted over without creating a subquery. 

```sql
-- repeated structure
select a+((z*2)-1), b+((z*2)-1) from foo;

-- compressed?
select a2, b2 from (select a+tmp as a2, b+tmp as b2, (z*2)-1 as tmp from foo);
```

Since FunSQL nodes are regular python variables, we can just reuse them and hope they will be compiled away. 

In [6]:
foo = SQLTable("foo", ["a", "b", "z"])


def add_z(col):
    z_sq = Fun("-", Fun("*", Get.z, 2), 1)
    return Fun("+", z_sq, col)


q = From(foo) >> Select(add_z(Get.a) >> As(S.a), add_z(Get.b) >> As(S.b))
render(q)

query: 
SELECT
  ((("foo_1"."z" * 2) - 1) + "foo_1"."a") AS "a", 
  ((("foo_1"."z" * 2) - 1) + "foo_1"."b") AS "b"
FROM "foo" AS "foo_1"

SQL doesn't allow naming args to a `GROUP BY` clause. 

```sql
-- can't name this value
> select x2 from foo group by x+1 as x2;
ERROR:  syntax error at or near "as"
LINE 1: select x2 from foo group by x+1 as x2;

-- sprinkle some more select on it
> select x2 from (select x+1 as x2 from foo) group by x2;
 ?column?
```

FunSQL adds the variables created by the Group node to the namespace for that subquery, and moves the alias to the corresponding `SELECT`. 

In [7]:
foo = SQLTable("foo", ["x", "y"])
q = (
    From(foo)
    >> Group(aka(Fun("+", Get.x, 1), S.x2))
    >> Select(Get.x2, Agg.count(Get.y))
)

render(q)

query: 
SELECT
  ("foo_1"."x" + 1) AS "x2", 
  COUNT("foo_1"."y") AS "count"
FROM "foo" AS "foo_1"
GROUP BY ("foo_1"."x" + 1)

#### CTEs

SQL didn't have CTEs until SQL:99. 

```sql
-- repeated structure
select * 
from 
  (select x, x+1 as x2 from foo) as foo1 
left join 
  (select x, x+1 as x2 from foo) as foo2 
on 
  foo1.x2 = foo2.x;
  
-- compressed?
with foo_plus as 
  (select x, x+1 as x2 from foo)
select * 
from 
  foo_plus as foo1 
left join 
  foo_plus as foo2 
on 
  foo1.x2 = foo2.x;
```

With FunSQL, inline subqueries can be written similarly to CTEs without duplication. We just reuse the variable representing the subquery. 

In [8]:
foo = SQLTable("foo", ["x", "y"])
foo_plus = From(foo) >> Select(Get.x, aka(Fun("+", Get.x, 1), S.x2))

The inline version gets rendered as,

In [9]:
q = foo_plus >> Join(
    aka(foo_plus, "foo_2"), left=True, on=Fun("=", Get.x2, Get.foo_2.x)
)
render(q)

query: 
SELECT
  "foo_2"."x", 
  "foo_2"."x2"
FROM (
  SELECT
    "foo_1"."x", 
    ("foo_1"."x" + 1) AS "x2"
  FROM "foo" AS "foo_1"
) AS "foo_2"
LEFT JOIN (
  SELECT
    "foo_3"."x", 
    ("foo_3"."x" + 1) AS "x2"
  FROM "foo" AS "foo_3"
) AS "foo_2_1" ON ("foo_2"."x2" = "foo_2_1"."x")

While with the base table defined as a CTE,

In [10]:
q = (
    From(S.foo_plus)
    >> Join(
        aka(From(S.foo_plus), "foo_plus_2"),
        left=True,
        on=Fun("=", Get.x2, Get.foo_plus_2.x),
    )
    >> With(foo_plus >> As(S.foo_plus))
)
render(q)

query: 
WITH "foo_plus_1" ("x", "x2")  AS (
  SELECT
    "foo_1"."x", 
    ("foo_1"."x" + 1) AS "x2"
  FROM "foo" AS "foo_1"
)
SELECT
  "foo_plus_2"."x", 
  "foo_plus_2"."x2"
FROM "foo_plus_1" AS "foo_plus_2"
LEFT JOIN "foo_plus_1" AS "foo_plus_3" ON ("foo_plus_2"."x2" = "foo_plus_3"."x")

### Conclusion

While SQL definitely needs a redo for the big list of reasons specified in the `Against SQL` post, FunSQL lets us get around some of the lexical issues with SQL. It could be useful to query systems speaking SQL either directly, or implementing a more concise DSL on top of it. 