NOTE: This notebook gets rendered with all cells executed in the `docs` directory.     

FunSQL internally represents the SQL syntax using clause objects. This notebook shows how they get compiled to SQL strings and is used to test different dialects. Users don't interact with this API directly. 

## Working with clauses

SQLClause objects correspond directly to the syntax of the SQL language. A SQL query can be assembled by constructing clauses and chaining them. The final object can then be rendered to a string.

In [1]:
from funsql.common import S
from funsql.clausedefs import *
from funsql.render import dialect_default
from funsql.compiler.serialize import SerializationContext, serialize

The serialization of a query depends on the database dialect selected. This document uses `Sqlite`. 

In [2]:
def test_render(clause, only_query = False) -> None:
    if not only_query:
        print("clause: \n", clause, "\n", sep="")
        print("-" * 80)

    ctx = SerializationContext(dialect_default())
    serialize(clause, ctx)
    print(ctx.render())

Constructing a query and serializing it.

In [3]:
c = FROM(S.person) >> SELECT(S.person_id, S.date_of_birth)
test_render(c)

clause: 
ID(person) >> FROM() >> SELECT(ID(person_id), ID(date_of_birth))

--------------------------------------------------------------------------------
query: 
SELECT
  "person_id", 
  "date_of_birth"
FROM "person"


#### Symbols

We need python strings to represent
* identifiers (say, table/column/function names) 
* and also, literal values in SQL (say, values in the `user_name` column of type `TEXT`)

To make it easy for clause constructors to distinguish between the two, we wrap identifiers in `Symbol` objects, which can be created using the shorthand `S`. So, 
* `SELECT(S("user_name"))` corresponds to: `SELECT user_name`
* `SELECT("user_name")` corresponds to: `SELECT 'user_name'`

The fluent syntax also works, that is

In [4]:
S.user_name == S("user_name")
type(S.user_name)

funsql.common.Symbol

And to chain multiple symbols, use the utility method `qual` (for "qualifier"). FunSQL translates it to chained `ID` clauses.

In [5]:
qual(S.schema_name, S.table_name, S.col_name)

ID(schema_name) >> ID(table_name) >> ID(col_name)

#### Literals

These represent SQL literals and can be constructed using the `LIT` clause. 

In [6]:
c = LIT("SQL is funny!")
c

LIT("SQL is funny!")

The common python types are converted to SQL literals when used with a clause. 

In [7]:
import datetime

c = SELECT(None, True, 100, 200.2, "FunSQL", datetime.date(2022, 2, 1))
test_render(c)

clause: 
SELECT(LIT(NULL),
       LIT(True),
       LIT(100),
       LIT(200.2),
       LIT("FunSQL"),
       LIT(DATE '2022-02-01'))

--------------------------------------------------------------------------------
query: 
SELECT
  NULL, 
  TRUE, 
  100, 
  200.2, 
  'FunSQL', 
  DATE '2022-02-01'


#### Identifiers

A SQL identifier is constructed using the `ID` clause.

In [8]:
c = ID(S.person)
c

ID(person)

In [9]:
c = ID(S.person) >> ID(S.email_addr)
c

ID(person) >> ID(email_addr)

In [10]:
c = ID(S.person_id, over=ID(S.person))
test_render(c)

clause: 
ID(person) >> ID(person_id)

--------------------------------------------------------------------------------
query: 
"person"."person_id"


When used in the context of a SQL clause, `Symbol` objects are converted to `ID` clauses.

In [11]:
FROM(S.person) >> SELECT(S.person_id, S.date_of_birth)

ID(person) >> FROM() >> SELECT(ID(person_id), ID(date_of_birth))

#### Variables

SQL placeholder parameters are represented using the `VAR` clause. 

In [12]:
c=VAR(S.year)
c

VAR(year)

Serializing a clause yields a `SQLString` object which also contains a list of the variables used. They can be bound to values when executing the query.

In [13]:
c = (
    FROM(S.flights) >> 
    WHERE(OP("OR", OP("=", S.origin, VAR(S.city)), OP("=", S.dest, VAR(S.city)))) >> 
    SELECT(S.flight_id)
)
test_render(c, only_query=True)

query: 
SELECT "flight_id"
FROM "flights"
WHERE (("origin" = $1) OR ("dest" = $1))

vars: [city]


#### Operator

A SQL operator can be used using the `OP` clause.

In [14]:
c = OP(S.NOT, OP("=", S.zip, 42000))
test_render(c)

clause: 
OP("NOT", OP("=", ID(zip), LIT(42000)))

--------------------------------------------------------------------------------
query: 
(NOT ("zip" = 42000))


In [15]:
c = OP(S.CURRENT_TIMESTAMP)
test_render(c)

clause: 
OP("CURRENT_TIMESTAMP")

--------------------------------------------------------------------------------
query: 
CURRENT_TIMESTAMP


Composite operators can be constructed using the `KW` clause.

In [16]:
c = OP("BETWEEN", S.year_of_birth, 2000, KW(S.AND, 2020))
test_render(c)

clause: 
OP("BETWEEN", ID(year_of_birth), LIT(2000), LIT(2020) >> KW(AND))

--------------------------------------------------------------------------------
query: 
("year_of_birth" BETWEEN 2000 AND 2020)


#### Case

Case expressions are constructed using the `CASE` clause. The arguments to the constructor is an interleaved sequence of conditions (`WHEN`) and corresponding values (`THEN`). 

When the total number of args is odd, the last argument is used as the default value.

In [17]:
c = CASE(OP(">", S.year_of_birth, 2000), "youngling")
test_render(c)

clause: 
CASE(OP(">", ID(year_of_birth), LIT(2000)), LIT("youngling"))

--------------------------------------------------------------------------------
query: 
(CASE WHEN ("year_of_birth" > 2000) THEN 'youngling' END)


In [18]:
c = CASE(OP(">", S.year_of_birth, 2000), "youngling", "millenial")
test_render(c)

clause: 
CASE(OP(">", ID(year_of_birth), LIT(2000)), LIT("youngling"), LIT("millenial"))

--------------------------------------------------------------------------------
query: 
(CASE WHEN ("year_of_birth" > 2000) THEN 'youngling' ELSE 'millenial' END)


#### AS

A SQL `AS` expression is constructed using the `AS` clause.


In [19]:
c = ID(S.person) >> AS(S.p)
test_render(c)

clause: 
ID(person) >> AS(p)

--------------------------------------------------------------------------------
query: 
"person" AS "p"


In [20]:
c = FROM(alias("person", "p")) >> SELECT(qual("p", "person_id"))
test_render(c)

clause: 
ID(person) >> AS(p) >> FROM() >> SELECT(ID(p) >> ID(person_id))

--------------------------------------------------------------------------------
query: 
SELECT "p"."person_id"
FROM "person" AS "p"


#### Function

SQL functions are represented using the `FUN` clause, with the first arg being the function name.  

In [21]:
c = FUN("CONCAT", S.city, ", ", S.state)
test_render(c)

clause: 
FUN("CONCAT", ID(city), LIT(", "), ID(state))

--------------------------------------------------------------------------------
query: 
CONCAT("city", ', ', "state")


Function with keyword arguments are constructed using the `KW` clause.

In [22]:
c = FUN("SUBSTRING", S.zip, KW("FROM", 1), KW("FOR", 3))
test_render(c)

clause: 
FUN("SUBSTRING", ID(zip), LIT(1) >> KW(FROM), LIT(3) >> KW(FOR))

--------------------------------------------------------------------------------
query: 
SUBSTRING("zip" FROM 1 FOR 3)


Function without any arguments

In [23]:
c = FUN("NOW")
test_render(c)

clause: 
FUN("NOW")

--------------------------------------------------------------------------------
query: 
NOW()


#### Aggregates

Aggregates are defined using the `AGG` clause.

In [24]:
c = AGG("COUNT", S("*"))
test_render(c)

clause: 
AGG("COUNT", OP("*"))

--------------------------------------------------------------------------------
query: 
COUNT(*)


With a `DISTINCT` modifier

In [25]:
c = AGG("COUNT", S.birth_year, distinct=True)
test_render(c)

clause: 
AGG("COUNT", distinct = True, ID(birth_year))

--------------------------------------------------------------------------------
query: 
COUNT(DISTINCT "birth_year")


With a `FILTER` modifier

In [26]:
c = AGG("COUNT", S("*"), filter_=OP("=", S.year_of_birth, 1970))
test_render(c)

clause: 
AGG("COUNT", OP("*"), filter = OP("=", ID(year_of_birth), LIT(1970)))

--------------------------------------------------------------------------------
query: 
(COUNT(*) FILTER (WHERE ("year_of_birth" = 1970)))


#### Partition

Window functions can be constructed by chaining a `PARTITION` clause to an `AGG` clause. 


In [27]:
c = PARTITION(S.year_of_birth, order_by = [S.month_of_birth, S.day_of_birth]) >> AGG("ROW_NUMBER")
test_render(c)

clause: 
AGG("ROW_NUMBER",
    over = PARTITION(ID(year_of_birth),
                     order_by = [ID(month_of_birth), ID(day_of_birth)]))

--------------------------------------------------------------------------------
query: 
(ROW_NUMBER() OVER (PARTITION BY "year_of_birth" ORDER BY "month_of_birth", "day_of_birth"))


The `frame` argument is a `Frame` object to specify the aggregation window explicitly. Unlike regular SQL, you must specify the start/end of the window explicitly. 

In [28]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(
        FrameMode.GROUPS, 
        start=FrameEdge(FrameEdgeSide.PRECEDING, 2), 
        end=FrameEdge(FrameEdgeSide.CURRENT_ROW)
    )
)
test_render(c)

clause: 
PARTITION(order_by = [ID(year_of_birth)],
          frame = [mode = GROUPS, start = PRECEDING(2), end = CURRENT_ROW])

--------------------------------------------------------------------------------
query: 
ORDER BY "year_of_birth" GROUPS BETWEEN 2 PRECEDING AND CURRENT ROW


In [29]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(
        FrameMode.ROWS, 
        start=FrameEdge(FrameEdgeSide.PRECEDING, 2), 
        end=FrameEdge(FrameEdgeSide.FOLLOWING, 2), 
        exclude=FrameExclude.CURRENT_ROW
    )
)
test_render(c, only_query=True)

query: 
ORDER BY "year_of_birth" ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING EXCLUDE CURRENT ROW


In [30]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(FrameMode.RANGE, start=FrameEdge(FrameEdgeSide.PRECEDING), end=FrameEdge(FrameEdgeSide.CURRENT_ROW))
)
test_render(c, only_query=True)

query: 
ORDER BY "year_of_birth" RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW


In [31]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(
        FrameMode.RANGE, 
        start=FrameEdge(FrameEdgeSide.PRECEDING), 
        end=FrameEdge(FrameEdgeSide.FOLLOWING), 
        exclude=FrameExclude.TIES
    )
)
test_render(c, only_query=True)

query: 
ORDER BY "year_of_birth" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING EXCLUDE TIES


#### Where

The SQL `WHERE` expression is constructed using the `WHERE` clause.

In [32]:
c = FROM(S.person) >> WHERE(OP("<", S.year_of_birth, 2000)) >> SELECT(S.person_id)
test_render(c)

clause: 
ID(person) >> FROM() >> WHERE(OP("<", ID(year_of_birth), LIT(2000))) >>
SELECT(ID(person_id))

--------------------------------------------------------------------------------
query: 
SELECT "person_id"
FROM "person"
WHERE ("year_of_birth" < 2000)


#### Limit

A `LIMIT/OFFSET` expression is constructed using the `LIMIT` clause.

In [33]:
c = FROM(S.person) >> LIMIT(100)
test_render(c, only_query=True)

query: 

FROM "person"
FETCH FIRST 100 ROWS ONLY


In [34]:
c = FROM(S.person) >> LIMIT(100, offset=20) >> SELECT(S.person_id)
test_render(c, only_query=True)

query: 
SELECT "person_id"
FROM "person"
OFFSET 20ROWS
FETCH NEXT 100 ROWS ONLY


An offset can be specified without a limit value. 

In [35]:
c = FROM(S.person) >> ORDER(S.year_of_birth) >> SELECT(S.person_id) >> LIMIT(offset=100) 
test_render(c, only_query=True)

query: 
SELECT "person_id"
FROM "person"
ORDER BY "year_of_birth"
OFFSET 100ROWS


#### Join

The SQL join operation can be expressed using the `JOIN` clause.


In [36]:
c = (
    FROM(alias(S.person, S.p)) >> 
    JOIN(
        alias(S.location, S.l), 
        on=OP("=", qual(S.p, S.location_id), qual(S.l, S.location_id)), 
        left = True
    )
)
test_render(c)

clause: 
ID(person) >> AS(p) >> FROM() >>
JOIN(ID(location) >> AS(l),
     OP("=", ID(p) >> ID(location_id), ID(l) >> ID(location_id)),
     left = True)

--------------------------------------------------------------------------------
query: 

FROM "person" AS "p"
LEFT JOIN "location" AS "l" ON ("p"."location_id" = "l"."location_id")


Different types of Joins can be expressed using the available args to the `JOIN` clause. 

In [37]:
t1 = alias(S.person, "p")
t2 = alias(S.provider, "pr")
c = t1 >> JOIN(t2, OP("=", qual("p", "provider_id"), qual("pr", "id")), left=True, right=True)

test_render(c)

clause: 
ID(person) >> AS(p) >>
JOIN(ID(provider) >> AS(pr),
     OP("=", ID(p) >> ID(provider_id), ID(pr) >> ID(id)),
     left = True,
     right = True)

--------------------------------------------------------------------------------
query: 
"person" AS "p"
FULL JOIN "provider" AS "pr" ON ("p"."provider_id" = "pr"."id")


A cross join can be constructed by setting the `on` argument to True.

In [38]:
t1 = alias(S.person, "p")
t2 = alias(S.provider, "pr")
c = t1 >> JOIN(t2, on=True)

test_render(c)

clause: 
ID(person) >> AS(p) >> JOIN(ID(provider) >> AS(pr), LIT(True))

--------------------------------------------------------------------------------
query: 
"person" AS "p"
CROSS JOIN "provider" AS "pr"


Writing a lateral join

In [39]:
t1 = FROM(alias("person", "p"))
t2 = (
    FROM(alias("visit_occurence", "vo")) >> 
    WHERE(OP("=", qual("p", "person_id"), qual("vo", "person_id"))) >>
    ORDER(qual("vo", "start_date") >> SORT(ValueOrder.DESC)) >>
    LIMIT(1) >>
    SELECT(qual("vo", "visit_start_date")) >>
    AS(S("vo"))
)
c = t1 >> JOIN(t2, on=True, left=True, lateral=True) >> SELECT(qual("p", "person_id"), qual("vo", "visit_start_date"))

test_render(c, only_query=True)

query: 
SELECT
  "p"."person_id", 
  "vo"."visit_start_date"
FROM "person" AS "p"
LEFT JOIN LATERAL (
  SELECT "vo"."visit_start_date"
  FROM "visit_occurence" AS "vo"
  WHERE ("p"."person_id" = "vo"."person_id")
  ORDER BY "vo"."start_date" DESC
  FETCH FIRST 1 ROW ONLY
) AS "vo" ON TRUE


#### Group

A SQL `Group By` expression is constructed with the `GROUP` clause.

In [40]:
c = FROM(S.person) >> GROUP(S.year_of_birth) >> SELECT(S.year_of_birth, AGG("COUNT", S("*")))
test_render(c)

clause: 
ID(person) >> FROM() >> GROUP(ID(year_of_birth)) >>
SELECT(ID(year_of_birth), AGG("COUNT", OP("*")))

--------------------------------------------------------------------------------
query: 
SELECT
  "year_of_birth", 
  COUNT(*)
FROM "person"
GROUP BY "year_of_birth"


#### Having

The SQL `Having` expression is constructed using the `HAVING` clause.

In [41]:
c = FROM(S.person) >> GROUP(S.year_of_birth) >> HAVING(OP(">", AGG("COUNT", S("*")), 10)) >> SELECT(S.year_of_birth)
test_render(c)

clause: 
ID(person) >> FROM() >> GROUP(ID(year_of_birth)) >>
HAVING(OP(">", AGG("COUNT", OP("*")), LIT(10))) >>
SELECT(ID(year_of_birth))

--------------------------------------------------------------------------------
query: 
SELECT "year_of_birth"
FROM "person"
GROUP BY "year_of_birth"
HAVING (COUNT(*) > 10)


#### Order

An `ORDER BY` expression is constructed using the `ORDER` clause. The order of values in a column is specified using a `SORT` object. 

In [42]:
c = FROM(S.person) >> ORDER(
    S.year_of_birth >> SORT(ValueOrder.ASC), 
    S.person_id
) >> SELECT(S.person_id)
test_render(c)

clause: 
ID(person) >> FROM() >> ORDER(ID(year_of_birth) >> SORT(ASC), ID(person_id)) >>
SELECT(ID(person_id))

--------------------------------------------------------------------------------
query: 
SELECT "person_id"
FROM "person"
ORDER BY
  "year_of_birth" ASC, 
  "person_id"


`ASC` and `DESC` are shorthands for the `SORT` object. 

In [43]:
c = FROM(S.person) >> ORDER(
    S.year_of_birth >> SORT(ValueOrder.DESC, nulls=NullsOrder.FIRST), 
    S.city >> SORT(ValueOrder.ASC),
    S.person_id
) >> SELECT(S.person_id)
test_render(c, only_query=True)

query: 
SELECT "person_id"
FROM "person"
ORDER BY
  "year_of_birth" DESC NULLS FIRST, 
  "city" ASC, 
  "person_id"


#### Union

`UNION` expressions are constructed using the `UNION` clause.

In [44]:
t1 = FROM(S.measurement) >> SELECT(S.person_id, alias("measurement_date", "date"))
t2 = FROM(S.observation) >> SELECT(S.person_id, alias("observation_date", "date"))
c = t1 >> UNION(t2)

test_render(c, only_query=True)

query: 
SELECT
  "person_id", 
  "measurement_date" AS "date"
FROM "measurement"
UNION
SELECT
  "person_id", 
  "observation_date" AS "date"
FROM "observation"


Using the `all_` keyword arg to construct a `UNION ALL` expression.

In [45]:
t1 = FROM(S.measurement) >> SELECT(S.person_id, alias("measurement_date", "date"))
t2 = FROM(S.observation) >> SELECT(S.person_id, alias("observation_date", "date"))
c = t1 >> UNION(t2, all_=True)

test_render(c, only_query=True)

query: 
SELECT
  "person_id", 
  "measurement_date" AS "date"
FROM "measurement"
UNION ALL
SELECT
  "person_id", 
  "observation_date" AS "date"
FROM "observation"


Example of a nested UNION clause

In [46]:
import datetime

t1 = FROM(S.measurement) >> SELECT(S.person_id, alias("measurement_date", "date"))
t2 = FROM(S.observation) >> SELECT(S.person_id, alias("observation_date", "date"))
c = t1 >> UNION(t2, all_=True) >> FROM() >> AS(S.union) >> WHERE(OP(">", S.date, datetime.date(2000, 1, 1))) >> SELECT(S.person_id)

test_render(c, only_query=True)

query: 
SELECT "person_id"
FROM (
  SELECT
    "person_id", 
    "measurement_date" AS "date"
  FROM "measurement"
  UNION ALL
  SELECT
    "person_id", 
    "observation_date" AS "date"
  FROM "observation"
) AS "union"
WHERE ("date" > DATE '2000-01-01')


#### Values

`VALUES` expressions are constructed using the `VALUES` clause. The common python data types are cast as SQL literals. 

In [47]:
c = VALUES([("SQL", 1974), ("Julia", 2012), ("FunSQL", 2021)])
test_render(c, only_query=True)

query: 
VALUES
  ('SQL', 1974),
  ('Julia', 2012),
  ('FunSQL', 2021)


With only a single row of values

In [48]:
c = VALUES([("SQL", "Julia", "FunSQL")])
test_render(c, only_query=True)

query: 
VALUES ('SQL', 'Julia', 'FunSQL')


Nested `Values` expression in a `FROM` clause

In [49]:
c = (
    VALUES([("SQL", 1974), ("Julia", 2012), ("FunSQL", 2021)]) >> 
    AS(S.values, columns = [S.name, S.year]) >>
    FROM() >>
    SELECT(OP("*"))
)

test_render(c, only_query=True)

query: 
SELECT *
FROM (
  VALUES
    ('SQL', 1974),
    ('Julia', 2012),
    ('FunSQL', 2021)
) AS "values" ("name", "year") 


#### Window

Window expressions are constructed using the `WINDOW` clause.

In [50]:
t1 = PARTITION(S.gender) >> AS(S.w1)
t2 = S.w1 >> PARTITION(S.year_of_birth, order_by=[S.month_of_birth, S.date_of_birth]) >> AS(S.w2)

c = FROM(S.person) >> WINDOW(t1, t2) >> SELECT(S.w1 >> AGG("ROW_NUMBER"), S.w2 >> AGG("ROW_NUMBER"))
test_render(c)

clause: 
ID(person) >> FROM() >>
WINDOW(PARTITION(ID(gender)) >> AS(w1),
       ID(w1) >>
       PARTITION(ID(year_of_birth),
                 order_by = [ID(month_of_birth), ID(date_of_birth)]) >>
       AS(w2)) >>
SELECT(AGG("ROW_NUMBER", over = ID(w1)), AGG("ROW_NUMBER", over = ID(w2)))

--------------------------------------------------------------------------------
query: 
SELECT
  (ROW_NUMBER() OVER ("w1")), 
  (ROW_NUMBER() OVER ("w2"))
FROM "person"
WINDOW
  "w1" AS (PARTITION BY "gender"), 
  "w2" AS ("w1" PARTITION BY "year_of_birth" ORDER BY "month_of_birth", "date_of_birth")


#### With

`CTE` (Common Table Expression) clauses are constructed using the `WITH` clause.

In [51]:
cte = (
    FROM(S.flights) >> 
    WHERE(OP("=", S.dest, "Mumbai")) >>
    SELECT(S.flight_id, S.airline) >>
    AS(S.flights_to_mumbai)
)

c = FROM(S.flights_from_mumbai) >> SELECT(S("*")) >> WITH(cte)
test_render(c)

clause: 
ID(flights_from_mumbai) >> FROM() >> SELECT(OP("*")) >>
WITH(ID(flights) >> FROM() >> WHERE(OP("=", ID(dest), LIT("Mumbai"))) >>
     SELECT(ID(flight_id), ID(airline)) >>
     AS(flights_to_mumbai))

--------------------------------------------------------------------------------
query: 
WITH "flights_to_mumbai" AS (
  SELECT
    "flight_id", 
    "airline"
  FROM "flights"
  WHERE ("dest" = 'Mumbai')
)
SELECT *
FROM "flights_from_mumbai"


The `WITH` clause can also be used to construct a recursive CTE.

In [52]:
cte = SELECT(1) >> UNION(SELECT(OP("+", S.x, 1)) >> LIMIT(100)) >> AS(S.counter, columns=[S.x])

c = FROM(S.counter) >> SELECT(S.x) >> WITH(cte, recursive=True)
test_render(c)

clause: 
ID(counter) >> FROM() >> SELECT(ID(x)) >>
WITH(recursive = True,
     SELECT(LIT(1)) >> UNION(SELECT(OP("+", ID(x), LIT(1))) >> LIMIT(100)) >>
     AS(counter, columns = [x]))

--------------------------------------------------------------------------------
query: 
WITH RECURSIVE "counter" ("x")  AS (
  SELECT 1
  UNION
  SELECT ("x" + 1)
  FETCH FIRST 100 ROWS ONLY
)
SELECT "x"
FROM "counter"
