NOTE: This notebook gets rendered with all cells executed in the `docs` directory.     

## Working with clauses

SQLClause objects correspond directly to the syntax of the SQL language. A SQL query can be assembled by constructing clauses and chaining them. The final object can then be rendered to a string.

In [None]:
from funsql.common import S
from funsql.clausedefs import *
from funsql.render import dialect_sqlite
from funsql.compiler.serialize import SerializationContext, serialize

The serialization of a query depends on the database dialect selected. This document uses `Sqlite`. 

In [None]:
def test_render(clause, only_query = False) -> None:
    if not only_query:
        print("clause: \n", clause, "\n", sep="")
        print("-" * 80)

    ctx = SerializationContext(dialect_sqlite())
    serialize(clause, ctx)
    print(ctx.render())

Constructing a query and serializing it.

In [None]:
c = FROM(S.person) >> SELECT(S.person_id, S.date_of_birth)
test_render(c)

#### Symbols

We need python strings to represent
* identifiers (say, table/column/function names) 
* and also, literal values in SQL (say, values in the `user_name` column of type `TEXT`)

To make it easy for clause constructors to distinguish between the two, we wrap identifiers in `Symbol` objects, which can be created using the shorthand `S`. So, 
* `SELECT(S("user_name"))` corresponds to: `SELECT user_name`
* `SELECT("user_name")` corresponds to: `SELECT 'user_name'`

The fluent syntax also works, that is

In [None]:
S.user_name == S("user_name")
type(S.user_name)

And to chain multiple symbols, use the utility method `qual` (for "qualifier"). FunSQL translates it to chained `ID` clauses.

In [None]:
qual(S.schema_name, S.table_name, S.col_name)

#### Literals

These represent SQL literals and can be constructed using the `LIT` clause. 

In [None]:
c = LIT("SQL is funny!")
c

The common python types are converted to SQL literals when used with a clause. 

In [None]:
import datetime

c = SELECT(None, True, 100, 200.2, "FunSQL", datetime.date(2022, 2, 1))
test_render(c)

#### Identifiers

A SQL identifier is constructed using the `ID` clause.

In [None]:
c = ID(S.person)
c

In [None]:
c = ID(S.person) >> ID(S.email_addr)
c

In [None]:
c = ID(S.person_id, over=ID(S.person))
test_render(c)

When used in the context of a SQL clause, `Symbol` objects are converted to `ID` clauses.

In [None]:
FROM(S.person) >> SELECT(S.person_id, S.date_of_birth)

#### Variables

SQL placeholder parameters are represented using the `VAR` clause. 

In [None]:
c=VAR(S.year)
c

Serializing a clause yields a `SQLString` object which also contains a list of the variables used. They can be bound to values when executing the query.

In [None]:
c = (
    FROM(S.flights) >> 
    WHERE(OP("OR", OP("=", S.origin, VAR(S.city)), OP("=", S.dest, VAR(S.city)))) >> 
    SELECT(S.flight_id)
)
test_render(c, only_query=True)

#### Operator

A SQL operator can be used using the `OP` clause.

In [None]:
c = OP(S.NOT, OP("=", S.zip, 42000))
test_render(c)

In [None]:
c = OP(S.CURRENT_TIMESTAMP)
test_render(c)

Composite operators can be constructed using the `KW` clause.

In [None]:
c = OP("BETWEEN", S.year_of_birth, 2000, KW(S.AND, 2020))
test_render(c)

#### Case

Case expressions are constructed using the `CASE` clause. The arguments to the constructor is an interleaved sequence of conditions (`WHEN`) and corresponding values (`THEN`). 

When the total number of args is odd, the last argument is used as the default value.

In [None]:
c = CASE(OP(">", S.year_of_birth, 2000), "youngling")
test_render(c)

In [None]:
c = CASE(OP(">", S.year_of_birth, 2000), "youngling", "millenial")
test_render(c)

#### AS

A SQL `AS` expression is constructed using the `AS` clause.


In [None]:
c = ID(S.person) >> AS(S.p)
test_render(c)

In [None]:
c = FROM(alias("person", "p")) >> SELECT(qual("p", "person_id"))
test_render(c)

#### Function

SQL functions are represented using the `FUN` clause, with the first arg being the function name.  

In [None]:
c = FUN("CONCAT", S.city, ", ", S.state)
test_render(c)

Function with keyword arguments are constructed using the `KW` clause.

In [None]:
c = FUN("SUBSTRING", S.zip, KW("FROM", 1), KW("FOR", 3))
test_render(c)

Function without any arguments

In [None]:
c = FUN("NOW")
test_render(c)

#### Aggregates

Aggregates are defined using the `AGG` clause.

In [None]:
c = AGG("COUNT", S("*"))
test_render(c)

With a `DISTINCT` modifier

In [None]:
c = AGG("COUNT", S.birth_year, distinct=True)
test_render(c)

With a `FILTER` modifier

In [None]:
c = AGG("COUNT", S("*"), filter_=OP("=", S.year_of_birth, 1970))
test_render(c)

#### Partition

Window functions can be constructed by chaining a `PARTITION` clause to an `AGG` clause. 


In [None]:
c = PARTITION(S.year_of_birth, order_by = [S.month_of_birth, S.day_of_birth]) >> AGG("ROW_NUMBER")
test_render(c)

The `frame` argument is a `Frame` object to specify the aggregation window explicitly. Unlike regular SQL, you must specify the start/end of the window explicitly. 

In [None]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(
        FrameMode.GROUPS, 
        start=FrameEdge(FrameEdgeSide.PRECEDING, 2), 
        end=FrameEdge(FrameEdgeSide.CURRENT_ROW)
    )
)
test_render(c)

In [None]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(
        FrameMode.ROWS, 
        start=FrameEdge(FrameEdgeSide.PRECEDING, 2), 
        end=FrameEdge(FrameEdgeSide.FOLLOWING, 2), 
        exclude=FrameExclude.CURRENT_ROW
    )
)
test_render(c, only_query=True)

In [None]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(FrameMode.RANGE, start=FrameEdge(FrameEdgeSide.PRECEDING), end=FrameEdge(FrameEdgeSide.CURRENT_ROW))
)
test_render(c, only_query=True)

In [None]:
c = PARTITION(
    order_by = [S.year_of_birth], 
    frame = Frame(
        FrameMode.RANGE, 
        start=FrameEdge(FrameEdgeSide.PRECEDING), 
        end=FrameEdge(FrameEdgeSide.FOLLOWING), 
        exclude=FrameExclude.TIES
    )
)
test_render(c, only_query=True)

#### Where

The SQL `WHERE` expression is constructed using the `WHERE` clause.

In [None]:
c = FROM(S.person) >> WHERE(OP("<", S.year_of_birth, 2000)) >> SELECT(S.person_id)
test_render(c)

#### Limit

A `LIMIT/OFFSET` expression is constructed using the `LIMIT` clause.

In [None]:
c = FROM(S.person) >> LIMIT(100)
test_render(c, only_query=True)

In [None]:
c = FROM(S.person) >> LIMIT(100, offset=20) >> SELECT(S.person_id)
test_render(c, only_query=True)

An offset can be specified without a limit value. 

In [None]:
c = FROM(S.person) >> ORDER(S.year_of_birth) >> SELECT(S.person_id) >> LIMIT(offset=100) 
test_render(c, only_query=True)

#### Join

The SQL join operation can be expressed using the `JOIN` clause.


In [None]:
c = (
    FROM(alias(S.person, S.p)) >> 
    JOIN(
        alias(S.location, S.l), 
        on=OP("=", qual(S.p, S.location_id), qual(S.l, S.location_id)), 
        left = True
    )
)
test_render(c)

Different types of Joins can be expressed using the available args to the `JOIN` clause. 

In [None]:
t1 = alias(S.person, "p")
t2 = alias(S.provider, "pr")
c = t1 >> JOIN(t2, OP("=", qual("p", "provider_id"), qual("pr", "id")), left=True, right=True)

test_render(c)

A cross join can be constructed by setting the `on` argument to True.

In [None]:
t1 = alias(S.person, "p")
t2 = alias(S.provider, "pr")
c = t1 >> JOIN(t2, on=True)

test_render(c)

Writing a lateral join

In [None]:
t1 = FROM(alias("person", "p"))
t2 = (
    FROM(alias("visit_occurence", "vo")) >> 
    WHERE(OP("=", qual("p", "person_id"), qual("vo", "person_id"))) >>
    ORDER(qual("vo", "start_date") >> SORT(ValueOrder.DESC)) >>
    LIMIT(1) >>
    SELECT(qual("vo", "visit_start_date")) >>
    AS(S("vo"))
)
c = t1 >> JOIN(t2, on=True, left=True, lateral=True) >> SELECT(qual("p", "person_id"), qual("vo", "visit_start_date"))

test_render(c, only_query=True)

#### Group

A SQL `Group By` expression is constructed with the `GROUP` clause.

In [None]:
c = FROM(S.person) >> GROUP(S.year_of_birth) >> SELECT(S.year_of_birth, AGG("COUNT", S("*")))
test_render(c)

#### Having

The SQL `Having` expression is constructed using the `HAVING` clause.

In [None]:
c = FROM(S.person) >> GROUP(S.year_of_birth) >> HAVING(OP(">", AGG("COUNT", S("*")), 10)) >> SELECT(S.year_of_birth)
test_render(c)

#### Order

An `ORDER BY` expression is constructed using the `ORDER` clause. The order of values in a column is specified using a `SORT` object. 

In [None]:
c = FROM(S.person) >> ORDER(
    S.year_of_birth >> SORT(ValueOrder.ASC), 
    S.person_id
) >> SELECT(S.person_id)
test_render(c)

`ASC` and `DESC` are shorthands for the `SORT` object. 

In [None]:
c = FROM(S.person) >> ORDER(
    S.year_of_birth >> SORT(ValueOrder.DESC, nulls=NullsOrder.FIRST), 
    S.city >> SORT(ValueOrder.ASC),
    S.person_id
) >> SELECT(S.person_id)
test_render(c, only_query=True)

#### Union

`UNION` expressions are constructed using the `UNION` clause.

In [None]:
t1 = FROM(S.measurement) >> SELECT(S.person_id, alias("measurement_date", "date"))
t2 = FROM(S.observation) >> SELECT(S.person_id, alias("observation_date", "date"))
c = t1 >> UNION(t2)

test_render(c, only_query=True)

Using the `all_` keyword arg to construct a `UNION ALL` expression.

In [None]:
t1 = FROM(S.measurement) >> SELECT(S.person_id, alias("measurement_date", "date"))
t2 = FROM(S.observation) >> SELECT(S.person_id, alias("observation_date", "date"))
c = t1 >> UNION(t2, all_=True)

test_render(c, only_query=True)

Example of a nested UNION clause

In [None]:
import datetime

t1 = FROM(S.measurement) >> SELECT(S.person_id, alias("measurement_date", "date"))
t2 = FROM(S.observation) >> SELECT(S.person_id, alias("observation_date", "date"))
c = t1 >> UNION(t2, all_=True) >> FROM() >> AS(S.union) >> WHERE(OP(">", S.date, datetime.date(2000, 1, 1))) >> SELECT(S.person_id)

test_render(c, only_query=True)

#### Values

`VALUES` expressions are constructed using the `VALUES` clause. The common python data types are cast as SQL literals. 

In [None]:
c = VALUES([("SQL", 1974), ("Julia", 2012), ("FunSQL", 2021)])
test_render(c, only_query=True)

With only a single row of values

In [None]:
c = VALUES([("SQL", "Julia", "FunSQL")])
test_render(c, only_query=True)

Nested `Values` expression in a `FROM` clause

In [None]:
c = (
    VALUES([("SQL", 1974), ("Julia", 2012), ("FunSQL", 2021)]) >> 
    AS(S.values, columns = [S.name, S.year]) >>
    FROM() >>
    SELECT(OP("*"))
)

test_render(c, only_query=True)

#### Window

Window expressions are constructed using the `WINDOW` clause.

In [None]:
t1 = PARTITION(S.gender) >> AS(S.w1)
t2 = S.w1 >> PARTITION(S.year_of_birth, order_by=[S.month_of_birth, S.date_of_birth]) >> AS(S.w2)

c = FROM(S.person) >> WINDOW(t1, t2) >> SELECT(S.w1 >> AGG("ROW_NUMBER"), S.w2 >> AGG("ROW_NUMBER"))
test_render(c)

#### With

`CTE` (Common Table Expression) clauses are constructed using the `WITH` clause.

In [None]:
cte = (
    FROM(S.flights) >> 
    WHERE(OP("=", S.dest, "Mumbai")) >>
    SELECT(S.flight_id, S.airline) >>
    AS(S.flights_to_mumbai)
)

c = FROM(S.flights_from_mumbai) >> SELECT(S("*")) >> WITH(cte)
test_render(c)

The `WITH` clause can also be used to construct a recursive CTE.

In [None]:
cte = SELECT(1) >> UNION(SELECT(OP("+", S.x, 1)) >> LIMIT(100)) >> AS(S.counter, columns=[S.x])

c = FROM(S.counter) >> SELECT(S.x) >> WITH(cte, recursive=True)
test_render(c)