# Core Read

This Notebook focuses on the process of selecting data.<br>
This also contains quite a few transformative 

Column transformations are found in "10 - Column Transformations"

The following topics are covered:
- Where
- Parameters
- Operators (OR, AND)
- IN
- Case (switch case)
- Extract
- Order By
- Group By
- Joins
  - Inner Join
  - Left Join
  - Outer Join
  - Cross Join / Cartesian Product
- Union
- Except
- Interset
- Subquery
- Having
- Streaming
- With Hint
- op (advanced SQLAlchemy)


Further Reading:
- [SELECT syntax](https://www.sqlite.org/lang_select.html) by SQLite
- ["Using SELECT Statements"](https://docs.sqlalchemy.org/en/20/tutorial/data_select.html) by SQLAlchemy
- ["SQL Statements and Expressions API"](https://docs.sqlalchemy.org/en/20/core/expression_api.html) by SQLAlchemy

# Setup

## Tables
- Product
- Customer
- Order
- Orderline

In [None]:
import sqlalchemy as sa
from utils import *

base = sa.MetaData()
Products = sa.Table('products', base, 
                        sa.Column('id', sa.INTEGER, primary_key=True, autoincrement=True),
                        sa.Column('name', sa.VARCHAR(255), nullable=False, index=True),
                        sa.Column('price', sa.DOUBLE, nullable=True)
                    )

Customers = sa.Table('customers', base, 
                        sa.Column('id', sa.INTEGER, primary_key=True, autoincrement=True),
                        sa.Column('name', sa.VARCHAR(255)),
                    )

Orders = sa.Table('orders', base, 
                        sa.Column('id', sa.INTEGER, primary_key=True, autoincrement=True),
                        sa.Column('customer_id', sa.INTEGER, sa.ForeignKey(Products.c['id']), nullable=False)
                 )

OrderLines = sa.Table('orderlines', base, 
                        sa.Column('order_id', sa.INTEGER, sa.ForeignKey(Orders.c['id'],  ondelete='CASCADE'), nullable=False),
                        sa.Column('product_id', sa.INTEGER, sa.ForeignKey(Products.c['id']), nullable=False),
                        sa.Column('quantity', sa.DOUBLE, nullable=False),
                      # order_id and product_id should be unique (as a pair).
                        sa.PrimaryKeyConstraint('order_id', 'product_id'),
                     )

In [None]:
engine = sa.create_engine('sqlite:///')
con = engine.connect()
base.create_all(engine)

In [None]:
print(repr(Customers.c.name))
print(repr(Customers.c['name']))
print(repr(Customers.columns.name))
print(repr(Customers.columns['name']))

## Data
Add a little bit of starting data.

In [None]:
with con.begin():
    con.execute(Customers.insert(), [{'name': 'Alice'}, {'name': 'Bob'}])
    con.execute(Products.insert(), [{'name': 'Cookie', 'price': 1}, { 'name': 'Ice Cream', 'price': 2}])

In [None]:
with con.begin() as t:
    result = con.execute(Orders.insert(), {'customer_id': 1})
    order_id = result.inserted_primary_key[0]
    result = con.execute(OrderLines.insert(), [{'order_id': order_id, 'product_id': 1, 'quantity': 1}])

# Basic Select
The basic select query can be made using `Table.select()` or `sa.select(Table_or_Columns)`.<br>
It can be useful to stick with `sa.select` as that is also used for column-based expressions and ORM expressions.

In [None]:
# Select All columns
print('\n--- A ---')
print(Customers.select())

print('\n--- B ---')
print(sa.select(Customers))

# Select 1 column
print('\n--- C ---')
print(sa.select(Customers.c.id))

In [None]:
print(sa.select(Customers.c.id).distinct())

In [None]:
s = sa.select(Customers.c.id)
print(str(s.add_columns(Customers.c.name)))

# Where
This is the classic 'where' clause from SQL.

In [None]:
query = (
    sa.select(Customers)
    .where(
        # .where has an implied 'AND' between its arguments.
        Customers.c.id == 0, 
        Customers.c.name == 'DoesNotExist'
    )
)
print(query)
print('--- SQL start ---')
with logs(), con.begin():
    for row in con.execute(query):
        print('row:', row)
print('--- SQL end ---')

-----
When an extra `.where` is attached to the query, it is added with an `AND`.
This means a 'where' clause *should* only create a subset in this manner.

In [None]:
print(str(sa.select(Customers).where(Customers.c.id > 0).where(Customers.c.id < 10)))

-----
Extending it with an `OR` is not supported, as this can effectively invert the query to `<original> OR true`, which would simply return all results.

# Parameters
It's possible to build a query where the criteria can't be filled in yet.<br>
In those situation, a bound parameter can be used as a placeholder.

The placeholder can be created using a `sqlalchemy.bindparam(name)`.<br>
The later query can be filled in using `query.params(name=value)`.

In [None]:
query = (
    sa.select(Customers)
    .where(
        # .where has an implied 'AND'
        Customers.c.id == 0, 
        Customers.c.name == sa.bindparam('named', required=True)
    )
)
print(query)
query = query.params(named='jack')

with logs(), con.begin():
    for row in con.execute(query):
        print('row:', row)

# Logical operators

Most column expression can also be combined to express a binary result.<br>
These binary expressions can be combines to build a where clause.

Some function and method names contain an awkward `_` to prevent conflicting with Python's keywords.

Operation | Function/Method | Operator
---|---|---
OR | or_(a, b) | a \| b 
AND | and_(a,b) | a & b
NOT | not_(a) | ~a
IS NULL | a.is_(None) | -
IS NOT NULL | a.is_not(None) | -
Contains | a.in_(tuple_or_expr) | -
Any | any_(a, b, c) | -
All | all_(a, b, c) | -

**Common Bug(s):** A lot of expressions use comparison symbols from Python.<br>
However, some may be tempted to use doubles ``&&``, but SQLAlchemy wants a singular variant.<br>
In addition, boolean expressions in SQLAlchemy want to use `column == True`, but many linters will compain and want to see `column is True`.

In [None]:
# OR
or_1 = (Customers.c['id'] == 0) | (Customers.c['name'] == 'DoesNotExist')

id_is_zero = Customers.c['id'] == 0
name_does_not_exist = Customers.c['name'] == 'DoesNotExist'

or_2 = sa.or_(id_is_zero,  name_does_not_exist)

print('or_1)', str(or_1))
print('-----')
print('or_2)', str(or_2))

query = (
    sa.select(Customers)
    .where(or_1)
)
print('\n--- SQL ---\n')
with logs(), con.begin():
    for row in con.execute(query):
        print('row:', row)

# IN
The 'IN' operator checks if a value is part of a subset.<br>
This can be used with values, or a query (SQLAlchemy does not consider this a subquery).

In [None]:
all_ids = 1, 2, 3, 4
query = sa.select(Customers).where(Customers.c.id.in_(all_ids))
print('-- values --')
with logs(), con.begin():
    con.execute(query)
    

In [None]:
all_ids = sa.select(Customers.c.id)
query = sa.select(Customers).where(Customers.c.id.in_(all_ids))
print('-- expression --')
with logs(), con.begin():
    con.execute(query)
    

# Case

SQLAlchemy can create a 'case' systems in two ways.

```python
sqlalchemy.case(mapping: dict, value: column_expression, else_=None)
```

The second style is:
```python
sqlalchemy.case(*cases: tuple[Expression, Value], value: column_expression=None, else_=None
```
Where expression is a comparison, and the value is the resulting value.

In [None]:
case = sa.case(
        (Products.c.name == 'Cookie', 'Cheap'), 
        (Products.c.name == 'Ice Cream', 'Expensive'), 
        else_='Everyday')

with rollback(con), logs():
    con.execute(Products.insert(), { 'name': 'cake'})
    for row in con.execute(sa.select(Products, case.label('remark'))):
        print(row.id, row.name, row.remark)


In [None]:
case = sa.case(
    (Products.c.name == 'Cookie', 'Cheap'), 
    (Products.c.name == 'Ice Cream', 'Expensive'), 
    else_='Everyday')

with rollback(con), logs():
    con.execute(Products.insert(), { 'name': 'cake'})
    for row in con.execute(sa.select(Products, case.label('remark'))):
        print(row.id, row.name, row.remark)

# Extract

https://docs.sqlalchemy.org/en/20/core/sqlelement.html#sqlalchemy.sql.expression.extract

# Order By


In [None]:
query = sa.select(Customers).order_by(Customers.c.id.asc())

print(query)

# nulls_first
# asc(...), desc(...)

In [None]:
query = sa.select(Customers).order_by(Customers.c.name.asc().nulls_first())

print(query)

# nulls_first
# asc(...), desc(...)

In [None]:
print(sa.select(Customers).where(Customers.c.name.contains('xyz')))

# Group By

In [None]:
query = sa.select(
        sa.func.count(Customers.c.id).label('my_count')
    ).group_by(Customers.c.name)

print(query)

# Joins

Many SQL Dialects provide their own shorthands for certain operations.<br>
Remember that when debugging queries.

Additionally, the `RIGHT JOIN` does not exist in SQLAlchemy.<br>
Most SQLAlchemy developers will just tell you to reverse the position of the operands so a `LEFT JOIN` can be used instead.

Developers can build their 'select' statments without defining the join statement beforehand.<br>
This allows for statments to be written a bit more similar to regular SQL.

## Inner Join
The Inner Join is the overlap between two tables.<br>
SQLAlchemy writes this as `(SELECT).join(table, expr)`.

- **Table:** The tableto join with
- **Expr:** The 'on' expression, usually a column comparison.

Developers can build their 'select' statements with defining the join statement beforehand.<br>

```
query = sa.select(Customers.c['name'], Orders.c['order_id'])

```

**Remember:** `JOIN` and `INNER JOIN` are the same thing.


In [None]:
query = sa.select(
    Customers.c['name'], 
    Orders.c['id'].label('order_id')
)
query = query.join(Customers, Customers.c['id'] == Orders.c['id'])
print(str(query))

In [None]:
query = (
    Orders.select()
    .join(Customers, Customers.c['id'] == Orders.c['customer_id'])
)
with logs(), con.begin():
    for row in con.execute(query):
        print(row)

## Left Join

The Left Outer Join effectively extends the data of a table with that of another.<br>
The syntax is similar to a regular join: `join(table, expr, isouter=True)`

**Remember:** `LEFT JOIN` and `LEFT OUTER JOIN` are the same thing.

In [None]:
query = (
    Customers.select()
    .join(Orders, Customers.c.id == Orders.c.customer_id, isouter=True)
)
with logs(), con.begin():
    for row in con.execute(query):
        print(row)

## Outer Join
`FULL OUTER JOIN` and `OUTER JOIN` are the same thing.

`Expression.join(table, expr, full=True, isouter=True)`<br>
`Expression.outerjoin(table)`

In [None]:
query = (
    sa.Select(Customers, Orders)
    .join(Customers, Customers.c['id'] == Orders.c['customer_id'], full=True, isouter=True)
)
with logs(), con.begin():
    for row in con.execute(query):
        print(row)

## Cross Join / Cartesian Product
Allegedly ``join(table, sa.literal(True))`` or ``(Tbl1, Tbl2).all()``

### Union

Remember that `UNION ALL` allows for deduplicate records, whereas `UNION` will not.

In [None]:
query_a = sa.select(Products.columns.id.label('x')).where(Products.columns.id == 1)
query_b = sa.select(Products.columns.id.label('x')).where(Products.columns.id == 1)

with logs(), con.begin():
    for product in con.execute(sa.union_all(query_a, query_b)).all():
        print(product)

# Except / Intersect
The `except` and `intersect` clauses can be used reduce the results of a selection.<br>
The syntax is the same as the UNION syntax, and follows the same rules:
1. The number of columns must be the same.
2. Columns must be of the same type

- [Microsoft SQL](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-ver16)
- [SQLite](https://www.sqlite.org/lang_select.html)
- [SQLAlchemy](https://docs.sqlalchemy.org/en/20/core/selectable.html#sqlalchemy.sql.expression.except_)

In short:

`<A> EXCEPT <B>` will take the result of A and remove entries also found in B.

`<A> INTERSECT <B>` will take the result of A and remove entries not found in B.

In SQLAlchemy, these two are not written exactly the same:

The `except` clause comes with an underscore at the end, while `intersect` does not.
 

 

In [None]:
with logs(), con.begin():
    A = sa.select(Products.c.id)
    B = sa.select(Products.c.id).where(Products.c.id > 3)
    query = A.except_(B)
    for product in con.execute(query):
        print(product)

# Subquery
Not to be confused with Common Table Expressions and IN (...) expressions.<br>
When needing to select columns from another select statement, consider it a subquery.<br>
Subqueries can be given a name for clarity, or will be called `anon_*` be default.

When a subquery has been made, treat it like any regular table.
The columns can be accessed via `.c` or `.columns`.


In [None]:
highest_id = sa.func.max(Customers.c.id).label('maxed')
sub = sa.select(highest_id).subquery(name='x')
query = sa.select(sub.c.maxed)

with logs(), con.begin():
    result = con.execute(query)
    for row in result:
        print(row)

# Having
The 'having' clause is often used to filter after a group by.<br>
"select customers having at least 20 orders."

These are often calculated columns.


In [None]:
order_count = sa.func.count(Orders.c.id).label('order_count')

query = (
    sa.select(Customers, order_count)
    .join(Orders, Orders.c.customer_id == Customers.c.id)
    .group_by(Customers.c.id)
    .having(order_count > 5)
)

with rollback(con):
    customer_id = con.execute(sa.insert(Customers), {'id': 9, 'name': 'Jack'}).inserted_primary_key[0]
    for i in range(8):
        con.execute(sa.Insert(Orders), {'customer_id': customer_id})
    
    with logs():
        for row in con.execute(query).mappings():
            print(row)


In [None]:
# Window

In [None]:
# UNION (ALL)

In [None]:


# WITH (expr) -> Common Table Expression (cte)

# Streaming

In [None]:
# yield_per

# With Hint


In [None]:
from sqlalchemy.dialects import mssql, sqlite

query = sa.select(Products)
query = query.with_hint(Products, text='WITH(NOLOCK)', dialect_name='mssql')
print('Microsoft SQL Server:')
print(str(query.compile(dialect=mssql.dialect())).replace('\n', ''))
print('SQLite:')
print(str(query.compile(dialect=sqlite.dialect())).replace('\n', ''))


# Row Number over Partition By


In [None]:
query = sa.select(
            Products.c.name,
            sa.func.row_number().over(partition_by=Products.c.id, order_by=Products.c.name).label('X')
)

print(query)
with con.begin():
    for row in con.execute(query).mappings():
        print(row)

# Alchemy: Text

The 'text' system 

The `op` method can be used for custom **op**erations.<br>
Using `op` often implies a dialect specific feature isn't available in SQLALchemy's implementation of the dialect.<br>
In turn, SQLAlchemy will not immediately validate it and will *try* to accept it as-is.

For example, a raw 'IN' syntax can be permformed like this:

In [None]:
all_ids = sa.text('(1,2,3,4)')
query = sa.select(Customers).where(Customers.c.id.op('IN')(all_ids))
with logs(), con.begin():
    con.execute(query)

The content of `sa.text` could be replaced with anything, like a pythonic `sa.text("range(1,10)")`.<br>
It is not supported by SQLite, and it will error on server-side execution (instead of client-side validation).

In [None]:
print(str(sa.text('xyz as abc')))