# Environment
[varname](https://github.com/pwwang/python-varname): Get the name of the variable from the variable assignment.

`pip install varname`

# Files Overview
- __./pysdql__: `pysdql` package that should be imported.
- __./docs/QueryStandard.md__: SDQL.py standard queries.
- __./docs/QueryGenerated.md__: pysdql generated queries.
- __./FlattenQuery.py__: generate standard queries for `QueryStandard.md`.
- __./pandas2sdql.py__: pandas queries for `QueryGenerated.md`.
- __./pysdql/core/dtypes/sdql_ir.py__: modified sdql_ir, only `__repr__` has been overwritten.

# Usage

__You can directly get query as string type by `pysdql.q1()`.__

In [4]:
import pysdql

print(pysdql.q1())

li = VarExpr('db->li_dataset')
x_li = VarExpr('x_li')
li_part = VarExpr('li_part')
li_having = VarExpr('li_having')
x_li_groupby_agg = VarExpr('x_li_groupby_agg')
out = VarExpr('out')
li_groupby_agg = VarExpr('li_groupby_agg')
li_groupby_agg_concat = VarExpr('li_groupby_agg_concat')

query = LetExpr(li_groupby_agg, SumExpr(x_li, li, IfExpr(CompareExpr(CompareSymbol.LTE, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_shipdate'), ConstantExpr(19980902)), DicConsExpr([(RecConsExpr([('l_returnflag', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_returnflag')), ('l_linestatus', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_linestatus'))]), RecConsExpr([('sum_qty', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_quantity')), ('sum_base_price', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_extendedprice')), ('sum_disc_price', MulExpr(RecAccessExpr(PairAccessExpr(x_li, 0), 'l_extendedprice'), SubExpr(ConstantExpr(1), RecAccessExpr(PairAccessExpr(x_li, 0), 'l_discount')))), ('sum_charge', MulExpr(MulExpr(RecAccessExpr

__If you only need to read SDQL IR that is generated from pandas query, go to `./docs/QueryGenerated.md`.__

__Both `QueryStandard` and `QueryGenerated` are indented to make them readable. If you need to compare the difference to find if there is any impact on efficiency, check them there.__

# Features

I'm not sure whether these are negative so they are just features. If you got something wrong, you might come back to this part.

### 1. Sensitive to `float` and `integer`.

If an integer is given in pandas query, such as `0`. It will always be converted to `ConstantExpr(0)` rather than `ConstantExpr(0.0)` even it can be infered from the context. A wrong data type will always assert an error rather than being promoted.

### 2. Redundant variables.

In some queries, such as `Q16`, a redundant variable `pa_ps_having = VarExpr('pa_ps_having')` was defined even it was never used in the query.

### 3. `VarExpr("out")` is not just assignment.

Instead of `LetExpr(VarExpr("out"), VarExpr("results"), ConstantExpr(True))`,

`LetExpr(VarExpr("out"), SumExpr(...), ConstantExpr(True))` is used in most cases.