# Environment
[varname](https://github.com/pwwang/python-varname): Get the name of the variable from the variable assignment.

`pip install varname`

# Files Overview
- __./pysdql__: `pysdql` package that should be imported.
- __./docs/QueryStandard.md__: SDQL.py standard queries.
- __./docs/QueryGenerated.md__: pysdql generated queries.
- __./FlattenQuery.py__: generate standard queries for `QueryStandard.md`.
- __./pandas2sdql.py__: pandas queries for `QueryGenerated.md`.
- __./pysdql/core/dtypes/sdql_ir.py__: modified sdql_ir, only `__repr__` has been overwritten.

# Usage

__You can directly get query as string type by `pysdql.q1()`.__

In [5]:
import pysdql

print(pysdql.q1())

li = VarExpr('db->li_dataset')
x_li = VarExpr('x_li')
li_part = VarExpr('li_part')
li_having = VarExpr('li_having')
x_li_groupby_agg = VarExpr('x_li_groupby_agg')
out = VarExpr('out')
li_groupby_agg = VarExpr('li_groupby_agg')
li_groupby_agg_concat = VarExpr('li_groupby_agg_concat')

query = LetExpr(li_groupby_agg, SumExpr(x_li, li, IfExpr(CompareExpr(CompareSymbol.LTE, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_shipdate'), ConstantExpr(19980902)), DicConsExpr([(RecConsExpr([('l_returnflag', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_returnflag')), ('l_linestatus', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_linestatus'))]), RecConsExpr([('sum_qty', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_quantity')), ('sum_base_price', RecAccessExpr(PairAccessExpr(x_li, 0), 'l_extendedprice')), ('sum_disc_price', MulExpr(RecAccessExpr(PairAccessExpr(x_li, 0), 'l_extendedprice'), SubExpr(ConstantExpr(1), RecAccessExpr(PairAccessExpr(x_li, 0), 'l_discount')))), ('sum_charge', MulExpr(MulExpr(RecAccessExpr

__If__ you __only need to read SDQL IR__ that is generated from pandas query, __go to `./docs/QueryGenerated.md`__.

__If__ you __need to compare the difference__ to find if there is any impact on efficiency, check them in __`QueryStandard` and `QueryGenerated`__. They are indented to make them readable. 

If you would like to __chnage a particular query__, there are two ways:

1. Go to `./pysdql/core/query/__init__.py` to modify the function of the query, which is related to `pysdql.q1()`.
2. Go to `./pandas2sdql.py` to modify the script. You got have to run the script and find the output on the terminal(stdout).

If you are using the script, you might find debug messages on the terminal.

As shown in the following example, the optimized output will be given under `>> li Optimizer Output <<`. 

Search `Optimizer Output` to find it.

In [8]:
from pysdql import DataFrame
from pandas2sdql import q6

li = DataFrame()
q6(li)

>> li Columns(In) <<
['l_orderkey', 'l_partkey', 'l_suppkey', 'l_linenumber', 'l_quantity', 'l_extendedprice', 'l_discount', 'l_tax', 'l_returnflag', 'l_linestatus', 'l_shipdate', 'l_commitdate', 'l_receiptdate', 'l_shipinstruct', 'l_shipmode', 'l_comment']
>> li Columns(Out) <<
['l_orderkey', 'l_partkey', 'l_suppkey', 'l_linenumber', 'l_quantity', 'l_extendedprice', 'l_discount', 'l_tax', 'l_returnflag', 'l_linestatus', 'l_shipdate', 'l_commitdate', 'l_receiptdate', 'l_shipinstruct', 'l_shipmode', 'l_comment']
>> li Columns(Used) <<
['l_extendedprice', 'l_discount', 'l_shipdate', 'l_quantity']
>> li Context Variables <<
{'li': li, 'x_li': x_li, 'li_part': li_part}
>> li Operation Sequence <<
{'iter': False, 'op_type': <class 'pysdql.core.dtypes.CondExpr.CondExpr'>, 'op': MulExpr(MulExpr(MulExpr(MulExpr(CompareExpr(CompareSymbol.GTE, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_shipdate'), ConstantExpr(19940101)), CompareExpr(CompareSymbol.LT, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_shi

"li = VarExpr('db->li_dataset')\nx_li = VarExpr('x_li')\nli_part = VarExpr('li_part')\nli_having = VarExpr('li_having')\nout = VarExpr('out')\n\nquery = LetExpr(out, SumExpr(x_li, li, IfExpr(MulExpr(MulExpr(MulExpr(MulExpr(CompareExpr(CompareSymbol.GTE, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_shipdate'), ConstantExpr(19940101)), CompareExpr(CompareSymbol.LT, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_shipdate'), ConstantExpr(19950101))), CompareExpr(CompareSymbol.GTE, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_discount'), ConstantExpr(0.05))), CompareExpr(CompareSymbol.LTE, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_discount'), ConstantExpr(0.07))), CompareExpr(CompareSymbol.LT, RecAccessExpr(PairAccessExpr(x_li, 0), 'l_quantity'), ConstantExpr(24))), DicConsExpr([('revenue', MulExpr(RecAccessExpr(PairAccessExpr(x_li, 0), 'l_extendedprice'), RecAccessExpr(PairAccessExpr(x_li, 0), 'l_discount')))]), EmptyDicConsExpr()), False), ConstantExpr(True))"

# Features

I'm not sure whether these are negative so they are just features. If you got something wrong, you might come back to this part.

### 1. Sensitive to `float` and `integer`.

If an integer is given in pandas query, such as `0`. It will always be converted to `ConstantExpr(0)` rather than `ConstantExpr(0.0)` even it can be infered from the context. A wrong data type will always assert an error rather than being promoted.

### 2. Redundant variables.

In some queries, such as `Q16`, a redundant variable `pa_ps_having = VarExpr('pa_ps_having')` was defined even it was never used in the query.

### 3. `VarExpr("out")` is not just assignment.

Instead of `LetExpr(VarExpr("out"), VarExpr("results"), ConstantExpr(True))`,

`LetExpr(VarExpr("out"), SumExpr(...), ConstantExpr(True))` is used in most cases.