In [ ]:
#;.pykx.disableJupyter()

In [ ]:
# https://code.kx.com/pykx/3.0/examples/jupyter-integration.html#q-first-mode
import pykx as kx
kx.util.jupyter_qfirst_enable()

##### Initialization

In [None]:
/insert in init.q 
\l buildtaq.q
\l ./db/taq

# Queries - qSQL 
##### Learning Objectives

To understand:
* How to construct a qSQL query
* The four different qSQL queries - `select`,`exec`,`update` and `delete`
* Building queries with constraints
* Building queries with aggregations
* Building queries with grouping
* Updating existing data
* Deleting existing data
* Using `fby` 

# Introduction 

The most common method of table querying and manipulation is qSQL, an SQL-like syntax built into the q language.

There are four fundamental actions qSQL allows us to use with a table:
* [`select`](https://code.kx.com/q/ref/select/) - choose data from a table
* [`exec`](https://code.kx.com/q/ref/exec/) - return data from a table, in a non-table format
* [`update`](https://code.kx.com/q/ref/update/) - perform some modification on a table
* [`delete`](https://code.kx.com/q/ref/delete/) - remove data from a table 


## Data

The tables that are used throughout this notebook comprise some [partitioned](https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#14634-partitioned-tables) tables (<code>\`trade</code>,<code>\`quote</code> and <code>\`nbbo</code>),  and some [flat](https://www.tutorialspoint.com/kdbplus/q_tables_on_disk.htm) tables (<code>\`daily</code>,<code>\`depth</code> and <code>\`mas</code>) which are stored locally to this Queries module in a folder called db/taq.

In [None]:
tables[]

In [None]:
tables[]! count each value each tables[]          //A quick shortcut to see each table and the associated table counts 

Let's look at the schema of both tables:

In [None]:
meta trade
meta daily

#  Choosing data from a table - `select` 

The qSQL `select` statement can be used to return data from a table, select particular columns, aggregate and/or filter data where necessary.

## Syntax

The `select` template has the following form:

    select <return columns> by <grouping columns> from <table> where <filter conditions>

The most basic qSQL `select` statement is the below:

In [None]:
select from daily       //returns all the records in the daily table
daily~select from daily //this is the same as calling the table as a variable 

## Virtual column 

 ##### Virtual column `i` 
In addition to existing and computed columns, a virtual column `i` exists which maps to a record index within the table. We refer to this column as virtual as it is not visible in the `meta` of the table but we can use it as we would any other column in our table. 

In [None]:
select i from trade      

## Queries with specified return - the `select` clause


We can use `select` to return a subset of the columns within a table, or to create new columns. 

In [None]:
select date, sym, open, size from daily //selecting a subset of columns 

We can use assignment within our statement to rename the resultant columns too: 

In [None]:
//we can pick and choose which to rename
select dt: date, stock:sym, open, sz: size from daily 

And can create new columns on the fly e.g. a new column called `mid` which is the midpoint of our `high` and `low` prices:

In [None]:
select date, sym, high, low, mid: 0.5*high+low from daily

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> The newly created column can't be referenced later within the same query as the column does not actually exist until the final result table is returned.</i></p>

In [None]:
//example - this will error with 'mid as kdb+/q doesn't know what this is yet
select date, sym, high, low, mid: 0.5*high+low, mid+high from daily

Creating a column doesn't mean that it permanently exists in the table. From the below query, we can see that our new column `mid` doesn't remain in our `daily` table. 

In [None]:
daily 
meta daily

If we did want to persist this change, we can use direct reassignment: 

In [None]:
daily2:select date, sym, high, low, mid: 0.5*high+low from daily
daily2

##### Exercise

Extract the `sym`, `close` and `size` columns from our `daily` table. 

In [None]:
select sym, close, size from daily

In [None]:
//your answer here 

##### Exercise 
Extract the same columns, but this time add a new boolean column called `Asym` which is true when the sym starts with an `"A"` and false otherwise. Assign this output to a new table `aDaily`.

In [None]:
aDaily:select sym, close, size, Asym:sym like "A*" from daily //we can evaluate any q expressions we like here!
aDaily

In [None]:
//your answer here 

### Querying with aggregations 
The columns of a table are lists, and we can perform aggregations and other functions or analytics using them like we can any list. 

In [None]:
select sum size,sum price from trade

##### Exercise 

Return the maximum price and average trade size from the trade table 

In [None]:
select max price, avg size from trade

In [None]:
//your answer here 

## Queries with constraints - the `where` clause

The `where` clause in qSQL allows us to specify conditions and filter our data accordingly. 

Suppose we want to select only trades that are associated with Apple, we can add this as a condition using the `where` clause: 

In [None]:
select from daily where sym=`AAPL

The `where` statement can contain any number of constraints separated by commas:

In [None]:
select from trade where sym =`AAPL, size > 70, date = 2020.01.02  //looking at our bigger trade table now 

In [None]:
\t:10 select from trade where sym =`AAPL, size > 70, date = 2020.01.02 // This will take significantly more time
\t:10 select from trade where date = 2020.01.02, sym =`AAPL, size > 70 // This query is more efficient

Always use `,` instead of `and` in the where clause. 

In [None]:
//performance comparison using and instead of ,
\t:10 select from trade where date = 2020.01.02, sym =`AAPL, size > 70         //the "right" way
\t:10 select from trade where (date = 2020.01.02) and  sym =`AAPL, size > 70   //the "wrong" way 

##### Exercise
Find all trades (using the `trade` table) associated with Dell (<code>\`DELL</code>) where the price is greater than 12.

In [None]:
select from trade where sym=`DELL,price > 12

In [None]:
//your answer here

##### Exercise

Write a select query using our `trade` table to find the volume-weighted average price (vwap) for the Google (<code>\`GOOG</code>) stock

Suggested reading: [wavg](https://code.kx.com/q/ref/avg/#wavg)

In [None]:
select vwap:size wavg price from trade where sym=`GOOG

In [None]:
// Enter your qSQL code here

## Queries with grouping - the `by` clause

The easiest way to obtain data summarized by grouping similar values together is to use the `by` clause.

In [None]:
select size by sym from daily 
select max size by sym from daily   //performing an aggregation on the list

We see that the returned tables are keyed - this is often helpful for quick retrieval.

In [None]:
(select max size by sym from daily)`IBM    //getting the max size for IBM

We can also use our own defined functions on these lists, e.g. to return the last 5 days closing prices: 

In [None]:
last5:{-5 sublist raze x}
select last5DaysClose:last5 close by sym from daily

<img src="../qbies.png" width="50px" style="width: 50px;padding-right:5px;padding-top:10px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> A neat overload of the <code>by</code> clause is if we don't specify any columns to be returned, we can get the last record in the table, broken down by our grouping!</i></p>

In [None]:
select by sym from daily   //very convenient for quick inspections!

##### Exercise 
Write a select statement that returns from our `trade` table the maximum and minimum prices and total number of trades (`numTrades`) broken down by `sym`.

In [None]:
select max price, min price, numTrades:count size  by sym from trade //we can count any column in our table not just i

In [None]:
//your answer here 

##### Exercise 
Write a select statement to recreate our `daily` table from our `trade` table. 

This has the open, high, low, close prices, a price column calculated as size x price, and size as the total traded volume for each sym on every date. Assign this value to `daily2` and verify it matches the `daily` table. 

*(Just this once we'll allow not using a where clause on a partitioned table!)*

In [None]:
//lets look first at what we're trying to reproduce
meta daily
daily

In [None]:
//so we need to recreate this - it's broken down by sym and date so they'll be our by clause 
daily2:select open:first price, high: max price, low: min price, close: last price,  //OHLC prices
            price:sum price*size, size:sum size      //total price as a cost (price*size) and total traded volume
//next our grouping clause - break down by date, then sym
        by date,sym                                  
        from trade 
//does this look the same? 
daily2

In [None]:
daily2: 0!daily2            //removing our key since daily isn't keyed
daily2~daily

In [None]:
//your answer here

### Temporal arithmetic

One of the most common uses of the `by` clause within qSQL is to return aggregations over a specified period of time.


In [None]:
select trds:count i, vwap:size wavg price by sym, 15 xbar time.minute from trade where date = last date 

##### Exercise
* Show the total volume every 1.5 minutes from our trade table on the 2nd of Jan 2020
* Further break this down by sym

(Hint: the [`xbar`](https://code.kx.com/q/ref/xbar/) documentation has a domain and range mapping table at the end to help understand which types work together)

In [None]:
select sum size by `time$0D00:01:30.000 xbar `timespan$time from trade where date = 2020.01.02

In [None]:
select sum size by `time$0D00:01:30.000 xbar `timespan$time, sym  from trade where date = 2020.01.02

In [None]:
//your answer here 

##### Exercise
Use `xbar` to generate a count of the number of trades (`trade where date = last date`) in intervals of trade size (interval size 10). 

(*This is commonly used to generate a histogram of trade size distribution*) 

In [None]:
select count i by 10 xbar size from trade where date = last date 

In [None]:
//your answer here

# Extracting data from tables - `exec`

The qSQL `exec` can also be used to query tables. All `exec` statements are written with the same `by`, `from`, and `where` clauses as select statements. However instead of returning only tables, `exec` statements can return a list, a dictionary, or indeed tables depending on the specific query. They are used primarily to extract data from the table format - or to restructure our data (see Practical Guidance for pivoting using `exec`)

If we only specify one column to be returned from our `exec` statement this is returned as a list: 

In [None]:
exec size from daily 

Suppose we want to return more than one list, if we specify many then we return a dictionary:  

In [None]:
exec size, price from daily    //this is nice because the dictionary values are lists 

If we add a grouping clause we get our values broken down by that grouping:

In [None]:
// returns a dictionary with the syms and prices of each trade
exec price by sym from daily

If we add more columns to be returned at this stage, we actually end up returning a dictionary where the keys are the broken down groupings and the value is a table with each column we selected as a column: 

In [None]:
exec 3 sublist price, 3 sublist size by sym from daily //sublisting for visibility

This is because what we are returning is a series of dictionaries for each of our groupings! 



In [None]:
exec sym from select sym from trade  //pulling the selection into memory, and then using exec 
exec sym from trade                  //can't do this on disk - there is really a sym list for each date

##### Exercise

Using the `daily` table, return the first `open` and last `close` prices for all symbols ending with "L".

Output the result as a dictionary, and also specifically as a keyed table. 

In [None]:
exec first open, last close by sym from daily where sym like "*L"  //not a keyed table, a dictionary
type exec first open, last close by sym from daily where sym like "*L"  //not a keyed table, a dictionary
type 0! exec first open, last close by sym from daily where sym like "*L"  //can't unkey this 


In [None]:
exec first open, last close by sym:sym from daily where sym like "*L" //other column names fine too 
type exec first open, last close by sym:sym from daily where sym like "*L" //keyed table - also a dictionary 
type 0! exec first open, last close by sym:sym from daily where sym like "*L" //can unkey this because it's a table

In [None]:
//your answer here 

# Updating/modifying table data - `update`

The qSQL `update` statement can be used to modify existing rows or add new columns to a table. All `update` statements are written with the same `by`, `from`, and `where` clauses as `select` and `exec` statements.

Suppose we wanted to change our price to be negative for all `AAPL` stocks - we can do that using update. 

In [None]:
5 sublist daily                                          //table before modification (sublisting for visibility)
5 sublist update neg[price] from daily where sym =`AAPL  //table after we make the price negative for AAPL

If we wanted to persist this change, we can pass the table by reference: 

In [None]:
update neg[price] from `daily where sym =`AAPL //we are returned the table reference as output when persisting
5 sublist daily                                //confirming our change is present

We can also use `update` to create new columns and to do so on a grouped basis - like if we wanted to add a new column to our trade table to show the max trade size for each symbol: 

In [None]:
show daily3:update maxTradeSize: max size by sym from daily
5 sublist select from daily3 where sym = `AAPL   //updated for all syms with their specific size max
5 sublist select from daily3 where sym = `DELL   //updated for all syms with their specific size max 

##### Exercise

Update the `daily` table to have a new column `mid` which is the midpoint of the high and low prices. Do this without modifying our original table.

In [None]:
update mid:0.5*high+low from daily

In [None]:
//your answer here 

##### Exercise
Persist a change to our daily table so  all `DOW` values are now half the `price`

In [None]:
update price*0.5 from `daily where sym =`DOW 
select from daily where sym =`DOW

In [None]:
//your answer here 

# Remove data from table - `delete`
The qSQL `delete` can be used to remove whole rows or whole columns from a table. All `delete` statements specify either column names (to delete columns), or use a `where` statement (to delete rows) - they cannot have both as partial column or row deletions are not supported.

In [None]:
5 sublist delete from daily where date=2020.01.02  //Table is passed by value, we are deleting rows
5 sublist daily                                    //change not persisted

In [None]:
delete price from `daily  //we are deleting the whole price column from our daily table, and persisting
5 sublist daily

If we try to combine the two and delete *part* of a row or column we will get an error: 

In [None]:
delete sym from daily where date = 2020.01.02

##### Exercise

Delete all occurrences of `AAPL` from our `daily` table by passing the table in as reference.

In [None]:
delete from `daily where sym =`AAPL
daily

In [None]:
//your answer here 

# Using `fby` to avoid nested queries


The [`fby`](https://code.kx.com/q/ref/fby/) keyword, sometimes referred to as "filter by" allows us to avoid multiple aggregation and joining steps that would usually be required in another language. 

The form of fby is `(aggregation;data) fby group` where: 
* aggregation refers to a function which takes a list and returns a singular atom 
* data refers to the column to which you want to apply this function 
* group refers to a column by which you want to group, or a table of multiple columns on which you want to group 

Returning to our example about finding all trades where the size is less than the average trade size on the exchange they traded, we can express this as follows: 

In [None]:
select from trade where date = last date, size < (avg;size) fby ex

Compare the above statement to the how it would be similarly done via normal qSql commands, we would first get the average size for each exchange, then join this data to our original table and perform a new selection: 

In [None]:
//first, get the average by exchange
show resby:select exAvg:avg size by ex from trade where date = last date

In [None]:
//next, combine that average value with your original table using lj
show interim:(select from trade where date = last date) lj resby

In [None]:
//finally, return the results from our original table that are less than the exchange average
select from interim where size < exAvg

Hopefully this illustrates how much more simple using `fby` is compared to the above statements. 

The `fby` doesn't have to be used only in the `where` clause, we can use this in any part of our statement: 

In [None]:
select sym, size, ex, lessThanEx: size < (avg;size) fby ex from trade where date = last date

In [None]:
update  filterSize:(avg;size) fby ex,
        lessThanEx: size < (avg;size) fby ex from  //as an update to the table instead
        (select from trade where date = last date) //partitioned, so we first select, then update

##### Exercise
Write a statement using `fby` to find the largest volume in our `trade` table (`where date = last date`) for which the price is greater than the average price for that symbol.

In [None]:
select max size from trade where date = last date,  price > (avg;price) fby sym

In [None]:
// Enter your qSQL code here