# `Table` Usage

In this notebook we'll show common `Table` operations using the stock market data introduced in the previous notebook.  `NDSparse` operations are *nearly* identical, so we will focus on `Table`.  The functions we cover here are:

1. `select`
1. `filter`
1. `map`
1. `reduce`
1. `groupreduce`
1. `groupby`
1. `summarize`
1. `columns`/`rows`
1. `join`
1. `merge`

Each of the above functions has detailed inline documentation, accessed from a Julia REPL with `?select`, for example.

# Begin by Loading Data

- Let's load the data we saved in the previous notebook:

In [1]:
using JuliaDB

# Print table rather than column summary
IndexedTables.set_show_compact!(false)

# loadtable("stocksample"; filenamecol = :Ticker, indexcols = [:Ticker, :Date]);
t = load("stocks.jdb")

Table with 56023 rows, 8 columns:
[1mTicker         [22m[1mDate        [22mOpen     High     Low      Close    Volume    OpenInt
────────────────────────────────────────────────────────────────────────────────
"aapl.us.txt"  1984-09-07  0.42388  0.42902  0.41874  0.42388  23220030  0
"aapl.us.txt"  1984-09-10  0.42388  0.42516  0.41366  0.42134  18022532  0
"aapl.us.txt"  1984-09-11  0.42516  0.43668  0.42516  0.42902  42498199  0
"aapl.us.txt"  1984-09-12  0.42902  0.43157  0.41618  0.41618  37125801  0
"aapl.us.txt"  1984-09-13  0.43927  0.44052  0.43927  0.43927  57822062  0
"aapl.us.txt"  1984-09-14  0.44052  0.45589  0.44052  0.44566  68847968  0
"aapl.us.txt"  1984-09-17  0.45718  0.46357  0.45718  0.45718  53755262  0
"aapl.us.txt"  1984-09-18  0.45718  0.46103  0.44052  0.44052  27136886  0
"aapl.us.txt"  1984-09-19  0.44052  0.44566  0.43157  0.43157  29641922  0
"aapl.us.txt"  1984-09-20  0.43286  0.43668  0.43286  0.43286  18453585  0
"aapl.us.txt"  1984-09-21  0.43286  

# Return a Subset of Columns: 

- We can use `select` to return a selector (introduced in the previous notebook) applied to a table.

## `select(table, selection)`

- When multiple selectors are involved, rows are "passed around" as a `NamedTuple`.
- A function paired with multiple selections must then accept a `NamedTuple`.

- For example, to calculate the range of stock prices for each day we can:
  1. Select `:High` and `:Low`
  1. Pair it with the anonymous function `row -> row.High - row.Low`

In [2]:
select(t, (:High, :Low) => row -> row.High - row.Low)

56023-element Array{Float64,1}:
  0.01028
  0.0115 
  0.01152
  0.01539
  0.00125
  0.01537
  0.00639
  0.02051
  0.01409
  0.00382
  0.02178
  0.00641
  0.0077 
  ⋮      
  7.03   
  7.93   
  6.53   
 11.77   
 12.349  
 16.06   
 11.12   
  8.49   
  6.47   
  5.59   
  8.16   
  6.51   

# Return a Subset of Rows:

- We can get the rows that satisfy some condition (when a function returns true) with the syntax:

## `filter(function, table; selection)` 

- Here we retrieve the data for AMZN (Amazon) by getting the rows for which `Ticker == "amzn.us.txt"`.

In [3]:
filter(x -> x == "amzn.us.txt", t; select = :Ticker)

Table with 5153 rows, 8 columns:
[1mTicker         [22m[1mDate        [22mOpen     High     Low      Close    Volume    OpenInt
────────────────────────────────────────────────────────────────────────────────
"amzn.us.txt"  1997-05-16  1.97     1.98     1.71     1.73     14700000  0
"amzn.us.txt"  1997-05-19  1.76     1.77     1.62     1.71     6106800   0
"amzn.us.txt"  1997-05-20  1.73     1.75     1.64     1.64     5467200   0
"amzn.us.txt"  1997-05-21  1.64     1.65     1.38     1.43     18853200  0
"amzn.us.txt"  1997-05-22  1.44     1.45     1.31     1.4      11776800  0
"amzn.us.txt"  1997-05-23  1.41     1.52     1.33     1.5      15937200  0
"amzn.us.txt"  1997-05-27  1.51     1.65     1.46     1.58     8697600   0
"amzn.us.txt"  1997-05-28  1.62     1.64     1.53     1.53     4574400   0
"amzn.us.txt"  1997-05-29  1.54     1.54     1.48     1.51     3472800   0
"amzn.us.txt"  1997-05-30  1.5      1.51     1.48     1.5      2594400   0
"amzn.us.txt"  1997-06-02  1.51     1

# Apply a Function to a Selection:

- We can use `map` to apply a function on a selection of a table with the syntax below:

## `map(function, table; select)`

- If `select` is not provided, each full row will be passed to the function.  
- Here we return the first item in each row:

In [4]:
map(first, t)

56023-element Array{String,1}:
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 ⋮            
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"

- Note that `map` and `select` can often be used to produce the same result since selections can be paired with a function.  
- For example, we previously used 

    ```julia
    select(t, (:High, :Low) => row -> row.High - row.Low)
    ```

    to calculate stock price ranges.  Equivalently, we can use:

In [5]:
map(r -> r.High - r.Close, t)

56023-element Array{Float64,1}:
  0.00514
  0.00382
  0.00766
  0.01539
  0.00125
  0.01023
  0.00639
  0.02051
  0.01409
  0.00382
  0.01664
  0.00641
  0.0077 
  ⋮      
  4.06   
  3.72   
  3.7    
  0.42   
 11.529  
  9.43   
  0.16   
  4.72   
  0.45   
  2.58   
  1.47   
  5.37   

# `reduce`

- `reduce` applies a function (`reducer`) pair-wise to a selection through the syntax:

## `reduce(reducer, table; select)`

- For example, if a table is four rows long, `reduce(reducer, t)` is equivalent to

```julia
out = reducer(row1, row2)
out = reducer(out, row3)
out = reducer(out, row4)
```

- In order to be meaningful, the `reducer` must have the associative property:

$$(A + B) + C = A + (B + C)$$



In [6]:
reduce(+, t; select = :Volume)

1830996051150

You can also `reduce` with estimators from **OnlineStats** (more on this later):

In [7]:
using OnlineStats

reduce(Sum(Int), t; select = :Volume)

Sum: n=56023 | value=1830996051150

# `groupreduce`

- Like `reduce`, `groupreduce` applies a reducer pair-wise to table elements.  
- However, the reducer is applied separately across groups (unique values of another selection).  
- The syntax is:

## `groupreduce(reducer, table, by; selection)`

- For example, we can find the total number of trades for each stock by calculating the sum of `:Volume`, grouped by `:Ticker`:

In [8]:
groupreduce(+, t, :Ticker; select = :Volume)

Table with 8 rows, 2 columns:
[1mTicker          [22m+
────────────────────────────
"aapl.us.txt"   891950579821
"amzn.us.txt"   40385735209
"dis.us.txt"    85815802336
"googl.us.txt"  26503128932
"ibm.us.txt"    81302723803
"msft.us.txt"   634313240042
"nflx.us.txt"   62518969374
"tsla.us.txt"   8205871633

# `groupby`

- `groupby` applies a function to each group subset (not pair-wise like `reduce`) through the syntax:

## `groupby(function, table [, by]; select)`

- Here we get the mean and standard deviation of closing price for each stock:

In [9]:
groupby((mean, std), t, :Ticker; select = :Close)

Table with 8 rows, 3 columns:
[1mTicker          [22mmean     std
────────────────────────────────
"aapl.us.txt"   22.281   37.7645
"amzn.us.txt"   181.769  239.548
"dis.us.txt"    20.6212  26.4787
"googl.us.txt"  389.856  235.102
"ibm.us.txt"    48.5542  49.2977
"msft.us.txt"   18.9847  16.424
"nflx.us.txt"   39.5213  47.5733
"tsla.us.txt"   150.355  107.024

# `summarize`

- `summarize` applies a function (or functions) column-wise.  The syntax is:

## `summarize(function, table, by; select)`

In [10]:
summarize((mean, std), t, :Ticker; select = (:Open, :Close))

Table with 8 rows, 5 columns:
[1mTicker          [22mOpen_mean  Close_mean  Open_std  Close_std
──────────────────────────────────────────────────────────
"aapl.us.txt"   22.2844    22.281      37.7634   37.7645
"amzn.us.txt"   181.747    181.769     239.611   239.548
"dis.us.txt"    20.6162    20.6212     26.4788   26.4787
"googl.us.txt"  389.993    389.856     235.105   235.102
"ibm.us.txt"    48.5355    48.5542     49.271    49.2977
"msft.us.txt"   18.9779    18.9847     16.4161   16.424
"nflx.us.txt"   39.5034    39.5213     47.5678   47.5733
"tsla.us.txt"   150.39     150.355     107.072   107.024

# AoS and SoA

- We can retrieve the table as a "struct of arrays" (`NamedTuple` of `Vector`s) or as an "array of structs" (`Vector` of `NamedTuple`s) via `columns` and `rows`, respectively.

## `columns(t; selection)`

## `rows(t; selection)`

In [11]:
# NamedTuple of Vectors
columns(t)[1]

56023-element Array{String,1}:
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 "aapl.us.txt"
 ⋮            
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"
 "tsla.us.txt"

In [12]:
# Vector of NamedTuples
rows(t)[1]

(Ticker = "aapl.us.txt", Date = 1984-09-07, Open = 0.42388, High = 0.42902, Low = 0.41874, Close = 0.42388, Volume = 23220030, OpenInt = 0)

# Joins

## `join(left, right; how, <options>)`

Join tables together based on matching keys.

- `how` can be one of `:inner`, `:left`, `:outer`, or`:anti`
- `<options>`: `rkey`, `lkey` (default to indexed variable), `rselect`, `lselect`

In [13]:
t1 = table(@NT(x=1:5, y = rand(5)); pkey = :x)

Table with 5 rows, 2 columns:
[1mx  [22my
────────────
1  0.598867
2  0.880299
3  0.0651531
4  0.961551
5  0.499147

In [14]:
t2 = table(@NT(x=3:7, z = rand(5)); pkey = :x)

Table with 5 rows, 2 columns:
[1mx  [22mz
───────────
3  0.506358
4  0.215028
5  0.925266
6  0.680365
7  0.25632

In [15]:
# try :inner, :outer, :left
tjoin = join(t1, t2; how = :inner)

Table with 3 rows, 3 columns:
[1mx  [22my          z
──────────────────────
3  0.0651531  0.506358
4  0.961551   0.215028
5  0.499147   0.925266

# Merging

- A `merge` results in a table that is still ordered by the primary key(s).

In [16]:
t3 = table(@NT(x=11:15, y = randn(5)), pkey = :x)

merge(t1, t3)

Table with 10 rows, 2 columns:
[1mx   [22my
─────────────
1   0.598867
2   0.880299
3   0.0651531
4   0.961551
5   0.499147
11  -0.476002
12  0.862928
13  -0.37055
14  0.0595302
15  0.708924