In [8]:
#;.pykx.disableJupyter()

PyKX now running in 'python' mode (default). All cells by default will be run as python code. 
Include '%%q' at the beginning of each cell to run as q code. 


In [9]:
# https://code.kx.com/pykx/3.0/examples/jupyter-integration.html#q-first-mode
import pykx as kx
kx.util.jupyter_qfirst_enable()

PyKX now running in 'jupyter_qfirst' mode. All cells by default will be run as q code. 
Include '%%py' at the beginning of each cell to run as python code. 


##### Learning objectives

To understand: 
* The structure of kdb+ tables on disk
* Saving and loading flat tables

# Introduction

In the real world, tables can get very large in size. For example, the daily average number of trades executed on the New York Stock Exchange is roughly 2 billion. Manufacturing and IOT have much larger volumes - an IOT sensor could have as many as 100,000 events per second. Luckily, kdb+/q offers a number of different methods to store tables which will allow for efficient storage and querying of these massive tables, even over many days or years of this data volume!

## How can we save tables in kdb+ ?
Tables in kdb+ can be stored to disk in four different formats: 

| Table format on disk |Representation | Number of rows | Useful functions |
| --------------------|-----------|-----------------|--------|
|Flat file | single binary file |few million | [set](https://code.kx.com/q/ref/get/#set), [save](https://code.kx.com/q/ref/save/)
|Splayed | directory of column files |up to 100 million |  [.Q.en](https://code.kx.com/q/kb/splayed-tables/#enumerating-symbol-columns)
| Partitioned | table partitioned by e.g. date, with a splayed table for each date | more than 100 million; or growing steadily| [.Q.dpft](https://code.kx.com/q/ref/dotq/#qdpft-save-table) |
| Segmented | partitioned tables distributed across disks | tables larger than disks; or you need to parallelize access |[.Q.par](https://code.kx.com/q/ref/dotq/#qpar-locate-partition) |

These tables are listed in order of their performance with increasingly large data sets.

 <img src="../images/introToData.png" width="500" height="500">

## Guidance on choosing which format for saving 

A general rule of thumb around which format to choose depends on three things: 

* **Will the table continue to grow at a fast rate?** - if not, then flat or splayed  depending on the size, if yes then partitioned or segmented depending on data volumes
* **Am I working in a RAM constrained environment?** - if yes, then you may prefer to store your reference tables as splayed rather than flat format. Flat table require all the data be read into memory before use, while splayed table only retrieve table rows after applying the qSQL constraints. 
* **What level of performance do I want?** - if high performance is central to the system and there are additional disk mounts available, a well structured segmented database can avoid some of the I/O competition that a partitioned table on one disk mount would face for the same query. 

# Flat file tables

[Flat file tables](https://code.kx.com/q/database/#object) are when we save a kdb+ table on disk entirely in one file. They are fully loaded into memory, which is why their size (memory footprint) should be small. Small size/configuration/keyed tables are suited for this type of table. 

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:10px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Small tables in this context are roughly any table up to a few million rows. The size of a flat file is limited by the length of a vector in kdb+. In practice, if we know that the table will grow to an order of millions of rows, then we would choose instead to splay the table. </i></p>

## Recap `set` and `get`

When we work with tables in memory if our process dies, then the table and (any modifications we have made to it) is lost. We can [serialize](https://code.kx.com/q/database/object/#set-and-get) our table to persistent storage using [set](https://code.kx.com/q/ref/get/#set). This function allows us to save the table to disk and easily reload it at another time. 

Syntax: ``` `:filepath set tablename```

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Reminder: We can use <code>hsym</code> to create a filepath from a symbol. </i></p>

[Reference to code.kx.com](https://code.kx.com/q/ref/hsym/)

In [10]:
hsym `filepath 

:filepath


Let's look at a simple example:

In [11]:
show sector:([sym:`TSLA`IBM`MS`GM]sector:`Auto`Comp`Bank`Auto) //creating a keyed table

`:flatsector set sector   //saves sector to flatsector in the current directory

delete sector from `.  //deletes sector table from memory

:flatsector
.
sym | sector
----| ------
TSLA| Auto  
IBM | Comp  
MS  | Bank  
GM  | Auto  


In [12]:
//sector  //sector is no longer in memory

When we executed the `set` operator on the kdb+ table, it was serialized and saved down to disk.

In [14]:
//system"ls " //looking at the files in the current directory
//system"cat flatsector"  //Looking at the flatsector file
\ls

"Introduction to data on disk Exercises.ipynb"
"Introduction to data on disk.ipynb"
"flatsector"


This has created a new file called `flatsector` on disk, we can also overwrite a file using set if it exists:  

In [17]:
show newSector:([sym:`IBM`TSLA`MS`GM`TSLA]sector:`Comp`Auto`Bank`Auto`Auto) //creating a keyed table

`:flatsector set newSector //overwriting flatsector file


:flatsector
sym | sector
----| ------
IBM | Comp  
TSLA| Auto  
MS  | Bank  
GM  | Auto  
TSLA| Auto  


<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Can we save other kdb+/q data structures down to disk ?</i></p>

In [19]:
key `:flatsector

:flatsector


In [20]:
\pwd

"/home/jovyan/course-advanced/01 Introduction to data on disk"


Absolutely ! We will see later in the course that vectors can be saved down as flat files. Other q entities can be serialized but it's most often tables.  

In [None]:
`:qlist set til 10         //saving a list
`:qdict set `a`b`c!1 2 3   //saving a dictionary 

Now that we have the flat file saved to disk, we can use the [get](https://code.kx.com/q/ref/get/) keyword to read this back into memory:

``` tablename: get `:path_to_file/filename```

The `get` function deserializes the table when it takes it into memory so it will look like a kdb+ table:   

In [22]:
get `:flatsector  //load in flatsector from disk
value `:flatsector //value and get are the same function

sym | sector
----| ------
IBM | Comp  
TSLA| Auto  
MS  | Bank  
GM  | Auto  
TSLA| Auto  
sym | sector
----| ------
IBM | Comp  
TSLA| Auto  
MS  | Bank  
GM  | Auto  
TSLA| Auto  


##### Exercise

Save down a flat table called flatT which is the table:

```t:([]sym:`AAPL`IBM`MSFT;price:10 20 30)``` 

with the price values multiplied by 10. Do this without creating a new table variable in memory.

In [28]:
`:flatT set update price*10 from t:([]sym:`AAPL`IBM`MSFT;price:10 20 30) 

:flatT


In [29]:
get `:flatT

sym  price
----------
AAPL 100  
IBM  200  
MSFT 300  


In [30]:
`:t set update price:10*price from t

get `:t //can use get to check if the solution is correct
t //The table hasn't changed in memory

:t
sym  price
----------
AAPL 100  
IBM  200  
MSFT 300  
sym  price
----------
AAPL 10   
IBM  20   
MSFT 30   


In [34]:
//your answer here
`:flatT set update price*10 from t:([]sym:`AAPL`IBM`MSFT;price:10 20 30) 
t
get `:flatT

:flatT
sym  price
----------
AAPL 10   
IBM  20   
MSFT 30   
sym  price
----------
AAPL 100  
IBM  200  
MSFT 300  


<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Since flat file tables load entirely into memory, we need to make sure that there is a enough memory allocated to the process to fit these tables in.</i></p>

##### Quiz Time! 
<i>Try the Get and Set Exercises to test what you've learned so far and have a go of saving files yourself!</i>

##  `load` and `save` 
There are also some other in-built functions defined for [saving tables](https://code.kx.com/q/database/object/#save-and-load) on disk to flat files and loading the tables into memory. They both take one argument: a file path. The table name is extracted from the path, and will save down to this file path the global table of the same name.
- [save](https://code.kx.com/q/ref/save/) 
- [load](https://code.kx.com/q/ref/load/#load)

In [35]:
t:([]sym:`TSLA`IBM`MS`GM;size:20 30 40 50;price:1.1 2.2 3.3 4.4) //creating small table
save `t //works the same as set 
load `:t //works the same as tabname: get `:<path>/tabname

:t
t


##### Exercise

Using `load` bring this new tables into memory - verify it exists in the current process. 

In [None]:
load `:flatT

In [None]:
`flatT in key `.      //does the variable exist in the root context
flatT

In [38]:
//your answer here
load `:flatT
key `.

flatT
`newSector`t`flatT`price


<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Reminder: We can also use <code>\l /path/to/file</code> to load binary files and scripts. </i></p>

[Load File or Directory](https://code.kx.com/q/basics/syscmds/#l-load-file-or-directory)

`save` can also be used to save a file in a specific format. This is done by adding the extension to the end of the table name. File extensions supported are .csv, .txt, .xls and .xml. In kdb+ 4.0, .json is also supported.

In [40]:
save `t.csv
save `t.txt

// In kdb+ 4.0
save `t.json

system"cat t.csv" //Looking at the t.csv file. 
\cat t.json

:t.csv
:t.txt
:t.json
"sym,size,price"
"TSLA,20,1.1"
"IBM,30,2.2"
"MS,40,3.3"
"GM,50,4.4"
"{\"sym\":\"TSLA\",\"size\":20,\"price\":1.1}"
"{\"sym\":\"IBM\",\"size\":30,\"price\":2.2}"
"{\"sym\":\"MS\",\"size\":40,\"price\":3.3}"
"{\"sym\":\"GM\",\"size\":50,\"price\":4.4}"


`load` does not have the equivalent functionality to load in these file types. Below are examples of how to load each type of file

In [47]:
// Loading a CSV, you must specify the column types and delimeter
("SJF";enlist csv) 0: `:t.csv

// Loading a text file
//read0 `:t.txt

// Loading a json file (kdb+ 4.0 only)
//load `:t.json

sym  size price
---------------
TSLA 20   1.1  
IBM  30   2.2  
MS   40   3.3  
GM   50   4.4  


## `set/get` versus `save/load`

Now that we have these functions to save our date, when should we use `set/get` versus `save/load`?

* `load/save` are perfect for casual use.
* For more organized writing and reading, we need the keywords `set` and `get`
* Use `set` to save a variable to a file of a different name
* If we want to save and **compress** our tables, use [`set`](https://code.kx.com/q/ref/get/#compression)

What are the main differences between these functions?

* `save` takes one argument, `set` takes two
* `get` returns the table value but does not save it in memory. `load` saves the table in memory and returns the name of the table
* `set` always saves files in binary format. `save` can save binary files as well as in alternative formats such as .csv

## Operating on  flat tables

We know how to save and load these files into memory but is there a difference on how we operator on them? The quick answer is **NO** however there are a few techniques that we can use to optimize our code. 

Let's create a new table `t`:

In [48]:
`:data/t set ([]sym:`AAPL`MSFT`KX;price:10 20 30) //setting a flat table in directory data

:data/t


In [49]:
t:get `:data/t

As we used defined a table as `t`, we can operate on it as an in-memory table:

In [None]:
select from t
select from t where sym=`AAPL

We can also operate on the on-disk table by specifying it's file handle as the table name:

In [50]:
`:data/t1 set ([]sym:`JPM`AMZN`FB;price:100 200 300)
select from `:data/t1 where sym in `JPM`AMZN

:data/t1
sym  price
----------
JPM  100  
AMZN 200  


We can update the flat table using it's file handle. Let's say we want to append a new row to the table:

In [53]:
`:data/t1 upsert (`KX;240) //Appending 1 row
`:data/t1 upsert ([]sym:`AAPL`JPM;price:260 140) //appending multiple rows

:data/t1
:data/t1


##### Exercise 

Using the table that we created above `t1`, drop the first two rows and save it to disk

In [None]:
`:data/t1 set 2_ get `:data/t1

In [58]:
//your answer here
`:data/t1 set 2_get`:data/t1
get `:data/t1

:data/t1
sym  price
----------
AAPL 260  
JPM  140  
KX   240  
AAPL 260  
JPM  140  


##### Quiz Time!
Try the Flat Files Exercises to test your understanding of saving and loading flat files!