In [1]:
#;.pykx.disableJupyter()

In [2]:
# https://code.kx.com/pykx/3.0/examples/jupyter-integration.html#q-first-mode
import pykx as kx
kx.util.jupyter_qfirst_enable()

PyKX now running in 'jupyter_qfirst' mode. All cells by default will be run as q code. 
Include '%%py' at the beginning of each cell to run as python code. 


##### Initialization Code

In [4]:
system"l init.q"

//subsequent calls to this init will throw an error because we have changed directory 
    //this error can be safely ignored 

//if you need to do a hard reset, please restart the kernel. 

Database present - loading local partitioned database /home/jovyan/course-advanced/.hidden/db/taq


**Learning outcomes** 

To understand:
* The structure of a partitioned database
* How to modify a partitioned database
* Considerations when working with extremely large tables 
* On disk compression
* Introduction to dbmaint.q

# Introduction
As implied by the name, a partitioned database is structured as a series of partitions, each of which contains folders for each table in the database. These folders are really splayed tables which store the data related to that particular partition. 

## Data 

For the duration of this module we will work with data that has been created in the `/db/taq` folder local to this module directory. This data has already been loaded into the current notebook: 

In [5]:
tables[]

`s#`daily`depth`mas`nbbo`quote`trade


In [6]:
key `:.

`s#`2020.01.02`2020.01.03`2020.01.06`2020.01.07`2020.01.08`2020.01.09`2020.01.10`2020.01.13`2020.01.14`2020.01.15`2020.01.16`2020.01.17`2020.01.20`2020.01.21`2020.01.22`2020.01.23`2020.01.24`2020.0..


In [7]:
//.Q.pt
//.Q.pf
.Q.pv

2020.01.02 2020.01.03 2020.01.06 2020.01.07 2020.01.08 2020.01.09 2020.01.10 2020.01.13 2020.01.14 2020.01.15 2020.01.16 2020.01.17 2020.01.20 2020.01.21 2020.01.22 2020.01.23 2020.01.24 2020.01.27..


The `daily`,`depth` and `mas` files are flat tables, while `nbbo`, `quote` and `trade` are partitioned tables. 

# Structure of a partitioned database.

The Partitions of a partitioned database are special directories that contain tables split by a certain criteria e.g. date. 

 <img src="../images/PartitionedDb.png" width="500" height="500">

## On Disk structure

Inside each partition are the tables, which will each have their own directories with the structure of a splayed table. Splayed tables can be thought of as tables being cut vertically along columns, while partitioned tables likewise cut vertically along columns (splayed) and then cut horizontally along either date, month, year, or int. 
 
The reason for splitting these tables is usually due to the large amounts of volumes associated with a given split - e.g. daily for highly frequent high volume data.
 
Partitioned tables are suitable for tables with millions of records per partition (e.g. daily time series data) and queries usually are executed against a limited set of partitions, so that only those partitions specified are accessed/queried. 

## Partition types 

Tables can only be [partitioned](https://code.kx.com/q4m3/14_Introduction_to_Kdb+/#143-partitioned-tables) on the following types: 
* date 
* month 
* year 
* long (previously int in versions pre v3.x) 

The partition type is determined from the partition name format; 
 
    2008.06.10 – the type is date, 
    2008.06 – the type is month, 
    2008 - the type is year, 
    25 - the type is long. 
 
One database can contain only one partition type at a time. Each table in a partition will have an extra virtual column with the same type and name as the partition type, and same value as the partition name.


For example, if we have trade table partitioned on date, each day's trade data will have a virtual column called date, and the value will be the name of the partition (folder) the data is from.

In [8]:
meta trade                //our trade table shows a column called date 

c    | t f a
-----| -----
date | d    
sym  | s   p
time | t    
price| f    
size | j    
stop | b    
cond | c    
ex   | c    


In [9]:
key `:2020.01.02/trade    //no file for the date column exists - inferred from directory structure

`s#`.d`cond`ex`price`size`stop`sym`time


# Creating a Partitioned Database 

Creating a partitioned database involves the creation of many separate schema consistent splayed tables, divided into different directories.

First we'll mock up some data to save: 

In [10]:
n: 10000
show 5 sublist t:([]date: asc n?2020.01.01 + til 10; sensor:n?`1;reading:n?82.9)

date       sensor reading 
--------------------------
2020.01.01 c      55.33425
2020.01.01 k      15.73327
2020.01.01 c      9.554756
2020.01.01 n      63.89734
2020.01.01 g      46.26324


We have data from 10 different days so if we create a date partitioned database, we will have ten different partitions. We can save our data into a directory `.../sensorDB` beside our `taq` database and given our data contains symbol columns, we need to enumerate before attempting to save down. 

In [11]:
show enumT:.Q.en[`:../sensorDB;t]
key `:../sensorDB                 //this has created the folder and our sym file - check via Jupyter Home!

`s#`2020.01.02`sym
date       sensor reading 
--------------------------
2020.01.01 c      55.33425
2020.01.01 k      15.73327
2020.01.01 c      9.554756
2020.01.01 n      63.89734
2020.01.01 g      46.26324
2020.01.01 i      28.48091
2020.01.01 b      6.236588
2020.01.01 k      18.03259
2020.01.01 b      26.43753
2020.01.01 d      71.13844
2020.01.01 g      49.22499
2020.01.01 h      79.25316
2020.01.01 d      2.35304 
2020.01.01 g      63.77775
2020.01.01 j      40.07898
..


In [12]:
//meta t
enumT

date       sensor reading 
--------------------------
2020.01.01 c      55.33425
2020.01.01 k      15.73327
2020.01.01 c      9.554756
2020.01.01 n      63.89734
2020.01.01 g      46.26324
2020.01.01 i      28.48091
2020.01.01 b      6.236588
2020.01.01 k      18.03259
2020.01.01 b      26.43753
2020.01.01 d      71.13844
2020.01.01 g      49.22499
2020.01.01 h      79.25316
2020.01.01 d      2.35304 
2020.01.01 g      63.77775
2020.01.01 j      40.07898
..


The final step is now to save our data down into a separate directory for each date - we begin with getting our list of dates, and writing to disk for one of the dates

In [13]:
show dts:exec distinct date from enumT 
show dt: first dts 

`s#2020.01.01 2020.01.02 2020.01.03 2020.01.04 2020.01.05 2020.01.06 2020.01.07 2020.01.08 2020.01.09 2020.01.10
2020.01.01


In [14]:
`$(string dt)
/,`sensorTab`
show path: ` sv `:../sensorDB,(`$string dt),`sensorTab`
path set select from enumT where date = 2020.01.02

2020.01.01
:../sensorDB/2020.01.01/sensorTab/
`:../sensorDB/2020.01.01/sensorTab/


Taking a moment to look at the directory structure:

In [15]:
system"dir ../sensorDB"

"2020.01.01  2020.01.02\tsym"


Let's load this data in, and see how it looks: 

In [16]:
\l ../sensorDB
select from sensorTab

date       sensor reading 
--------------------------
2020.01.02 l      46.68199
2020.01.02 o      70.27368
2020.01.02 l      37.28489
2020.01.02 n      32.30919
2020.01.02 m      81.84824
2020.01.02 f      73.51871
2020.01.02 h      50.66184
2020.01.02 j      65.68112
2020.01.02 d      18.3917 
2020.01.02 n      76.96679
2020.01.02 b      76.42116
2020.01.02 o      15.94944
2020.01.02 k      59.70587
2020.01.02 c      60.60719
2020.01.02 c      55.50768
..


Hmmm something looks a bit odd - can you spot what has happened and where we've gone wrong?

<b>*Take some time to review the steps up to here before continuing on!*</b>

In [17]:
5#select from sensorTab where date = 2020.01.01
5#select from sensorTab where date = 2020.01.02

date       sensor reading 
--------------------------
2020.01.02 l      46.68199
2020.01.02 o      70.27368
2020.01.02 l      37.28489
2020.01.02 n      32.30919
2020.01.02 m      81.84824
date       sensor reading 
--------------------------
2020.01.02 k      10.66391
2020.01.02 m      77.11459
2020.01.02 i      72.0592 
2020.01.02 n      50.15432
2020.01.02 p      47.86468


##### Continue

There are two errors here: 
1. We saved the data from our table relating to 2020.01.02 but saved this data under our 2020.01.01 partition!
2. We saved down a column with the same name as our date partition! 

Well done if you spotted either (or both!)

Fundamentally the query structure with kdb+/q partitioned tables relies on leveraging the top level on disk partition structure to reduce the amount of data drawn back. If you recall, the `where` clause is executed first in qSQL. 

This means that when we specify `where date = 2020.01.01` that is the partition in which we look, and then because we had stored down (incorrectly) a column with the same name as our partition, this is returned from the retrieval, giving us the odd looking results we see above. 



<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> We should never store a table column on disk with the same name as our partitions! </i></p>

##### Exercise

Fix the error above by saving the data in the correct directory instead and reloading the database. 

Verify this now returns sensible results.

In [45]:
//\pwd
//\mv 2020.01.01 2020.01.02
\l . 

In [None]:
select from sensorTab where date = 2020.01.01
select from sensorTab where date = 2020.01.02   //success!

In [18]:
//your answer here 
\pwd
\mv 2020.01.01 2020.01.02
\l . 



"/home/jovyan/course-advanced/.hidden/db/sensorDB"
()


QError: ./2020.01.02/2020.01.01/.d. OS reports: No such file or directory

In [19]:
5#select from sensorTab where date = 2020.01.01
5#select from sensorTab where date = 2020.01.02

date sensor reading
-------------------
                   
                   
                   
                   
                   
date       sensor reading 
--------------------------
2020.01.02 k      10.66391
2020.01.02 m      77.11459
2020.01.02 i      72.0592 
2020.01.02 n      50.15432
2020.01.02 p      47.86468


Continue with the exercise and create a function - `saveTableParted` which will take the following arguments: 
* p: partition directory home 
* t: a table to save 
* tableName : name to call the new table

And creates a date partitioned database - the table has a date column, but ensure this is not saved down.

Check this works by running the following successfully: 
    
    saveTableParted[`:.;t;`sensorTab] 
    
And loading in the completed database.

In [20]:
saveTableParted:{[d;t;tableName] 
            dts: exec distinct date from t;              //get the unique dates in the table - will be our partitions
            enumT: .Q.en[d;t];                           //enumerating against our directory to save to - NB!
            {[d;tab;tableName;dt]                        //need to loop over all dates, so making a lambda 
                toSave: select from tab where date = dt;   //only want to save data for this date
                path: ` sv d,(`$string dt),tableName,`;    //file path for our new data directory  (NB needs ending /)
                path set toSave                            //saving using set
                        }[d;enumT;tableName] each dts    //everything else is the same, looping over dates
        }

In [21]:
saveTableParted[`:.;t;`sensorTab] 

`:./2020.01.01/sensorTab/`:./2020.01.02/sensorTab/`:./2020.01.03/sensorTab/`:./2020.01.04/sensorTab/`:./2020.01.05/sensorTab/`:./2020.01.06/sensorTab/`:./2020.01.07/sensorTab/`:./2020.01.08/sensorT..


In [22]:
\l . 
select count i by date from sensorTab
select count i by date from t                    
t~ update value sensor from select from sensorTab     //removing the enumeration to verify these match

date      | x   
----------| ----
2020.01.01| 959 
2020.01.02| 962 
2020.01.03| 1021
2020.01.04| 1008
2020.01.05| 1021
2020.01.06| 1001
2020.01.07| 1018
2020.01.08| 1034
2020.01.09| 984 
2020.01.10| 992 
date      | x   
----------| ----
2020.01.01| 959 
2020.01.02| 962 
2020.01.03| 1021
2020.01.04| 1008
2020.01.05| 1021
2020.01.06| 1001
2020.01.07| 1018
2020.01.08| 1034
2020.01.09| 984 
2020.01.10| 992 
1b


In [23]:
saveTableParted:{[d;t;tableName] 
            dts: exec distinct date from t;              //get the unique dates in the table - will be our partitions
            enumT: .Q.en[d;t];                           //enumerating against our directory to save to - NB!
            {[d;tab;tableName;dt]                        //need to loop over all dates, so making a lambda 
                toSave: select from tab where date = dt;   //only want to save data for this date
                path: ` sv d,(`$string dt),tableName,`;    //file path for our new data directory  (NB needs ending /)
                path set toSave                            //saving using set
                        }[d;enumT;tableName] each dts    //everything else is the same, looping over dates
        }

**Before moving on please move back into the /db/taq directory by running the below!**

In [24]:
\l ../taq
select count i by date from trade
\pwd

date      | x     
----------| ------
2020.01.02| 146790
2020.01.03| 152463
2020.01.06| 147258
2020.01.07| 141046
2020.01.08| 144751
2020.01.09| 140281
2020.01.10| 137972
2020.01.13| 147727
2020.01.14| 145750
2020.01.15| 146362
2020.01.16| 150264
2020.01.17| 149713
2020.01.20| 147049
2020.01.21| 144268
2020.01.22| 146677
..
"/home/jovyan/course-advanced/.hidden/db/taq"


# Querying data in a Partitioned Database 

We will see querying data in a partitioned database is very similar to querying data in a splayed table.

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:12px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i>  When querying a partitioned database, it is important to include the partition as the first constraint in the <code>where</code> clause. This speeds up queries, as kdb+/q only has to search through the specified partitions, as opposed to all of them.</i></p>

## Restrictions 
Similar to the behaviour we saw with Splayed tables, we cannot `update`, `delete` or `exec` data directly from a partitioned table - we must first return the data into memory to use these functions.

In [25]:
exec time from 
    select from trade where date = last date 

09:30:00.021 09:30:00.025 09:30:00.028 09:30:00.031 09:30:00.041 09:30:00.147 09:30:00.216 09:30:00.413 09:30:00.439 09:30:00.441 09:30:00.536 09:30:00.575 09:30:00.594 09:30:00.646 09:30:00.796 09..


Upon initial load of the database, the partitioned structure is memory mapped - running count i by date across the entire `trade` table, or returning the count of any tables is usually not an expensive operation: 

In [26]:
select count i by date from trade 

date      | x     
----------| ------
2020.01.02| 146790
2020.01.03| 152463
2020.01.06| 147258
2020.01.07| 141046
2020.01.08| 144751
2020.01.09| 140281
2020.01.10| 137972
2020.01.13| 147727
2020.01.14| 145750
2020.01.15| 146362
2020.01.16| 150264
2020.01.17| 149713
2020.01.20| 147049
2020.01.21| 144268
2020.01.22| 146677
..


However, outside of this we should always ensure that our database queries have a constraint against the partition column.  

## Partitioned Database Utilities
There are a number of kdb+/q [`.Q utilities`](https://code.kx.com/q/ref/dotq/) which are helpful when working with large partitioned databases, some of the more popular are discussed here. 

### `.Q.chk`

[`.Q.chk`](https://code.kx.com/q/ref/dotq/#qchk-fill-hdb) is a utility which will look across all partitions in the data and fill any missing tables with an empty schema definition (as per the schema in the most recent partition), allowing the data to be loadable. 

Let's work through an example of saving down just one individual table in a new directory:  

In [27]:
show t: get `:2020.01.02/trade   //grabbing a convenient table

sym  time         price size stop cond ex
-----------------------------------------
AAPL 09:30:00.021 83.88 17   0    G    N 
AAPL 09:30:00.025 83.87 74   0    J    N 
AAPL 09:30:00.028 83.84 57   0    N    N 
AAPL 09:30:00.031 83.87 81   0    K    N 
AAPL 09:30:00.041 83.87 52   0    G    N 
AAPL 09:30:00.147 83.83 20   0    Z    N 
AAPL 09:30:00.216 83.98 67   0    8    N 
AAPL 09:30:00.413 83.97 47   0    P    N 
AAPL 09:30:00.439 83.95 70   0    8    N 
AAPL 09:30:00.441 83.9  62   0    A    N 
AAPL 09:30:00.536 83.94 18   0    G    N 
AAPL 09:30:00.575 83.89 32   0    G    N 
AAPL 09:30:00.594 83.85 72   0    A    N 
AAPL 09:30:00.646 83.76 10   0    O    N 
AAPL 09:30:00.796 83.77 25   0    P    N 
..


In [28]:
show fp:` sv hsym[`$string .z.d],`trade`   //our filepath to save to 
fp set t                                   //t is already enumerated - we were lazy before!

`:2025.03.07/trade/
:2025.03.07/trade/


Now we can reload our database and inspect our data: 

In [29]:
\l . 
`date xdesc select count i by date from trade   //our new data is there - all good!

date      | x     
----------| ------
2025.03.07| 146790
2020.01.31| 150960
2020.01.30| 152930
2020.01.29| 158067
2020.01.28| 162477
2020.01.27| 155718
2020.01.24| 153839
2020.01.23| 145783
2020.01.22| 146677
2020.01.21| 144268
2020.01.20| 147049
2020.01.17| 149713
2020.01.16| 150264
2020.01.15| 146362
2020.01.14| 145750
..


In [30]:
\pwd

"/home/jovyan/course-advanced/.hidden/db/taq"


Great! It looks like we were successful in adding our data and can query our `trade` table - but what about our other tables? 

In [None]:
`date xdesc select count i by date from quote   //no such table!

This is problematic - we have broken our `quote` table by adding `trade` without supplying data for `quote` in the same partiton. Luckily this is where we can use `.Q.chk`, which takes as an input the directory of the database we want to unify: 

In [31]:
.Q.chk[`:.]    //it returns a list of the directories to which is added data 


















..


In [32]:
`date xdesc select count i by date from quote   //problem solved! We don't have an entry for our date as it's empty

date      | x     
----------| ------
2020.01.31| 750931
2020.01.30| 761369
2020.01.29| 788241
2020.01.28| 813131
2020.01.27| 780007
2020.01.24| 769034
2020.01.23| 728081
2020.01.22| 733703
2020.01.21| 722632
2020.01.20| 734560
2020.01.17| 750446
2020.01.16| 753248
2020.01.15| 732749
2020.01.14| 729196
2020.01.13| 737660
..


This is a very commonly used utility when working with databases. 

### `.Q.view`
One of the most helpful for protecting database from inexperienced users is the [`.Q.view`](https://code.kx.com/q/ref/dotq/#qview-subview) utility - this allows us to specify a default window against which non constrained database queries are executed. 

In [33]:
select count i by date from trade                                  //query against the full database 
select count i by date from trade where date in (5 sublist date)   //query against the first 5 dates 

date      | x     
----------| ------
2020.01.02| 146790
2020.01.03| 152463
2020.01.06| 147258
2020.01.07| 141046
2020.01.08| 144751
2020.01.09| 140281
2020.01.10| 137972
2020.01.13| 147727
2020.01.14| 145750
2020.01.15| 146362
2020.01.16| 150264
2020.01.17| 149713
2020.01.20| 147049
2020.01.21| 144268
2020.01.22| 146677
..
date      | x     
----------| ------
2020.01.02| 146790
2020.01.03| 152463
2020.01.06| 147258
2020.01.07| 141046
2020.01.08| 144751


`.Q.view` allows us to set a subview on our partitioned tables that gets enacted when people query without a constraint: 

In [59]:
.Q.view 5 sublist date 

In [60]:
select count i by date from trade   //no constraint explicitly defined but subset to the date provided to .Q.view 

date      | x     
----------| ------
2020.01.02| 146790
2020.01.03| 152463
2020.01.06| 147258
2020.01.07| 141046
2020.01.08| 144751


This can be removed by calling `.Q.view` with no input:

In [62]:
.Q.view[]
select count i by date from trade
.Q.view -5 sublist date 
select count i by date from trade

date      | x     
----------| ------
2020.01.02| 146790
2020.01.03| 152463
2020.01.06| 147258
2020.01.07| 141046
2020.01.08| 144751
2020.01.09| 140281
2020.01.10| 137972
2020.01.13| 147727
2020.01.14| 145750
2020.01.15| 146362
2020.01.16| 150264
2020.01.17| 149713
2020.01.20| 147049
2020.01.21| 144268
2020.01.22| 146677
..
date      | x     
----------| ------
2020.01.28| 162477
2020.01.29| 158067
2020.01.30| 152930
2020.01.31| 150960
2025.03.07| 146790


This is helpful as a fail-safe to protect against unconstrained queries which may cause the process to attempt to read the whole table into memory, consuming all the RAM and causing the process to fail. 

### `.Q.dpft`
Arguably the most frequently used utility associated with partitioned databases, [`.Q.dpft`](https://code.kx.com/q/ref/dotq/#qdpft-save-table) allows the saving down of a global table to a specified directory and is the end of day save down utility leveraged in vanilla tick.q architecture setup.

Let's breakdown the syntax: 

    .Q.dpft[d;p;f;t] [.Q.(directory)(partition)(field - parted)(table - global)]
where `d` is a directory handle, `p` is a partition of a database sorted (`p#`) on `f` a field of the table named by
`t`, a table handle (symbol). 

Grabbing some data, let's use this to save down: 

In [34]:
show t: get `:2020.01.02/trade 

sym  time         price size stop cond ex
-----------------------------------------
AAPL 09:30:00.021 83.88 17   0    G    N 
AAPL 09:30:00.025 83.87 74   0    J    N 
AAPL 09:30:00.028 83.84 57   0    N    N 
AAPL 09:30:00.031 83.87 81   0    K    N 
AAPL 09:30:00.041 83.87 52   0    G    N 
AAPL 09:30:00.147 83.83 20   0    Z    N 
AAPL 09:30:00.216 83.98 67   0    8    N 
AAPL 09:30:00.413 83.97 47   0    P    N 
AAPL 09:30:00.439 83.95 70   0    8    N 
AAPL 09:30:00.441 83.9  62   0    A    N 
AAPL 09:30:00.536 83.94 18   0    G    N 
AAPL 09:30:00.575 83.89 32   0    G    N 
AAPL 09:30:00.594 83.85 72   0    A    N 
AAPL 09:30:00.646 83.76 10   0    O    N 
AAPL 09:30:00.796 83.77 25   0    P    N 
..


In [35]:
.Q.dpft[`:.;.z.d;`sym;`t]       //this will save the table t to the partitioned database 
\l . 

t


In [36]:
key hsym[`$ string .z.d]        //we can see this table is now present in our most recent directory! 

`s#`nbbo`quote`t`trade


<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> .Q.dpft has it's limitations like .Q.en in that we can only name the enumerated file `sym`. If we wanted to change the enumerated file, we can use <a href="https://code.kx.com/q/ref/dotq/#qdpfts-save-table-with-symtable">.Q.dpfts</a>

In [37]:
newtab:([]sym:10?`AAPL`MSFT`KX;price:10?100.) //creating another table
.Q.dpfts[`:newhdb;.z.d;`sym;`newtab;`mysym] //saving to new directory

newtab


### Other useful utility functions to know  

Most of the time, the partitioned database is stored on a server in which users can't access to see what the structure it has. There are many questions that we need answered to structure our queries can be as performant. How do we know if a table is a flat file or partitioned ? How do we find out how the database is partitioned ? Luckily kdb+/q has functions that will tell us this information:

* [.Q.pt](https://code.kx.com/q/ref/dotq/#qpt-partitioned-tables)- returns a list of partitioned tables
* [.Q.pf](https://code.kx.com/q/ref/dotq/#qpf-partition-field)- returns the partitioned field
* [.Q.pv](https://code.kx.com/q/ref/dotq/#qpv-modified-partition-values)- returns the partitions values 
* [.Q.cn](https://code.kx.com/q/ref/dotq/#qcn-count-partitioned-table)- returns the count of the partitioned table
* [.Q.pd](https://code.kx.com/q/ref/dotq/#qpd-modified-partition-locations)- returns the location of the partitions
* [.Q.pn](https://code.kx.com/q/ref/dotq/#qpn-partition-counts)- returns the number of rows in each partition

In [42]:
.Q.pn

nbbo | 
quote| 
t    | 
trade| 


In [43]:
.Q.pt
.Q.pf
.Q.pv 
.Q.pd
.Q.cn trade
.Q.pn  //nbbo,quote and t are empty because .Q.cn was only executed for trade
.Q.cn quote
.Q.pn //quote number is filled in now 

`s#`nbbo`quote`t`trade
date
2020.01.02 2020.01.03 2020.01.06 2020.01.07 2020.01.08 2020.01.09 2020.01.10 2020.01.13 2020.01.14 2020.01.15 2020.01.16 2020.01.17 2020.01.20 2020.01.21 2020.01.22 2020.01.23 2020.01.24 2020.01.27..
`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.`:.
146790 152463 147258 141046 144751 140281 137972 147727 145750 146362 150264 149713 147049 144268 146677 145783 153839 155718 162477 158067 152930 150960 146790
nbbo | ()
quote| ()
t    | ()
trade| 146790 152463 147258 141046 144751 140281 137972 147727 145750 146362 150264 149713 147049 144268 146677 145783 153839 155718 162477 158067 152930 150960 146790
731943 759571 735464 707893 724462 702548 689443 737660 729196 732749 753248 750446 734560 722632 733703 728081 769034 780007 813131 788241 761369 750931 0
nbbo | ()
quote| 731943 759571 735464 707893 724462 702548 689443 737660 729196 732749 753248 750446 734560 722632 733703 728081 769034 780007 813131 788241 761369 750931 0
t    | ()
trad

# Modifying a Partitioned Database
If we want to modify a table in our partitioned database we need to modifying each of the underlying splayed tables that exist within each partition. 

The [dbmaint.q](https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md) script provides some useful utilities for editing and maintaining a historical database (HDB). Generally, these functions are safer and should be used in place of the raw commands for any database amendments. 

In [44]:
system "l ."

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i>Once you have modified the on-disk database, remember to adjust the schema (tick/???.q) to reflect your changes to the data.</i></p>

Using the dbMaint functions, the following arguments apply:

* **dbdir** : a file symbol for the database folder
* **table** : the symbol naming a table
* **col** : the symbol name of a column

The dbMaint script has already been loaded into memory.

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> The <a href="https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md"><b>dbmaint.q</b></a> script is a valuable tool in our arsenal when it comes to modifying databases! </i></p>

## Modify Columns

For simplicity of understanding we will looks at two of these functions:
* addcol - Adds a new column to table with value default value in each row.
* deletecol - Deletes a column from table.

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i> Other common use cases of  <a href="https://github.com/KxSystems/kdb/blob/master/utils/dbmaint.md"><b>dbmaint.q</b></a> script include renaming columns or table name, reordering columns, adding or removing attributes and casting columns.</i></p>

Firstly let's look at how we would add a new column to the trade table:

Syntax:```addcol[dbdir;table;col;defaultvalue]```
 
Let's add a new column called `newCol1` of type `symbol` to the trade table. Then add a second new column called `newCol2` of type `float` to the trade table. 

In [45]:
addcol[`:.;`trade;`newCol1;`]
addcol[`:.;`trade;`newCol2;0f]
system "l ."
"Column addition ",?[all `newCol1`newCol2 in cols select from trade where date=last date;"successful!";"failure!"]

2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.02/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.03/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.06/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.07/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.08/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.09/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.10/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.13/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.14/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.15/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.16/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.17/trade
2025.03.07 03:08:42 adding column newCol1 (type -20) to `:./2020.01.20/trade

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i>Always remember to run system "l ." after making runnning dbmaint functions to make sure your process has picked up the new changes. </i></p>

Syntax:```deletecol[dbdir;table;col]```

Let's delete the previously created columns `newCol1` and `newCol2` 

In [46]:
deletecol[`:.;`trade;`newCol1]
deletecol[`:.;`trade;`newCol2]
system "l ."
"Column deletion ",?[not all `newCol1`newCol2 in cols select from trade where date=last date;"successful!";"failure!"]

2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.02/trade
Column deletion successful!
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.03/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.06/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.07/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.08/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.09/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.10/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.13/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.14/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.15/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.16/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.17/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.20/trade
2025.03.07 03:14:22 deleting column newCol1 from `:./2020.01.2

<img src="../images/qbies.png" style="width: 50px;padding-right:5px;padding-top:2px;padding-left:5px;" align="left"/>

<p style='color:#273a6e'><i>The deleteCol function doesn't delete the col# files for nested columns (the files containing the actual values) – you will need to delete these manually.</i></p>

##### Exercise 

Using a function in dbmaint, change the type of the size column to a short 

In [None]:
castcol[`:.;`trade;`size;`short]
\l .        //reloading database
meta trade //checking if size column is now a short

In [None]:
//your answer here


### Summary table of functions in dbmaint.q

|Function |Purpose | Syntax|
|---------|------|--------|
|addcol |Adds new column col to table with value default value in each row|`addcol[dbdir;table;col;defaultvalue]`|
|castcol|Cast the values in the column to the newtype and save.|`castcol[dbdir;table;col;newtype]`|
|clearattrcol |Remove any attributes from column `col` |`clearattrcol[dbdir;table;col]`|
|copycol|Copy the values from `oldcol` into a new column named `newcol`, undefined in the table.|`copycol[dbdir;table;oldcol;newcol]`|
|deletecol| Delete column `col` from `table`|`deletecol[dbdir;table;col]`|
|findcol|Print a list of the partition directories where `col` exists and its type in each|`findcol[dbdir;table;col]`|
|fixtable|Adds missing columns to to all partitions of a table, given the location of a good partition.|`fixtable[dbdir;table;goodpartition]`|`fixtable[dbdir;table;goodpartition]`|
|fncol|Apply a function to the list of values in `col` and save the results as its values.|`fncol[dbdir;table;col;fn]`|
|listcols|List the columns of `table` (relies on the first partition)|`listcols[dbdir;table]`|
|renamecol|Rename column `oldname` to `newname`, which must be undefined in the table|`renamecol[dbdir;table;oldname;newname]`|
|reordercols|Reorder the columns of `table`. `neworder` is a full list of the column names as they appear in the updated table.| `reordercols[dbdir;table;neworder]`|
|setattrcol|Apply an attribute to `col`. The data in the column must be valid for that attribute.|`setattrcol[dbdir;table;col;newattr]`|`setattrcol[dbdir;table;col;newattr]`|
|addtable|Add a table called `tablename` with an empty table with the same schema as `table` created in each partition of the new table.|`addtable[dbdir;tablename;table]`|
|rentable|Rename table `old` to `new`|`rentable[dbdir;old;new]`|

# Further resources 

Make sure to check out *Partitioned Databases - Tips and Tricks* notebook to find out more about:
* Virtual columns
* Sorting large partitions 
* Database compression 
* Database modification  