Skip to content

Latest commit

 

History

History
118 lines (92 loc) · 2.07 KB

fast_columnar_database.md

File metadata and controls

118 lines (92 loc) · 2.07 KB

Fast Columnar Database

KlongPy provides a module "klongpy.db" that includes DuckDb integration. DuckDb can operate directly on NumPy arrays, which allows for zero-copy SQL execution over pre-existing NumPy data.

Tables

?> .py("klongpy.db")
?> t::.table([["a" [1 2 3]] ["b" [2 3 4]]])
a b
1 2
2 3
3 4
?> t,"c",,[3 4 5]
a b c
1 2 3
2 3 4
3 4 5

Indexes (one or more columns) can be created on a table. The current indexes can be seen in the table discription prefix.

?> .index(t; ["a"])
['a']

When a column is indexed, it appears with an asterisk in the pretty-print format:

?> t
a* b
 1 2
 2 3
 3 5

Inserting a row with a pre-existing value at an index results in an update:

?> .insert(t, [3 5 6])
a* b c
 1 2 3
 2 3 4
 3 5 6

Indexes may be reset via .rindex(). True is returned if the index was reset.

?> .rindex(t)
1

Database

Databases are created from a map of table names to table instances. A database instance is a function which accepts SQL and runs it over the underlying tables. SQL results are NumPy arrays and can be directly used in normal KlongPy operations.

?> T::.table(,"a",,[1 2 3])
a
1
2
3
?> db::.db(:{},"T",,T)
:db
?> db("select * from T")
[1 2 3]

Since KlongPy uses DuckDb under the hood, you can perform sophisticated SQL over the underlying NumPy arrays.

For example, it's easy to use JOIN with this setup:

d::[]
d::d,,"a",,[1 2 3]
d::d,,"b",,[2 3 4]
T::.table(d)

e::,"c",,[3 4 5]
G::.table(e)

q:::{}
q,"T",,T
q,"G",,G
db::.db(q)

We can now issue a JOIN SQL:

?> db("select * from T join G on G.c = T.b")
[[2 3 3]
 [3 4 4]]

Pandas DataFrame integration

Tables are backed by Pandas DataFrames, so it's easy to integrate Pandas directly into KlongPy via DuckDb.

from klongpy import KlongInterpreter
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)

klong = KlongInterpreter()
klong['df'] = df
r = klong("""
.py("klongpy.db")
t::.table(df)
db::.db(:{},"people",t)
db("select Age from people")
""")