These extensions allow you to represent Python NumPy/Pandas objects in Postgres.
Warning
|
This is very much a work in progress. Expect the APIs and what is in each extension to change! |
Currently you can think of PandaPost more as an experiment than anything else. It seems like it would be useful to be able to natively store ndarrays, DataFrames, etc in a Postgres column. Then again, maybe not. :)
If you can make use of any of this, please drop a line to our Google Group. We’d love any feedback!
Install the pgxn client, and then pgxn install panda_post --unstable
. After that you can CREATE EXTENSION "PandaPost";
in your database. Currently it installs everything in your default schema; I suggest creating a panda schema and installing there. Note that it also requires plpythonu.
To wrap all that together:
pgxn install panda_post --unstable psql <database name> CREATE EXTENSION plpythonu; CREATE SCHEMA panda; CREATE EXTENSION panda_post WITH SCHEMA panda;
There are now casts between ndarray and all the plpython supported types (boolean, int*, float, real, numeric and text). This allows things like SELECT panda.str(array[1,1,2,2]::panda.ndarray);
.
Function | Description |
---|---|
T |
Apply ndarray T() function to input |
eval |
Returns the results of running python’s eval() on the input, as an ndarray. IE: |
repr |
Returns the python repr() of the input. |
str |
Returns the python str() of the input. |
Function | Description |
---|---|
ediff1d |
|
in1d |
Returns boolean array of whether each element in |
intersect1d |
Return unique list of values that are present in both |
ndunique |
Return unique elements in |
ndunique1 |
Same as ndunique() but only return the array of unique elements. |
setdiff1d |
Return the difference between |
setxor1d |
Return the values that are in only one (but not both) |
union1d |
Return the unique values that are in either |
Function | Description |
---|---|
create_cast |
Creates casts to and from a Postgres data type and ndarray |
Next up: porting the more common ndarray functions:
np.ndarray.byteswap np.ndarray.cumsum np.ndarray.flat np.ndarray.min np.ndarray.ravel np.ndarray.shape np.ndarray.tobytes np.ndarray.all np.ndarray.choose np.ndarray.data np.ndarray.flatten np.ndarray.nbytes np.ndarray.real np.ndarray.size np.ndarray.tofile np.ndarray.any np.ndarray.clip np.ndarray.diagonal np.ndarray.getfield np.ndarray.ndim np.ndarray.repeat np.ndarray.sort np.ndarray.tolist np.ndarray.argmax np.ndarray.compress np.ndarray.dot np.ndarray.imag np.ndarray.newbyteorder np.ndarray.reshape np.ndarray.squeeze np.ndarray.tostring np.ndarray.argmin np.ndarray.conj np.ndarray.dtype np.ndarray.item np.ndarray.nonzero np.ndarray.resize np.ndarray.std np.ndarray.trace np.ndarray.argpartition np.ndarray.conjugate np.ndarray.dump np.ndarray.itemset np.ndarray.partition np.ndarray.round np.ndarray.strides np.ndarray.transpose np.ndarray.argsort np.ndarray.copy np.ndarray.dumps np.ndarray.itemsize np.ndarray.prod np.ndarray.searchsorted np.ndarray.sum np.ndarray.var np.ndarray.astype np.ndarray.ctypes np.ndarray.fill np.ndarray.max np.ndarray.ptp np.ndarray.setfield np.ndarray.swapaxes np.ndarray.view np.ndarray.base np.ndarray.cumprod np.ndarray.flags np.ndarray.mean np.ndarray.put np.ndarray.setflags np.ndarray.take
CREATE EXTENSION IF NOT EXISTS plpythonu; CREATE EXTENSION panda_post; CREATE TEMP TABLE s AS SELECT array['a','b','c']::ndarray AS s1, array['c','d']::ndarray AS s2; -- python repr() of an ndarray of strings SELECT repr(s1) FROM s; repr ------------------------- array(['a', 'b', 'c'], + dtype='|S1') (1 row) -- python str() of same array SELECT str(s1) FROM s; str --------------- ['a' 'b' 'c'] (1 row) -- exclusive-or of two ndarrays SELECT str(setxor1d(s1, s2)) FROM s; str --------------- ['a' 'b' 'd'] (1 row) -- Intersection SELECT str(intersect1d(s1, s2)) FROM s; str ------- ['c'] (1 row)
This more complicated example uses the lambad extension to create a dataframe, which can currently be returned as an ndarray.
Note
|
Eventually there will be an actual DataFrame Postgres data type |
CREATE EXTENSION IF NOT EXISTS lambda; \set df pd.DataFrame.from_dict([{"a":1,"b":"a"},{"a":2,"b":"b"}]) SELECT repr( lambda( $l$( ndarray ) RETURNS ndarray LANGUAGE plpythonu TRANSFORM FOR TYPE ndarray AS $body$ import pandas as pd return $l$ || :'df' || $l$ $body$ $l$ , NULL::ndarray ) ); repr --------- a b+ 0 1 a+ 1 2 b (1 row)
PandaPost is released under a BSD liscense.
Copyright (c) 2016 Jim Nasby <Jim.Nasby@BlueTreble.com>.