Skip to content

ajfriend/pdx2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDX2: Helper functions to run SQL on Pandas DataFrames

GitHub | PyPI

NOTE: This is basically a clone of https://github.com/ajfriend/pdx (since pdx is already taken on PyPI.)

pip install pdx2

Small ergonomic improvements to make it easy to run DuckDB queries on Pandas DataFrames.

Query a Pandas DataFrame with df.sql(...). Omit the FROM clause because it is added implicitly:

import pdx2
iris = pdx2.data.get_iris()  # returns pandas.DataFrame

iris.sql("""
select
    species,
    count(*)
        as num,
group by
    1
""")

You can use short SQL (sub-)expressions because FROM and SELECT * are implied whenever they're omitted:

iris.sql('where petal_length > 4.5')
iris.sql('limit 10')
iris.sql('order by petal_length')
iris.sql('')  # returns the dataframe unmodified. I.e., 'select * from iris'

For more, check out the example notebook folder.

Other affordances

  • df.aslist()
  • df.asdict()
  • df.asitem()
  • df.cols2dict()
  • save/load helpers for DuckDB database files

Reference

For bleeding edge DuckDB

git clone https://github.com/duckdb/duckdb.git
cd duckdb
../env/bin/pip install -e tools/pythonpkg --verbose