PDX2: Helper functions to run SQL on Pandas DataFrames

NOTE: This is basically a clone of https://github.com/ajfriend/pdx (since pdx is already taken on PyPI.)

pip install pdx2

Small ergonomic improvements to make it easy to run DuckDB queries on Pandas DataFrames.

pdx2 monkey-patches pandas.DataFrame to provide a df.sql(...) method.
since pdx uses DuckDB, you can leverage their convienient SQL dialect:
- https://duckdb.org/2022/05/04/friendlier-sql.html
- https://duckdbsnippets.com/

Query a Pandas DataFrame with df.sql(...). Omit the FROM clause because it is added implicitly:

import pdx2
iris = pdx2.data.get_iris()  # returns pandas.DataFrame

iris.sql("""
select
    species,
    count(*)
        as num,
group by
    1
""")

You can use short SQL (sub-)expressions because FROM and SELECT * are implied whenever they're omitted:

iris.sql('where petal_length > 4.5')

iris.sql('limit 10')

iris.sql('order by petal_length')

iris.sql('')  # returns the dataframe unmodified. I.e., 'select * from iris'

For more, check out the example notebook folder.

Other affordances

df.aslist()
df.asdict()
df.asitem()
df.cols2dict()
save/load helpers for DuckDB database files

Reference

Apache Arrow and the "10 Things I Hate About pandas"

For bleeding edge DuckDB

git clone https://github.com/duckdb/duckdb.git
cd duckdb
../env/bin/pip install -e tools/pythonpkg --verbose

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/workflows		.github/workflows
notebooks		notebooks
src/pdx2		src/pdx2
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
changelog.md		changelog.md
makefile		makefile
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDX2: Helper functions to run SQL on Pandas DataFrames

Other affordances

Reference

For bleeding edge DuckDB

About

Uh oh!

Releases 4

Languages

License

ajfriend/pdx2

Folders and files

Latest commit

History

Repository files navigation

PDX2: Helper functions to run SQL on Pandas DataFrames

Other affordances

Reference

For bleeding edge DuckDB

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Languages