Skip to content

Files

Latest commit

 

History

History

python

PyBallista

Python client for Ballista.

This project is versioned and released independently from the main Ballista project and is intentionally not part of the default Cargo workspace so that it doesn't cause overhead for maintainers of the main Ballista codebase.

Creating a SessionContext

Important

Current approach is to support datafusion python API, there are know limitations of current approach, with some cases producing errors. We trying to come up with the best approach to support datafusion python interface. More details could be found at #1142

Creates a new context and connects to a Ballista scheduler process.

from ballista import BallistaBuilder
>>> ctx = BallistaBuilder().standalone()

Example SQL Usage

>>> ctx.sql("create external table t stored as parquet location './testdata/test.parquet'")
>>> df = ctx.sql("select * from t limit 5")
>>> pyarrow_batches = df.collect()

Example DataFrame Usage

>>> df = ctx.read_parquet('./testdata/test.parquet').limit(5)
>>> pyarrow_batches = df.collect()

Scheduler and Executor

Scheduler and executors can be configured and started from python code.

To start scheduler:

from ballista import BallistaScheduler

scheduler = BallistaScheduler()

scheduler.start()
scheduler.wait_for_termination()

For executor:

from ballista import BallistaExecutor

executor = BallistaExecutor()

executor.start()
executor.wait_for_termination()

Development Process

Creating Virtual Environment

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

Building

maturin develop

Note that you can also run maturin develop --release to get a release build locally.

Testing

python3 -m pytest