Minimal example of serving a Delta Lake table over Apache Arrow Flight (gRPC). Demonstrates how to bridge delta-rs and PyArrow Flight so clients can stream columnar data without a Spark or cloud dependency.
- Reads a CSV of Divvy bike-share trips into a Delta Lake table (one-time setup)
- Serves that table over Arrow Flight on
grpc://localhost:5005 - Client connects, discovers available datasets, fetches all rows as Arrow record batches, and prints summary stats
setup_delta.py CSV → Delta Lake (Parquet on disk)
server.py Delta Lake → Arrow Flight gRPC server
client.py Arrow Flight client → in-memory Arrow table
main.py CLI entrypoint (setup | server | client)
| Concept | Role |
|---|---|
FlightServerBase |
Base class — handles gRPC; you implement the 3 RPCs |
list_flights |
Discovery — returns available datasets + their schema |
get_flight_info |
Returns schema + endpoint for a named dataset |
do_get |
Streams record batches to the client |
Ticket |
Opaque bytes the client holds and returns on do_get — server encodes whatever it needs (dataset name, partition filter, etc.) |
FlightInfo |
Schema + endpoints + row/byte counts returned to client |
GeneratorStream |
Wraps a Python batch generator into the Flight streaming protocol |
Ticket design: In this implementation a ticket is just b"divvy_trips" — the whole dataset. Tickets can encode partition filters (e.g. b"divvy_trips/year=2024") so clients can request slices; the server decodes and applies filters before scanning.
client.list_flights()
→ server yields FlightInfo (schema + Ticket)
client.do_get(ticket)
→ server opens DeltaTable
→ scans Parquet files in 65,536-row batches
→ streams Arrow RecordBatches over gRPC
→ client reassembles into Arrow Table
- Python 3.12+
pyarrow >= 16.0.0deltalake >= 0.10.0
Install with uv:
uv sync1. Download data
Download a Divvy monthly CSV (e.g. 202604-divvy-tripdata.csv) from Divvy trip data and place it at:
data/202604-divvy-tripdata.csv
2. Setup — convert CSV to Delta Lake
python main.py setup3. Start the Flight server
python main.py server4. Run the client (separate terminal)
python main.py clientClient prints total row count, rideable_type breakdown, and member/casual split.