# Parquet Explorer

This tutorial explores some basic query operations on Parquet files written by Nautilus. We'll utilize both the `datafusion` and `pyarrow` libraries.

<div style="border:1px solid #5bc0de; padding:10px; margin-top:10px; margin-bottom:10px; background-color:#333333; color: #5bc0de;">
<b>Note:</b> This notebook expects to be run from the <code>docs/tutorials/</code> directory within the NautilusTrader repository, as it references test data files using relative paths.
</div>

Before proceeding, ensure that you have `datafusion` installed:
```bash
uv pip install datafusion
```

In [None]:
import datafusion
import pyarrow.parquet as pq

In [None]:
from pathlib import Path

# Test data paths (relative to docs/tutorials/)
test_data_dir = Path("../../tests/test_data/nautilus")
trade_tick_path = test_data_dir / "trades.parquet"
bar_path = test_data_dir / "bars.parquet"

# Verify paths exist
assert trade_tick_path.exists(), f"Trade tick file not found: {trade_tick_path.absolute()}"
assert bar_path.exists(), f"Bar file not found: {bar_path.absolute()}"

In [None]:
ctx = datafusion.SessionContext()

In [None]:
# Register parquet files (run this cell once)
ctx.register_parquet("trade_0", str(trade_tick_path))
ctx.register_parquet("bar_0", str(bar_path))

## TradeTick data

In [None]:
query = "SELECT * FROM trade_0 ORDER BY ts_init"
df = ctx.sql(query)

In [None]:
df.schema()

In [None]:
df

In [None]:
table = pq.read_table(trade_tick_path)

In [None]:
table.schema

## Bar data

In [None]:
query = "SELECT * FROM bar_0 ORDER BY ts_init"
df = ctx.sql(query)

In [None]:
df.schema()

In [None]:
df

In [None]:
table = pq.read_table(bar_path)
table.schema