# Apache Parquet Course

A hands-on introduction to the [Apache Parquet](https://parquet.apache.org/) file format using **PyArrow**.

---

## Contents

| # | Notebook | Topics |
|---|----------|--------|
| 01 | [Why Parquet?](01_why_parquet.ipynb) | Motivation, columnar storage, benchmark vs CSV |
| 02 | [File Format](02_file_format.ipynb) | Magic bytes, row groups, column chunks, pages, footer |
| 03 | [Metadata](03_metadata.ipynb) | FileMetaData, RowGroupMetaData, ColumnChunkMetaData, statistics |
| 04 | [Types](04_types.ipynb) | Physical types, logical type annotations, schema mapping |
| 05 | [Encodings](05_encodings.ipynb) | PLAIN, RLE_DICTIONARY, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT |
| 06 | [Compression](06_compression.ipynb) | SNAPPY, GZIP, ZSTD, LZ4_RAW: size and speed comparison |
| 07 | [Nested Encoding](07_nested_encoding.ipynb) | Dremel algorithm, definition & repetition levels, lists, structs |
| 08 | [Page Index & Bloom Filters](08_page_index_bloom_filter.ipynb) | Statistics-based row-group skipping, bloom filter lookups |

---

## Prerequisites

- Python 3.9+
- `pyarrow` installed (`pip install pyarrow` or via the course `pixi` / `conda` environment)
- Familiarity with Arrow arrays and tables is helpful, see the [Apache Arrow course](../00_intro.ipynb) if needed

No additional packages are required: PyArrow bundles full Parquet read/write support via `pyarrow.parquet`.