Skip to content

Port parquet codec to v0.54.0-exaforce#36

Merged
sundaresanr merged 1 commit intov0.54.0-exaforcefrom
v0.54.0-exaforce-parquet
Apr 21, 2026
Merged

Port parquet codec to v0.54.0-exaforce#36
sundaresanr merged 1 commit intov0.54.0-exaforcefrom
v0.54.0-exaforce-parquet

Conversation

@sundaresanr
Copy link
Copy Markdown

Summary

  • Ports the parquet codec from v0.51-exaforce-parquet (PR #35) onto upstream v0.54.0
  • Enables `aws_s3` sink parquet output via `encoding.codec: parquet` — existing production vector configs keep working unchanged
  • `cargo check -p vector` green on default features

Design differences vs PR #35

Upstream v0.54 independently added a batched encoding scaffold (`BatchSerializer`, `BatchEncoder`, `EncoderKind::Batch`) in `lib/codecs/src/encoding/encoder.rs` with an `Arrow` variant behind the `arrow` feature. Rather than ship PR #35's duplicate `BatchSerializer` enum in `serializer.rs`, this PR:

  • Adds `BatchSerializer::Parquet(ParquetSerializer)` alongside upstream's `Arrow` variant
  • Ungates `EncoderKind::Batch` and the `(Transformer, BatchEncoder)` / `EncoderKind::Batch` arm of `(Transformer, EncoderKind)` (parquet is always-on, so they must compile without `codecs-arrow`)
  • Switches `S3RequestOptions::encoder` from `(Transformer, Encoder)` to `(Transformer, EncoderKind)`, so v0.54's existing `Encoder<Vec> for (Transformer, EncoderKind)` impl handles the framed-vs-batched dispatch — no need for PR Add Parquet codec to v0.51-exaforce #35's `Arc<dyn Encoder<Vec>>` polymorphism

`lib/codecs/src/encoding/format/parquet.rs` itself is copied verbatim (parquet 39.0.0, 1317 lines). `SerializerConfig::Parquet { parquet: ParquetSerializerOptions }`, `SerializerConfig::build_batched()`, and `EncodingConfigWithFraming::build_batched()` all preserved.

@sundaresanr sundaresanr force-pushed the v0.54.0-exaforce-parquet branch 3 times, most recently from b1f167c to b972a5c Compare April 20, 2026 23:15
Brings the parquet codec from v0.51-exaforce-parquet (PR #35) onto
upstream v0.54.0. Enables the aws_s3 sink to emit parquet files via
`encoding.codec: parquet`, preserving the existing user-facing config
that production vector configs already use.

Design differences vs PR #35, to fit v0.54's already-reshaped
batched encoding infrastructure:

- Upstream v0.54 gained a `BatchSerializer` enum (in
  lib/codecs/src/encoding/encoder.rs) with an `Arrow` variant behind
  the `arrow` feature, plus `BatchEncoder` and `EncoderKind::Batch`.
  We add `BatchSerializer::Parquet(ParquetSerializer)` alongside the
  `Arrow` variant rather than defining a new `BatchSerializer` enum
  in serializer.rs. `EncoderKind::Batch` is now ungated.
- aws_s3 sink's `S3RequestOptions::encoder` switches from
  `(Transformer, Encoder<Framer>)` to `(Transformer, EncoderKind)`.
  The existing `Encoder<Vec<Event>> for (Transformer, EncoderKind)`
  impl in src/sinks/util/encoding.rs dispatches between framed and
  batched paths, so there is no need for the `Arc<dyn Encoder>`
  polymorphism PR #35 used on v0.51.
- `(Transformer, BatchEncoder)` impl is no longer gated behind
  `codecs-arrow`, since parquet is now always-on.

Parquet support itself is unchanged: `lib/codecs/src/encoding/format/parquet.rs`
is copied verbatim from PR #35 (parquet 39.0.0, 1317 lines), and
`SerializerConfig::Parquet { parquet: ParquetSerializerOptions }` plus
`SerializerConfig::build_batched()` and
`EncodingConfigWithFraming::build_batched()` are preserved.

Verified with `cargo check -p vector` (default features).
@sundaresanr sundaresanr force-pushed the v0.54.0-exaforce-parquet branch from b972a5c to 655c841 Compare April 20, 2026 23:41
@sundaresanr sundaresanr merged commit f50bd86 into v0.54.0-exaforce Apr 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant