Skip to content

Conversation

@EmilyMatt
Copy link
Contributor

@EmilyMatt EmilyMatt commented Nov 18, 2025

Which issue does this PR close?

Closes #18810 .

Rationale for this change

Performance of the Avro reader is currently dismal :(
While going over it, I saw a couple opportunities for improvements, but then I noticed the lack of support for binary types,
this is something I'd like to tackle early on.

What changes are included in this PR?

  1. Separated binary/string types to actually respect the output type(previously they all used the i32 offset variant)
  2. Added support for nested binary types.
  3. Added support for fixed-binary reads
  4. Moved binary/utf8 reads to be copy-on-write, as much as possible(identified one case where it isn't currently possible with Array type of Byte element)

Are these changes tested?

The new nested support is tested by modifying the existing test_complex_list test, the move to cow is tested via all the current tests that test strings and binaries.
Added a new test to test fixed type, both in normal and list circumstances.

Are there any user-facing changes?

No, just extended support

@github-actions github-actions bot added the datasource Changes to the datasource crate label Nov 18, 2025
@alamb
Copy link
Contributor

alamb commented Nov 20, 2025

Performance of the Avro reader is currently dismal :(

FWIW this is major reason we have been working on the new arrow-avro crate- https://arrow.apache.org/blog/2025/10/23/introducing-arrow-avro/

Perhaps rather than working on the current avro reader, you could help with the migration effort to the new reader:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Avro Reader Types Support

3 participants