Skip to content

Expose arrow schema on async avro reader#9533

Closed
mzabaluev-flarion wants to merge 3 commits intoapache:mainfrom
mzabaluev:expose-arrow-schema-on-async-avro-reader
Closed

Expose arrow schema on async avro reader#9533
mzabaluev-flarion wants to merge 3 commits intoapache:mainfrom
mzabaluev:expose-arrow-schema-on-async-avro-reader

Conversation

@mzabaluev-flarion
Copy link
Contributor

Rationale for this change

Exposes the Arrow schema produced by the async Avro file reader, similarly to the schema method on the synchronous reader.

This allows an application to prepare casting or other schema transformations with no need to fetch the first record batch to learn the produced Arrow schema. Since the async reader only parses OCF content for the moment, the schema does not change from batch to batch.

What changes are included in this PR?

The schema method for AsyncAvroFileReader exposes the Arrow schema of record batches that are produced by the reader.

Are these changes tested?

Added tests verifying that the returned schema matches the expected.

Are there any user-facing changes?

Added a schema method to AsyncAvroFileReader.

Add a schema method to obtain the Arrow schema from the async Avro
reader.
Add metadata on fields of nested records and the list type,
so that the expected schema matches the one produced by the reader.
Add a test reading nested_records.avro to verify the schema exposed
by the reader.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants