Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ipc): add support for deserializing messages with nested dictionary fields #923

Merged
merged 3 commits into from
Nov 8, 2021

Conversation

helgikrs
Copy link
Contributor

@helgikrs helgikrs commented Nov 8, 2021

Which issue does this PR close?

Closes #846

What changes are included in this PR?

This change recursively walks the entire schema to pick up all dictionary fields (nested) within other fields. This allows the IPC deserialization to match dictionaries with fields, even if they are not found at the top-level of the schema.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 8, 2021
@helgikrs helgikrs changed the title feat(ipc): read a message containing nested dictionary fields feat(ipc): add support for deserializing messages with nested dictionary fields Nov 8, 2021
@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2021

Codecov Report

Merging #923 (c04f543) into master (62934e9) will increase coverage by 0.00%.
The diff coverage is 87.93%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #923   +/-   ##
=======================================
  Coverage   82.29%   82.30%           
=======================================
  Files         168      168           
  Lines       48028    48083   +55     
=======================================
+ Hits        39527    39577   +50     
- Misses       8501     8506    +5     
Impacted Files Coverage Δ
arrow/src/datatypes/field.rs 53.93% <80.55%> (+3.25%) ⬆️
arrow/src/datatypes/schema.rs 66.93% <100.00%> (+0.54%) ⬆️
arrow/src/ipc/reader.rs 85.62% <100.00%> (+0.49%) ⬆️
arrow/src/array/transform/mod.rs 85.33% <0.00%> (-0.14%) ⬇️
parquet/src/encodings/encoding.rs 93.71% <0.00%> (+0.19%) ⬆️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.22%) ⬆️
arrow/src/datatypes/datatype.rs 65.36% <0.00%> (+0.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62934e9...c04f543. Read the comment docs.

@helgikrs helgikrs force-pushed the ipc-read-nested-dict branch 2 times, most recently from 444e203 to 2f564eb Compare November 8, 2021 01:58
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @helgikrs -- this looks very nice. 👍

I left some stylistic comments, but I think this PR is also good to merge as is.

Please let me know what you would like to do

arrow/src/datatypes/field.rs Outdated Show resolved Hide resolved
arrow/src/datatypes/field.rs Outdated Show resolved Hide resolved
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@helgikrs
Copy link
Contributor Author

helgikrs commented Nov 8, 2021

Thank you @helgikrs -- this looks very nice. +1

I left some stylistic comments, but I think this PR is also good to merge as is.

Please let me know what you would like to do

Thanks for the suggestions, applied the both.

@alamb
Copy link
Contributor

alamb commented Nov 8, 2021

Thanks @helgikrs !

@alamb alamb merged commit e20d3fa into apache:master Nov 8, 2021
alamb added a commit that referenced this pull request Nov 9, 2021
…ary fields (#923)

* feat(ipc): read a message containing nested dictionary fields

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* address lints

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
alamb added a commit that referenced this pull request Nov 9, 2021
…ary fields (#923) (#931)

* feat(ipc): read a message containing nested dictionary fields

* Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* address lints

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgi@lacework.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support nested dictionaries in ipc serialization & deserialization
3 participants