Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify interactions with arrow flight APIs #377

Merged
merged 16 commits into from
Jul 5, 2021

Conversation

garyanaplan
Copy link
Contributor

@garyanaplan garyanaplan commented May 28, 2021

Initial work to implement some basic traits and illustrate the direction of travel

Which issue does this PR close?

closes #376

Rationale for this change

Illustrating direction of travel. Open Questions:

  • Should I put this into a separate module or is it ok in utils.rs?
  • I've moved the trait stuff into src/lib.rs and hidden the generated code into a gen module.
  • Would it be better to replace Vec with NewTypes (such as IpcMessage that I added to the src/utils.rs)
  • Should I work on hiding some of the raw generated code?
  • I've hidden the generated code and re-exported the documented bits that I think are required.

What changes are included in this PR?

Illustrates what kind of work I'm doing before I go too far in the wrong direction

Are there any user-facing changes?

Deprecation warnings...

Initial work to implement some basic traits
Some more polishing of the basic code I provided last week.
Add support for representing tickets as base64 encoded strings.

Also: more polishing of Display, etc...
When writing BOOLEAN data, writing more than 2048 rows of data will
overflow the hard-coded 256 buffer set for the bit-writer in the
PlainEncoder. Once this occurs, further attempts to write to the encoder
fail, becuase capacity is exceeded, but the errors are silently ignored.

This fix improves the error detection and reporting at the point of
encoding and modifies the logic for bit_writing (BOOLEANS). The
bit_writer is initially allocated 256 bytes (as at present), then each
time the capacity is exceeded the capacity is incremented by another
256 bytes.

This certainly resolves the current problem, but it's not exactly a
great fix because the capacity of the bit_writer could now grow
substantially.

Other data types seem to have a more sophisticated mechanism for writing
data which doesn't involve growing or having a fixed size buffer. It
would be desirable to make the BOOLEAN type use this same mechanism if
possible, but that level of change is more intrusive and probably
requires greater knowledge of the implementation than I possess.

resolves: apache#349
Tacky, but I can't think of better way to do this without
specialization.
Remove the byte tracking from the PlainEncoder and use the existing
bytes_written() method in BitWriter.

This is neater.
The test ensures that we can write > 2048 rows to a parquet file and
that when we read the data back, it finishes without hanging (defined as
taking < 5 seconds).

If we don't want that extra complexity, we could remove the
thread/channel stuff and just try to read the file and let the test
runner terminate hanging tests.
The values.len() reports the number of values to be encoded and so must
be divided by 8 (bits in a bytes) to determine the effect on the byte
capacity of the bit_writer.
Following merge with master, make sure this is exposed so that
integration tests work.

also: there has been a release since I last looked at this so update the
deprecation warnings.
@garyanaplan garyanaplan marked this pull request as ready for review June 17, 2021 07:51
clippy complains about using deprecated functions, so replace them with
the new trait support.

also: fix the trait documentation
@codecov-commenter
Copy link

Codecov Report

Merging #377 (f2568e0) into master (9f56afb) will decrease coverage by 0.18%.
The diff coverage is 16.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #377      +/-   ##
==========================================
- Coverage   82.65%   82.47%   -0.19%     
==========================================
  Files         165      166       +1     
  Lines       45556    45691     +135     
==========================================
+ Hits        37655    37683      +28     
- Misses       7901     8008     +107     
Impacted Files Coverage Δ
arrow-flight/src/utils.rs 0.00% <0.00%> (ø)
...ng/src/flight_client_scenarios/integration_test.rs 0.00% <0.00%> (ø)
...ng/src/flight_server_scenarios/integration_test.rs 0.00% <0.00%> (ø)
arrow-flight/src/lib.rs 17.83% <17.83%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f56afb...f2568e0. Read the comment docs.

@nevi-me nevi-me self-requested a review June 17, 2021 10:21
Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments @garyanaplan

// TODO: add more explicit conversion that exposes flight descriptor and metadata options
/// Convert a `Schema` to `SchemaResult` by converting to an IPC message
#[deprecated(
since = "4.3.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be 4.4.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


fn try_from(value: i32) -> ArrowResult<Self> {
match value {
0 => Ok(DescriptorType::Unknown),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DescriptorType should have a method for this IIRC. Prost allows creating enums from numbers, so we might not need to implement this ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

 - update deprecated warnings
 - improve TryFrom for DescriptorType
@github-actions github-actions bot added the arrow-flight Changes to the arrow-flight crate label Jun 23, 2021
@garyanaplan garyanaplan requested a review from nevi-me June 23, 2021 16:52
@nevi-me nevi-me added the api-change Changes to the arrow API label Jul 5, 2021
@nevi-me
Copy link
Contributor

nevi-me commented Jul 5, 2021

@alamb I've marked this as an API change because the import paths for the proto module have changed.

@nevi-me nevi-me merged commit 21d69ca into apache:master Jul 5, 2021
@garyanaplan garyanaplan deleted the simpler-apis branch July 5, 2021 08:45
@alamb alamb added enhancement Any new improvement worthy of a entry in the changelog and removed enhancement Any new improvement worthy of a entry in the changelog labels Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify interactions with arrow flight APIs
4 participants