Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test upgrade to arrow 7.x with new prost/tonic #1283

Closed
wants to merge 3 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 11, 2021

Rationale for this change

Test what would happen if we upgraded prost and tonic dependencies in the arrow-rs 6.x line (aka what would happen if we merged apache/arrow-rs#945)?

What changes are included in this PR?

temporarily override arrow to use apache/arrow-rs#945

Are there any user-facing changes?

No

@alamb
Copy link
Contributor Author

alamb commented Nov 11, 2021

When upgrading arrow only, the build fails like this

   Compiling datafusion v5.1.0 (/Users/alamb/Software/arrow-datafusion/datafusion)
error[E0308]: mismatched types
  --> ballista/rust/core/src/client.rs:75:19
   |
75 |         Ok(Self { flight_client })
   |                   ^^^^^^^^^^^^^ expected struct `Channel`, found struct `tonic::transport::channel::Channel`
   |
   = note: expected struct `FlightServiceClient<Channel>`
              found struct `FlightServiceClient<tonic::transport::channel::Channel>`
   = note: perhaps two different versions of crate `tonic` are being used?

error[E0599]: the method `do_get` exists for struct `FlightServiceClient<Channel>`, but its trait bounds were not satisfied
   --> ballista/rust/core/src/client.rs:112:14
    |
112 |             .do_get(request)
    |              ^^^^^^ method cannot be called on `FlightServiceClient<Channel>` due to unsatisfied trait bounds
    |
   ::: /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/tonic-0.5.2/src/transport/channel/mod.rs:68:1
    |
68  | pub struct Channel {
    | ------------------
    | |
    | doesn't satisfy `<_ as Service<tonic::codegen::http::Request<http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::status::Status>>>>::Response = tonic::codegen::http::Response<_>`
    | doesn't satisfy `_: Service<tonic::codegen::http::Request<http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::status::Status>>>`
    | doesn't satisfy `_: tonic::client::service::GrpcService<http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::status::Status>>`
    |
    = note: the following trait bounds were not satisfied:
            `Channel: tonic::client::service::GrpcService<http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::status::Status>>`
            `<Channel as Service<tonic::codegen::http::Request<http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::status::Status>>>>::Response = tonic::codegen::http::Response<_>`
            `Channel: Service<tonic::codegen::http::Request<http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::status::Status>>>`

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Nov 11, 2021
@alamb
Copy link
Contributor Author

alamb commented Nov 11, 2021

When I also update prost and tonic in the Cargo.toml` in the ballista projects, now it fails like this:


cd /Users/alamb/Software/arrow-datafusion && CARGO_TARGET_DIR=/Volumes/RAMDisk/df-target cargo test --all
   Compiling ballista-core v0.6.0 (/Users/alamb/Software/arrow-datafusion/ballista/rust/core)
error[E0277]: `(dyn tonic::codec::Decoder<Item = FlightData, Error = tonic::Status> + std::marker::Send + 'static)` cannot be shared between threads safely
   --> ballista/rust/core/src/client.rs:128:20
    |
128 |                 Ok(Box::pin(FlightDataStream::new(stream, schema)))
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `(dyn tonic::codec::Decoder<Item = FlightData, Error = tonic::Status> + std::marker::Send + 'static)` cannot be shared between threads safely
    |
    = help: the trait `Sync` is not implemented for `(dyn tonic::codec::Decoder<Item = FlightData, Error = tonic::Status> + std::marker::Send + 'static)`
    = note: required because of the requirements on the impl of `Sync` for `std::ptr::Unique<(dyn tonic::codec::Decoder<Item = FlightData, Error = tonic::Status> + std::marker::Send + 'static)>`
    = note: required because it appears within the type `Box<(dyn tonic::codec::Decoder<Item = FlightData, Error = tonic::Status> + std::marker::Send + 'static)>`
    = note: required because it appears within the type `Streaming<FlightData>`
note: required because it appears within the type `FlightDataStream`
   --> ballista/rust/core/src/client.rs:137:8
    |
137 | struct FlightDataStream {
    |        ^^^^^^^^^^^^^^^^
    = note: required for the cast to the object type `dyn RecordBatchStream<Item = std::result::Result<datafusion::arrow::record_batch::RecordBatch, datafusion::arrow::error::ArrowError>> + Sync + std::marker::Send`

error[E0277]: `(dyn tonic::codegen::Body<Error = tonic::Status, Data = prost::bytes::Bytes> + std::marker::Send + 'static)` cannot be shared between threads safely
   --> ballista/rust/core/src/client.rs:128:20
    |
128 |                 Ok(Box::pin(FlightDataStream::new(stream, schema)))
    |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `(dyn tonic::codegen::Body<Error = tonic::Status, Data = prost::bytes::Bytes> + std::marker::Send + 'static)` cannot be shared between threads safely
    |
    = help: the trait `Sync` is not implemented for `(dyn tonic::codegen::Body<Error = tonic::Status, Data = prost::bytes::Bytes> + std::marker::Send + 'static)`
    = note: required because of the requirements on the impl of `Sync` for `std::ptr::Unique<(dyn tonic::codegen::Body<Error = tonic::Status, Data = prost::bytes::Bytes> + std::marker::Send + 'static)>`
    = note: required because it appears within the type `Box<(dyn tonic::codegen::Body<Error = tonic::Status, Data = prost::bytes::Bytes> + std::marker::Send + 'static)>`
    = note: required because it appears within the type `Pin<Box<(dyn tonic::codegen::Body<Error = tonic::Status, Data = prost::bytes::Bytes> + std::marker::Send + 'static)>>`
    = note: required because it appears within the type `http_body::combinators::box_body::UnsyncBoxBody<prost::bytes::Bytes, tonic::Status>`
    = note: required because it appears within the type `Streaming<FlightData>`
note: required because it appears within the type `FlightDataStream`
   --> ballista/rust/core/src/client.rs:137:8
    |
137 | struct FlightDataStream {
    |        ^^^^^^^^^^^^^^^^
    = note: required for the cast to the object type `dyn RecordBatchStream<Item = std::result::Result<datafusion::arrow::record_batch::RecordBatch, datafusion::arrow::error::ArrowError>> + Sync + std::marker::Send`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `ballista-core` due to 2 previous errors
warning: build failed, waiting for other jobs to finish...
error: build failed

(namely because FlightData is no longer marked Sync for some reason)

@tustvold any ideas?

"Fixed" (worked around) in f2cfa38

@@ -135,24 +135,28 @@ impl BallistaClient {
}

struct FlightDataStream {
stream: Streaming<FlightData>,
stream: Mutex<Streaming<FlightData>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed because after tonic/prost upgrade FlightData is no longer Sync. There are perhaps better ways to handle this

@alamb alamb marked this pull request as draft November 11, 2021 12:21
@houqp
Copy link
Member

houqp commented Nov 12, 2021

@alamb let us know if you need help with the ballista change.

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2021

@alamb let us know if you need help with the ballista change.

Thanks @houqp -- I think this PR is likely to be part of the "upgrade datafusion to use arrow 7.0.0" when that is released

If anyone has any good ideas about how to avoid the use of Mutex in FlightDataStream that would be awesome

@alamb alamb changed the title Test upgrade to arrow 6.x with new prost/tonic Test upgrade to arrow 7.x with new prost/tonic Nov 15, 2021
@yjshen
Copy link
Member

yjshen commented Nov 29, 2021

Facing with the same issue while updating #68 with arrow2 version 0.8 and arrow-format 0.3, since the same breaking change was introduced in DataEngineeringLabs/arrow-format#2.

Thanks @alamb I use your mutex way here as a workaround, please let me know if you have further great solutions.😄

@tustvold
Copy link
Contributor

tustvold commented Nov 29, 2021

I did some poking around a while back and was going to write up an issue, but got pulled away to other things. The problem is that SendableRecordBatchStream has a Sync constraint on it. I think this is just a mistake in DataFusion, I can't see a compelling reason that being able to share immutable references to streams across threads is important, especially when stream consumption requires &mut.

This would, however, be a fairly annoying breaking change to DataFusion and I wanted to provide more context before filing a ticket, but ran out of time

@alamb alamb mentioned this pull request Jan 5, 2022
@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2022

Included in #1523

@alamb alamb closed this Jan 5, 2022
@alamb alamb deleted the alamb/test_updated_deps branch August 8, 2023 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants