Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust/drivermgr): start rust driver_manager #416

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wjones127
Copy link
Member

No description provided.

@wjones127 wjones127 force-pushed the rust-driver-manager branch 5 times, most recently from 0dc016f to 6f6a5f9 Compare February 4, 2023 22:46
@wjones127 wjones127 force-pushed the rust-driver-manager branch 18 times, most recently from c6e3c8e to 1492bc2 Compare February 8, 2023 05:58
@wjones127
Copy link
Member Author

wjones127 commented Feb 8, 2023

I'm still figuring out the Windows CI, but otherwise this is ready for review. I recommend reading the files in the following order:

  • rust/src/lib.rs
  • rust/src/error.rs
  • rust/src/interface.rs
  • rust/src/ffi.rs
  • rust/src/driver_manager.rs
  • rust/tests/test_driver_manager.rs

@wjones127 wjones127 force-pushed the rust-driver-manager branch 6 times, most recently from a50c6f5 to 157d304 Compare February 9, 2023 05:31
@wjones127
Copy link
Member Author

Rust CI is successful on my branch: https://github.com/wjones127/arrow-adbc/actions/runs/4131461920
🚀

@wjones127
Copy link
Member Author

I am working on a follow up that adds tests with Miri (#446). As part of that I've found there are some memory leaks and use-after-free errors in this code. Once I'm done with #446 I can backport them into this PR.

Comment on lines +37 to +43
//! [AdbcConnection] should not be used across multiple threads. Driver
//! implementations do not guarantee connection APIs are safe to call from
//! multiple threads, unless calls are carefully sequenced. So instead of using
//! the same connection across multiple threads, create a connection for each
//! thread. [AdbcConnectionBuilder] is [core::marker::Send], so it can be moved
//! to a new thread before initialized into an [AdbcConnection]. [AdbcConnection]
//! holds it's inner data in a [std::rc::Rc], so it is also cheaply copyable.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking just last night that we might want to implement async execute methods, which would involve invoking with spawn_blocking(). That might mean we need to make AdbcConnection Send, but not 100% sure yet. Maybe something for a follow up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wanted to implement async interfaces in general. That'll require some thought between Flight RPC, and Rust/C++/Go. (I want to make the C-level ABI in a way that all three languages can expose their native concurrency models efficiently, and import others' models efficiently. But that means I need to really go study up on how those models work.) I'd appreciate any thoughts you and @zeroshade have.

e.g.: gRPC C++ uses a callback-based model. UCX (in C) uses a polling-based model. There's precedent like kqueue/epoll, and of course the new io_uring (don't know if that's relevant).

There's also Python's model to consider (as a consumer), and probably we'd want to anticipate things like OpenTelemetry (or generally: non-stack-frame, non-thread based 'task' context). And eventually, whenever I get around to JNI bindings, Java's model as well (which, with virtual threading, is also getting more complicated/interesting).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and consideration of whether we need an async version of C Data Interface...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the Rust model is described well here: https://rust-lang.github.io/async-book/02_execution/02_future.html.

But for now I was simply thinking that Rust driver manager could wrap the blocking calls into C drivers with tokio::spawn_blocking(), which sends the call to another thread to execute. This is what Rust async libraries do for filesystem interaction on systems that don't provide a native async interface. And then native Rust drivers could always be async. But maybe there is value in having both async and blocking versions of execute() in the Rust API. Something I'll try to get feedback from the Rust community on once I have a proposal for the ADBC Rust API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this in the R package too. In the Arrow R package we always execute "cancellable arrow stuff" on another thread (via an Executor) and wait for that future to complete. That relies on Arrow's async API and cancellation signal handlers, but it would be a huge asset for the driver manager to handle this (nothing worse than accidentally executing a huge query without the ability to cancel).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. If I can find some spare time...

I think async interfaces in general throughout our ecosystem are going to be more important as they get adopted more by developers (C++ is, well, coming along, but Go/mostly Rust/sorta Python have generally mature concurrency stories nowadays)

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, thank you!

I had some questions, but my Rust is rather out of date so hopefully it makes sense

run: cargo fmt -- --check
- name: Clippy
run: cargo clippy --tests
- name: Build Driver SQLite (Windows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may or may not want to integrate with the main pipeline, which will build all these artifacts once so you can just download them from other jobs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it seems clear enough while separate. But if we rely on any more artifacts from the other build, then I think we should merge.


[package]
name = "arrow-adbc"
version = "0.1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: version needs to be bumped (also, we'll have to add this to the release scripts)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll need to look at what needs to be done for release.

arrow = { version = "32.0.0", features = ["ffi"], default_features = false}
libloading = "0.7"

# TODO: support arrow2 with a non-default feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the upcoming merger - should we target arrow2 directly if that is going to be the primary implementation?

///
/// Can be used in combination with [check_err] when implementing ADBC FFI
/// functions.
pub trait AdbcError {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Instead of requiring boxing everywhere, what about having a concrete error type and just converting to/from the FFI version as necessary?

Also in general, I think it'd be better to compartmentalize the FFI bits the same way Go does, instead of assuming FFI as a fundamental part of the implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this now. When I was thinking more FFI-centric this made a bit more sense. I have refactored to use a concrete struct in #478.

/// Databases hold state shared by multiple connections. This typically means
/// configuration and caches. For in-memory databases, it provides a place to
/// hold ownership of the in-memory database.
pub trait DatabaseApi {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it typical to suffix traits with 'Api'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mostly to differentiate between the AdbcDatabase in the driver_manager module and this trait. But given we want to be less FFI-centric, I renamed the driver_manager AdbcDatabase -> DriverDatabase and renamed this one AdbcDatabase.

@wjones127 wjones127 marked this pull request as draft February 26, 2023 23:11
@wjones127
Copy link
Member Author

@lidavidm I think I'll halt this PR in favor of #478. I'd like to get feedback from the Rust community on the Rust API first, then go ahead with the driver manager and C interface implementation modules.

@lidavidm
Copy link
Member

Sounds good to me. Thanks Will!

@lidavidm
Copy link
Member

lidavidm commented Jul 1, 2024

I think this has been superseded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants