-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Format] Simplify Execute and Query interface #61
Comments
I think it may still have sense to have a generic Execute to ease compatibility with APIs that do not differentiate between the types of queries (and note JDBC has all three!), but having an execute-with-rows-affected and execute-with-result-set is reasonable. Concurrency is a separate discussion; I'm torn on whether we should push the complexity into the driver (and declare up front that, for example, everything must be thread-safe), or declare that everything is not thread-safe and leave clients to deal with it (so Go would have to lock the statement to use it), or (like DBAPI in Python) provide a way for drivers to indicate what they support (this is probably the worst of both worlds though). |
Also, possibly the driver manager could define execute-with-result-set and execute-with-rows-affected in terms of the generic execute + generic getters to retrieve the affected rows/last row ID. So a basic driver (e.g. SQLite, which doesn't differentiate) would be straightforward to implement still but clients can be mostly none the wiser. It does increase the actual API surface to have all those permutations though. |
Another reason to differentiate between queries with/without result sets: in a Postgres driver, that means we know when we can attempt to use |
Just to be sure we're on the same page:
In R DBI we're using the term "query" to indicate something that returns a result set (with or without parameters), and a "statement" doesn't return a result set but we can query the number of rows affected. The more information we can share with the driver up front, the better. Do we really need two methods, though -- would a single What should happen to queries that indicate that no result set is expected but where a result set is available? Is a warning useful or annoying? |
That sounds reasonable. (
A single method sounds reasonable. The Query(struct AdbcConnection*, const char*, struct ArrowArrayStream*, size_t*, struct AdbcError*) If the query returns a result set, the ArrowArrayStream contains the result set and the size_t is the number of rows (if known, else 0). If not, the ArrowArrayStream is not set (or: contains last inserted IDs, if supported and configured) and the size_t contains rows affected.
I think we can just ignore the result set in that case. This also makes it easy to support "Retrieve the last inserted id for inserts into an auto-increment table" that @zeroshade mentioned in #55: the caller can set an option to return the inserted IDs, and they'll come back via the result set. |
Last inserted IDs (or the results of computed columns in general, for that matter) can be obtained with the R DBI has |
Right, and on the other hand, databases like SQLite have no reliable way to get the info. But APIs like JDBC, Python DBAPI, and Go's database API expose standard ways to get last inserted ID(s). I wasn't originally concerned with it since it feels like the 'wrong' use case to worry about so I'm fine ignoring it. I think what's being discussed here is basically to have the same things as DBI, and the |
Ok, so how does this sound (here, I'm ignoring #59 provide a "just query" method): Remove |
That seems reasonable to me @lidavidm |
Ok, while digging I realized we also need to account for partitioning (=Flight/Flight SQL's ability to return multiple endpoints in a GetFlightInfo). So what I'm tentatively about to refactor towards is this: /// \brief Execute a statement and get the results.
///
/// This invalidates any prior result sets.
///
/// \param[in] statement The statement to execute.
/// \param[in] out_type The expected result type:
/// - ADBC_OUTPUT_TYPE_NONE if the query should not generate a
/// result set;
/// - ADBC_OUTPUT_TYPE_ARROW for an ArrowArrayStream;
/// - ADBC_OUTPUT_TYPE_PARTITIONS for a count of partitions (see \ref
/// adbc-statement-partition below).
/// The result set will be in out.
/// \param[out] out The results. Must be NULL for output type NONE, a
/// pointer to an ArrowArrayStream for ARROW_ARRAY_STREAM, or a
/// pointer to a size_t for PARTITIONS.
/// \param[out] rows_affected The number of rows affected if known,
/// else -1. Pass NULL if the client does not want this information.
/// \param[out] error An optional location to return an error
/// message if necessary.
ADBC_EXPORT
AdbcStatusCode AdbcStatementExecute(struct AdbcStatement* statement, int output_type,
void* out, int64_t* rows_affected,
struct AdbcError* error);
/// \brief No results are expected from AdbcStatementExecute. Pass
/// NULL to out.
#define ADBC_OUTPUT_TYPE_NONE 0
/// \brief Arrow data is expected from AdbcStatementExecute. Pass
/// ArrowArrayStream* to out.
#define ADBC_OUTPUT_TYPE_ARROW 1
/// \brief Partitions are expected from AdbcStatementExecute. Pass
/// size_t* to out to get the number of partitions, and use
/// AdbcStatementGetPartitionDesc to get a partition.
///
/// Drivers are not required to support partitioning. In that case,
/// AdbcStatementExecute will return ADBC_STATUS_NOT_IMPLEMENTED.
#define ADBC_OUTPUT_TYPE_PARTITIONS 2 If we decide to add native support for "affected row IDs", that could also go here. Otherwise I don't expect that we'd need more enum variants here. |
@zeroshade just to follow up, does the linked PR look reasonable (at least interface-wise)? |
For the record, SQLite now also has How is the caller supposed to know the size of the I suspect the type of |
|
It seems like this would be cleaner as multiple functions (as Kirill noted) by output type? Maybe: ADBC_EXPORT AdbcStatusCode AdbcStatementExecuteVoid(struct AdbcStatement* statement, struct AdbcError* error);
ADBC_EXPORT AdbcStatusCode AdbcStatementExecute(struct AdbcStatement* statement, struct ArrowArrayStream* stream, struct AdbcError* error);
ADBC_EXPORT AdbcStatusCode AdbcStatementExecuteParitioned(struct AdbcStatement* statement, void* something, struct AdbcError* error); |
Alright, I'll split into three functions then. |
Updated the PR. |
* [Format][C][Java][Python] Simplify execute/query interface Fixes #61. * Update vendored nanoarrow * [C] Split Execute
Rather than the separate
Execute
/GetStream
functions, it might be better to follow something similar to FlightSQL's interface or Go'sdatabase/sql
API.Have two functions:
Execute(struct AdbcConnection*, const char*, struct AdbcResult*, struct AdbcError*)
whereAdbcResult
would contain an optional LastInsertedID and Number of Rows affectedQuery(struct AdbcConnection*, const char*, struct ArrowArrayStream*, struct AdbcError*)
where theArrowArrayStream
is populated with the result set.Corresponding methods would exist for a Statement just without the need for taking the
const char*
as it would already be prepared in the statement.Some benefits of this idea:
The text was updated successfully, but these errors were encountered: