Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] JDBC Flight SQL driver should follow Flight RPC spec #34532

Closed
lidavidm opened this issue Mar 10, 2023 · 1 comment · Fixed by #38521
Closed

[Java] JDBC Flight SQL driver should follow Flight RPC spec #34532

lidavidm opened this issue Mar 10, 2023 · 1 comment · Fixed by #38521

Comments

@lidavidm
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

The JDBC Flight SQL driver assumes that the endpoints are all the same as the main server connection, but this is wrong. It should connect to individual endpoints specified by GetFlightInfo.

#13900 tried to fix it but the original contributor isn't interested - this would be a starting point for future contributors

Component(s)

FlightRPC, Java

@jduo
Copy link
Member

jduo commented Oct 27, 2023

take

jduo added a commit to Bit-Quill/arrow that referenced this issue Oct 30, 2023
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Change connecting to each endpoint to happen asynchronously
instead of iteratively.
- Change stream clean-up to be done as soon as a stream is
finished instead of at the end of the result set.
jduo added a commit to Bit-Quill/arrow that referenced this issue Oct 31, 2023
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Rework ArrowFlightSqlHandler.Builder to avoid changing its state
during build() so that it can be re-used.
- Add a copy constructor to ArrowFlightSqlHandler.Builder to make it
easy to replicate connection settings when connecting to different
locations.
jduo added a commit to Bit-Quill/arrow that referenced this issue Nov 3, 2023
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Rework ArrowFlightSqlHandler.Builder to avoid changing its state
during build() so that it can be re-used.
- Add a copy constructor to ArrowFlightSqlHandler.Builder to make it
easy to replicate connection settings when connecting to different
locations.
- Add unit test infrastructure for building a distributed FlightSql
server
jduo added a commit to Bit-Quill/arrow that referenced this issue Nov 3, 2023
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Rework ArrowFlightSqlHandler.Builder to avoid changing its state
during build() so that it can be re-used.
- Add a copy constructor to ArrowFlightSqlHandler.Builder to make it
easy to replicate connection settings when connecting to different
locations.
- Add unit test infrastructure for building a distributed FlightSql
server
lidavidm pushed a commit that referenced this issue Nov 3, 2023
…8521)

### Rationale for this change
The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.

### What changes are included in this PR?
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent.
- Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo().
- Add utility classes in unit tests for constructing a distributed Flight SQL Server

### Are these changes tested?
Yes.

### Are there any user-facing changes?
The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).

* Closes: #34532

Authored-by: James Duong <james.duong@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 15.0.0 milestone Nov 3, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ts (apache#38521)

### Rationale for this change
The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.

### What changes are included in this PR?
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent.
- Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo().
- Add utility classes in unit tests for constructing a distributed Flight SQL Server

### Are these changes tested?
Yes.

### Are there any user-facing changes?
The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).

* Closes: apache#34532

Authored-by: James Duong <james.duong@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ts (apache#38521)

### Rationale for this change
The Flight SQL JDBC Driver currently doesn't fetch at multiple endpoints correctly when the data is not at the same location as the original connection.

### What changes are included in this PR?
- Create new clients to connect to new locations in endpoints.
- If no location is reported using the current connection.
- Make ArrowFlightSqlClientHandler's builder's build() function to be idempodent.
- Add functionality to clone ArrowFlightSqClientHandler's builder so that it can be used for temporary connections to locations returned by getFlightInfo().
- Add utility classes in unit tests for constructing a distributed Flight SQL Server

### Are these changes tested?
Yes.

### Are there any user-facing changes?
The behavior for when there are reported endpoints from getFlightInfo is now fixed. However if users relied on the previous behavior of just getting streams from the same node, and their server only ever reported the original node, they may observe more Flight client connections opening and closing than before (since new connections get spawned for each partition that has at least one Location now).

* Closes: apache#34532

Authored-by: James Duong <james.duong@improving.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants