-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10542: [C#][Flight] Add beginning on flight code for net core #8694
Conversation
Also reduced version of testweb to 3.0
This is required since gRPC requires a minimum of 3.0
This should solve the sourcelink issue. Also removed Flight.Client namespace, to just Flight.
This makes it possible to keep netstandard1.3
Also added license header in assembly info
Also made ToProtocol methods internal.
Added EmbedUntrackedSources as a test to fix the build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @Ulimo! Thanks so much for the contribution. This will greatly help the .NET Arrow ecosystem.
Here's some initial feedback on the structure. I will review the code deeper tomorrow as I get more time.
csharp/src/Apache.Arrow.Flight/Reader/RecordBatcReaderImplementation.cs
Outdated
Show resolved
Hide resolved
csharp/test/Apache.Arrow.Flight.TestWeb/Apache.Arrow.Flight.TestWeb.csproj
Outdated
Show resolved
Hide resolved
csharp/test/Apache.Arrow.Flight.TestWeb/Apache.Arrow.Flight.TestWeb.csproj
Outdated
Show resolved
Hide resolved
csharp/test/Apache.Arrow.Flight.TestWeb/Apache.Arrow.Flight.TestWeb.csproj
Outdated
Show resolved
Hide resolved
Co-authored-by: James Newton-King <james@newtonking.com>
This was required to remove dependency from Grpc package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic work, @Ulimo! I'm really excited to get this in, I've wanted a .NET Arrow Flight library for a while, but haven't had the chance to work on it.
I left a few suggestions I found while playing with the code. One last suggestion I'll make is to go through each class and make sure we only make public
what actually needs to be public. Once we ship an API it is hard to change/break it. So erroring on the side of hiding something until it is absolutely needed to be public is a good approach.
Client code is under client namespace. Server code is under server namespace. Added support for application metadata. Removed exposure of properties not required for the user.
@eerhardt first of all thank you so much for your support and feedback and sorry for this wall of text. Based on your last feedback that the API is hard to change etc in the future, I really took a long sitting to go through all the classes etc, and found some issues that I really needed to address. This resulted in a small overhaul on the structure etc. I also went through so the gRPC schema information is actually exposed to the user as well. This also resulted in some rewrites. The feedback you have given has been exactly what I needed to come to the conclusions etc on structure, so I am once again extremely thankful for all the feedback you have given. This ofcourse increased the scope of the PR a bit again, but I hope it helps give a more real look and feel on how the framework will behave, and if the implemented design actually works. Here comes information on the work that has been done, and also some checklists to see that all the required information is exposed etc, and why some classes are public. Major changes
Minor changes
Test coverage is now above 80% on the non generated code. Application metadata additionTo enable users to write and read application metadata, the class FlightRecordBatchStreamingCall was added that Application metadata can be read with each record batch, it is implemented as a list since hypothetically metadata can Schema -> record batch -> record batch -> ... -> done So the first message could contains metadata as well. When a user gets a new record batch the metadata is cleared similar For writing, this is done by an additional method in FlightRecordBatchStreamWriter which looks like this:
gRPC supportHere is a check on what gRPC methods and properties are exposed and can be read/used by a user of the framework, mostly to check that everything that is required to be exposed are exposed at this time. gRPC methods exposed or not exposed to the user
gRPC properties exposed or not exposed to the userHandshakeRequestHandshake is not exposed at all
HandshakeResponseHandshake is not exposed at all
BasicAuthBasic auth is not exposed at all
ActionType
Criteria
Action
Result
SchemaResultSchema result is not mapped to its own type, but returns a schema directly
FlightDescriptor
FlightInfo
FlightEndpoint
Location
Ticket
FlightDataFlight data is not exposed as a class, but the data is exposed in getStream, startPut for client, and DoGet and DoPut for server.
PutResult
Classes public or internalHere is a list of classes and if they are public or internal, and motivation on why they are public.
|
@eerhardt to be honest after I went through and really thought regarding the calls, I had one final thing I was not too happy with. The API in itself feels quite easy to use, but I think I overspecified it for RecordBatch, and there will be trouble if we add dictionary support etc. The solution I can think of would reduce on ease of use thought (I think?), which would be:
FlightData structure:
This stream solution could also have a FlightDataVisitor for the different types also, and I think that it allows If not using a visitor, the loop would not be as "pretty" but still functional with easier additions of new types (atleast in my opinion):
Would love to get your input, so the flight framework starts with a good foundation. EDIT: Doing put operations would be a bit wierder with this though, it would require the user to have knowledge to send the schema as the first message. I think the putStream should still be similar to the existing solution even with this, but with extra write commands such as Write(ArrowDictionary) etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I dug a bit deeper into the code this time. I think this is close to merging. Just a few more comments.
csharp/src/Apache.Arrow.Flight/Internal/FlightRecordBatchStreamWriter.cs
Outdated
Show resolved
Hide resolved
csharp/src/Apache.Arrow.Flight/Client/FlightClientRecordBatchStreamWriter.cs
Outdated
Show resolved
Hide resolved
csharp/src/Apache.Arrow.Flight/Client/FlightRecordBatchDuplexStreamingCall.cs
Outdated
Show resolved
Hide resolved
|
||
// | ||
// Summary: | ||
// Gets the call status if the call has already finished. Throws InvalidOperationException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do callers know when the call has finished? Do they await
on the ResponseHeadersAsync
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller ends the stream, so they call CompleteAsync on the request stream.
csharp/src/Apache.Arrow.Flight/Internal/FlightRecordBatchStreamReader.cs
Outdated
Show resolved
Hide resolved
csharp/src/Apache.Arrow.Flight/Internal/FlightRecordBatchStreamReader.cs
Outdated
Show resolved
Hide resolved
csharp/src/Apache.Arrow.Flight/Internal/FlightMessageSerializer.cs
Outdated
Show resolved
Hide resolved
Also fixed PR comments.
I think this is OK, for now. This is just the initial impementation. The Flight API won't necessarily be "stable" in its first release - even in the C++/Python APIs it isn't stable yet, as far as I understand. There are probably many more places that will need to change to support dictionaries in the future. |
Do you think you can fill out this table with the current support? https://github.com/apache/arrow/blob/master/docs/source/status.rst#flight-rpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great @Ulimo! Thanks for all the awesome work here.
I'll merge this today unless I hear other feedback.
@eerhardt I just have one question, getting this out as a preview nuget package soonish, is that possible? |
I believe the next round of releases is scheduled for sometime in January. (See this recent dev-list thread) Potential options for private feeds are Azure DevOps or myget.org. |
FYI - @Ulimo, the https://www.nuget.org/packages/Apache.Arrow.Flight/ |
Upgraded build version to 3.1 and C# version to 8. Also added EmbedUntrackedSources so auto generated code will work with sourcelink. This change is a requirement for apache#8694 and is isolating some of the changes that it required regarding upgrading net core version. Closes apache#8702 from Ulimo/ARROW-10634_ci_build Authored-by: Östman Alexander <alexander.ostman@sweco.se> Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
I closed the previous PR and opened this one, since the original one did not use InternalsVisibleTo and required a version bump on the Apache.Arrow project from netstandard1.3 to netstandard1.5. This adds basic support for both a flight server and a flight client for Net Core. Sorry for the massive PR, but I wanted to have some basic functionality in place with atleast some tests that use the put and get flow before creating a PR. I hope that this PR can help create some discussion on how the interfaces should look etc, and if this looks like an accetable interface for flight in net core. It tries to mimic as much as possible the original gRPC net core interface, but mapping the classes from the network protocol to the C# classes. This implementation uses InternalsVisibleTo, to hinder a bump of .netstandard version from 1.3 to 1.5. So the Apache.Arrow project still uses netstandard1.3. All flight code is in a seperate project Apache.Arrow.Flight This also required changing build version from 2.2 to 3.0 The code does not include: * Handshake - the reason is that AspNetCore contains features already for authentication/authorization for gRPC. Can be added later ofcourse. * DoExchange - I feel that more feedback/discussion is required before DoExchange can be implemented. Note: **This has been fixed** Sourcelink did not work when using grpc.tools to have code compilation in the build step. So I had to generate the grpc code manually for sourcelink to work. This means that there are alot of extra code in this PR that are auto generated. cc: @eerhardt Looks like you are the most active for C#, would be great to get some feedback if this is similar to how you would implement it, or if major changes are required I am ofcourse up for that as well. Closes apache#8694 from Ulimo/master Lead-authored-by: Östman Alexander <alexander.ostman@sweco.se> Co-authored-by: Ulimo <alexander.ostman@hotmail.com> Signed-off-by: Eric Erhardt <eric.erhardt@microsoft.com>
I closed the previous PR and opened this one, since the original one did not use InternalsVisibleTo and required a version bump on the Apache.Arrow project from netstandard1.3 to netstandard1.5.
This adds basic support for both a flight server and a flight client for Net Core.
Sorry for the massive PR, but I wanted to have some basic functionality in place with atleast some tests that use the put and get flow before creating a PR.
I hope that this PR can help create some discussion on how the interfaces should look etc, and if this looks like an accetable interface for flight in net core. It tries to mimic as much as possible the original gRPC net core interface, but mapping the classes from the network protocol to the C# classes.
This implementation uses InternalsVisibleTo, to hinder a bump of .netstandard version from 1.3 to 1.5.
So the Apache.Arrow project still uses netstandard1.3.
All flight code is in a seperate project Apache.Arrow.Flight
This also required changing build version from 2.2 to 3.0
The code does not include:
Note:
This has been fixed
Sourcelink did not work when using grpc.tools to have code compilation in the build step. So I had to generate the grpc code manually for sourcelink to work. This means that there are alot of extra code in this PR that are auto generated.
cc: @eerhardt
Looks like you are the most active for C#, would be great to get some feedback if this is similar to how you would implement it, or if major changes are required I am ofcourse up for that as well.