New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add generic protobuf #2634
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me. Lets proceed with the next step from the meeting.
protos/io_descriptors.proto
Outdated
* The repeated field for the identified dtype contains the array. | ||
* Naming convention: Follows `numpy.dtype` where appropriate. | ||
* `numpy.dtype`: https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types */ | ||
message Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message Array { | |
message NumpyNdarray { |
Also i think the filename should be protos/io_descriptors/numpy.proto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the message name, I thought we were keeping it Array
to prevent confusion from clients using languages other than python. If I separate all the io descriptors in their own proto, there will be one generated file for each proto. Having one io_descriptor proto means users can access them from one generated file. For example: io_descriptor_pb.Array()
to create a new array and io_descriptor_pb.PandasDataframe()
to create a pandas dataframe and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NumpyNdarray provides a better distinction than normal Array. Since a lot of the proto message is Numpy specific, thus the name NumpyNdarray
makes it more clear than just Array
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do this in JSON, and I'm not sure we want to do this here. Numpy is what's being used in the server under the hood to process generic array data coming from clients, I don't think clients should care what the input format is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change to NumpyNdarray
. If you guys want, I can change it back to Array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sauyon imo we should follow the naming in io_descriptors for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we merge this now so you can start working on the I/O descriptors, or do you want to finish up the TODOs first?
Lets merge this sauyon |
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
* Added generic proto file with basic Value message * Added basic Array message to represent arrays * Added Tuple message to represent tuples * Added timestamp and duration datatypes * Added concise todo comments * Changed field names, message name `NumpyNdarray` * Applied changed message name, moved todo comment * Added trailing newline * Fixed sint32 and sint64 field names Co-authored-by: Sadab Hafiz <sadabhafizny@gmail.com> Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Description
Added a generic protobuf file to represent arrays in protobuf format for grpc. Currently contains message
Value
to represent single instance of supported types, messageArray
to represent arrays of supported types, and messageNumpyNdarray
which stores more details about the arrays similar to python numpy arrays.Motivation and Context
This is a step towards allowing serving through grpc.
How Has This Been Tested?
This has been tested by creating protobuf instances of different types of arrays using the generated code of this proto file.
Checklist:
make format
andmake lint
script have passed(instructions).