Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add generic protobuf #2634

Merged
merged 9 commits into from
Jun 28, 2022
56 changes: 56 additions & 0 deletions protos/io_descriptors.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
syntax = "proto3";

import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";

/* Value represents a single instance of the supported datatypes.
* Naming convention: Follows `numpy.dtype` where appropriate.
* `numpy.dtype`: https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types */
message Value{
oneof dtype{
float f4=1;
double f8=2;
uint32 u4=3;
uint64 u8=4;
sint32 i4=5;
sint64 i8=6;
google.protobuf.Timestamp timestamp_=7;
google.protobuf.Duration duration_=8;
bool bool_=9;
string str_=10;
bytes bytes_=11;
Value value_=12;
Array array_=13;
Tuple tuple_=14;
// TODO: int32, int64, fixed32, fixed64, sfixed32, sfixed64
}
}

/* Tuple represents a repeated field containing data with same or different datatypes */
message Tuple{
repeated Value value_=1;
}

// TODO: message complex types

/* Array contains a dtype which identifies the type of the array.
* The repeated field for the identified dtype contains the array.
* Naming convention: Follows `numpy.dtype` where appropriate.
* `numpy.dtype`: https://numpy.org/doc/stable/reference/arrays.dtypes.html#specifying-and-constructing-data-types */
message Array {
Copy link
Member

@aarnphm aarnphm Jun 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message Array {
message NumpyNdarray {

Also i think the filename should be protos/io_descriptors/numpy.proto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the message name, I thought we were keeping it Array to prevent confusion from clients using languages other than python. If I separate all the io descriptors in their own proto, there will be one generated file for each proto. Having one io_descriptor proto means users can access them from one generated file. For example: io_descriptor_pb.Array() to create a new array and io_descriptor_pb.PandasDataframe() to create a pandas dataframe and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NumpyNdarray provides a better distinction than normal Array. Since a lot of the proto message is Numpy specific, thus the name NumpyNdarray makes it more clear than just Array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do this in JSON, and I'm not sure we want to do this here. Numpy is what's being used in the server under the hood to process generic array data coming from clients, I don't think clients should care what the input format is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to NumpyNdarray. If you guys want, I can change it back to Array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sauyon imo we should follow the naming in io_descriptors for consistency

string dtype=1;
repeated float f4=2;
repeated double f8=3;
repeated uint32 u4=4;
repeated uint64 u8=5;
repeated sint32 i4=6;
repeated sint64 i8=7;
repeated google.protobuf.Timestamp timestamp_=8;
repeated google.protobuf.Duration duration_=9;
repeated bool bool_=10;
repeated string str_=11;
repeated bytes bytes_=12;
repeated Array array_=13;
repeated Tuple tuple_=14;
// TODO: int32, int64, fixed32, fixed64, sfixed32, sfixed64
}
Proto007 marked this conversation as resolved.
Show resolved Hide resolved