Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Format][FlightSQL] Spec refers to "int" type (which seems Java specific) rather than "int32" #35118

Closed
appletreeisyellow opened this issue Apr 13, 2023 · 3 comments · Fixed by #35120 or #35213

Comments

@appletreeisyellow
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

While implement FlightSQL in our project, we found that the FlightSql.proto format reference refers to a type int

* key_sequence: int not null,

What int means was slightly unclear as the Arrow spec refers to "Int" types with bitWidth of 8, 16, 32 or 64 but the rust implementation has Int8, Int16, Int32 and Int64.

arrow/format/Schema.fbs

Lines 145 to 148 in c03ca8f

table Int {
bitWidth: int; // restricted to 8, 16, 32, and 64 in v1
is_signed: bool;
}

Go uses Int32:

{Name: "key_sequence", Type: arrow.PrimitiveTypes.Int32, Nullable: false},

C++ uses Int32:

field("key_sequence", int32(), false), field("fk_key_name", utf8(), true),

Java uses INT:

which is defined at:

According to The Java Language Specification from Section 4.2: Primitive Types and Values:

The integral types are byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit and 64-bit signed two's-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing UTF-16 code units

Therefore, we can conclude that INT in arrow Java refers to 32-bit integer.

Proposal

I propose that we change the proto spec to be consistent and use int<bitwidth> to refer to integer types. So specifically, change int to int32 to match C++/Rust/Go as well as the convention in the rest of Flight.proto

Here is the reference to details of our discussion https://github.com/influxdata/influxdb_iox/pull/7546#discussion_r1165830088

Component(s)

Format

@alamb
Copy link
Contributor

alamb commented Apr 16, 2023

Also related to #35107

alamb pushed a commit that referenced this issue Apr 16, 2023
…than `int` (#35120)

### Rationale for this change

The spec is inconsistent -- see details on #35118 

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: #35118
* Closes: #35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb alamb added this to the 13.0.0 milestone Apr 16, 2023
@appletreeisyellow
Copy link
Contributor Author

appletreeisyellow commented Apr 18, 2023

There are more to change from int to int32 in the following part of the code

arrow/format/FlightSql.proto

Lines 1067 to 1112 in 00072f9

* data_type: int not null (The SQL data type),
* column_size: int (The maximum size supported by that column.
* In case of exact numeric types, this represents the maximum precision.
* In case of string types, this represents the character length.
* In case of datetime data types, this represents the length in characters of the string representation.
* NULL is returned for data types where column size is not applicable.),
* literal_prefix: utf8 (Character or characters used to prefix a literal, NULL is returned for
* data types where a literal prefix is not applicable.),
* literal_suffix: utf8 (Character or characters used to terminate a literal,
* NULL is returned for data types where a literal suffix is not applicable.),
* create_params: list<utf8 not null>
* (A list of keywords corresponding to which parameters can be used when creating
* a column for that specific type.
* NULL is returned if there are no parameters for the data type definition.),
* nullable: int not null (Shows if the data type accepts a NULL value. The possible values can be seen in the
* Nullable enum.),
* case_sensitive: bool not null (Shows if a character data type is case-sensitive in collations and comparisons),
* searchable: int not null (Shows how the data type is used in a WHERE clause. The possible values can be seen in the
* Searchable enum.),
* unsigned_attribute: bool (Shows if the data type is unsigned. NULL is returned if the attribute is
* not applicable to the data type or the data type is not numeric.),
* fixed_prec_scale: bool not null (Shows if the data type has predefined fixed precision and scale.),
* auto_increment: bool (Shows if the data type is auto incremental. NULL is returned if the attribute
* is not applicable to the data type or the data type is not numeric.),
* local_type_name: utf8 (Localized version of the data source-dependent name of the data type. NULL
* is returned if a localized name is not supported by the data source),
* minimum_scale: int (The minimum scale of the data type on the data source.
* If a data type has a fixed scale, the MINIMUM_SCALE and MAXIMUM_SCALE
* columns both contain this value. NULL is returned if scale is not applicable.),
* maximum_scale: int (The maximum scale of the data type on the data source.
* NULL is returned if scale is not applicable.),
* sql_data_type: int not null (The value of the SQL DATA TYPE which has the same values
* as data_type value. Except for interval and datetime, which
* uses generic values. More info about those types can be
* obtained through datetime_subcode. The possible values can be seen
* in the XdbcDataType enum.),
* datetime_subcode: int (Only used when the SQL DATA TYPE is interval or datetime. It contains
* its sub types. For type different from interval and datetime, this value
* is NULL. The possible values can be seen in the XdbcDatetimeSubcode enum.),
* num_prec_radix: int (If the data type is an approximate numeric type, this column contains
* the value 2 to indicate that COLUMN_SIZE specifies a number of bits. For
* exact numeric types, this column contains the value 10 to indicate that
* column size specifies a number of decimal digits. Otherwise, this column is NULL.),
* interval_precision: int (If the data type is an interval data type, then this column contains the value
* of the interval leading precision. Otherwise, this column is NULL. This fields
* is only relevant to be used by ODBC).

Go uses Int32:

{Name: "data_type", Type: arrow.PrimitiveTypes.Int32, Nullable: false},
{Name: "column_size", Type: arrow.PrimitiveTypes.Int32, Nullable: true},
{Name: "literal_prefix", Type: arrow.BinaryTypes.String, Nullable: true},
{Name: "literal_suffix", Type: arrow.BinaryTypes.String, Nullable: true},
{Name: "create_params", Type: arrow.ListOfField(arrow.Field{Name: "item", Type: arrow.BinaryTypes.String, Nullable: false}), Nullable: true},
{Name: "nullable", Type: arrow.PrimitiveTypes.Int32, Nullable: false},
{Name: "case_sensitive", Type: arrow.FixedWidthTypes.Boolean, Nullable: false},
{Name: "searchable", Type: arrow.PrimitiveTypes.Int32, Nullable: false},
{Name: "unsigned_attribute", Type: arrow.FixedWidthTypes.Boolean, Nullable: true},
{Name: "fixed_prec_scale", Type: arrow.FixedWidthTypes.Boolean, Nullable: false},
{Name: "auto_increment", Type: arrow.FixedWidthTypes.Boolean, Nullable: true},
{Name: "local_type_name", Type: arrow.BinaryTypes.String, Nullable: true},
{Name: "minimum_scale", Type: arrow.PrimitiveTypes.Int32, Nullable: true},
{Name: "maximum_scale", Type: arrow.PrimitiveTypes.Int32, Nullable: true},
{Name: "sql_data_type", Type: arrow.PrimitiveTypes.Int32, Nullable: false},
{Name: "datetime_subcode", Type: arrow.PrimitiveTypes.Int32, Nullable: true},
{Name: "num_prec_radix", Type: arrow.PrimitiveTypes.Int32, Nullable: true},
{Name: "interval_precision", Type: arrow.PrimitiveTypes.Int32, Nullable: true},

C++ uses int32:

field("data_type", int32(), false),
field("column_size", int32()),
field("literal_prefix", utf8()),
field("literal_suffix", utf8()),
field("create_params", list(field("item", utf8(), false))),
field("nullable", int32(), false),
field("case_sensitive", boolean(), false),
field("searchable", int32(), false),
field("unsigned_attribute", boolean()),
field("fixed_prec_scale", boolean(), false),
field("auto_increment", boolean()),
field("local_type_name", utf8()),
field("minimum_scale", int32()),
field("maximum_scale", int32()),
field("sql_data_type", int32(), false),
field("datetime_subcode", int32()),
field("num_prec_radix", int32()),
field("interval_precision", int32()),

Java uses INT:

Field.notNullable("data_type", INT.getType()),
Field.nullable("column_size", INT.getType()),
Field.nullable("literal_prefix", VARCHAR.getType()),
Field.nullable("literal_suffix", VARCHAR.getType()),
new Field(
"create_params", FieldType.nullable(LIST.getType()),
singletonList(Field.notNullable("item", VARCHAR.getType()))),
Field.notNullable("nullable", INT.getType()),
Field.notNullable("case_sensitive", BIT.getType()),
Field.notNullable("searchable", INT.getType()),
Field.nullable("unsigned_attribute", BIT.getType()),
Field.notNullable("fixed_prec_scale", BIT.getType()),
Field.nullable("auto_increment", BIT.getType()),
Field.nullable("local_type_name", VARCHAR.getType()),
Field.nullable("minimum_scale", INT.getType()),
Field.nullable("maximum_scale", INT.getType()),
Field.notNullable("sql_data_type", INT.getType()),
Field.nullable("datetime_subcode", INT.getType()),
Field.nullable("num_prec_radix", INT.getType()),
Field.nullable("interval_precision", INT.getType())

As mentioned in the above, INT in arrow Java refers to 32-bit integer.

kou pushed a commit that referenced this issue Apr 18, 2023
…egers rather than `int` (#35213)

### Rationale for this change

There are more inconsistency of the spec format found  -- see details on the original issue #35118. #35120 is the first PR with the same fix.

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: #35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@alamb
Copy link
Contributor

alamb commented Apr 24, 2023

There are more to change from int to int32 in the following part of the code

Note this was fixed in #35213 -- thanks @appletreeisyellow and @kou

liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…ather than `int` (apache#35120)

### Rationale for this change

The spec is inconsistent -- see details on apache#35118 

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: apache#35118
* Closes: apache#35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…it integers rather than `int` (apache#35213)

### Rationale for this change

There are more inconsistency of the spec format found  -- see details on the original issue apache#35118. apache#35120 is the first PR with the same fix.

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: apache#35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…ather than `int` (apache#35120)

### Rationale for this change

The spec is inconsistent -- see details on apache#35118 

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: apache#35118
* Closes: apache#35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…it integers rather than `int` (apache#35213)

### Rationale for this change

There are more inconsistency of the spec format found  -- see details on the original issue apache#35118. apache#35120 is the first PR with the same fix.

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: apache#35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…ather than `int` (apache#35120)

### Rationale for this change

The spec is inconsistent -- see details on apache#35118 

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: apache#35118
* Closes: apache#35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…it integers rather than `int` (apache#35213)

### Rationale for this change

There are more inconsistency of the spec format found  -- see details on the original issue apache#35118. apache#35120 is the first PR with the same fix.

### What changes are included in this PR?

Use `int32` to refer to 32-bit integers rather than `int`

### Are these changes tested?

No, only comments are changed

### Are there any user-facing changes?

This clarifies a small corner case in the document

* Closes: apache#35118

Authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment