Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different primitive types in different languages #35052

Closed
izveigor opened this issue Apr 11, 2023 · 8 comments
Closed

Different primitive types in different languages #35052

izveigor opened this issue Apr 11, 2023 · 8 comments

Comments

@izveigor
Copy link
Contributor

izveigor commented Apr 11, 2023

Describe the bug, including details regarding any error messages, version, and platform.

Primitive types differ depending on the programming language.

Language Return types of the function "is_primitive" Source
Rust UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float16, Float32, Float64, Decimal128, Decimal256, Date32, Date64, Timestamp, Time32, Time64, Duration, Interval link
Go BOOL, UINT8, INT8, UINT16, INT16, UINT32, INT32, UINT64, INT64, FLOAT16, FLOAT32, FLOAT64, DATE32, DATE64, TIME32, TIME64, TIMESTAMP, DURATION, INTERVAL_MONTHS, INTERVAL_DAY_TIME, INTERVAL_MONTH_DAY_NANO link
Python _Type_NA, _Type_BOOL, _Type_UINT8, _Type_INT8, _Type_UINT16, _Type_INT16, _Type_UINT32, _Type_INT32, _Type_UINT64, _Type_INT64, _Type_TIMESTAMP, _Type_DATE32, _Type_TIME32, _Type_TIME64, _Type_DATE64, _Type_HALF_FLOAT, _Type_FLOAT, _Type_DOUBLE link

I think, there must be some rule that determines whether a type belongs to primitive types.

Component(s)

Go, Python

@westonpace
Copy link
Member

It might be nice to agree on other type categories too (temporal, numeric, etc.)

@izveigor
Copy link
Contributor Author

izveigor commented Apr 12, 2023

I think, it's also a great idea to define all types of tensors, because they are also different.

Language Support tensor data types Source
Rust BooleanTensor, Int8Tensor, Int16Tensor, Int32Tensor, Int64Tensor, UInt8Tensor, UInt16Tensor, UInt32Tensor, UInt64Tensor, Float16Tensor, Float32Tensor, Float64Tensor link
Golang Int8, Int16, Int32, Int64, Uint8, Uint16, Uint32, Uint64, Float32, Float64, Date32, Date64 link

@izveigor
Copy link
Contributor Author

@tustvold
Copy link
Contributor

tustvold commented Apr 12, 2023

For Rust we don't define booleans as primitives because it is a separate array type, BooleanArray vs PrimitiveArray<T>.

This arises because there is a non-trivial behaviour and API difference between arrays of aligned scalars, and bit packed bools, with the former having native language support, e.g. [i8], [u32], support transparent zero-copy slicing, etc...

I suspect there may be differing definitions of what constitutes a primitive type, is it being a native scalar value (which is what Rust uses), or does it reflect the buffer layout

Edit: although the lack of Decimal in python and the presence of null in Go is confusing... 🤔

@alamb
Copy link
Contributor

alamb commented Apr 12, 2023

My initial reading of this ticket is that I would expect the native type mappings of Arrow --> languages to differ somewhat given different languages have different notions of what "primitive types" are. Most languages have native integer and float support, but from there the differences get substantial as highlighted above

So in other words maybe this is "not a bug, working as expected"

@izveigor I wonder if you could provide some background about why you are raising this issue (like what problem does varying native types in different language bindings cause)?

@izveigor
Copy link
Contributor Author

I didn't accurately describe the problem, I will try to ask some questions that I did not understand.

  1. I don't understand the main principe by which a type is assigned to a primitive. Seen from the user's point of view, the primitive types are the opposite nested. If this is due to the peculiarities of the language, then which ones?
  2. Tensor problem. As written above, tensor types are also different. In my opinion, a tensor should accept all primitive types.

I think the answers to these questions will be of interest to users who do not quite understand the definition.

@westonpace
Copy link
Member

I don't understand the main principe by which a type is assigned to a primitive. Seen from the user's point of view, the primitive types are the opposite nested. If this is due to the peculiarities of the language, then which ones?

"Not nested" is one definition of primitive (and it matches the one I have in my head) but it seems like not all implementations have chosen this definition. For example, if I interpret @tustvold 's comment correctly I believe it says that Rust has chosen "primitive" to mean "maps directly to a rust primitive array". Since "primitive" is not defined by the spec it is probably valid (as @alamb mentions) for each implementation to have a different definition.

Tensor problem. As written above, tensor types are also different. In my opinion, a tensor should accept all primitive types.

The rust tensor implementation is older and predates recent discussion on a formalized definition for tensors added in #33925 Given the definition in #33925 I don't see any use of the word "primitive" or, in fact, any limitation to the possible types. So I think it would be legal (if not altogether sensible) to have tensors of nested types. For example, a tensor of strings should be legal.

@izveigor
Copy link
Contributor Author

Thanks for the answer, @westonpace. I think after that I have no questions left about primitive types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants