Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 17 Add support for i128 #48

Merged
merged 1 commit into from
Jun 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The Arrow ecosystem provides many ways to convert between Arrow and other popula
Types that implement the `ArrowField`, `ArrowSerialize` and `ArrowDeserialize` traits can be converted to/from Arrow. The `ArrowField` implementation for a type defines the Arrow schema. The `ArrowSerialize` and `ArrowDeserialize` implementations provide the conversion logic via arrow2's data structures.


For serializing to arrow, the `TryIntoArrow::try_into_arrow` method can be used to serialize any iterable into an `arrow2::Array`, which represents the in-memory Arrow layout or a `arrow2::Chunk`, which represents a column group, `Vec<arrow2::Array>>`. `arrow2::Chunk` can be used with `arrow2` API for other functionality such converting to parquet and arrow flight RPC.
For serializing to arrow, `TryIntoArrow::try_into_arrow` can be used to serialize any iterable into an `arrow2::Array` or a `arrow2::Chunk`. `arrow2::Array` represents the in-memory Arrow layout. `arrow2::Chunk` represents a column group and can be used with `arrow2` API for other functionality such converting to parquet and arrow flight RPC.

For deserializing from arrow, the `TryIntoCollection::try_into_collection` can be used to deserialize from an `arrow2::Array` representation into any container that implements `FromIterator`.

Expand All @@ -34,7 +34,7 @@ For deserializing from arrow, the `TryIntoCollection::try_into_collection` can b

This is not an exhaustive list. Please open an issue if you need a feature.

### A note on nested option times
### A note on nested option types

Since the Arrow format only supports one level of validity, nested option types such as `Option<Option<T>>` after serialization to Arrow will lose intermediate nesting of None values. For example, `Some(None)` will be serialized to `None`,

Expand Down
23 changes: 12 additions & 11 deletions arrow2_convert/src/deserialize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ where

// Macro to facilitate implementation for numeric types and numeric arrays.
macro_rules! impl_arrow_deserialize_primitive {
($physical_type:ty, $logical_type:ident) => {
($physical_type:ty) => {
impl ArrowDeserialize for $physical_type {
type ArrayType = PrimitiveArray<$physical_type>;

Expand Down Expand Up @@ -103,16 +103,17 @@ where
}
}

impl_arrow_deserialize_primitive!(u8, UInt8);
impl_arrow_deserialize_primitive!(u16, UInt16);
impl_arrow_deserialize_primitive!(u32, UInt32);
impl_arrow_deserialize_primitive!(u64, UInt64);
impl_arrow_deserialize_primitive!(i8, Int8);
impl_arrow_deserialize_primitive!(i16, Int16);
impl_arrow_deserialize_primitive!(i32, Int32);
impl_arrow_deserialize_primitive!(i64, Int64);
impl_arrow_deserialize_primitive!(f32, Float32);
impl_arrow_deserialize_primitive!(f64, Float64);
impl_arrow_deserialize_primitive!(u8);
impl_arrow_deserialize_primitive!(u16);
impl_arrow_deserialize_primitive!(u32);
impl_arrow_deserialize_primitive!(u64);
impl_arrow_deserialize_primitive!(i8);
impl_arrow_deserialize_primitive!(i16);
impl_arrow_deserialize_primitive!(i32);
impl_arrow_deserialize_primitive!(i64);
impl_arrow_deserialize_primitive!(i128);
impl_arrow_deserialize_primitive!(f32);
impl_arrow_deserialize_primitive!(f64);

impl ArrowDeserialize for String {
type ArrayType = Utf8Array<i32>;
Expand Down
11 changes: 10 additions & 1 deletion arrow2_convert/src/field.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use chrono::{NaiveDate, NaiveDateTime};
/// Trait implemented by all types that can be used as an Arrow field.
///
/// Implementations are provided for types already supported by the arrow2 crate:
/// - numeric types: [`u8`], [`u16`], [`u32`], [`u64`], [`i8`], [`i16`], [`i32`], [`i64`], [`f32`], [`f64`]
/// - numeric types: [`u8`], [`u16`], [`u32`], [`u64`], [`i8`], [`i16`], [`i32`], [`i128`], [`i64`], [`f32`], [`f64`],
/// - other types: [`bool`], [`String`]
/// - temporal types: [`chrono::NaiveDate`], [`chrono::NaiveDateTime`]
///
Expand Down Expand Up @@ -107,6 +107,15 @@ impl_numeric_type_full!(i64, Int64);
impl_numeric_type_full!(f32, Float32);
impl_numeric_type_full!(f64, Float64);

impl ArrowField for i128 {
type Type = i128;

#[inline]
fn data_type() -> arrow2::datatypes::DataType {
arrow2::datatypes::DataType::Decimal(32, 32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is just the default. In general the user needs to specify the correct precision and scale via e.g. .to(DataType::Decimal(10, 11))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point - what would you recommend for this? One idea is we could provide a datatype override for a field to override the default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I would recommend an override (ideally mandatory), so that people do not mistakenly use 32, 32 without thinking about it. If not possible mandatory, always present the override in the examples so people do not forget about it.

}
}

impl ArrowField for String {
type Type = String;

Expand Down
23 changes: 12 additions & 11 deletions arrow2_convert/src/serialize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ pub trait ArrowMutableArray: arrow2::array::MutableArray {

// Macro to facilitate implementation of serializable traits for numeric types and numeric mutable arrays.
macro_rules! impl_numeric_type {
($physical_type:ty, $logical_type:ident) => {
($physical_type:ty) => {
impl ArrowSerialize for $physical_type {
type MutableArrayType = MutablePrimitiveArray<$physical_type>;

Expand Down Expand Up @@ -97,16 +97,17 @@ where
}
}

impl_numeric_type!(u8, UInt8);
impl_numeric_type!(u16, UInt16);
impl_numeric_type!(u32, UInt32);
impl_numeric_type!(u64, UInt64);
impl_numeric_type!(i8, Int8);
impl_numeric_type!(i16, Int16);
impl_numeric_type!(i32, Int32);
impl_numeric_type!(i64, Int64);
impl_numeric_type!(f32, Float32);
impl_numeric_type!(f64, Float64);
impl_numeric_type!(u8);
impl_numeric_type!(u16);
impl_numeric_type!(u32);
impl_numeric_type!(u64);
impl_numeric_type!(i8);
impl_numeric_type!(i16);
impl_numeric_type!(i32);
impl_numeric_type!(i128);
impl_numeric_type!(i64);
impl_numeric_type!(f32);
impl_numeric_type!(f64);

impl ArrowSerialize for String {
type MutableArrayType = MutableUtf8Array<i32>;
Expand Down
1 change: 1 addition & 0 deletions arrow2_convert/tests/test_round_trip.rs
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ fn test_primitive_type_vec() {
test_int_type!(i16);
test_int_type!(i32);
test_int_type!(i64);
test_int_type!(i128);
test_int_type!(u8);
test_int_type!(u16);
test_int_type!(u32);
Expand Down
3 changes: 3 additions & 0 deletions arrow2_convert/tests/test_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ fn test_schema_types() {
a5: chrono::NaiveDateTime,
// timestamp(ns, None)
a6: Option<chrono::NaiveDateTime>,
// i128
a7: i128,
// array of date times
date_time_list: Vec<chrono::NaiveDateTime>,
// optional list array of optional strings
Expand Down Expand Up @@ -122,6 +124,7 @@ fn test_schema_types() {
Field::new("a4", DataType::Date32, false),
Field::new("a5", DataType::Timestamp(TimeUnit::Nanosecond, None), false),
Field::new("a6", DataType::Timestamp(TimeUnit::Nanosecond, None), true),
Field::new("a7", DataType::Decimal(32, 32), false),
Field::new(
"date_time_list",
DataType::List(Box::new(Field::new(
Expand Down