New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type #4127
ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type #4127
Conversation
6421d3b
to
ec67279
Compare
@kou This is ready to review. |
c_glib/arrow-glib/composite-array.h
Outdated
gchar **field_names, | ||
gsize n_field_names, | ||
guint8 *type_codes, | ||
gsize n_type_codes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to provide API that requires many arguments. Because the API isn't easy to use.
How about using GArrowSparseUnionDataType
?
diff --git a/c_glib/arrow-glib/composite-array.cpp b/c_glib/arrow-glib/composite-array.cpp
index b202fb45..aac3c24c 100644
--- a/c_glib/arrow-glib/composite-array.cpp
+++ b/c_glib/arrow-glib/composite-array.cpp
@@ -366,6 +366,54 @@ garrow_sparse_union_array_new(GArrowInt8Array *type_ids,
}
}
+/**
+ * garrow_sparse_union_array_new_data_type:
+ * @data_type: The data type for the sparse array.
+ * @type_ids: The field type IDs for each value as #GArrowInt8Array.
+ * @fields: (element-type GArrowArray): The arrays for each field
+ * as #GList of #GArrowArray.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Returns: (nullable): A newly created #GArrowSparseUnionArray
+ * or %NULL on error.
+ *
+ * Since: 0.14.0
+ */
+GArrowSparseUnionArray *
+garrow_sparse_union_array_new_data_type(GArrowSparseUnionDataType *data_type,
+ GArrowInt8Array *type_ids,
+ GList *fields,
+ GError **error)
+{
+ auto arrow_data_type = garrow_data_type_get_raw(GARROW_DATA_TYPE(data_type));
+ auto arrow_union_data_type =
+ std::static_pointer_cast<arrow::UnionType>(arrow_data_type);
+ std::vector<std::string> arrow_field_names;
+ for (const auto &arrow_field : arrow_union_data_type->children()) {
+ arrow_field_names.push_back(arrow_field->name());
+ }
+ std::vector<uint8_t> arrow_type_codes(arrow_union_data_type->type_codes());
+ auto arrow_type_ids = garrow_array_get_raw(GARROW_ARRAY(type_ids));
+ std::vector<std::shared_ptr<arrow::Array>> arrow_fields;
+ for (auto node = fields; node; node = node->next) {
+ auto *field = GARROW_ARRAY(node->data);
+ arrow_fields.push_back(garrow_array_get_raw(field));
+ }
+ std::shared_ptr<arrow::Array> arrow_union_array;
+ auto status = arrow::UnionArray::MakeSparse(*arrow_type_ids,
+ arrow_fields,
+ arrow_field_names,
+ arrow_type_codes,
+ &arrow_union_array);
+ if (garrow_error_check(error,
+ status,
+ "[sparse-union-array][new][data-type]")) {
+ return GARROW_SPARSE_UNION_ARRAY(garrow_array_new_raw(&arrow_union_array));
+ } else {
+ return NULL;
+ }
+}
+
G_DEFINE_TYPE(GArrowDenseUnionArray,
garrow_dense_union_array,
diff --git a/c_glib/arrow-glib/composite-array.h b/c_glib/arrow-glib/composite-array.h
index a181ffcc..c65c1d66 100644
--- a/c_glib/arrow-glib/composite-array.h
+++ b/c_glib/arrow-glib/composite-array.h
@@ -108,6 +108,11 @@ GArrowSparseUnionArray *
garrow_sparse_union_array_new(GArrowInt8Array *type_ids,
GList *fields,
GError **error);
+GArrowSparseUnionArray *
+garrow_sparse_union_array_new_data_type(GArrowSparseUnionDataType *data_type,
+ GArrowInt8Array *type_ids,
+ GList *fields,
+ GError **error);
#define GARROW_TYPE_DENSE_UNION_ARRAY (garrow_dense_union_array_get_type())
diff --git a/c_glib/test/test-sparse-union-array.rb b/c_glib/test/test-sparse-union-array.rb
index 721f95c1..4a9e7c81 100644
--- a/c_glib/test/test-sparse-union-array.rb
+++ b/c_glib/test/test-sparse-union-array.rb
@@ -18,32 +18,69 @@
class TestSparseUnionArray < Test::Unit::TestCase
include Helper::Buildable
- def setup
- type_ids = build_int8_array([0, 1, nil, 1, 0])
- fields = [
- build_int16_array([1, nil, nil, nil, 5]),
- build_string_array([nil, "b", nil, "d", nil]),
- ]
- @array = Arrow::SparseUnionArray.new(type_ids, fields)
- end
+ sub_test_case(".new") do
+ sub_test_case("default") do
+ def setup
+ type_ids = build_int8_array([0, 1, nil, 1, 0])
+ fields = [
+ build_int16_array([1, nil, nil, nil, 5]),
+ build_string_array([nil, "b", nil, "d", nil]),
+ ]
+ @array = Arrow::SparseUnionArray.new(type_ids, fields)
+ end
- def test_value_data_type
- fields = [
- Arrow::Field.new("0", Arrow::Int16DataType.new),
- Arrow::Field.new("1", Arrow::StringDataType.new),
- ]
- assert_equal(Arrow::SparseUnionDataType.new(fields, [0, 1]),
- @array.value_data_type)
- end
+ def test_value_data_type
+ fields = [
+ Arrow::Field.new("0", Arrow::Int16DataType.new),
+ Arrow::Field.new("1", Arrow::StringDataType.new),
+ ]
+ assert_equal(Arrow::SparseUnionDataType.new(fields, [0, 1]),
+ @array.value_data_type)
+ end
- def test_field
- assert_equal([
- build_int16_array([1, nil, nil, nil, 5]),
- build_string_array([nil, "b", nil, "d", nil]),
- ],
- [
- @array.get_field(0),
- @array.get_field(1),
- ])
+ def test_field
+ assert_equal([
+ build_int16_array([1, nil, nil, nil, 5]),
+ build_string_array([nil, "b", nil, "d", nil]),
+ ],
+ [
+ @array.get_field(0),
+ @array.get_field(1),
+ ])
+ end
+ end
+
+ sub_test_case("DataType") do
+ def setup
+ data_type_fields = [
+ Arrow::Field.new("number", Arrow::Int16DataType.new),
+ Arrow::Field.new("text", Arrow::StringDataType.new),
+ ]
+ type_codes = [11, 13]
+ @data_type = Arrow::SparseUnionDataType.new(data_type_fields, type_codes)
+ type_ids = build_int8_array([0, 1, nil, 1, 0])
+ fields = [
+ build_int16_array([1, nil, nil, nil, 5]),
+ build_string_array([nil, "b", nil, "d", nil]),
+ ]
+ @array = Arrow::SparseUnionArray.new(@data_type, type_ids, fields)
+ end
+
+ def test_value_data_type
+ assert_equal(@data_type,
+ @array.value_data_type)
+ end
+
+ def test_field
+ assert_equal([
+ build_int16_array([1, nil, nil, nil, 5]),
+ build_string_array([nil, "b", nil, "d", nil]),
+ ],
+ [
+ @array.get_field(0),
+ @array.get_field(1),
+ ])
+ end
+ end
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll rewrite the code to do so.
ec67279
to
d755ba7
Compare
8406d4b
to
5ad5572
Compare
@kou It's ready to review again. Please have a look. |
Codecov Report
@@ Coverage Diff @@
## master #4127 +/- ##
==========================================
+ Coverage 87.77% 89.18% +1.4%
==========================================
Files 758 617 -141
Lines 92506 82202 -10304
Branches 1251 0 -1251
==========================================
- Hits 81201 73315 -7886
+ Misses 11188 8887 -2301
+ Partials 117 0 -117
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you check my comments?
for (const auto &arrow_field : arrow_union_data_type->children()) { | ||
arrow_field_names.push_back(arrow_field->name()); | ||
} | ||
std::vector<uint8_t> arrow_type_codes(arrow_union_data_type->type_codes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use arrow_union_data_type->type_codes()
directly for arrow::UnionArray::MakeSparse()
instead of copying it?
@@ -420,6 +468,57 @@ garrow_dense_union_array_new(GArrowInt8Array *type_ids, | |||
} | |||
} | |||
|
|||
/** | |||
* garrow_dense_union_array_new_data_type: | |||
* @data_type: The data type for the sparse array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sparse" -> "dense"
@array = Arrow::DenseUnionArray.new(type_ids, value_offsets, fields) | ||
end | ||
sub_test_case(".new") do | ||
def setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you put more sub test cases under ".new" sub test case?
sub_test_case(".new") do
sub_test_case("default") do # or "no DataType"?
end
sub_test_case("DataType") do
end
end
] | ||
type_codes = [11, 13] | ||
@data_type = Arrow::DenseUnionDataType.new(data_type_fields, type_codes) | ||
type_ids = build_int8_array([0, 1, nil, 1, 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right?
[11, 13, nil, 13, 13]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, type ids must be [11, 13, nil, 13, 11]
.
But, it doesn't affect the tests so I couldn't notice the mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add element accessor to union arrays and test union array values later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it.
] | ||
type_codes = [11, 13] | ||
@data_type = Arrow::SparseUnionDataType.new(data_type_fields, type_codes) | ||
type_ids = build_int8_array([0, 1, nil, 1, 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right?
[11, 13, nil, 13, 11]
?
@kou I've done to fix code for your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I'll push a fix and merge this.
] | ||
type_codes = [11, 13] | ||
@data_type = Arrow::DenseUnionDataType.new(data_type_fields, type_codes) | ||
type_ids = build_int8_array([11, 13, nil, 11, 13]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[11, 13, nil, 11, 13]
should be [11, 13, nil, 13, 13]
because number
field only has one element and text
field has three elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!!
This is separated from #3723.
This should be merged after #3723.