Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type #4127

Closed

Conversation

mrkn
Copy link
Member

@mrkn mrkn commented Apr 9, 2019

This is separated from #3723.
This should be merged after #3723.

@mrkn mrkn force-pushed the glib_ruby_make_union_array_with_field_names branch from 6421d3b to ec67279 Compare April 10, 2019 01:35
@mrkn mrkn changed the title WIP: [GLib][Ruby]MakeDense and MakeSparse in UnionArray should accept a vector of Field ARROW-5155: [GLib][Ruby]MakeDense and MakeSparse in UnionArray should accept a vector of Field Apr 10, 2019
@mrkn
Copy link
Member Author

mrkn commented Apr 10, 2019

@kou This is ready to review.
Could you please have a look this again?

gchar **field_names,
gsize n_field_names,
guint8 *type_codes,
gsize n_type_codes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to provide API that requires many arguments. Because the API isn't easy to use.

How about using GArrowSparseUnionDataType?

diff --git a/c_glib/arrow-glib/composite-array.cpp b/c_glib/arrow-glib/composite-array.cpp
index b202fb45..aac3c24c 100644
--- a/c_glib/arrow-glib/composite-array.cpp
+++ b/c_glib/arrow-glib/composite-array.cpp
@@ -366,6 +366,54 @@ garrow_sparse_union_array_new(GArrowInt8Array *type_ids,
   }
 }
 
+/**
+ * garrow_sparse_union_array_new_data_type:
+ * @data_type: The data type for the sparse array.
+ * @type_ids: The field type IDs for each value as #GArrowInt8Array.
+ * @fields: (element-type GArrowArray): The arrays for each field
+ *   as #GList of #GArrowArray.
+ * @error: (nullable): Return location for a #GError or %NULL.
+ *
+ * Returns: (nullable): A newly created #GArrowSparseUnionArray
+ *   or %NULL on error.
+ *
+ * Since: 0.14.0
+ */
+GArrowSparseUnionArray *
+garrow_sparse_union_array_new_data_type(GArrowSparseUnionDataType *data_type,
+                                        GArrowInt8Array *type_ids,
+                                        GList *fields,
+                                        GError **error)
+{
+  auto arrow_data_type = garrow_data_type_get_raw(GARROW_DATA_TYPE(data_type));
+  auto arrow_union_data_type =
+    std::static_pointer_cast<arrow::UnionType>(arrow_data_type);
+  std::vector<std::string> arrow_field_names;
+  for (const auto &arrow_field : arrow_union_data_type->children()) {
+    arrow_field_names.push_back(arrow_field->name());
+  }
+  std::vector<uint8_t> arrow_type_codes(arrow_union_data_type->type_codes());
+  auto arrow_type_ids = garrow_array_get_raw(GARROW_ARRAY(type_ids));
+  std::vector<std::shared_ptr<arrow::Array>> arrow_fields;
+  for (auto node = fields; node; node = node->next) {
+    auto *field = GARROW_ARRAY(node->data);
+    arrow_fields.push_back(garrow_array_get_raw(field));
+  }
+  std::shared_ptr<arrow::Array> arrow_union_array;
+  auto status = arrow::UnionArray::MakeSparse(*arrow_type_ids,
+                                              arrow_fields,
+                                              arrow_field_names,
+                                              arrow_type_codes,
+                                              &arrow_union_array);
+  if (garrow_error_check(error,
+                         status,
+                         "[sparse-union-array][new][data-type]")) {
+    return GARROW_SPARSE_UNION_ARRAY(garrow_array_new_raw(&arrow_union_array));
+  } else {
+    return NULL;
+  }
+}
+
 
 G_DEFINE_TYPE(GArrowDenseUnionArray,
               garrow_dense_union_array,
diff --git a/c_glib/arrow-glib/composite-array.h b/c_glib/arrow-glib/composite-array.h
index a181ffcc..c65c1d66 100644
--- a/c_glib/arrow-glib/composite-array.h
+++ b/c_glib/arrow-glib/composite-array.h
@@ -108,6 +108,11 @@ GArrowSparseUnionArray *
 garrow_sparse_union_array_new(GArrowInt8Array *type_ids,
                               GList *fields,
                               GError **error);
+GArrowSparseUnionArray *
+garrow_sparse_union_array_new_data_type(GArrowSparseUnionDataType *data_type,
+                                        GArrowInt8Array *type_ids,
+                                        GList *fields,
+                                        GError **error);
 
 
 #define GARROW_TYPE_DENSE_UNION_ARRAY (garrow_dense_union_array_get_type())
diff --git a/c_glib/test/test-sparse-union-array.rb b/c_glib/test/test-sparse-union-array.rb
index 721f95c1..4a9e7c81 100644
--- a/c_glib/test/test-sparse-union-array.rb
+++ b/c_glib/test/test-sparse-union-array.rb
@@ -18,32 +18,69 @@
 class TestSparseUnionArray < Test::Unit::TestCase
   include Helper::Buildable
 
-  def setup
-    type_ids = build_int8_array([0, 1, nil, 1, 0])
-    fields = [
-      build_int16_array([1, nil, nil, nil, 5]),
-      build_string_array([nil, "b", nil, "d", nil]),
-    ]
-    @array = Arrow::SparseUnionArray.new(type_ids, fields)
-  end
+  sub_test_case(".new") do
+    sub_test_case("default") do
+      def setup
+        type_ids = build_int8_array([0, 1, nil, 1, 0])
+        fields = [
+          build_int16_array([1, nil, nil, nil, 5]),
+          build_string_array([nil, "b", nil, "d", nil]),
+        ]
+        @array = Arrow::SparseUnionArray.new(type_ids, fields)
+      end
 
-  def test_value_data_type
-    fields = [
-      Arrow::Field.new("0", Arrow::Int16DataType.new),
-      Arrow::Field.new("1", Arrow::StringDataType.new),
-    ]
-    assert_equal(Arrow::SparseUnionDataType.new(fields, [0, 1]),
-                 @array.value_data_type)
-  end
+      def test_value_data_type
+        fields = [
+          Arrow::Field.new("0", Arrow::Int16DataType.new),
+          Arrow::Field.new("1", Arrow::StringDataType.new),
+        ]
+        assert_equal(Arrow::SparseUnionDataType.new(fields, [0, 1]),
+                     @array.value_data_type)
+      end
 
-  def test_field
-    assert_equal([
-                   build_int16_array([1, nil, nil, nil, 5]),
-                   build_string_array([nil, "b", nil, "d", nil]),
-                 ],
-                 [
-                   @array.get_field(0),
-                   @array.get_field(1),
-                 ])
+      def test_field
+        assert_equal([
+                       build_int16_array([1, nil, nil, nil, 5]),
+                       build_string_array([nil, "b", nil, "d", nil]),
+                     ],
+                     [
+                       @array.get_field(0),
+                       @array.get_field(1),
+                     ])
+      end
+    end
+
+    sub_test_case("DataType") do
+      def setup
+        data_type_fields = [
+          Arrow::Field.new("number", Arrow::Int16DataType.new),
+          Arrow::Field.new("text", Arrow::StringDataType.new),
+        ]
+        type_codes = [11, 13]
+        @data_type = Arrow::SparseUnionDataType.new(data_type_fields, type_codes)
+        type_ids = build_int8_array([0, 1, nil, 1, 0])
+        fields = [
+          build_int16_array([1, nil, nil, nil, 5]),
+          build_string_array([nil, "b", nil, "d", nil]),
+        ]
+        @array = Arrow::SparseUnionArray.new(@data_type, type_ids, fields)
+      end
+
+      def test_value_data_type
+        assert_equal(@data_type,
+                     @array.value_data_type)
+      end
+
+      def test_field
+        assert_equal([
+                       build_int16_array([1, nil, nil, nil, 5]),
+                       build_string_array([nil, "b", nil, "d", nil]),
+                     ],
+                     [
+                       @array.get_field(0),
+                       @array.get_field(1),
+                     ])
+      end
+    end
   end
 end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll rewrite the code to do so.

@mrkn mrkn force-pushed the glib_ruby_make_union_array_with_field_names branch from ec67279 to d755ba7 Compare April 22, 2019 09:30
@mrkn mrkn force-pushed the glib_ruby_make_union_array_with_field_names branch 2 times, most recently from 8406d4b to 5ad5572 Compare April 24, 2019 01:21
@mrkn
Copy link
Member Author

mrkn commented Apr 24, 2019

@kou It's ready to review again. Please have a look.

@codecov-io
Copy link

codecov-io commented Apr 24, 2019

Codecov Report

Merging #4127 into master will increase coverage by 1.4%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4127      +/-   ##
==========================================
+ Coverage   87.77%   89.18%    +1.4%     
==========================================
  Files         758      617     -141     
  Lines       92506    82202   -10304     
  Branches     1251        0    -1251     
==========================================
- Hits        81201    73315    -7886     
+ Misses      11188     8887    -2301     
+ Partials      117        0     -117
Impacted Files Coverage Δ
python/pyarrow/_plasma.pyx 91.27% <0%> (-0.06%) ⬇️
python/pyarrow/pandas_compat.py 96.7% <0%> (-0.01%) ⬇️
python/pyarrow/_csv.pyx 99.15% <0%> (-0.01%) ⬇️
cpp/src/arrow/array-test.cc 100% <0%> (ø) ⬆️
python/pyarrow/tests/test_array.py 96.79% <0%> (ø) ⬆️
cpp/src/arrow/array-binary-test.cc 100% <0%> (ø) ⬆️
python/pyarrow/types.pxi 69.27% <0%> (ø) ⬆️
go/arrow/math/uint64_amd64.go
go/arrow/memory/memory_avx2_amd64.go
go/arrow/ipc/file_reader.go
... and 143 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 532450d...e625556. Read the comment docs.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check my comments?

for (const auto &arrow_field : arrow_union_data_type->children()) {
arrow_field_names.push_back(arrow_field->name());
}
std::vector<uint8_t> arrow_type_codes(arrow_union_data_type->type_codes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use arrow_union_data_type->type_codes() directly for arrow::UnionArray::MakeSparse() instead of copying it?

@@ -420,6 +468,57 @@ garrow_dense_union_array_new(GArrowInt8Array *type_ids,
}
}

/**
* garrow_dense_union_array_new_data_type:
* @data_type: The data type for the sparse array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sparse" -> "dense"

@array = Arrow::DenseUnionArray.new(type_ids, value_offsets, fields)
end
sub_test_case(".new") do
def setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put more sub test cases under ".new" sub test case?

sub_test_case(".new") do
  sub_test_case("default") do # or "no DataType"?
  end

  sub_test_case("DataType") do
  end
end

]
type_codes = [11, 13]
@data_type = Arrow::DenseUnionDataType.new(data_type_fields, type_codes)
type_ids = build_int8_array([0, 1, nil, 1, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right?
[11, 13, nil, 13, 13]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, type ids must be [11, 13, nil, 13, 11].
But, it doesn't affect the tests so I couldn't notice the mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add element accessor to union arrays and test union array values later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it.

]
type_codes = [11, 13]
@data_type = Arrow::SparseUnionDataType.new(data_type_fields, type_codes)
type_ids = build_int8_array([0, 1, nil, 1, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right?
[11, 13, nil, 13, 11]?

@mrkn
Copy link
Member Author

mrkn commented Apr 25, 2019

@kou I've done to fix code for your comments.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
I'll push a fix and merge this.

]
type_codes = [11, 13]
@data_type = Arrow::DenseUnionDataType.new(data_type_fields, type_codes)
type_ids = build_int8_array([11, 13, nil, 11, 13])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[11, 13, nil, 11, 13] should be [11, 13, nil, 13, 13] because number field only has one element and text field has three elements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!

@kou kou changed the title ARROW-5155: [GLib][Ruby]MakeDense and MakeSparse in UnionArray should accept a vector of Field ARROW-5155: [GLib][Ruby] MakeDense and MakeSparse in UnionArray should accept a vector of Field Apr 25, 2019
@kou kou changed the title ARROW-5155: [GLib][Ruby] MakeDense and MakeSparse in UnionArray should accept a vector of Field ARROW-5155: [GLib][Ruby] Add support for building union arrays from data type Apr 25, 2019
@kou kou closed this in ecfb807 Apr 25, 2019
@mrkn mrkn deleted the glib_ruby_make_union_array_with_field_names branch April 25, 2019 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants