Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-8343: [GLib] Add GArrowRecordBatchIterator #6847

Closed
wants to merge 8 commits into from

Conversation

mrkn
Copy link
Member

@mrkn mrkn commented Apr 6, 2020

I'd like to add GArrowRecordBatchIterator as a binding of arrow::RecordBatchIterator class.

@mrkn mrkn requested a review from kou April 6, 2020 08:44
@github-actions
Copy link

github-actions bot commented Apr 6, 2020

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
I've moved implementations to record-batch.{cpp,h,hpp} to reduce C++ build time. We already do similar things for array implementations.
(Generally, GLib library use one file per class. But we may put multiple classes to one file to reduce build time.)

GArrowRecordBatchIterator *
garrow_record_batch_iterator_new(GList *record_batches)
{
std::vector<std::shared_ptr<arrow::RecordBatch>> arrow_record_batches;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with the idea that it can avoid making a vector to introduce GListIterator<T> like arrow::VectorIterator<T>.

@kou How do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I've removed temporary vector in garrow_record_batch_iterator_to_list().

Copy link

@stevedanomodolor stevedanomodolor Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with the idea that it can avoid making a vector to introduce GListIterator<T> like arrow::VectorIterator<T>.

@kou How do you think about this?

How do you create a Glist of anything, especially fields. It is not clear how do just looking at the api funtions available. Please I would be gratefull.

What I understand so far is,

To create a GList of fields, your need this funtionn call ->
GList * garrow_struct_data_type_get_fields (GArrowStructDataType *struct_data_type);

which means you have to create a GArrowStructDatatype -> which is dont with this function call

GArrowStructDataType *
garrow_struct_data_type_new (GList *fields);, which also depends on GList, I am a bit confused, how to a funtion depends on another function that depends on it self.

An example how to create Glist would be really appreciated.

I also tried creating each field and when creating the schema, initializing it like this, {filedx, filedy} but it gives an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/schema.cpp#L241-L247, this is a typical pattern to make a GList from scratch.

By the way, please do not ask unrelated things to this pull-request here.
You can ask anything about Apache Arrow in the mailing list.
http://mail-archives.apache.org/mod_mbox/arrow-dev/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou I noticed GListIterator is not usable for implementing GArrowRecordBatchIterator because it is a binding of single arrow::RecordBatchIterator instead of a list of GArrowRecordBatch.

I postponed to introduce GListIterator until we actually need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then I'll merge this pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou I noticed again that GListIterator can be used just only for garrow_record_batch_iterator_new. I was confused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We can work on it as a follow-up pull request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

@kou kou closed this in 0a66565 Apr 8, 2020
@mrkn mrkn deleted the ARROW-8343 branch April 8, 2020 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants