Skip to content

[Go] Improved building of structs into arrow record  #64

@gmintoco

Description

@gmintoco

Describe the enhancement requested

Hi,

I recently made a post on the mailing list but I thought this might make more sense as a location to communicate. I am using Arrow for Go mostly to read and write Parquet and IPC files. Often I would like to use the very helpful schema.NewSchemaFromStruct() from github.com/apache/arrow/go/v11/parquet/schema. However naturally then in my code, I would like to build an Arrow record using this schema, something like this:

        var obj []Test
        pool := memory.NewGoAllocator()

	parquetSchema, err := pqschema.NewSchemaFromStruct(Test{})
	if err != nil {
		return nil, nil, err
	}
	schema, err := pqarrow.FromParquet(parquetSchema, &pqarrow.ArrowReadProperties{}, metadata.KeyValueMetadata{})
	if err != nil {
		return nil, nil, err
	}
	pqschema.PrintSchema(parquetSchema.Root(), os.Stdout, 2)

	builder := array.NewRecordBuilder(pool, schema)
	defer builder.Release()

	for i, obj := range input {
		builder.Field(0).(*array.BinaryBuilder).Append([]byte(obj.Id))
		list := builder.Field(1).(*array.ListBuilder)
		for _, value := range obj.Values[i] {
			subList := list.ValueBuilder().(*array.ListBuilder)
			subList.ValueBuilder().(*array.Float64Builder).Append(value)
			subList.Append(true)
		}
		list.Append(true)
	}

	rec := builder.NewRecord()

This is fine for smaller structs but when they get larger or a lot more complicated it is very tedious writing out all of the builder code (if there is already a better way of doing this I would love to know! or if I am approaching this wrong, I am quite new to go :) )

I thought it would make sense to have some reflection-based builder that can build a record from a struct. I took a stab at implementing something like this here: https://gist.github.com/gmintoco/3e65aa7b47ae37b0685db88b2755933f

My questions are:

  1. Is there a better way of doing this?
  2. Does a function like this make sense to add to the Go arrow implementation (I would be happy to try and write a PR if this is the case)

Looking forward to any feedback :)

Component(s)

Go

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions