Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-5640: [Go] Implement Arrow Map Array #10106

Closed
wants to merge 8 commits into from

Conversation

zeroshade
Copy link
Member

Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately.

@github-actions
Copy link

@zeroshade
Copy link
Member Author

Tagging @emkornfield @sbinet for visibility

@@ -277,6 +277,8 @@ func NewBuilder(mem memory.Allocator, dtype arrow.DataType) Builder {
case arrow.UNION:
case arrow.DICTIONARY:
case arrow.MAP:
typ := dtype.(*arrow.MapType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in this change, you're only adding map support, but here you are changing union and dictionary to use the map builder. I'm not sure in this context the difference between map and dictionary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're still thinking in C++ context. Go doesn't automatically fallthrough like C++, this only modifies the Map case, it doesn't change anything about Union or Dictionary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh thanks


func arrayEqualMap(left, right *Map) bool {
// since Map is implemented using a list, we can just use arrayEqualList
return arrayEqualList(left.List, right.List)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would maps with the same key values, but in different orders be considered equal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently they would not, and as far as I can tell that seems consistent with other implementations. Someone else can tell me if i'm wrong there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see anywhere in the spec docs or cod that mentions keysSorted with regards to comparison

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't. To my knowledge the keySorted doesn't have strong semantics other than to indicate the keys follow some logical ordering.

}

if len(data.childData[0].childData) != 2 {
panic("arrow/array: map array child array should have two fields")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the panics necessary here ? is this common in arrow/go ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, just following the pattern that is used elsewhere. Until the version tags are added properly as part of the release process here I can't change that because it would require adding "error" returns that would be breaking changes in order to get rid of the panics.

@zeroshade
Copy link
Member Author

bump for reviews and hopefully merges

@@ -85,10 +85,16 @@ func TestMakeFromData(t *testing.T) {
}},
{name: "duration", d: &testDataType{arrow.DURATION}},

{name: "map", d: &testDataType{arrow.MAP}, child: []*array.Data{
array.NewData(&testDataType{arrow.STRUCT}, 0, make([]*memory.Buffer, 4), []*array.Data{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is probably consistent with other code here, but literal comments like

/*elementByteWidth=*/4 could make this more readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments for literals


// KeysSorted checks the datatype that was used to construct this array and
// returns the KeysSorted boolean value used to denote if the key array is
// sorted for each list element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be worth commenting on the somewhat lack of semantics of keySored and linking to the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments expanding on the keysorted lack of semantics

// the Value field (the second field) of the child struct.
func (a *Map) Items() Interface { return a.items }

func (a *Map) Retain() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for Retain and release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

// NewMapBuilder returns a builder, using the provided memory allocator.
// The created Map builder will create a map array whose keys will be a non-nullable
// array of type `keytype` and whose mapped items will be a nullable array of itemtype.
func NewMapBuilder(mem memory.Allocator, keytype, itemtype arrow.DataType, keysSorted bool) *MapBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth noting keysSorted is not enforced here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}
}

func (b *MapBuilder) Retain() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for Retain and Release.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

// Len returns the current number of Maps that are in the builder
func (b *MapBuilder) Len() int { return b.listBuilder.Len() }

func (b *MapBuilder) Cap() int { return b.listBuilder.Cap() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for Cap and NullN?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

keysSorted bool
}

// NewMapBuilder returns a builder, using the provided memory allocator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since Maps are relatively complex to work with, it might pay to give a simple usage example here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the example_test file to provide a simple usage example which would show up in the generated docs on pkg.go.dev, I figured that was more directly useful than putting one here, but i can also add one here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I didn't realize the examples end up on the docs page, I think that is sufficient from now on.

panic("arrow: nil key or item type for MapType")
}

return &MapType{value: ListOf(StructOf(Field{Name: "key", Type: key}, Field{Name: "value", Type: item, Nullable: true}))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does StructOf provide a name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current implementation of StructOf does not provide a name for the resulting struct, it only creates a datatype, rather than a Field. The DataType can then be used to create a Field and thus Name the struct.

The same is true for the current implementation of ListOf here. The result is a DataType not a field and thus doesn't have a name.

panic("arrow: nil key or item type for MapType")
}

return &MapType{value: ListOf(StructOf(Field{Name: "key", Type: key}, Field{Name: "value", Type: item, Nullable: true}))}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "key" and "value" hardcoded here have any implications for reading maps that name this differentlcy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check but I don't believe I enforce these names having to be named this way during reading. That said, naming them this way is how the spec describes it should be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I confirmed that the names hardcoded here do not affect reading maps that name it differently. The assumption, as per the spec, is always that the first field is the keys and the second field is the values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks.

@@ -593,3 +593,66 @@ func Example_table() {
// rec[3]["f1-i32"]: [16 17 18 19 20]
// rec[3]["f2-f64"]: [16 17 18 19 20]
}

// This example demonstrates how to create a Map Array.
// The resulting array should be:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the comment at the end of the method be moved up here? Maybe provide a reference here instead of documentation of usage I mentioned above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By putting the comment at the end of the method here in that format this example actually gets run as a test when running tests and confirms that the output of running this method matches the output comment at the end of the method. moving the comment at the end of the method would disable that benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, very cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other cool aspect of this is that it shows up in the go docs too as an example like the examples on the pkg.go.dev docs here 😄

Unit string `json:"unit,omitempty"`
TimeZone string `json:"timezone,omitempty"`
Scale int `json:"scale,omitempty"` // for Decimal128
Name string `json:"name"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just whitespace adjustment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeysSorted was added to the struct which is why the whitespace got adjusted


func makeMapsWantJSONs() string {
return `{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for testing purposes would it make sense to use a shorter example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because these tests are generated from the arrdata record batches that's why this ends up this large. The reason why those records are the size they are is to ensure that we're properly testing handling multiple chunks with a map and multiple records.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems reasonable, mostly nits and some questions.

@zeroshade
Copy link
Member Author

@emkornfield I've added comments and docs as requested and responded to the questions. Lemme know if there's anything else needed.

@emkornfield
Copy link
Contributor

@zeroshade I think this needs a rebase since the merge of the Decimal128 PR.

@zeroshade
Copy link
Member Author

@emkornfield yup, i was already doing the rebase when you commented haha. I've pushed the rebase

@zeroshade
Copy link
Member Author

@emkornfield all set with the rebase here now and all tests passed still, huzzah

DileepSrigiri pushed a commit to DileepSrigiri/arrow that referenced this pull request May 6, 2021
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately.

Closes apache#10106 from zeroshade/maptype

Authored-by: Matthew Topol <mtopol@factset.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 10, 2021
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately.

Closes apache#10106 from zeroshade/maptype

Authored-by: Matthew Topol <mtopol@factset.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately.

Closes apache#10106 from zeroshade/maptype

Authored-by: Matthew Topol <mtopol@factset.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
@zeroshade zeroshade deleted the maptype branch September 12, 2021 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants