-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-5640: [Go] Implement Arrow Map Array #10106
Conversation
Tagging @emkornfield @sbinet for visibility |
@@ -277,6 +277,8 @@ func NewBuilder(mem memory.Allocator, dtype arrow.DataType) Builder { | |||
case arrow.UNION: | |||
case arrow.DICTIONARY: | |||
case arrow.MAP: | |||
typ := dtype.(*arrow.MapType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like in this change, you're only adding map support, but here you are changing union and dictionary to use the map builder. I'm not sure in this context the difference between map and dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're still thinking in C++ context. Go doesn't automatically fallthrough like C++, this only modifies the Map case, it doesn't change anything about Union or Dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh thanks
|
||
func arrayEqualMap(left, right *Map) bool { | ||
// since Map is implemented using a list, we can just use arrayEqualList | ||
return arrayEqualList(left.List, right.List) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would maps with the same key values, but in different orders be considered equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently they would not, and as far as I can tell that seems consistent with other implementations. Someone else can tell me if i'm wrong there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't see anywhere in the spec docs or cod that mentions keysSorted
with regards to comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't. To my knowledge the keySorted doesn't have strong semantics other than to indicate the keys follow some logical ordering.
} | ||
|
||
if len(data.childData[0].childData) != 2 { | ||
panic("arrow/array: map array child array should have two fields") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the panics necessary here ? is this common in arrow/go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, just following the pattern that is used elsewhere. Until the version tags are added properly as part of the release process here I can't change that because it would require adding "error" returns that would be breaking changes in order to get rid of the panics.
bump for reviews and hopefully merges |
go/arrow/array/array_test.go
Outdated
@@ -85,10 +85,16 @@ func TestMakeFromData(t *testing.T) { | |||
}}, | |||
{name: "duration", d: &testDataType{arrow.DURATION}}, | |||
|
|||
{name: "map", d: &testDataType{arrow.MAP}, child: []*array.Data{ | |||
array.NewData(&testDataType{arrow.STRUCT}, 0, make([]*memory.Buffer, 4), []*array.Data{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is probably consistent with other code here, but literal comments like
/*elementByteWidth=*/4
could make this more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments for literals
|
||
// KeysSorted checks the datatype that was used to construct this array and | ||
// returns the KeysSorted boolean value used to denote if the key array is | ||
// sorted for each list element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might be worth commenting on the somewhat lack of semantics of keySored and linking to the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments expanding on the keysorted lack of semantics
// the Value field (the second field) of the child struct. | ||
func (a *Map) Items() Interface { return a.items } | ||
|
||
func (a *Map) Retain() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for Retain and release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
// NewMapBuilder returns a builder, using the provided memory allocator. | ||
// The created Map builder will create a map array whose keys will be a non-nullable | ||
// array of type `keytype` and whose mapped items will be a nullable array of itemtype. | ||
func NewMapBuilder(mem memory.Allocator, keytype, itemtype arrow.DataType, keysSorted bool) *MapBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth noting keysSorted is not enforced here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
} | ||
} | ||
|
||
func (b *MapBuilder) Retain() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for Retain and Release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
go/arrow/array/map.go
Outdated
// Len returns the current number of Maps that are in the builder | ||
func (b *MapBuilder) Len() int { return b.listBuilder.Len() } | ||
|
||
func (b *MapBuilder) Cap() int { return b.listBuilder.Cap() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs for Cap and NullN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
keysSorted bool | ||
} | ||
|
||
// NewMapBuilder returns a builder, using the provided memory allocator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since Maps are relatively complex to work with, it might pay to give a simple usage example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the example_test file to provide a simple usage example which would show up in the generated docs on pkg.go.dev, I figured that was more directly useful than putting one here, but i can also add one here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I didn't realize the examples end up on the docs page, I think that is sufficient from now on.
panic("arrow: nil key or item type for MapType") | ||
} | ||
|
||
return &MapType{value: ListOf(StructOf(Field{Name: "key", Type: key}, Field{Name: "value", Type: item, Nullable: true}))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does StructOf provide a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current implementation of StructOf
does not provide a name for the resulting struct, it only creates a datatype, rather than a Field. The DataType can then be used to create a Field and thus Name the struct.
The same is true for the current implementation of ListOf
here. The result is a DataType
not a field and thus doesn't have a name.
panic("arrow: nil key or item type for MapType") | ||
} | ||
|
||
return &MapType{value: ListOf(StructOf(Field{Name: "key", Type: key}, Field{Name: "value", Type: item, Nullable: true}))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does "key" and "value" hardcoded here have any implications for reading maps that name this differentlcy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check but I don't believe I enforce these names having to be named this way during reading. That said, naming them this way is how the spec describes it should be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I confirmed that the names hardcoded here do not affect reading maps that name it differently. The assumption, as per the spec, is always that the first field is the keys and the second field is the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks.
@@ -593,3 +593,66 @@ func Example_table() { | |||
// rec[3]["f1-i32"]: [16 17 18 19 20] | |||
// rec[3]["f2-f64"]: [16 17 18 19 20] | |||
} | |||
|
|||
// This example demonstrates how to create a Map Array. | |||
// The resulting array should be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the comment at the end of the method be moved up here? Maybe provide a reference here instead of documentation of usage I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By putting the comment at the end of the method here in that format this example actually gets run as a test when running tests and confirms that the output of running this method matches the output comment at the end of the method. moving the comment at the end of the method would disable that benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, very cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other cool aspect of this is that it shows up in the go docs too as an example like the examples on the pkg.go.dev docs here 😄
go/arrow/internal/arrjson/arrjson.go
Outdated
Unit string `json:"unit,omitempty"` | ||
TimeZone string `json:"timezone,omitempty"` | ||
Scale int `json:"scale,omitempty"` // for Decimal128 | ||
Name string `json:"name"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just whitespace adjustment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeysSorted
was added to the struct which is why the whitespace got adjusted
|
||
func makeMapsWantJSONs() string { | ||
return `{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for testing purposes would it make sense to use a shorter example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because these tests are generated from the arrdata
record batches that's why this ends up this large. The reason why those records are the size they are is to ensure that we're properly testing handling multiple chunks with a map and multiple records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly seems reasonable, mostly nits and some questions.
@emkornfield I've added comments and docs as requested and responded to the questions. Lemme know if there's anything else needed. |
@zeroshade I think this needs a rebase since the merge of the Decimal128 PR. |
@emkornfield yup, i was already doing the rebase when you commented haha. I've pushed the rebase |
@emkornfield all set with the rebase here now and all tests passed still, huzzah |
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately. Closes apache#10106 from zeroshade/maptype Authored-by: Matthew Topol <mtopol@factset.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately. Closes apache#10106 from zeroshade/maptype Authored-by: Matthew Topol <mtopol@factset.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately. Closes apache#10106 from zeroshade/maptype Authored-by: Matthew Topol <mtopol@factset.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Took it upon myself to implement the Map Array type for Golang and uncomment the tests appropriately.