-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go: Schema and Types #8122
Go: Schema and Types #8122
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, what else can I say since it is very much a copy of the PyIceberg code :D Thanks for working on this! Curious what @nastra thinks of this since he's much more experienced in Go than me.
return false | ||
} | ||
|
||
return slices.EqualFunc(s.fields, other.fields, func(a, b NestedField) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. In Python we use a generator that will break the loop once it encounters the first False
:
schema_is_equal = all(lhs == rhs for lhs, rhs in zip(self.columns, other.columns))
Is there something similar in Go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slices.EqualFunc
will break its loop when it encounters the first false.
if bl, ok := visitor.(BeforeListElementVisitor); ok { | ||
bl.BeforeListElement(elemField) | ||
} else if bf, ok := visitor.(BeforeFieldVisitor); ok { | ||
bf.BeforeField(elemField) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only want to visit the first branch of this if:
@visit.register(ListType)
def _(obj: ListType, visitor: SchemaVisitor[T]) -> T:
"""Visit a ListType with a concrete SchemaVisitor."""
visitor.before_list_element(obj.element_field)
result = visit(obj.element_type, visitor)
visitor.after_list_element(obj.element_field)
return visitor.list(obj, result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default implementation of before_list_element
calls before_field
(https://github.com/apache/iceberg/blob/c27c1fb1f279c8be96847fc41ce637b3eff19661/python/pyiceberg/schema.py#L282-284)
So it does require the second branch
) | ||
|
||
var ( | ||
tableSchemaNested = iceberg.NewSchemaWithIdentifiers(1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for using the same tests :)
}, Required: true}, | ||
9: {ID: 9, Name: "key", Type: iceberg.PrimitiveTypes.String, Required: true}, | ||
10: {ID: 10, Name: "value", Type: iceberg.PrimitiveTypes.Int32, Required: true}, | ||
11: tableSchemaNested.Field(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
|
||
require ( | ||
github.com/davecgh/go-spew v1.1.1 // indirect | ||
github.com/pmezard/go-difflib v1.0.0 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer maintained, do we still want to continue with this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are indirect dependencies of github.com/stretchr/testify/assert
which is very well maintained currently so I don't think it's an issue.
I'll review this early next week |
|
||
# Get in Touch | ||
|
||
- [Iceberg community](https://iceberg.apache.org/community/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have a contributing section or file for GO so that people know how to set up Intellij and how to run tests and such (can be a follow-up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I'll follow up with that after I get the existing stuff merged.
t.Type = BinaryType{} | ||
default: | ||
switch { | ||
case strings.HasPrefix(typename, "fixed"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use the same regex that we're using on the Java side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, I was following the way that pyiceberg works, which regex is essentially just doing the same thing as I'm doing here by checking for the prefix and then parsing the number out of the brackets. If we're gonna switch this to the same regex as the Java side, we should probably do the same in pyiceberg so that all three are aligned, yes?
|
||
package iceberg_test | ||
|
||
import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all types should be tested, similar to TestSchemaConversions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for all the remaining types.
rev: v4.4.0 | ||
hooks: | ||
- id: golangci-lint | ||
entry: bash -c 'cd iceberg && golangci-lint run --fix --timeout 5m' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this require any installation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with pyiceberg, you need to pip install pre-commit
first and then you can run pre-commit install
to get what you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for someone who isn't actively working on pyiceberg (like me) this isn't obvious what exactly is required for installation, so would be good to add this somewhere in the docs. This can be a follow-up and can be part of the Docs that describe how to set up IntelliJ with Go and such, so that non-Go committers can review code and know how to run tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I can add stuff to the README for it. Though I have no idea how to set up IntelliJ with Go since I use VSCode lol, so i welcome any contributions on that 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the thing is that most reviewers will be using Intellij and having some docs on how to set up stuff for Go would be great and also improves review cycles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goland is the recommended IDE for go from Jetbrains, but it is paid. VScode has good support for Go compared to IntelliJ.
Core contributors of open source projects can get Goland for free
go/iceberg/schema_test.go
Outdated
assert.Truef(t, tableSchemaNested.Equals(&sc), "expected: %s\ngot: %s", tableSchemaNested, &sc) | ||
} | ||
|
||
func TestRemainingTypes(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a types_test.go
where this should live and where we can test all types with the JSON parsing, similar to how it's done on the Python side in test_types.py
@nastra I'll probably get to the remaining comments tomorrow, sorry for the delay here |
@nastra just to confirm, is increasing the tests in |
I believe so, but I'd have to re-review the PR tomorrow. Sorry, for the review delay here. |
go/iceberg/types_test.go
Outdated
"github.com/stretchr/testify/require" | ||
) | ||
|
||
func TestRemainingTypes(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining
is rather a confusing name. The goal here is to test all types, including lists/maps/nested structs.
I think test_types.py
does a great job in testing JSON & string representations of all types, and I think we should do the same here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade I think that the only item that is currently outstanding. We just need to make sure that all types are extensively tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated and tests are updated too. Let me know if there's any other tests you think I should add. Confirmed that all JSON and string representations are covered in the tests (according to the test coverage reports)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I think this should be fine now. We should probably re-open this against https://github.com/apache/iceberg-go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll go do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra Looks like the iceberg-go
repo is completely empty, the github UI doesn't allow forking an empty repository so that I can submit a PR. Could someone please commit a LICENSE file or empty README.md or something so that I can fork it to create the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a README to iceberg-go
, so hopefully things should work now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra I've created apache/iceberg-go#1 😄 in addition to the existing code here I've added general dev script stuff there.
Closing this PR in favor of it having been merged / added to github.com/apache/iceberg-go |
Initial PR of a Golang implementation of the Iceberg spec. This PR contains: