Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: Schema and Types #8122

Closed
wants to merge 13 commits into from
Closed

Conversation

zeroshade
Copy link
Member

Initial PR of a Golang implementation of the Iceberg spec. This PR contains:

  • Representation of Schema and Types
  • GHA workflow to run unit tests and linting on the Go code
  • basic README/docs

@zeroshade
Copy link
Member Author

zeroshade commented Jul 20, 2023

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, what else can I say since it is very much a copy of the PyIceberg code :D Thanks for working on this! Curious what @nastra thinks of this since he's much more experienced in Go than me.

return false
}

return slices.EqualFunc(s.fields, other.fields, func(a, b NestedField) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. In Python we use a generator that will break the loop once it encounters the first False:

schema_is_equal = all(lhs == rhs for lhs, rhs in zip(self.columns, other.columns))

Is there something similar in Go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slices.EqualFunc will break its loop when it encounters the first false.

if bl, ok := visitor.(BeforeListElementVisitor); ok {
bl.BeforeListElement(elemField)
} else if bf, ok := visitor.(BeforeFieldVisitor); ok {
bf.BeforeField(elemField)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only want to visit the first branch of this if:

@visit.register(ListType)
def _(obj: ListType, visitor: SchemaVisitor[T]) -> T:
    """Visit a ListType with a concrete SchemaVisitor."""
    visitor.before_list_element(obj.element_field)
    result = visit(obj.element_type, visitor)
    visitor.after_list_element(obj.element_field)

    return visitor.list(obj, result)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default implementation of before_list_element calls before_field (https://github.com/apache/iceberg/blob/c27c1fb1f279c8be96847fc41ce637b3eff19661/python/pyiceberg/schema.py#L282-284)

So it does require the second branch

)

var (
tableSchemaNested = iceberg.NewSchemaWithIdentifiers(1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for using the same tests :)

go/iceberg/schema.go Show resolved Hide resolved
}, Required: true},
9: {ID: 9, Name: "key", Type: iceberg.PrimitiveTypes.String, Required: true},
10: {ID: 10, Name: "value", Type: iceberg.PrimitiveTypes.Int32, Required: true},
11: tableSchemaNested.Field(5),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻


require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer maintained, do we still want to continue with this ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are indirect dependencies of github.com/stretchr/testify/assert which is very well maintained currently so I don't think it's an issue.

@nastra
Copy link
Contributor

nastra commented Jul 22, 2023

I'll review this early next week

go/iceberg/doc.go Outdated Show resolved Hide resolved

# Get in Touch

- [Iceberg community](https://iceberg.apache.org/community/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a contributing section or file for GO so that people know how to set up Intellij and how to run tests and such (can be a follow-up)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I'll follow up with that after I get the existing stuff merged.

go/iceberg/types.go Outdated Show resolved Hide resolved
go/iceberg/types.go Show resolved Hide resolved
t.Type = BinaryType{}
default:
switch {
case strings.HasPrefix(typename, "fixed"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the same regex that we're using on the Java side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, I was following the way that pyiceberg works, which regex is essentially just doing the same thing as I'm doing here by checking for the prefix and then parsing the number out of the brackets. If we're gonna switch this to the same regex as the Java side, we should probably do the same in pyiceberg so that all three are aligned, yes?


package iceberg_test

import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all types should be tested, similar to TestSchemaConversions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for all the remaining types.

rev: v4.4.0
hooks:
- id: golangci-lint
entry: bash -c 'cd iceberg && golangci-lint run --fix --timeout 5m'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this require any installation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with pyiceberg, you need to pip install pre-commit first and then you can run pre-commit install to get what you need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for someone who isn't actively working on pyiceberg (like me) this isn't obvious what exactly is required for installation, so would be good to add this somewhere in the docs. This can be a follow-up and can be part of the Docs that describe how to set up IntelliJ with Go and such, so that non-Go committers can review code and know how to run tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I can add stuff to the README for it. Though I have no idea how to set up IntelliJ with Go since I use VSCode lol, so i welcome any contributions on that 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the thing is that most reviewers will be using Intellij and having some docs on how to set up stuff for Go would be great and also improves review cycles

Copy link
Contributor

@coded9 coded9 Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goland is the recommended IDE for go from Jetbrains, but it is paid. VScode has good support for Go compared to IntelliJ.
Core contributors of open source projects can get Goland for free

go/iceberg/types.go Outdated Show resolved Hide resolved
go/iceberg/types.go Outdated Show resolved Hide resolved
go/iceberg/types.go Outdated Show resolved Hide resolved
assert.Truef(t, tableSchemaNested.Equals(&sc), "expected: %s\ngot: %s", tableSchemaNested, &sc)
}

func TestRemainingTypes(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a types_test.go where this should live and where we can test all types with the JSON parsing, similar to how it's done on the Python side in test_types.py

@zeroshade
Copy link
Member Author

@nastra I'll probably get to the remaining comments tomorrow, sorry for the delay here

@zeroshade
Copy link
Member Author

@nastra just to confirm, is increasing the tests in types_test.go the only remaining thing here? or am i overlooking a comment somewhere?

@nastra
Copy link
Contributor

nastra commented Aug 9, 2023

@nastra just to confirm, is increasing the tests in types_test.go the only remaining thing here? or am i overlooking a comment somewhere?

I believe so, but I'd have to re-review the PR tomorrow. Sorry, for the review delay here.

"github.com/stretchr/testify/require"
)

func TestRemainingTypes(t *testing.T) {
Copy link
Contributor

@nastra nastra Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining is rather a confusing name. The goal here is to test all types, including lists/maps/nested structs.

I think test_types.py does a great job in testing JSON & string representations of all types, and I think we should do the same here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeroshade I think that the only item that is currently outstanding. We just need to make sure that all types are extensively tested

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and tests are updated too. Let me know if there's any other tests you think I should add. Confirmed that all JSON and string representations are covered in the tests (according to the test coverage reports)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I think this should be fine now. We should probably re-open this against https://github.com/apache/iceberg-go

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll go do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra Looks like the iceberg-go repo is completely empty, the github UI doesn't allow forking an empty repository so that I can submit a PR. Could someone please commit a LICENSE file or empty README.md or something so that I can fork it to create the PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a README to iceberg-go, so hopefully things should work now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra I've created apache/iceberg-go#1 😄 in addition to the existing code here I've added general dev script stuff there.

@zeroshade
Copy link
Member Author

Closing this PR in favor of it having been merged / added to github.com/apache/iceberg-go

@zeroshade zeroshade closed this Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants