Add UnboundReference class by samredai · Pull Request #4679 · apache/iceberg

samredai · 2022-05-02T00:43:32Z

This adds an UnboundReference class that has a bind method which returns a BoundReference class. Merging this relies on PR #4678 which adds the Schema.accessor_for_field method. For complete functionality, a subsequent PR needs to be opened that implements the struct method in the BuildPositionAccessors visitor (added in the other PR) that builds a map of field ID to schema position accessor (the same PR should probably include tests for UnboundReference and BuildPositionAccessors.

To summarize:

Review and merge PR Add a skeleton for the BuildPositionAccessors visitor #4678
Rebase this PR (tests will then pass), review and merge
Follow-up with an update to PR Add a skeleton for the BuildPositionAccessors visitor #4678 that adds the logic for building the field ID -> Accessor map
Follow-up with tests for BuildPositionAccessors and UnboundReference

cc: @rdblue @emkornfield @dramaticlly @CircArgs @jun-he @dhruv-pratap

samredai · 2022-05-03T15:59:50Z

This has been rebased and is ready for review

kbendick · 2022-05-03T17:31:49Z

BTW, might want to add a note in the PR description that this class corresponds to the Java API's NamedReference.

A lot of times after looking through the git blame, I’ll go back and look at PRs for context and given we're changing the name that might give others a jumping off point to go investigate further.

rdblue · 2022-05-03T20:31:42Z

python/src/iceberg/expressions/base.py

+
+
+class UnboundReference:
+    """A reference not yet bounded to a field in a schema


I think you want "bound" instead of "bounded".

Fixed! Thanks

samredai · 2022-05-03T21:02:27Z

@kbendick good point, I've added this to the docstring:

    Note:
        An unbound reference is sometimes referred to as a "named" reference

emkornfield · 2022-05-04T05:01:43Z

python/src/iceberg/expressions/base.py

+    """A reference not yet bound to a field in a schema
+
+    Args:
+        name (str): The name of the field


it might pay to provide more details on name here for nested fields, or at least point to other documentation on how they are expected to be referred to.

This isn't directly related to this PR, but i could find in the specification what is considered a valid field name other than the fact that they can contain periods "." at least for imports. This is quite possibly user error though.

There aren't spec requirements for field names. In the implementations, we index using . to join the names so that you can easily project names like a.b.

@emkornfield agreed this could use more details. Once the accessor functionality is added I'm planning to circle back here and add an example in the docstring along the lines of:

Example: >>> from iceberg.expressions.base import UnboundReference >>> from iceberg.schema import Schema >>> from iceberg.types import FloatType, MapType, NestedField, StringType >>> schema = Schema( ... NestedField( ... field_id=1, ... name="location", ... field_type=MapType( ... key_id=2, ... key_type=StringType(), ... value_id=3, ... value_type=FloatType(), ... value_is_optional=False, ... ), ... is_optional=False, ... ), ... schema_id=1, ... ) >>> name = "location.lat" >>> unbound_reference = UnboundReference(name="location.lat") >>> unbound_reference.bind(schema=schema, case_sensitive=False) BoundReference(field=1, accessor=Accessor(position=1, inner=Accessor(position=2)))

I'll also expand name description to be more descriptive. Thanks!

emkornfield · 2022-05-04T05:04:14Z

python/src/iceberg/expressions/base.py

+        Returns:
+            BoundReference: A reference bound to the specific field in the Iceberg schema
+        """
+        field = schema.find_field(name_or_id=self.name, case_sensitive=case_sensitive)


are nested field names guaranteed to be unambigous?

Can there be a field named "a.b" and a field named "b" nested under a struct "a"?

Yes, they are guaranteed to be unambiguous. Iceberg will fail if there are names that map to the same flattened version.

samredai mentioned this pull request May 2, 2022

Add a skeleton for the BuildPositionAccessors visitor #4678

Merged

github-actions bot added the python label May 2, 2022

samredai marked this pull request as draft May 2, 2022 00:48

samredai added 2 commits May 3, 2022 08:47

Add UnboundReference class

565c4aa

Add return types and bind method docstring

2e12fee

samredai marked this pull request as ready for review May 3, 2022 15:59

rdblue reviewed May 3, 2022

View reviewed changes

Fix typo and add note about named reference

83c69ae

White space linter error

6977eee

emkornfield reviewed May 4, 2022

View reviewed changes

rdblue approved these changes May 4, 2022

View reviewed changes

rdblue merged commit 42e47f5 into apache:master May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UnboundReference class#4679

Add UnboundReference class#4679
rdblue merged 4 commits intoapache:masterfrom
samredai:UnboundReference

samredai commented May 2, 2022 •

edited

Loading

Uh oh!

samredai commented May 3, 2022

Uh oh!

kbendick commented May 3, 2022

Uh oh!

rdblue May 3, 2022

Uh oh!

samredai May 3, 2022

Uh oh!

samredai commented May 3, 2022

Uh oh!

emkornfield May 4, 2022

Uh oh!

rdblue May 4, 2022

Uh oh!

samredai May 4, 2022

Uh oh!

emkornfield May 4, 2022

Uh oh!

rdblue May 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		class UnboundReference:
		"""A reference not yet bounded to a field in a schema

Conversation

samredai commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samredai commented May 3, 2022

Uh oh!

kbendick commented May 3, 2022

Uh oh!

rdblue May 3, 2022

Choose a reason for hiding this comment

Uh oh!

samredai May 3, 2022

Choose a reason for hiding this comment

Uh oh!

samredai commented May 3, 2022

Uh oh!

emkornfield May 4, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue May 4, 2022

Choose a reason for hiding this comment

Uh oh!

samredai May 4, 2022

Choose a reason for hiding this comment

Uh oh!

emkornfield May 4, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue May 4, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

samredai commented May 2, 2022 •

edited

Loading