Skip to content

Conversation

@CircArgs
Copy link
Contributor

@CircArgs CircArgs commented Apr 1, 2022

Created the basic boolean expression types.

@github-actions github-actions bot added the python label Apr 1, 2022
@CircArgs
Copy link
Contributor Author

CircArgs commented Apr 1, 2022

@samredai @rdblue please take a look when you get the chance, thanks a lot.

@CircArgs
Copy link
Contributor Author

CircArgs commented Apr 6, 2022

@rdblue I think I've made the changes you suggested and also referred to the link for some further modifications. I have some CI issues to work out, but if I'm on the right or wrong track with these changes please let me know

@CircArgs CircArgs requested a review from rdblue April 6, 2022 15:50
@CircArgs
Copy link
Contributor Author

CircArgs commented Apr 6, 2022

Fixed my failing tox checks. Had a few logical errors and a mistakenly overwritten file.

return left
return super().__new__(cls)

def __init__(self, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ensure that rest is empty since it should only be handled using __new__? If someone calls this directly, it would currently discard rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong. If rest is non-empty then it ignores left and right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rest is non empty then new uses reduce to construct the object with only left and right specified

Copy link
Contributor

@rdblue rdblue Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not in this method. If this method is called directly then it would do the wrong thing. That could happen in a subclass, right?

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great. I think there are a couple of slight problems to fix that I noted.

@dramaticlly
Copy link
Contributor

Thanks @CircArgs , we actually merged #4468 to renamed the expression module structure. Maybe you will want to rename accordingly from expression to expressions so we dont have both of expression module? Some rebase of the change might be needed as well

@CircArgs
Copy link
Contributor Author

Thanks @dramaticlly I think it's good now in that respect. @rdblue are we all set?

):
if not rest:
self.left = left
self.right = right
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still need to be fixed. Even if we don't expect calls to __init__ with both left/right and rest, we should not ignore the possibility.

Copy link
Contributor Author

@CircArgs CircArgs Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rdblue, I was trying to suggest that it's not plainly ignored and that __new__ takes care of making sure __init__ is only called with left and right. This test shows how And handles more arguments. Is that what you were thinking would not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant to be a convenience feature and I saw it in the Java file you shared which is why it is here. If you don't think this is necessary I can remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a great idea to handle this in __new__. But __init__ still exists and can be called directly. I don't think it should assume it is always called through __new__. I think the only thing that you need to do is add an else case that throws ValueError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ still exists and can be called directly. I don't think it should assume it is always called through __new__.

Would you mind elaborating on what you mean by this? To my knowledge, there is no calling init directly without implicitly calling new since new gives you the base object and init essentially adds/modifies the attributes

Copy link
Contributor

@samredai samredai Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But __init__ still exists and can be called directly.

I don't think there's a way to do this while avoiding a call to __new__. The one way I wasn't sure of was if a child class calls init via super but I did this quick test and even that calls __new__.

class Foo:
    def __new__(cls):
        print("New called")
        return cls
    def __init__(self, val):
        self.val = val

class Bar(Foo):
    def __init__(self):
        super().__init__(5)

Bar()  # "New called"

Copy link
Contributor

@samredai samredai Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this could use some doc-strings describing this behavior though. Specifically mentioning __new__

    """AND operation expression - logical conjunction

    Args:
        left(BooleanExpression): The left hand boolean expression
        right(BooleanExpression): The right hand boolean expression
        *rest(BooleanExpression): Additional boolean expressions to fold into the AND expression

    Note:
        Additional boolean expressions provided to `rest` will be folded/reduced to create an
        object with only `left` and `right` specified (see  the `__new__` constructor) . For
        example, And(a, b, c) wil result in the equivalent of And(And(a, b), c).
    """

Once predicates are added we could include an Example: section here too without failing the pydoc tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Foo.__init__(obj) directly doesn't call __new__:

f = Foo() # "New called"
Foo.__init__(f, 6) # avoids __new__

I'm not saying that these are likely cases, I'm just saying that we should prefer throwing exceptions for cases like this rather than failing silently.

    def __init__(self, left, right, *rest):
        if not rest:
            self.left = left
            self.right = right
        else:
            raise ValueError("Expected only left and right operands")

It isn't that intrusive to have the else case and it will catch crazy cases because we can't prevent people from calling __init__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you can also avoid calling __new__ by overriding it in the subclass:

class Bar(Foo):
    def __new__(cls):
        print("Subclass new called")

Bar()  # "Subclass new called"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah I agree it's a small addition to protect against these cases. It also makes the init easier to understand.

right: BooleanExpression,
*rest: BooleanExpression,
):
if not rest:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the tests are failing on the new ValueError. Maybe updating this to if not rest or len(rest) <= 0:?

@CircArgs
Copy link
Contributor Author

CircArgs commented Apr 22, 2022

Hey @rdblue I wanted to circle back on the original without raising the error. I threw it in without considering the implications based on your suggestion, but I feel as though we still don't need such a thing. When one creates an And for example, it will call __new__ and run through the logic on whether we are dealing with a tautology or a contradiction and yield AlwaysTrue or AlwaysFalse respectively. This logic, in conjunction with taking in *rest and reducing to binary Ands makes us have to return and that's why I'm using __new__. Take this example:

class A:
    def __new__(cls):
        return B()
    def __init__(self):
        print('A init')
        
class B:
    def __init__(self):
        print('B init')

A()
>>> "B init"

So A's __new__ was called but the resulting object was a B so its __init__ was called. I'm just showing how __new__ has full control here and that And's __init__ will only ever see what __new__ wants it to see. As for what happens via possible calls to super from child classes, do we have to worry about that? I believed these classes would not be inherited from and if they were it is the user's responsibility to understand the implications especially when dealing with dunder methods.

@rdblue
Copy link
Contributor

rdblue commented Apr 22, 2022

@CircArgs, the method can still be called with bad arguments. Since we cannot hide it entirely, I think throwing an exception is a reasonable step to ensure it isn't misused.

It's also interesting that we're actually hitting the ValueError in tests:

  ==================================== ERRORS ====================================
  _________ ERROR collecting tests/expressions/test_expressions_base.py __________
  tests/expressions/test_expressions_base.py:159: in <module>
      base.And(TestExpressionA(), TestExpressionB(), TestExpressionA()),
  src/iceberg/expressions/base.py:162: in __init__
      raise ValueError("Expected only left and right operands")
  E   ValueError: Expected only left and right operands
  _________ ERROR collecting tests/expressions/test_expressions_base.py __________
  tests/expressions/test_expressions_base.py:159: in <module>
      base.And(TestExpressionA(), TestExpressionB(), TestExpressionA()),
  src/iceberg/expressions/base.py:162: in __init__
      raise ValueError("Expected only left and right operands")
  E   ValueError: Expected only left and right operands

Why is that case getting hit? I thought that we were controlling the inputs to __init__?

@samredai
Copy link
Contributor

samredai commented Apr 23, 2022

The ValueError getting raised comes from the fact that __init__ is always called after __new__ when you call the standard class constructor. The first argument to __init__ (meaning self) is the value returned from __new__.

However for the if rest case that's in __new__ here, a fully constructed instance is returned by return reduce(And, (left, right, *rest)) and so __init__ doesn't need to do anything, which is why it was only doing something if not rest. So whether or not rest is provided is being used as a sort of signal to __init__ on whether or not the class has already been fully initialized in __new__. Adding the ValueError disrupts the "already initialized path" by causing __init__ to panic.

Two other alternatives I can think of is:

  1. Move the initialization entirely into __new__ and get rid of __init__ completely:
class And(BooleanExpression):
    """AND operation expression - logical conjunction"""

    def __new__(cls, left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression):
        if rest:
            return reduce(And, (left, right, *rest))
        if left is AlwaysFalse() or right is AlwaysFalse():
            return AlwaysFalse()
        elif left is AlwaysTrue():
            return right
        elif right is AlwaysTrue():
            return left
        _instance = super().__new__(cls)
        _instance.args = (left, right)
        vars(_instance)["left"] = left
        vars(_instance)["right"] = right
        return _instance
  1. Move the reduction logic out into a standalone function and simplify the And class to only accept a left and right arg:
def and_expression(left: BooleanExpression, right: BooleanExpression, *rest: BooleanExpression) -> Union[And, AlwaysTrue, AlwaysFalse]:
    if rest:
        return reduce(And, (left, right, *rest))
    if left is AlwaysFalse() or right is AlwaysFalse():
        return AlwaysFalse()
    elif left is AlwaysTrue():
        return right
    elif right is AlwaysTrue():
        return left
    return And(left, right)

A note on option 1, someone can still override __new__ in a subclass as @rdblue pointed out but I think the expectation holds that they should call super().__new__(cls, *args, **kwargs), just as it's expected to call super().__init__(*args, **kwargs) if you override __init__ in a subclass.

A note about option 2 is that it may make checking if something is an instance of And less intuitive, in other words:

from iceberg.expressions.base import and_expression, And

expr = and_expression(left, right, rest)
isinstance(expr, And)

as opposed to:

from iceberg.expressions.base import And

expr = And(left, right, rest)
isinstance(expr, And)

(sorry for the super long comment...)

@rdblue
Copy link
Contributor

rdblue commented Apr 24, 2022

Thanks for finding the problem, @samredai. I think the solution to remove __init__ makes sense. Thanks for implementing that change, @CircArgs!

@rdblue rdblue merged commit 449a743 into apache:master Apr 24, 2022
@rdblue
Copy link
Contributor

rdblue commented Apr 24, 2022

Thanks, @CircArgs!

@CircArgs
Copy link
Contributor Author

Thanks, @CircArgs!

Thank you for all the helpful commentary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants