Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error on instance property and init-only variable with the same name in a dataclass #17219

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

roberfi
Copy link
Contributor

@roberfi roberfi commented May 6, 2024

Fixes #12046

It does not raise "already defined" fail if already existing node is InitVar, so name of the property will be redefined. Then, when getting the method of that property, it looks for a possible redefinition.

This comment has been minimized.

@roberfi roberfi marked this pull request as ready for review May 6, 2024 19:07
@stroxler
Copy link
Contributor

The actual fix for the reported bug looks like it works to me, and I think it's probably a good thing for get_method to return the last definition when there are redefinitions of a method (this could come up in library code that isn't type checked, and using the last method seems right).

But I'm a little concerned that the example reported in the bug is actually relying on undefined behavior that just sort of happens to work.

@dataclass
class Test:
    foo: InitVar[str]
    _foo: str = field(init=False)

    def __post_init__(self, foo: str) -> None:
        self._foo = foo

    @property
    def foo(self) -> str:
        return self._foo

    @foo.setter
    def foo(self, value: str) -> None:
        self._foo = value

does indeed work at runtime, because dataclasses relies on inspect.get_annotations and Test.__annotations__ winds up recording the attribute annotation.

But the dataclasses docs do not attempt to specify behavior here, nor do the unit tests at https://github.com/python/cpython/blob/main/Lib/test/test_dataclasses/__init__.py validate anything.

And we can very easily run into poorly-defined behavior; for example if I tweak the first two lines in the example valid to provide a default argument but this code

@dataclass
class Test:
    foo: InitVar[str] = 6
    ... etc ...

then it will not work as expected, trying to use the default value will give

>>> Test()
Test(_foo=<property object at 0x108aa04a0>)

because even though dataclasses is interpreting the annotation using __annotations__ which the runtime is setting based on assignments in the class body, dataclasses in interpreting the value using the actual name bound to Test.foo which is no longer 6 by the time the decorator executes.

cc @JelleZijlstra / @hauntsaninja - I think this question is close to being either a typing spec or a CPython standard library question.

Comment on lines 3166 to +3171
if name in cls.names:
node = cls.names[name].node
if isinstance(node, FuncBase):
return node
elif isinstance(node, Decorator): # Two `if`s make `mypyc` happy
return node
else:
return None
elif possible_redefinitions := sorted(
[n for n in cls.names.keys() if n.startswith(f"{name}-redefinition")]
):
node = cls.names[possible_redefinitions[-1]].node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of what we decide to do about dataclasses that redefine InitVar fields, this seems like it might be a good change.

In general, validating that there aren't redefinitions (which is happening in semanal) seems like it should be independent of getting the appropriate value for a name; when dealing with dependencies, using the last-defined value for a name seems like a good thing.

I wonder if it would be worth pulling into a helper function, and using not just for methods but other attributes as well?

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented May 25, 2024

Oh, nice catch! I didn't fully realise that InitVar could have a default.

From the CPython side, redefinition in general is problematic, even with normal fields. In my similar-to-dataclasses library, I have some heuristic based runtime checks against this case. Basically I complain if something in __annotations__ has a value matching the following:

if _is_property_like(value) or (
    isinstance(value, types.FunctionType)
    and value.__name__ != "<lambda>"
    and value.__qualname__.startswith(cls.__qualname__)
):

(my similar-to-dataclasses lib is sometimes used in ways that make it more likely that people will try to clobber things)

CPython is probably stuck with its behaviour given that folks are clearly relying on it. Either way, it should definitely have tests for this case. Want to open a PR?

From the mypy side, it seems like a half dozen folks have run into this, so it would be nice to support. The ideal behaviour would match the runtime behaviour, i.e. we allow redefinition when there isn't a value and we disallow redefinition when there is a value. @roberfi are you interested in adding the warning back when there's a default value? :-)

@roberfi
Copy link
Contributor Author

roberfi commented May 25, 2024

Oh, that's crazy! This kind of implementation could never have a default value.

Thank you so much for the catch and the explanation. It was a very good one!

@roberfi are you interested in adding the warning back when there's a default value? :-)

Yes, sure, I will add the check/warning and a test to verify that it's working. Let's do it!

@stroxler
Copy link
Contributor

Yeah, I can look into adding some CPython unit tests later this week

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

stroxler added a commit to stroxler/cpython that referenced this pull request May 26, 2024
As originally discussed in
python/mypy#17219,
MyPy has had a false-positive bug report because it errors when a
dataclass has methods that shadow an `InitVar` field.

It is actually a bit surprising that this works, it turns out
that `__annotations__` "remembers" field assignments even if the
bound names are later overwritten by methods; it will *not* work
to provide a default value.

There have been multiple bug reports on MyPy so we know people are
actually relying on this in practice; most likely it comes up when
a dataclass wants to take a "raw" value as an InitVar and transform
it somehow in `__post_init__` into a different value before assigning
it to a field; in that case they may choose to make the actual field
private and provide a property for access.

I currently provide a test of the happy path where there is no default
value provided, but no tests of the behavior when no default is
provided (in which case the property will override the default) and no
documentation (because I'm not sure we want to consider this behavior
officially supported).

The main goal is to have a regression test since it would be easy for a
refactor to break this.
stroxler added a commit to stroxler/cpython that referenced this pull request May 26, 2024
As originally discussed in
python/mypy#17219,
MyPy has had a false-positive bug report because it errors when a
dataclass has methods that shadow an `InitVar` field.

It is actually a bit surprising that this works, it turns out
that `__annotations__` "remembers" field assignments even if the
bound names are later overwritten by methods; it will *not* work
to provide a default value.

There have been multiple bug reports on MyPy so we know people are
actually relying on this in practice; most likely it comes up when
a dataclass wants to take a "raw" value as an InitVar and transform
it somehow in `__post_init__` into a different value before assigning
it to a field; in that case they may choose to make the actual field
private and provide a property for access.

I currently provide a test of the happy path where there is no default
value provided, but no tests of the behavior when no default is
provided (in which case the property will override the default) and no
documentation (because I'm not sure we want to consider this behavior
officially supported).

The main goal is to have a regression test since it would be easy for a
refactor to break this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error on instance property and init-only variable with the same name in a dataclass
3 participants