Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #35235 -- ArrayAgg() doesn't return default when filter contains __in=[]. #17890

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sharonwoo
Copy link
Contributor

https://code.djangoproject.com/ticket/35235

New to Django core here and wanting to try for next djangonaut run, thought I could try to triage by writing some quickie tests to see if I could replicate the issue... and I can't. My 2 additional tests (and all of them, actually) pass locally.

Let me know if this is not a good approach.

@sharonwoo sharonwoo marked this pull request as draft February 20, 2024 08:14
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

)
self.assertEqual([], result)

self.assertIn("[] != '{}'", str(context.exception))
Copy link
Contributor Author

@sharonwoo sharonwoo Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test, with _output_field_or_none as @cached_property, raises an exception as result gives '{}'. I then call the context.exception to show what the result is... a bit clumsy I know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for experiments, right?
Normally, I'd use:

        qs = AggregateTestModel.objects.annotate(
            test_array_agg=ArrayAgg(
                "stattestmodel__int1",
                filter=Q(pk__in=[]),
                default=Value([]),
            )
        )
        self.assertSequenceEqual(qs.get().test_array_agg, [])

.first()
.test_array_agg
)
self.assertEqual([], result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's preferred to have an expected value as a second argument.

Suggested change
self.assertEqual([], result)
self.assertSequenceEqual(result, [])

test_array_agg=ArrayAgg(
"stattestmodel__int1",
filter=Q(pk__in=[]),
default=Value([]),
Copy link
Contributor

@shangxiao shangxiao Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional idea: maybe setup this as a subtest to test the conditions:

filter value:

  1. [-1]
  2. []

default value:

  1. []
  2. Value([])

This increases our "black box" coverage. (Simon mentioned that Value(…) is no longer necessary for defaults and we may want to update the docs to reflect that but it still may be valuable to use for coverage.)

Edit: Clarified conditions as filter value × default value as per Mariusz' setup in the ticket.

@shangxiao
Copy link
Contributor

I wonder if we should also add a test around _output_field_or_none though it could be a bit too trivial 🤔

eg

def test_output_field_or_none(self):
    expression = FooExpression()
    self.assertIsNone(assert expression._output_field_or_none)
    new_output_field = BooleanField()
    expression.output_field = new_output_field
    self.assertIs(expression._output_field_or_none, new_output_field)

@sharonwoo sharonwoo changed the title ArrayAgg inconsistent return value when no results - could not replicate issue Fixed #35235 -- ArrayAgg() doesn't return default when filter contains __in=[]. Feb 23, 2024
@sharonwoo sharonwoo force-pushed the arrayagg-inconsistent-return-value-ticket-35235 branch from 51f14bf to c4c12a0 Compare February 23, 2024 03:18
@sharonwoo sharonwoo marked this pull request as ready for review February 23, 2024 03:27
@sharonwoo sharonwoo force-pushed the arrayagg-inconsistent-return-value-ticket-35235 branch from c4c12a0 to 80fab95 Compare February 23, 2024 03:33
def test_array_agg_with_empty_filter_and_default_values(self):
for filter_value in ([-1], []):
for default_value in ([], Value([])):
with self.subTest(filter=filter_value, default=default_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏☺️

@charettes
Copy link
Member

charettes commented Feb 23, 2024

@shangxiao

I wonder if we should also add a test around _output_field_or_none though it could be a bit too trivial 🤔

I think it might be worth it yes just to capture that is it an unexpected behaviour of BaseExpression

Also do you know how to trigger a benchmark run on a PR? I figured it might be worth a run to assert our expectations wrt/to _output_field_or_none uncaching hold.

My expectations are that since every expression is composed of other expressions that they resolve their output field from and eventually cache them it should have no impact but it might be valuable to confirm prior to merging if it's just a few more CPU cycle away.

@charettes
Copy link
Member

charettes commented Feb 23, 2024

BTW I also just noticed _output_field_resolved_to_none which gets assigned in output_field and it seems that flag might also warrant clearing? I wonder if its existence warrants the usage of a slightly more involved solution that has output_field assignment clear both _output_field_or_none and _output_field_resolved_to_none instead.

@felixxm felixxm added the benchmark Apply to have ASV benchmarks run on a PR label Feb 23, 2024
@shangxiao
Copy link
Contributor

BTW I also just noticed _output_field_resolved_to_none which gets assigned in output_field and it seems that flag might also warrant clearing? I wonder if its existence warrants the usage of a slightly more involved solution that has output_field assignment clear both _output_field_or_none and _output_field_resolved_to_none instead.

I saw that. Initially I figured we could ignore it as it's only used in that module, but as folks always say we don't know how this module gets used out in the wild 😝

A simpler (and possibly more Pythonic?) approach could just be to remove flags altogether and raise a different exception type, and use that as the distinguishing factor as to whether _output_field_or_none() reraises.

eg

--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -172,7 +172,6 @@ class BaseExpression:
     empty_result_set_value = NotImplemented
     # aggregate specific fields
     is_summary = False
-    _output_field_resolved_to_none = False
     # Can the expression be used in a WHERE clause?
     filterable = True
     # Can the expression can be used as a source expression in Window?
@@ -310,8 +309,7 @@ class BaseExpression:
         """Return the output type of this expressions."""
         output_field = self._resolve_output_field()
         if output_field is None:
-            self._output_field_resolved_to_none = True
-            raise FieldError("Cannot resolve expression type, unknown output_field")
+            raise ValueError("Cannot resolve expression type, unknown output_field")
         return output_field

     @property
@@ -323,8 +321,7 @@ class BaseExpression:
         try:
             return self.output_field
         except FieldError:
-            if not self._output_field_resolved_to_none:
-                raise
+            raise

     def _resolve_output_field(self):
         """

@shangxiao
Copy link
Contributor

shangxiao commented Feb 23, 2024

Or maybe this is better as it keeps the API consistent - this way it doesn't disrupt folks expecting FieldError from .output_field.

I know Mariusz doesn't like specific exceptions but this is a practice I find valuable - a practice which I got from old mate Raymond Hettinger in one of his presentations.

Anyway though the point I'm making is that "state is the root of all evil" (aside from premature optimisation that is 😁)

--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -166,13 +166,16 @@ class Combinable:
         return NegatedExpression(self)


+class OutputFieldIsNoneError(FieldError):
+    pass
+
+
 class BaseExpression:
     """Base class for all query expressions."""

     empty_result_set_value = NotImplemented
     # aggregate specific fields
     is_summary = False
-    _output_field_resolved_to_none = False
     # Can the expression be used in a WHERE clause?
     filterable = True
     # Can the expression can be used as a source expression in Window?
@@ -310,8 +313,9 @@ class BaseExpression:
         """Return the output type of this expressions."""
         output_field = self._resolve_output_field()
         if output_field is None:
-            self._output_field_resolved_to_none = True
-            raise FieldError("Cannot resolve expression type, unknown output_field")
+            raise OutputFieldIsNoneError(
+                "Cannot resolve expression type, unknown output_field"
+            )
         return output_field

     @property
@@ -322,9 +326,10 @@ class BaseExpression:
         """
         try:
             return self.output_field
+        except OutputFieldIsNoneError:
+            return
         except FieldError:
-            if not self._output_field_resolved_to_none:
-                raise
+            raise

     def _resolve_output_field(self):
         """

@charettes
Copy link
Member

charettes commented Feb 23, 2024

I like the OutputFieldIsNoneError approach a lot, it seems like it should have been this way in the first place.

One small tweak I would make; the whole except FieldError: raise branch is now unnecessary

--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -166,13 +166,16 @@ class Combinable:
         return NegatedExpression(self)


+class OutputFieldIsNoneError(FieldError):
+    pass
+
+
 class BaseExpression:
     """Base class for all query expressions."""

     empty_result_set_value = NotImplemented
     # aggregate specific fields
     is_summary = False
-    _output_field_resolved_to_none = False
     # Can the expression be used in a WHERE clause?
     filterable = True
     # Can the expression can be used as a source expression in Window?
@@ -310,8 +313,9 @@ class BaseExpression:
         """Return the output type of this expressions."""
         output_field = self._resolve_output_field()
         if output_field is None:
-            self._output_field_resolved_to_none = True
-            raise FieldError("Cannot resolve expression type, unknown output_field")
+            raise OutputFieldIsNoneError(
+                "Cannot resolve expression type, unknown output_field"
+            )
         return output_field

     @property
@@ -322,9 +326,10 @@ class BaseExpression:
         """
         try:
             return self.output_field
+        except OutputFieldIsNoneError:
+            return
-         except FieldError:
-            if not self._output_field_resolved_to_none:
-                raise

     def _resolve_output_field(self):
         """

@shangxiao
Copy link
Contributor

Ah yes of course you're right, one of those forest-trees moments 😂

@sharonwoo sharonwoo force-pushed the arrayagg-inconsistent-return-value-ticket-35235 branch from 80fab95 to b148ed7 Compare February 24, 2024 14:48
@shangxiao
Copy link
Contributor

@sharonwoo Just need some tests around _output_field_or_none 👍

@sharonwoo sharonwoo force-pushed the arrayagg-inconsistent-return-value-ticket-35235 branch 3 times, most recently from 1e346f5 to 407d534 Compare February 27, 2024 05:11
@sharonwoo sharonwoo marked this pull request as draft February 27, 2024 05:13
with self.assertRaisesMessage(FieldError, msg):
Value(object()).output_field
def test_output_field_resolution(self):
# why does this fail, but test_resolve_output_field_with_null pass?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding - to understand the difference between this test and the set of tests containing test_resolve_output_field_with_null

Copy link
Contributor Author

@sharonwoo sharonwoo Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reading https://code.djangoproject.com/ticket/33397 and #15271 Q__Q

test_output_field_resolution should see the expression combinations put into the annotation raise OutputFieldIsNoneError EXCEPT the three handled by db.models.expressions._connector_combinations, 'IntegerField', 'DecimalField', 'FloatField'.

@sharonwoo sharonwoo force-pushed the arrayagg-inconsistent-return-value-ticket-35235 branch from 407d534 to 14d3065 Compare March 4, 2024 08:35
@sharonwoo
Copy link
Contributor Author

Sorry, let me come back to this this or next week :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Apply to have ASV benchmarks run on a PR
Projects
None yet
5 participants