Fixed #15648 -- Allowed QuerySet.values_list() to return a namedtuple. #8852

sir-sigurd · 2017-08-04T11:49:00Z

https://code.djangoproject.com/ticket/15648

I'm not sure if there is consensus on API, but I decided to push it forward.
I'm not sure also if the current way of naming expressions is OK and if should support expressions with named=True at all.

atombrella · 2017-08-05T14:21:51Z

docs/ref/models/querysets.txt

@@ -651,6 +651,11 @@ rather than one-tuples. An example should make the difference clearer::

 It is an error to pass in ``flat`` when there is more than one field.

+``named`` parameter can be passed in to get results as named tuples::
+
+    >>> Entry.objects.values_list('id', 'headline')


Did you forget named=True here?

Yes, thanks!

jarshwah · 2017-08-08T21:47:20Z

I like it! What are your concerns with passing back annotations? With regular querysets the annotations are provided with their names as attributes, I don't see any issue here.

sir-sigurd · 2017-08-09T03:33:52Z

@jarshwah
I mean when we pass expressions directly to values_list() it doesn't look good:

In [2]: PersonSkill.objects.values_list('id', models.Value(1, output_field=models.IntegerField()), named=True).first()
Out[2]: Row(id=10001, value140320025193552=1)

adamchainz

This is nice, I've thought of this nearly every time I've used values_list and then found myself doing [0] etc. later on, having to refer back to see what it means

adamchainz · 2017-08-16T06:54:28Z

django/db/models/query.py

+            extra_names = list(query.extra_select)
+            annotation_names = list(query.annotation_select)
+            names = extra_names + field_names + annotation_names
+        return map(namedtuple('Row', names)._make, rows)


We should try cache the namedtuple classes that come out of this, they can take about half a millisecond to instantiate, which is a lot of time - if a view invokes 10 queries that's 5ms, or 10% of a tight performance budge, gone:

In [14]: %timeit namedtuple('Row', ['a', 'b', 'c', 'd']) 1000 loops, best of 3: 417 µs per loop

If you make names a tuple (so it's hashable) then you can just have:

@lru_cache() # using default maxsize, i don't think it would need tweaking really def create_namedtuple(names): return namedtuple('Row', names)

Also the temp name rows isn't needed, I'd move it into the map call, leaving:

return map( create_namedtuple(names)._make, super().__iter__(), )

adamchainz · 2017-08-16T06:55:22Z

django/db/models/query.py

+            NamedValuesListIterable if named
+            else FlatValuesListIterable if flat
+            else ValuesListIterable
+        )


not sure if this really passes the django style guide, compared to plain if/else statements

It looks okay to me.

adamchainz · 2017-08-16T06:56:10Z

docs/releases/2.0.txt

@@ -263,6 +263,9 @@ Models
 * The new ``field_name`` parameter of :meth:`.QuerySet.in_bulk` allows fetching
  results based on any unique model field.

+* The new ``named`` parameter of :meth:`.QuerySet.values_list` allows to get
+  results as named tuples.


"allows fetching results as named tuples" (or should that be "namedtuple instances"?)

adamchainz · 2017-08-16T06:58:55Z

docs/ref/models/querysets.txt

@@ -651,6 +651,11 @@ rather than one-tuples. An example should make the difference clearer::

 It is an error to pass in ``flat`` when there is more than one field.

+``named`` parameter can be passed in to get results as named tuples::


"You can also pass in the named parameter to get results as named tuples (ref:collection.namedtuple):"

It might be a user's first time coming across collections.namedtuple when reading these docs, so we should link out to the Python ones so they can learn what it means.

adamchainz · 2017-08-16T07:00:37Z

tests/queries/tests.py

+
+        expr = F('num') + 1
+        values = qs.values_list(expr, named=True).first()
+        self.assertEqual(values._fields, ('combinedexpression%s' % id(expr),))


I think the test should be split into multiple test methods, one per thing-being-tested, as above

adamchainz · 2017-08-16T07:12:11Z

As to value140320025193552 we could change the field name generation from using str(id(field)), to something simple like:

field_names = {f for f in fields if not hasattr(f, 'resolve_expression')}
_fields = []
expressions = {}
unnamed_counter = 1
for field in fields:
    if hasattr(field, 'resolve_expression'):
        while True:
            field_id = 'x' + str(unnamed_counter)
            unnamed_counter += 1
            if field_id not in field_names:
                break
        expressions[field_id] = field
        _fields.append(field_id)
    else:
        _fields.append(field)

Then you just get x1, x2 etc. for the unnamed things - 0 based indexing to be bikeshedded ofc 🚲. Users can always address them by index too.

The current patch has an incredibly rare bug, if some model happens to have a field value1234 that is requested, and the id() of an expression resolves to 1234, then it will overwrite the existing field ;)

timgraham · 2017-09-04T23:13:53Z

django/db/models/query.py

+    """
+
+    @staticmethod
+    @lru_cache()


It's not immediately obvious to me the purpose of caching here. Maybe a comment would be useful. Also might want to consider the discussion in #8825.

Added comment for this.
I don't think that it worth to set maxsize=None, because overhead of @lru_cache() seems to be futile relatively for obtaining QuerySet:

In [7]: %timeit Person.objects.values_list(named=True) 156 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

maxsize=None is a bad idea like Aymeric says in #8825.

timgraham · 2017-09-04T23:15:36Z

django/db/models/query.py

+            NamedValuesListIterable if named
+            else FlatValuesListIterable if flat
+            else ValuesListIterable
+        )


It looks okay to me.

timgraham · 2017-09-05T01:13:39Z

tests/queries/tests.py

+            Number.objects.values_list('num', flat=True, named=True)
+
+    def test_named_values_list_bad_field_name(self):
+        msg = "Type names and field names must be valid identifiers: '1'"


Did you consider catching this exception and reraising a message that explains the issue in terms that a Django user might more easily understand?

How about this?

try: tuple_class = self.create_namedtuple_class(*names) except Exception as e: msg = 'namedtuple() raised exception. Make sure that field names are valid identifiers.' raise ValueError(msg) from e

If only QuerySet.extra() is affected, we can skip it since extra() usage is discouraged.

timgraham · 2017-09-05T01:14:18Z

tests/queries/tests.py

+    def test_named_values_list_bad_field_name(self):
+        msg = "Type names and field names must be valid identifiers: '1'"
+        with self.assertRaisesMessage(ValueError, msg):
+            Number.objects.extra(select={'1': 'num+1'}).values_list('1', named=True).first()


Is it possible to trigger the error without using QuerySet.extra() considering that it might be deprecated in the future?

It seems that QuerySet.extra() is the only way to make field name which is not valid identifier.

timgraham · 2017-09-05T01:24:48Z

django/db/models/query.py

+    @staticmethod
+    @lru_cache()
+    def create_namedtuple_class(*names):
+        return namedtuple('Row', names)


Did you consider using queryset.model instead of 'Row'? I don't know if that would cause confusion with model instances.

IMHO it doesn't make much sense as returned tuples don't have strong connection to model. User could do something like this:

Model.objects.annotate(f=models.F('related__f')).values_list('f', named=True)

timgraham · 2017-09-05T01:25:29Z

docs/ref/models/querysets.txt

@@ -651,6 +651,12 @@ rather than one-tuples. An example should make the difference clearer::

 It is an error to pass in ``flat`` when there is more than one field.

+You can also pass in the named parameter to get results as named tuples


You can pass ``named=True`` to get results as a :func:`~python:collections.namedtuple`::

sir-sigurd · 2017-09-05T07:41:33Z

django/db/models/query.py

+            names.extend(query.values_select)
+            names.extend(query.annotation_select)
+        tuple_class = self.create_namedtuple_class(*names)
+        # tuple_class._make() is inlined here for better performance.


Using _make():

%timeit for x in Person.objects.values_list(named=True): pass 23.1 ms ± 438 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using inlined _make():

%timeit for x in Person.objects.values_list(named=True): pass 18 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comparing with plain tuples:

%timeit for x in Person.objects.values_list(): pass 14 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I think tuple_class._make refers to tuple.__new__ but are the names intentionally different?

No, I meant this _make():

In [107]: namedtuple('Row', ('field',), verbose=True) from builtins import property as _property, tuple as _tuple from operator import itemgetter as _itemgetter from collections import OrderedDict class Row(tuple): 'Row(field,)' __slots__ = () _fields = ('field',) def __new__(_cls, field,): 'Create new instance of Row(field,)' return _tuple.__new__(_cls, (field,)) @classmethod def _make(cls, iterable, new=tuple.__new__, len=len): 'Make a new Row object from a sequence or iterable' result = new(cls, iterable) if len(result) != 1: raise TypeError('Expected 1 arguments, got %d' % len(result)) return result ...

I added this comment so it would be easier to understand from what I derived this.

I'm still not sure what "tuple_class._make() is inlined here" means.

Probably it's better to remove it in this case.

adamchainz · 2017-09-05T08:01:24Z

django/db/models/query.py

        for field in fields:
            if hasattr(field, 'resolve_expression'):
-                field_id = str(id(field))
+                field_id_prefix = getattr(field, 'default_alias', field.__class__.__name__.lower())


timgraham · 2017-09-06T15:23:47Z

docs/ref/models/querysets.txt

+
+    >>> Entry.objects.values_list('id', 'headline', named=True)
+    <QuerySet [Row(id=1, headline='First entry'), ...]>
+


To explain why namedtuple isn't the default (it seems backwards compatible as Anssi noted on the ticket), maybe we should add something like:
Using a named tuple may make use of the results more readable, at the expense
of a small performance penalty for transforming the results into a named tuple.

timgraham · 2017-09-06T15:24:06Z

django/db/models/query.py

+    @staticmethod
+    @lru_cache()
+    def create_namedtuple_class(*names):
+        # namedtuple() is too slow to be called for every QuerySet evaluation


# Cache namedtuple() with @lru_cache() since it's too slow to be # called for every QuerySet evaluation.

timgraham · 2017-09-06T15:25:07Z

django/db/models/query.py

-    def values_list(self, *fields, flat=False):
+    def values_list(self, *fields, flat=False, named=False):
+        if flat and named:
+            raise TypeError("'flat' and 'named' can't be passed together.")


For similarity with other messages (e.g. 'The nowait option cannot be used with skip_locked.') you might change 'passed' to 'used'.

…dtuple.

sir-sigurd force-pushed the named-tuples branch from d395d5c to 8ccd567 Compare August 4, 2017 12:12

atombrella reviewed Aug 5, 2017

View reviewed changes

sir-sigurd force-pushed the named-tuples branch from 8ccd567 to b9deeeb Compare August 5, 2017 16:29

adamchainz reviewed Aug 16, 2017

View reviewed changes

sir-sigurd force-pushed the named-tuples branch 5 times, most recently from baa7b16 to d5da9e8 Compare August 17, 2017 03:27

sir-sigurd force-pushed the named-tuples branch 2 times, most recently from 6c07725 to 4d2c6b2 Compare August 30, 2017 02:50

timgraham reviewed Sep 5, 2017

View reviewed changes

sir-sigurd force-pushed the named-tuples branch 2 times, most recently from 9f5d5dd to cc12fd9 Compare September 5, 2017 07:12

sir-sigurd commented Sep 5, 2017

View reviewed changes

adamchainz reviewed Sep 5, 2017

View reviewed changes

timgraham approved these changes Sep 6, 2017

View reviewed changes

sir-sigurd force-pushed the named-tuples branch 2 times, most recently from 77498c1 to 306403f Compare September 6, 2017 16:42

Fixed django#15648 -- Allowed QuerySet.values_list() to return a name…

f3c9562

…dtuple.

timgraham force-pushed the named-tuples branch from 306403f to f3c9562 Compare September 6, 2017 19:32

timgraham merged commit f3c9562 into django:master Sep 6, 2017

sir-sigurd deleted the named-tuples branch September 7, 2017 01:13

UmanShahzad mentioned this pull request Dec 27, 2017

Implement invoice back-end open-craft/accounting#6

Merged

		@@ -651,6 +651,11 @@ rather than one-tuples. An example should make the difference clearer::

		It is an error to pass in ``flat`` when there is more than one field.

		``named`` parameter can be passed in to get results as named tuples::

		@@ -651,6 +651,12 @@ rather than one-tuples. An example should make the difference clearer::

		It is an error to pass in ``flat`` when there is more than one field.

		You can also pass in the named parameter to get results as named tuples


		>>> Entry.objects.values_list('id', 'headline', named=True)
		<QuerySet [Row(id=1, headline='First entry'), ...]>

Fixed #15648 -- Allowed QuerySet.values_list() to return a namedtuple. #8852

Fixed #15648 -- Allowed QuerySet.values_list() to return a namedtuple. #8852

Conversation

sir-sigurd commented Aug 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jarshwah commented Aug 8, 2017

sir-sigurd commented Aug 9, 2017 • edited Loading

adamchainz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamchainz commented Aug 16, 2017 • edited Loading

Choose a reason for hiding this comment

sir-sigurd Sep 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sir-sigurd Sep 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sir-sigurd commented Aug 9, 2017 •

edited

Loading

adamchainz commented Aug 16, 2017 •

edited

Loading

sir-sigurd Sep 5, 2017 •

edited

Loading

sir-sigurd Sep 6, 2017 •

edited

Loading