Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #15648 -- Allowed QuerySet.values_list() to return a namedtuple. #8852

Merged
merged 1 commit into from
Sep 6, 2017

Conversation

sir-sigurd
Copy link
Member

https://code.djangoproject.com/ticket/15648

I'm not sure if there is consensus on API, but I decided to push it forward.
I'm not sure also if the current way of naming expressions is OK and if should support expressions with named=True at all.

@@ -651,6 +651,11 @@ rather than one-tuples. An example should make the difference clearer::

It is an error to pass in ``flat`` when there is more than one field.

``named`` parameter can be passed in to get results as named tuples::

>>> Entry.objects.values_list('id', 'headline')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget named=True here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

@jarshwah
Copy link
Member

jarshwah commented Aug 8, 2017

I like it! What are your concerns with passing back annotations? With regular querysets the annotations are provided with their names as attributes, I don't see any issue here.

@sir-sigurd
Copy link
Member Author

sir-sigurd commented Aug 9, 2017

@jarshwah
I mean when we pass expressions directly to values_list() it doesn't look good:

In [2]: PersonSkill.objects.values_list('id', models.Value(1, output_field=models.IntegerField()), named=True).first()
Out[2]: Row(id=10001, value140320025193552=1)

Copy link
Sponsor Member

@adamchainz adamchainz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, I've thought of this nearly every time I've used values_list and then found myself doing [0] etc. later on, having to refer back to see what it means

extra_names = list(query.extra_select)
annotation_names = list(query.annotation_select)
names = extra_names + field_names + annotation_names
return map(namedtuple('Row', names)._make, rows)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try cache the namedtuple classes that come out of this, they can take about half a millisecond to instantiate, which is a lot of time - if a view invokes 10 queries that's 5ms, or 10% of a tight performance budge, gone:

In [14]: %timeit namedtuple('Row', ['a', 'b', 'c', 'd'])
1000 loops, best of 3: 417 µs per loop

If you make names a tuple (so it's hashable) then you can just have:

@lru_cache()  # using default maxsize, i don't think it would need tweaking really
def create_namedtuple(names):
    return namedtuple('Row', names)

Also the temp name rows isn't needed, I'd move it into the map call, leaving:

return map(
    create_namedtuple(names)._make,
    super().__iter__(),
)

NamedValuesListIterable if named
else FlatValuesListIterable if flat
else ValuesListIterable
)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this really passes the django style guide, compared to plain if/else statements

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks okay to me.

@@ -263,6 +263,9 @@ Models
* The new ``field_name`` parameter of :meth:`.QuerySet.in_bulk` allows fetching
results based on any unique model field.

* The new ``named`` parameter of :meth:`.QuerySet.values_list` allows to get
results as named tuples.
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"allows fetching results as named tuples" (or should that be "namedtuple instances"?)

@@ -651,6 +651,11 @@ rather than one-tuples. An example should make the difference clearer::

It is an error to pass in ``flat`` when there is more than one field.

``named`` parameter can be passed in to get results as named tuples::
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You can also pass in the named parameter to get results as named tuples (ref:collection.namedtuple):"

It might be a user's first time coming across collections.namedtuple when reading these docs, so we should link out to the Python ones so they can learn what it means.


expr = F('num') + 1
values = qs.values_list(expr, named=True).first()
self.assertEqual(values._fields, ('combinedexpression%s' % id(expr),))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test should be split into multiple test methods, one per thing-being-tested, as above

@adamchainz
Copy link
Sponsor Member

adamchainz commented Aug 16, 2017

As to value140320025193552 we could change the field name generation from using str(id(field)), to something simple like:

field_names = {f for f in fields if not hasattr(f, 'resolve_expression')}
_fields = []
expressions = {}
unnamed_counter = 1
for field in fields:
    if hasattr(field, 'resolve_expression'):
        while True:
            field_id = 'x' + str(unnamed_counter)
            unnamed_counter += 1
            if field_id not in field_names:
                break
        expressions[field_id] = field
        _fields.append(field_id)
    else:
        _fields.append(field)

Then you just get x1, x2 etc. for the unnamed things - 0 based indexing to be bikeshedded ofc 🚲. Users can always address them by index too.

The current patch has an incredibly rare bug, if some model happens to have a field value1234 that is requested, and the id() of an expression resolves to 1234, then it will overwrite the existing field ;)

@sir-sigurd sir-sigurd force-pushed the named-tuples branch 5 times, most recently from baa7b16 to d5da9e8 Compare August 17, 2017 03:27
@sir-sigurd sir-sigurd force-pushed the named-tuples branch 2 times, most recently from 6c07725 to 4d2c6b2 Compare August 30, 2017 02:50
"""

@staticmethod
@lru_cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious to me the purpose of caching here. Maybe a comment would be useful. Also might want to consider the discussion in #8825.

Copy link
Member Author

@sir-sigurd sir-sigurd Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment for this.
I don't think that it worth to set maxsize=None, because overhead of @lru_cache() seems to be futile relatively for obtaining QuerySet:

In [7]: %timeit Person.objects.values_list(named=True)
156 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxsize=None is a bad idea like Aymeric says in #8825.

NamedValuesListIterable if named
else FlatValuesListIterable if flat
else ValuesListIterable
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks okay to me.

Number.objects.values_list('num', flat=True, named=True)

def test_named_values_list_bad_field_name(self):
msg = "Type names and field names must be valid identifiers: '1'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider catching this exception and reraising a message that explains the issue in terms that a Django user might more easily understand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

try:
    tuple_class = self.create_namedtuple_class(*names)
except Exception as e:
    msg = 'namedtuple() raised exception. Make sure that field names are valid identifiers.'
    raise ValueError(msg) from e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only QuerySet.extra() is affected, we can skip it since extra() usage is discouraged.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur

def test_named_values_list_bad_field_name(self):
msg = "Type names and field names must be valid identifiers: '1'"
with self.assertRaisesMessage(ValueError, msg):
Number.objects.extra(select={'1': 'num+1'}).values_list('1', named=True).first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to trigger the error without using QuerySet.extra() considering that it might be deprecated in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that QuerySet.extra() is the only way to make field name which is not valid identifier.

@staticmethod
@lru_cache()
def create_namedtuple_class(*names):
return namedtuple('Row', names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using queryset.model instead of 'Row'? I don't know if that would cause confusion with model instances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it doesn't make much sense as returned tuples don't have strong connection to model. User could do something like this:

Model.objects.annotate(f=models.F('related__f')).values_list('f', named=True)

@@ -651,6 +651,12 @@ rather than one-tuples. An example should make the difference clearer::

It is an error to pass in ``flat`` when there is more than one field.

You can also pass in the named parameter to get results as named tuples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass ``named=True`` to get results as a
:func:`~python:collections.namedtuple`::

@sir-sigurd sir-sigurd force-pushed the named-tuples branch 2 times, most recently from 9f5d5dd to cc12fd9 Compare September 5, 2017 07:12
names.extend(query.values_select)
names.extend(query.annotation_select)
tuple_class = self.create_namedtuple_class(*names)
# tuple_class._make() is inlined here for better performance.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using _make():

%timeit for x in Person.objects.values_list(named=True): pass
23.1 ms ± 438 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using inlined _make():

%timeit for x in Person.objects.values_list(named=True): pass
18 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comparing with plain tuples:

%timeit for x in Person.objects.values_list(): pass
14 ms ± 167 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tuple_class._make refers to tuple.__new__ but are the names intentionally different?

Copy link
Member Author

@sir-sigurd sir-sigurd Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant this _make():

In [107]: namedtuple('Row', ('field',), verbose=True)
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class Row(tuple):
    'Row(field,)'

    __slots__ = ()

    _fields = ('field',)

    def __new__(_cls, field,):
        'Create new instance of Row(field,)'
        return _tuple.__new__(_cls, (field,))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new Row object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 1:
            raise TypeError('Expected 1 arguments, got %d' % len(result))
        return result
...

I added this comment so it would be easier to understand from what I derived this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure what "tuple_class._make() is inlined here" means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's better to remove it in this case.

for field in fields:
if hasattr(field, 'resolve_expression'):
field_id = str(id(field))
field_id_prefix = getattr(field, 'default_alias', field.__class__.__name__.lower())
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice


>>> Entry.objects.values_list('id', 'headline', named=True)
<QuerySet [Row(id=1, headline='First entry'), ...]>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain why namedtuple isn't the default (it seems backwards compatible as Anssi noted on the ticket), maybe we should add something like:
Using a named tuple may make use of the results more readable, at the expense
of a small performance penalty for transforming the results into a named tuple.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@staticmethod
@lru_cache()
def create_namedtuple_class(*names):
# namedtuple() is too slow to be called for every QuerySet evaluation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Cache namedtuple() with @lru_cache() since it's too slow to be
# called for every QuerySet evaluation.

def values_list(self, *fields, flat=False):
def values_list(self, *fields, flat=False, named=False):
if flat and named:
raise TypeError("'flat' and 'named' can't be passed together.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For similarity with other messages (e.g. 'The nowait option cannot be used with skip_locked.') you might change 'passed' to 'used'.

@sir-sigurd sir-sigurd force-pushed the named-tuples branch 2 times, most recently from 77498c1 to 306403f Compare September 6, 2017 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants