Made Field.error_messages a cached property. #15415

collinanderson · 2022-02-09T15:13:09Z

This speeds up field creation and reduces memory usage.

Tested speed using this code:

import time
from django.db import models
times = []
for x in range(1000):
    start = time.time()
    for x in range(1000):
        models.CharField(max_length=100)
    elapsed = time.time() - start
    times.append(elapsed)
    times.sort()
    print('%.04f min: %.04f, median: %.04f' % (elapsed, times[0], times[len(times) // 2]))

baseline:
min: 0.0077, median: 0.0082
with patch:
min: 0.0057, median: 0.0063

So about 23% faster creation. Some fields don't use model validation, like annotate(output_field=DecimalField()).

This speeds up field creation and reduces memory usage.

kezabelle · 2022-02-11T12:05:16Z

So generally speaking (AFAIK), introducing the cached_property to defer calculation is a net-negative to first-access (by a few hundred ns or so. Not massively slow), because of the indirection involved to call the underlying func and patch the instance's dict. It's worth deferring if the values generally aren't used, or the object is held for long enough for the values to be used multiple times.

In this case, it's kind of all of them; they can be less eager because they're not required for further work within __init__, they're generally held around permanently across the process (except where they're generated at runtime by output_field), and because they're held in memory for the duration, the calculated values will be re-used endlessly, where they're used at all.

Here, for example, is initing a couple of fields on baseline/main as of 25514b6 on my machine:

In [2]: %timeit Field()
4.29 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit CharField()
5.35 µs ± 37.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

and after the patch (manually applied):

In [2]: %timeit Field()
2.72 µs ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit CharField()
3.4 µs ± 44.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

So creation does get quicker. Good for both fields defined forever on models ahead-of-time, and for those generated by output_field at runtime.

When we access the error_messages for the first time, and the work is done, it notably falls back to roughly the same ballpark as if the patch weren't applied; first baseline/main:

In [8]: %timeit Field().error_messages
4.38 µs ± 63.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [9]: %timeit CharField().error_messages
5.41 µs ± 48.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

and then after patch:

In [4]: %timeit Field().error_messages
4.83 µs ± 48.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit CharField().error_messages
5.91 µs ± 67.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The cost is init + error_messages, which seems the best way to account for the fact that you'd otherwise be getting an outlier on the first access and the rest would be invalidated by having been cached in the dict. You can see there's some additional cost to gating it behind the cached property, but that should be amortized over the lifetime of the Field definition (the cost falls to 35ns per access thereafter), and isn't applicable to fields generated by output_field anyway (you'd think I'd have remembered that given #15277 but hey ho)

I can't currently think of a reason this wouldn't just be a net-positive based on the numbers I'm collecting which agree with Colin's data, unless it transpired that error_messages were used in some cases for output_field or what-have-you.

I'd suggest that if this does land, it may be worth revisiting the #15410 PR subsequently, with an eye to the fact that Fields are spawned at runtime so there is potentially a memory/speed saving to consider there independently, which I'd entirely forgotten about (... again despite #15277, sigh. Sorry!). It being separate PRs actually makes it easier to evaluated as a whole though, so that's nice.

If there's other usages/use-cases that we've not considered in this discussion to date, someone remind us so we can try and account for those too ;)

kezabelle

Tentatively agreeing at this stage, pending anyone thinking of issues/use-cases we've not thought of, or data-points indicating it would be a generally negative for existing, relatively common use X/Y/Z.

felixxm · 2022-02-16T19:29:51Z

@collinanderson Thanks 👍

collinanderson · 2022-02-16T20:36:27Z

Thanks!

Move Field.error_messages to @cached_property

d3a1119

This speeds up field creation and reduces memory usage.

felixxm requested a review from kezabelle February 10, 2022 08:24

kezabelle approved these changes Feb 11, 2022

View reviewed changes

felixxm self-assigned this Feb 16, 2022

felixxm changed the title ~~Move Field.error_messages to @cached_property~~ Made Field.error_messages a cached property. Feb 16, 2022

felixxm merged commit 35c2474 into django:main Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made Field.error_messages a cached property. #15415

Made Field.error_messages a cached property. #15415

collinanderson commented Feb 9, 2022

kezabelle commented Feb 11, 2022

kezabelle left a comment

felixxm commented Feb 16, 2022

collinanderson commented Feb 16, 2022

Made Field.error_messages a cached property. #15415

Made Field.error_messages a cached property. #15415

Conversation

collinanderson commented Feb 9, 2022

kezabelle commented Feb 11, 2022

kezabelle left a comment

Choose a reason for hiding this comment

felixxm commented Feb 16, 2022

collinanderson commented Feb 16, 2022