Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made Field.error_messages a cached property. #15415

Merged
merged 1 commit into from
Feb 16, 2022

Conversation

collinanderson
Copy link
Contributor

This speeds up field creation and reduces memory usage.

Refs #15410

Tested speed using this code:

import time
from django.db import models
times = []
for x in range(1000):
    start = time.time()
    for x in range(1000):
        models.CharField(max_length=100)
    elapsed = time.time() - start
    times.append(elapsed)
    times.sort()
    print('%.04f min: %.04f, median: %.04f' % (elapsed, times[0], times[len(times) // 2]))

baseline:
min: 0.0077, median: 0.0082
with patch:
min: 0.0057, median: 0.0063

So about 23% faster creation. Some fields don't use model validation, like annotate(output_field=DecimalField()).

This speeds up field creation and reduces memory usage.
@kezabelle
Copy link
Contributor

So generally speaking (AFAIK), introducing the cached_property to defer calculation is a net-negative to first-access (by a few hundred ns or so. Not massively slow), because of the indirection involved to call the underlying func and patch the instance's dict. It's worth deferring if the values generally aren't used, or the object is held for long enough for the values to be used multiple times.

In this case, it's kind of all of them; they can be less eager because they're not required for further work within __init__, they're generally held around permanently across the process (except where they're generated at runtime by output_field), and because they're held in memory for the duration, the calculated values will be re-used endlessly, where they're used at all.

Here, for example, is initing a couple of fields on baseline/main as of 25514b6 on my machine:

In [2]: %timeit Field()
4.29 µs ± 38.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit CharField()
5.35 µs ± 37.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

and after the patch (manually applied):

In [2]: %timeit Field()
2.72 µs ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit CharField()
3.4 µs ± 44.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

So creation does get quicker. Good for both fields defined forever on models ahead-of-time, and for those generated by output_field at runtime.

When we access the error_messages for the first time, and the work is done, it notably falls back to roughly the same ballpark as if the patch weren't applied; first baseline/main:

In [8]: %timeit Field().error_messages
4.38 µs ± 63.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [9]: %timeit CharField().error_messages
5.41 µs ± 48.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

and then after patch:

In [4]: %timeit Field().error_messages
4.83 µs ± 48.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [5]: %timeit CharField().error_messages
5.91 µs ± 67.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The cost is init + error_messages, which seems the best way to account for the fact that you'd otherwise be getting an outlier on the first access and the rest would be invalidated by having been cached in the dict. You can see there's some additional cost to gating it behind the cached property, but that should be amortized over the lifetime of the Field definition (the cost falls to 35ns per access thereafter), and isn't applicable to fields generated by output_field anyway (you'd think I'd have remembered that given #15277 but hey ho)

I can't currently think of a reason this wouldn't just be a net-positive based on the numbers I'm collecting which agree with Colin's data, unless it transpired that error_messages were used in some cases for output_field or what-have-you.

I'd suggest that if this does land, it may be worth revisiting the #15410 PR subsequently, with an eye to the fact that Fields are spawned at runtime so there is potentially a memory/speed saving to consider there independently, which I'd entirely forgotten about (... again despite #15277, sigh. Sorry!). It being separate PRs actually makes it easier to evaluated as a whole though, so that's nice.

If there's other usages/use-cases that we've not considered in this discussion to date, someone remind us so we can try and account for those too ;)

Copy link
Contributor

@kezabelle kezabelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively agreeing at this stage, pending anyone thinking of issues/use-cases we've not thought of, or data-points indicating it would be a generally negative for existing, relatively common use X/Y/Z.

@felixxm felixxm self-assigned this Feb 16, 2022
@felixxm felixxm changed the title Move Field.error_messages to @cached_property Made Field.error_messages a cached property. Feb 16, 2022
@felixxm
Copy link
Member

felixxm commented Feb 16, 2022

@collinanderson Thanks 👍

@felixxm felixxm merged commit 35c2474 into django:main Feb 16, 2022
@collinanderson
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants