Instantiation performance vs. dataclasses #12

hohav · 2020-12-31T17:42:11Z

Thanks for creating dataclassy! In terms of features it does everything I want, but I'm also very sensitive to performance since I'm working on a project that constructs hundreds of thousands of small objects in a tight loop.

I've observed roughly a 33% slowdown for small object creation between dataclasses and dataclassy in CPython 3.9.0 (benchmark below). Do you have any thoughts on this? How much of a priority is performance for dataclassy? Would you welcome a PR that closed the performance gap at the cost of significantly longer or more complex code?

from timeit import timeit
import dataclasses
import dataclassy

@dataclasses.dataclass
class Foo1:
    __slots__ = 'x', 'y'
    x: int
    y: int

@dataclassy.dataclass(slots=True)
class Foo2:
    x: int
    y: int

def f1():
    Foo1(1, 2)

def f2():
    Foo2(1, 2)

print(timeit(f1, number=1000000))
print(timeit(f2, number=1000000))

biqqles · 2020-12-31T18:23:57Z

Hi there. I have been thinking about performance recently - for example see #10 where I am testing a change with the aim of improving performance in a different scenario. Earlier this month I created some rudimentary benchmarks and got the same results as you. Performance is something I am happy to focus on now I'm confident dataclassy has achieved its primary goal of being more expressive and enjoyable to use than dataclasses.

Now, as to what I think specifically:

dataclasses is fast because it minimises the amount of time CPython has to spend executing Python on class initialisation. It does this by only touching __init__. dataclassy currently uses __new__ or __call__ to do initialisation so that it stays out of the user's way, allowing them to beautifully define an __init__ that takes the role of __post_init__. The initialisation logic between the two (setting instance variables to parameters) is identical; what slows this down is more or less having to call object.__new__ and return the object from Python.
dataclassy absolutely can gain the same performance in the simple case. Changing it to generate __init__ instead of __new__ required a grand total of 4 lines of code to be changed and in my benchmarks results in identical (if not marginally better) performance to dataclasses. Of course this totally breaks the user-defined init feature, but this is an optimisation that can be made conditionally.
This "pay for what you need" paradigm is appealing and wouldn't be difficult or especially longwinded to implement. I have a few ideas for other optimisations too, but they're much less dramatic than this.

biqqles · 2020-12-31T20:23:53Z

OK, I realised that with the simplifications made to how initialisation works over the months I actually do not need to mess around with __new__ and __call__ and can do everything with __init__ meaning the performance upgrade is attainable across all use-cases. One test for decorator reuse on a blank class is failing which I still need to figure out the cause of.

biqqles · 2020-12-31T20:32:08Z

OK, the prototype is in performance. All tests pass (though it needs neatening) but I'm now off to enjoy the new year. Happy New Year to you too.

hohav · 2021-01-01T18:14:06Z

Wow, that was quick! I tried the performance branch, and (for my use at least) it closely matches dataclasses' performance. Let me know if you want any help with further testing or implementation. Thanks, and happy new year to you too!

biqqles · 2021-01-11T12:47:47Z

@TylerYep, would you be able to check the code in performance works in your applications? This is quite a big internal change and it would be good if a few people could check it doesn't break anything. It should also fix the instantiation performance problem you also noted in #6.

TylerYep · 2021-01-14T06:52:48Z

Hey, I tested it out (I checked ot the version of code on the open PR) and it works great!
Prior to this, dataclassy was taking around 30+ seconds to complete either of the two benchmark tests.

My quick benchmark (my application is essentially a SAT solver that creates roughly 2^n instances of the dataclass).
I am using cProfile to measure the runtimes.

                dataclasses         dataclassy
Small N:          2.2 sec            2.3 sec
Larger N:         4.7 sec            4.9 sec

biqqles · 2021-01-14T09:02:33Z

Awesome, thanks!

biqqles added the question Further information is requested label Dec 31, 2020

This was referenced Jan 11, 2021

RFC: using __call__ instead of __new__ #10

Closed

make subclasses of DataClassMeta compatible with __init__ #13

Closed

Performance improvements #14

Merged

biqqles linked a pull request Jan 17, 2021 that will close this issue

Performance improvements #14

Merged

3 tasks

biqqles closed this as completed in #14 Jan 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instantiation performance vs. dataclasses #12

Instantiation performance vs. dataclasses #12

hohav commented Dec 31, 2020

biqqles commented Dec 31, 2020 •

edited

Loading

biqqles commented Dec 31, 2020

biqqles commented Dec 31, 2020

hohav commented Jan 1, 2021

biqqles commented Jan 11, 2021

TylerYep commented Jan 14, 2021 •

edited

Loading

biqqles commented Jan 14, 2021

Instantiation performance vs. dataclasses #12

Instantiation performance vs. dataclasses #12

Comments

hohav commented Dec 31, 2020

biqqles commented Dec 31, 2020 • edited Loading

biqqles commented Dec 31, 2020

biqqles commented Dec 31, 2020

hohav commented Jan 1, 2021

biqqles commented Jan 11, 2021

TylerYep commented Jan 14, 2021 • edited Loading

biqqles commented Jan 14, 2021

biqqles commented Dec 31, 2020 •

edited

Loading

TylerYep commented Jan 14, 2021 •

edited

Loading