You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's about a 15% speedup by tweaking the data classes.
CPython's __slots__ is a bit of a hack which makes an instance attribute faster to look up, and it makes the instance more compact. It is not in the Python 3.7 dataclass decorator (see https://www.python.org/dev/peps/pep-0557/#support-for-automatically-setting-slots ). It can be added manually, which is a recommended workaround. Doing that gives a benefit to your Python benchmark, improving it from the original:
Then, for a reason I don't understand, the default __init__ adds measurable though small overhead compared to a manual one.
# 30137.319999997915 μs
@dataclass
class Vertex:
__slots__ = ("x", "y", "z")
x: float
y: float
z: float
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
However, a manual __init__ seems wrong given the goal of the dataclass decorator.
If I add a __slots__ = ("normal", "v1", "v2", "v3") to the Triangle class, the timing drops further, to 28504 μs.
There are a couple of microoptimizations which improved things by a couple of percent, but not enough to warrant them being considered in this benchmark.
ctypes alternative
One way to get better performance is to use the ctypes module from the standard library. The following takes about 124 μs:
It's a bit of a cheat as there isn't any object instantiation. If I uncomment the test code, the benchmark time goes to 6881 μs. If I compromise and instead Triangle instances but on-demand Vertex instances, using return list((Triangle*trianglecount).from_buffer_copy(s)) then the parse time goes to 1560 μs and the benchmark+test code only slightly increases to 7000 μs.
NumPy alternative
If you're willing to give up the attribute accession API, another option is to use NumPy, and bring the timing down to 80 μs. With structured types I can reference triangles[10].v1.y as triangles[10]["v1"]["y"]. However, I don't think this is acceptable for what you are looking for.
There's about a 15% speedup by tweaking the data classes.
CPython's
__slots__
is a bit of a hack which makes an instance attribute faster to look up, and it makes the instance more compact. It is not in the Python 3.7 dataclass decorator (see https://www.python.org/dev/peps/pep-0557/#support-for-automatically-setting-slots ). It can be added manually, which is a recommended workaround. Doing that gives a benefit to your Python benchmark, improving it from the original:to
Then, for a reason I don't understand, the default
__init__
adds measurable though small overhead compared to a manual one.However, a manual
__init__
seems wrong given the goal of the dataclass decorator.If I add a
__slots__ = ("normal", "v1", "v2", "v3")
to the Triangle class, the timing drops further, to 28504 μs.There are a couple of microoptimizations which improved things by a couple of percent, but not enough to warrant them being considered in this benchmark.
ctypes alternative
One way to get better performance is to use the ctypes module from the standard library. The following takes about 124 μs:
It's a bit of a cheat as there isn't any object instantiation. If I uncomment the test code, the benchmark time goes to 6881 μs. If I compromise and instead Triangle instances but on-demand Vertex instances, using
return list((Triangle*trianglecount).from_buffer_copy(s))
then the parse time goes to 1560 μs and the benchmark+test code only slightly increases to 7000 μs.NumPy alternative
If you're willing to give up the attribute accession API, another option is to use NumPy, and bring the timing down to 80 μs. With structured types I can reference triangles[10].v1.y as triangles[10]["v1"]["y"]. However, I don't think this is acceptable for what you are looking for.
The text was updated successfully, but these errors were encountered: