some suggestions for the Python code #1

adalke · 2018-10-07T20:49:26Z

There's about a 15% speedup by tweaking the data classes.

CPython's __slots__ is a bit of a hack which makes an instance attribute faster to look up, and it makes the instance more compact. It is not in the Python 3.7 dataclass decorator (see https://www.python.org/dev/peps/pep-0557/#support-for-automatically-setting-slots ). It can be added manually, which is a recommended workaround. Doing that gives a benefit to your Python benchmark, improving it from the original:

# 36077.7910000003 μs
@dataclass
class Vertex:
    x: float
    y: float
    z: float

to

# 32817.94299999996 μs
@dataclass
class Vertex:
    __slots__ = ("x", "y", "z")
    x: float
    y: float
    z: float

Then, for a reason I don't understand, the default __init__ adds measurable though small overhead compared to a manual one.

# 30137.319999997915 μs
@dataclass
class Vertex:
    __slots__ = ("x", "y", "z")
    x: float
    y: float
    z: float
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

However, a manual __init__ seems wrong given the goal of the dataclass decorator.

If I add a __slots__ = ("normal", "v1", "v2", "v3") to the Triangle class, the timing drops further, to 28504 μs.

There are a couple of microoptimizations which improved things by a couple of percent, but not enough to warrant them being considered in this benchmark.

ctypes alternative

One way to get better performance is to use the ctypes module from the standard library. The following takes about 124 μs:

import struct
import timeit
import ctypes

class Vertex(ctypes.Structure):
    _pack_ = 4
    _fields_ = [("x", ctypes.c_float),
                ("y", ctypes.c_float),
                ("z", ctypes.c_float)]
        
class Triangle(ctypes.Structure):
    _pack_ = 2
    _fields_ = [("normal", Vertex),
                ("v1", Vertex),
                ("v2", Vertex),
                ("v3", Vertex),
                ("_ignore", ctypes.c_short)]

def parse(path: str):
    with open(path, 'rb') as stl:
        stl.seek(80)  # skip header
        trianglecount = struct.unpack('I', stl.read(4))[0]

        buffer_size = 50 * trianglecount
        s = stl.read(buffer_size)
        assert len(s) == (buffer_size), (len(s), buffer_size)
        return (Triangle*trianglecount).from_buffer_copy(s)

def benchmark():
    triangles = parse('nist.stl')
    ## print("blah", sum(triangle.normal.x + triangle.v1.y + triangle.v2.z + triangle.v3.x
    ##                       for triangle in triangles))

time = min(timeit.Timer(benchmark).repeat(number=1, repeat=500)) * 1e6

print(str(time) + " μs")

It's a bit of a cheat as there isn't any object instantiation. If I uncomment the test code, the benchmark time goes to 6881 μs. If I compromise and instead Triangle instances but on-demand Vertex instances, using return list((Triangle*trianglecount).from_buffer_copy(s)) then the parse time goes to 1560 μs and the benchmark+test code only slightly increases to 7000 μs.

NumPy alternative

If you're willing to give up the attribute accession API, another option is to use NumPy, and bring the timing down to 80 μs. With structured types I can reference triangles[10].v1.y as triangles[10]["v1"]["y"]. However, I don't think this is acceptable for what you are looking for.

import numpy as np
import struct
import timeit

point = [("x", np.float32), ("y", np.float32), ("z", np.float32)]
triangle_fields = np.dtype([
    ("normal", point),
    ("v1", point),
    ("v2", point),
    ("v3", point),
    ("ignore", "2S")
    ])
def parse(path: str):
    with open(path, 'rb') as stl:
        stl.seek(80)  # skip header
        trianglecount = struct.unpack('I', stl.read(4))[0]

        s = stl.read(50 * trianglecount)
        assert len(s) == (50 * trianglecount), (len(s), 50 * trianglecount)
        return np.frombuffer(s, triangle_fields, count=trianglecount)

def benchmark():
    triangles = parse('nist.stl')
    ## print("blah", sum(triangle["normal"]["x"] + triangle["v1"]["y"] + triangle["v2"]["z"] + triangle["v3"]["x"]
    ##                       for triangle in triangles))

time = min(timeit.Timer(benchmark).repeat(number=1, repeat=500)) * 1e6

print(str(time) + " μs")

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some suggestions for the Python code #1

some suggestions for the Python code #1

adalke commented Oct 7, 2018

some suggestions for the Python code #1

some suggestions for the Python code #1

Comments

adalke commented Oct 7, 2018

ctypes alternative

NumPy alternative