Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some suggestions for the Python code #1

Open
adalke opened this issue Oct 7, 2018 · 0 comments
Open

some suggestions for the Python code #1

adalke opened this issue Oct 7, 2018 · 0 comments

Comments

@adalke
Copy link

adalke commented Oct 7, 2018

There's about a 15% speedup by tweaking the data classes.

CPython's __slots__ is a bit of a hack which makes an instance attribute faster to look up, and it makes the instance more compact. It is not in the Python 3.7 dataclass decorator (see https://www.python.org/dev/peps/pep-0557/#support-for-automatically-setting-slots ). It can be added manually, which is a recommended workaround. Doing that gives a benefit to your Python benchmark, improving it from the original:

# 36077.7910000003 μs
@dataclass
class Vertex:
    x: float
    y: float
    z: float

to

# 32817.94299999996 μs
@dataclass
class Vertex:
    __slots__ = ("x", "y", "z")
    x: float
    y: float
    z: float

Then, for a reason I don't understand, the default __init__ adds measurable though small overhead compared to a manual one.

# 30137.319999997915 μs
@dataclass
class Vertex:
    __slots__ = ("x", "y", "z")
    x: float
    y: float
    z: float
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

However, a manual __init__ seems wrong given the goal of the dataclass decorator.

If I add a __slots__ = ("normal", "v1", "v2", "v3") to the Triangle class, the timing drops further, to 28504 μs.

There are a couple of microoptimizations which improved things by a couple of percent, but not enough to warrant them being considered in this benchmark.

ctypes alternative

One way to get better performance is to use the ctypes module from the standard library. The following takes about 124 μs:

import struct
import timeit
import ctypes

class Vertex(ctypes.Structure):
    _pack_ = 4
    _fields_ = [("x", ctypes.c_float),
                ("y", ctypes.c_float),
                ("z", ctypes.c_float)]
        
class Triangle(ctypes.Structure):
    _pack_ = 2
    _fields_ = [("normal", Vertex),
                ("v1", Vertex),
                ("v2", Vertex),
                ("v3", Vertex),
                ("_ignore", ctypes.c_short)]

def parse(path: str):
    with open(path, 'rb') as stl:
        stl.seek(80)  # skip header
        trianglecount = struct.unpack('I', stl.read(4))[0]

        buffer_size = 50 * trianglecount
        s = stl.read(buffer_size)
        assert len(s) == (buffer_size), (len(s), buffer_size)
        return (Triangle*trianglecount).from_buffer_copy(s)

def benchmark():
    triangles = parse('nist.stl')
    ## print("blah", sum(triangle.normal.x + triangle.v1.y + triangle.v2.z + triangle.v3.x
    ##                       for triangle in triangles))

time = min(timeit.Timer(benchmark).repeat(number=1, repeat=500)) * 1e6

print(str(time) + " μs")

It's a bit of a cheat as there isn't any object instantiation. If I uncomment the test code, the benchmark time goes to 6881 μs. If I compromise and instead Triangle instances but on-demand Vertex instances, using return list((Triangle*trianglecount).from_buffer_copy(s)) then the parse time goes to 1560 μs and the benchmark+test code only slightly increases to 7000 μs.

NumPy alternative

If you're willing to give up the attribute accession API, another option is to use NumPy, and bring the timing down to 80 μs. With structured types I can reference triangles[10].v1.y as triangles[10]["v1"]["y"]. However, I don't think this is acceptable for what you are looking for.

import numpy as np
import struct
import timeit

point = [("x", np.float32), ("y", np.float32), ("z", np.float32)]
triangle_fields = np.dtype([
    ("normal", point),
    ("v1", point),
    ("v2", point),
    ("v3", point),
    ("ignore", "2S")
    ])
def parse(path: str):
    with open(path, 'rb') as stl:
        stl.seek(80)  # skip header
        trianglecount = struct.unpack('I', stl.read(4))[0]

        s = stl.read(50 * trianglecount)
        assert len(s) == (50 * trianglecount), (len(s), 50 * trianglecount)
        return np.frombuffer(s, triangle_fields, count=trianglecount)

def benchmark():
    triangles = parse('nist.stl')
    ## print("blah", sum(triangle["normal"]["x"] + triangle["v1"]["y"] + triangle["v2"]["z"] + triangle["v3"]["x"]
    ##                       for triangle in triangles))

time = min(timeit.Timer(benchmark).repeat(number=1, repeat=500)) * 1e6

print(str(time) + " μs")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant