Significant speedups with minimal changes #62

karimbahgat · 2016-08-16T07:06:47Z

Up to 8-20x reading speedups in a series of changes based on the principle of unpacking bytes in batch instead of multiple times, while maintaining the general convention and style of the original pyshp code. See commits for more detail.

Changing just one line of code, i improved a weakness in shape reading leading to significant speedup of 5-8x. For instance itershaping through level1 of the global administrative units database is reduced to 10 secs from 80 secs. Greatest speedups for complex shapes, least or nonexistent for singlepoint geometries.
Also implemented a minimal version of the shapeindex speedup suggested in Performance improvement for Reader.__shapeIndex #52 .
Finally a general 1.3x speedup for reading records by "precompiling" the byte format, and more importantly a 15-20x speedup when reading all records at once.

When unpacking the list of coordinate points, each point pair is unpacked one at a time, making it very slow for long lists of points. So a fairly simple change to read all points at once, which btw is in line with the existing convention elsewhere, leads to 5-9x speedup when reading shapes, with bigger gains for files with more complex shapes (more points).

Implemented basic version from #52 , but without numpy and memoryview to keep it simple. Not sure if any practical/noticable speedup, since reading the offsets is only a one-time issue and very little data involved. More out of principle.

Since each row/record has the same format every time, instead of a regular unpack, precompile the format using struct.Struct to unpack faster. Leads to roughly x1.33 speedup. In addition, when users arent worried about memory (ie the "records" method), we can exploit this by reading all records to memory at once. Leads to 15-20x speedup.

karimbahgat added 3 commits August 16, 2016 00:53

Unpack all shx index offsets at once

c1b54f3

Implemented basic version from #52 , but without numpy and memoryview to keep it simple. Not sure if any practical/noticable speedup, since reading the offsets is only a one-time issue and very little data involved. More out of principle.

karimbahgat mentioned this pull request Aug 16, 2016

Performance improvement for Reader.__shapeIndex #52

Closed

GeospatialPython merged commit 50b6217 into GeospatialPython:master Aug 19, 2016

micahcochran mentioned this pull request Aug 23, 2016

Record Alignment Issue #66

Closed

karimbahgat mentioned this pull request Aug 24, 2016

Fixes for records() speedup #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant speedups with minimal changes #62

Significant speedups with minimal changes #62

karimbahgat commented Aug 16, 2016 •

edited

Loading

Significant speedups with minimal changes #62

Significant speedups with minimal changes #62

Conversation

karimbahgat commented Aug 16, 2016 • edited Loading

karimbahgat commented Aug 16, 2016 •

edited

Loading