Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant speedups with minimal changes #62

Merged
merged 3 commits into from
Aug 19, 2016
Merged

Significant speedups with minimal changes #62

merged 3 commits into from
Aug 19, 2016

Conversation

karimbahgat
Copy link
Collaborator

@karimbahgat karimbahgat commented Aug 16, 2016

Up to 8-20x reading speedups in a series of changes based on the principle of unpacking bytes in batch instead of multiple times, while maintaining the general convention and style of the original pyshp code. See commits for more detail.

  1. Changing just one line of code, i improved a weakness in shape reading leading to significant speedup of 5-8x. For instance itershaping through level1 of the global administrative units database is reduced to 10 secs from 80 secs. Greatest speedups for complex shapes, least or nonexistent for singlepoint geometries.

  2. Also implemented a minimal version of the shapeindex speedup suggested in Performance improvement for Reader.__shapeIndex #52 .

  3. Finally a general 1.3x speedup for reading records by "precompiling" the byte format, and more importantly a 15-20x speedup when reading all records at once.

When unpacking the list of coordinate points, each point pair is
unpacked one at a time, making it very slow for long lists of points. So
a fairly simple change to read all points at once, which btw is in line
with the existing convention elsewhere, leads to 5-9x speedup when
reading shapes, with bigger gains for files with more complex shapes
(more points).
Implemented basic version from #52 , but without numpy and memoryview to
keep it simple. Not sure if any practical/noticable speedup, since
reading the offsets is only a one-time issue and very little data
involved. More out of principle.
Since each row/record has the same format every time, instead of a
regular unpack, precompile the format using struct.Struct to unpack
faster. Leads to roughly x1.33 speedup.

In addition, when users arent worried about memory (ie the "records"
method), we can exploit this by reading all records to memory at once.
Leads to 15-20x speedup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants