New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
las reading is unexpectedly slow #726
Comments
Hey guys, I'm going to respond here rather than on the diff b50b298 so it doesn't get lost. @hobu - thanks for giving a concrete test case independent of displaz. To repeat the same thing on my system:
As for my system, it's a very standard 64 bit ubuntu:
|
@abellgithub sorry if the above was a rather non-specific bug report, I'll blame it on the late night. Luckily it's very easy to reproduce - you can just translate pretty much any piece of las data as Howard has kindly done. |
I've made a bunch of changes that have made a pretty good dent some in the slowness of pdal translate and other I/O heavy operations. There is more that can be done, but we may be at a point where the work/reward ratio just doesn't justify. We're not going to be as fast as LAStools for doing straight translation (LAS -> LAS) because PDAL is focused on supporting multiple file formats and allowing generic filtering. For example, we usually store X/Y/Z values internally as doubles, which takes conversion, and then we need to convert back to ints to write LAS. You'd never do this if all you were doing was LAS. There is also a tradeoff between the simplicity of the code and some performance. For example, in the past we stored data LAS-style (base value and scaling factor/offset), but that became problematic when supporting other formats that didn't use the same notion and it greatly complicated the code that needed scaled values in some cases and unscaled in others. An optimization for one format may hurt performance for another unless we want to write/maintain a lot more code. The hope is that the current design is more flexible and easier to use than what existed in the past along with being sufficiently fast for most operations. |
Right, the new version is significantly faster. Current timings on my system:
I agree the API has become a lot easier to use, but I do find the speed hit hard to stomach. Particularly for interactive use (such as in displaz), an extra factor of 3x loading time is really noticeable. It's an interesting reversal from the very early pdal days: http://lists.osgeo.org/pipermail/pdal/2011-March/000014.html |
I think this might be the wrong St. Helens file. I suspect the test in the mailing list was done with one of the LAZ files found on http://www.liblas.org/samples/ which are only ~110mb uncompressed, rather than the liblas.org/samples/st-helens.las file which is ~400mb in size. I made the latter st-helens.las in 2013, which was after that mailing list post. Which St. Helens files are you using for these tests? Additionally, translate has has the cost of writing the data included in the test, which isn't a fair comparison to feeding data to a GUI. Both of these things doesn't mean we're reading it as fast as we could, however. I'm sure there's more performance to get, but we do want to be careful not to hork up the API and basic usage in the processes. |
The file I've been testing with is a 402mb version of St Helens. I don't mean we should directly compare to the old mailing list post - who knows what machine it was done on. What's interesting is that there's a direct comparison of pc2pc vs las2las on a given piece of data on the same machine. Measured in the GUI, I get the following: |
Thought I'd put a little bit of data behind this performance question, since I had some cycles this morning. I put together a quick little thing to compare between PDAL versions and against lastools and liblas. These tests were all done on my MacBook Pro laptop, OS X 10.1.1, Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz. Some notes:
These numbers are all from a single run of the appropriate las-to-las passthrough command ( Edit: as @hobu mentioned above these are all reading + writing tests — not necessarily applicable to a GUI application. You'd probably have to build some custom test executables to just test reading, and I don't have that many free cycles this morning. Edit 2: all this testing was done w/ SSD hard drive.
And some pretty pictures: |
Thanks Pete! IMO translating las->las is a reasonable proxy for measuring it in a GUI application: Assuming that the point unpacking and repacking code each take half of the time, just halve the numbers above and you get a rough representation of the unpacking time. If we were talking about factors of 10% in performance difference it'd be a different story, but we're talking about 200 to 300% here. |
Generalizing Pete’s data even further by adding the wall-clock time of the two datasets, we get:
liblas is roughly twice as fast as PDAL-old and three times faster than PDAL-new. (check my math, though) -mpg On Jan 31, 2015, at 1:04 PM, Pete Gadomski <notifications@github.commailto:notifications@github.com> wrote: Thought I'd put a little bit of data behind this performance question, since I had some cycles this morning. I put together a quick little thinghttps://github.com/gadomski/dogshow to compare between PDAL versions and against lastoolshttps://github.com/LAStools/LAStools/ and liblashttps://github.com/libLAS/libLAS/. These tests were all done on my MacBook Pro laptop, OS X 10.1.1, Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz. Some notes:
These numbers are all from a single run of the appropriate las-to-las passthrough command (pc2pc, pdal translate, and las2las). No averaging over multiple runs was done. Buyer beware. [:skull:] name points real user sys And some pretty pictures: [user-plus-sys]https://cloud.githubusercontent.com/assets/58314/5989034/d775a616-a93e-11e4-94b0-e9b0913a8bbd.png [user-to-sys]https://cloud.githubusercontent.com/assets/58314/5989035/d91b3594-a93e-11e4-8bea-0520438bb996.png — |
Well, a quick look at the liblas code says that it's just different. Doing a comparison of liblas and PDAL isn't really apples to apples. PDAL loads the data from the file into memory so that it can be processed. las2las does no such thing. It simply writes the points as it reads. This may be fine for streaming operations, but doesn't meet the needs of some algorithms that PDAL exploits. We discussed streaming points through the system early in development and decided against it. One can't, for example, sort streamed points or perform certain spatial algorithms. Today Howard and I discussed adding an interface for some applications that only want a simple reader or writer interface, but that's simply not the way that PDAL was designed as it didn't meet the requirements. We may get there, though. |
Right, I didn't quite appreciate that all data was now read into memory before you can get at the first point. It's certainly really useful if you're going to do filtering which requires spatial context, but not so good for applications which need to read the data into their own buffers for whatever reason. (In my case, because I need to be passing the data efficiently to OpenGL which means having the buffer layout optimized for the OpenGL API.) |
Closing in favor of #754. |
Hi guys, I ran into some performance issues when reviewing c42f/displaz#34. Rather than derailing that conversation too far, I thought I'd bring it home to the pdal repo.
I guess that the last time I had time to look into the pdal code was more than a year ago now. Since then it looks like you've been hard at work improving the API which is nice to see :) Unfortunately it also looks like there's been a major performance regression reading las: I'm seeing something like a 5x performance hit vs lastools.
Some profiling using valgrind exposes a disturbing amount of time spent inside standard library stream machinery in
LasReader::loadPointV10()
- ripping that out gave a boost of about 2x - see b50b298 on the las-unpack-perf branch. Note that this patch is intended as a discussion point rather than a sterling example of production code.Something like that patch will help, but unfortunately pdal still needs another factor of 2.5x or so. It's possible that the rest of the speed hit is due to the new
PointBuffer
API which is a lot safer and simpler to use than last time I looked at it, but also looks a lot more internally complex. In particular, I doubt you can get really great performance unless you amortize the cost of dimension metadata and type handling over more than one point at a time.The text was updated successfully, but these errors were encountered: