Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

las reading is unexpectedly slow #726

Closed
c42f opened this issue Jan 24, 2015 · 12 comments
Closed

las reading is unexpectedly slow #726

c42f opened this issue Jan 24, 2015 · 12 comments

Comments

@c42f
Copy link
Contributor

c42f commented Jan 24, 2015

Hi guys, I ran into some performance issues when reviewing c42f/displaz#34. Rather than derailing that conversation too far, I thought I'd bring it home to the pdal repo.

I guess that the last time I had time to look into the pdal code was more than a year ago now. Since then it looks like you've been hard at work improving the API which is nice to see :) Unfortunately it also looks like there's been a major performance regression reading las: I'm seeing something like a 5x performance hit vs lastools.

Some profiling using valgrind exposes a disturbing amount of time spent inside standard library stream machinery in LasReader::loadPointV10() - ripping that out gave a boost of about 2x - see b50b298 on the las-unpack-perf branch. Note that this patch is intended as a discussion point rather than a sterling example of production code.

Something like that patch will help, but unfortunately pdal still needs another factor of 2.5x or so. It's possible that the rest of the speed hit is due to the new PointBuffer API which is a lot safer and simpler to use than last time I looked at it, but also looks a lot more internally complex. In particular, I doubt you can get really great performance unless you amortize the cost of dimension metadata and type handling over more than one point at a time.

@c42f
Copy link
Contributor Author

c42f commented Jan 25, 2015

Hey guys, I'm going to respond here rather than on the diff b50b298 so it doesn't get lost.

@hobu - thanks for giving a concrete test case independent of displaz. To repeat the same thing on my system:

$ time ./las2las -i st-helens.las -o foo.las

real    0m2.271s
user    0m1.967s
sys     0m0.303s


$ time ~/local/bin/pdal translate st-helens.las bar.las
st-helens.las: Found invalid value of '0' for point's return number.
st-helens.las: Found invalid value of '0' for point's number of returns.
................................................................................[snip]

real    0m15.675s
user    0m14.750s
sys     0m0.634s

As for my system, it's a very standard 64 bit ubuntu:

$ uname -a
Linux spheroid 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

$ g++ --version
g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@c42f
Copy link
Contributor Author

c42f commented Jan 25, 2015

@abellgithub sorry if the above was a rather non-specific bug report, I'll blame it on the late night.

Luckily it's very easy to reproduce - you can just translate pretty much any piece of las data as Howard has kindly done.

@abellgithub
Copy link
Contributor

I've made a bunch of changes that have made a pretty good dent some in the slowness of pdal translate and other I/O heavy operations. There is more that can be done, but we may be at a point where the work/reward ratio just doesn't justify.

We're not going to be as fast as LAStools for doing straight translation (LAS -> LAS) because PDAL is focused on supporting multiple file formats and allowing generic filtering. For example, we usually store X/Y/Z values internally as doubles, which takes conversion, and then we need to convert back to ints to write LAS. You'd never do this if all you were doing was LAS. There is also a tradeoff between the simplicity of the code and some performance. For example, in the past we stored data LAS-style (base value and scaling factor/offset), but that became problematic when supporting other formats that didn't use the same notion and it greatly complicated the code that needed scaled values in some cases and unscaled in others. An optimization for one format may hurt performance for another unless we want to write/maintain a lot more code. The hope is that the current design is more flexible and easier to use than what existed in the past along with being sufficiently fast for most operations.

@c42f
Copy link
Contributor Author

c42f commented Jan 30, 2015

Right, the new version is significantly faster. Current timings on my system:

$ time ~/local/bin/pdal translate st-helens.las bar.las
/home/chris/src/lastools/src/st-helens.las: Found invalid value of '0' for point's return number.
/home/chris/src/lastools/src/st-helens.las: Found invalid value of '0' for point's number of returns.

real    0m6.956s
user    0m5.583s
sys     0m0.617s

I agree the API has become a lot easier to use, but I do find the speed hit hard to stomach. Particularly for interactive use (such as in displaz), an extra factor of 3x loading time is really noticeable.

It's an interesting reversal from the very early pdal days: http://lists.osgeo.org/pipermail/pdal/2011-March/000014.html

@hobu
Copy link
Member

hobu commented Jan 30, 2015

I think this might be the wrong St. Helens file. I suspect the test in the mailing list was done with one of the LAZ files found on http://www.liblas.org/samples/ which are only ~110mb uncompressed, rather than the liblas.org/samples/st-helens.las file which is ~400mb in size. I made the latter st-helens.las in 2013, which was after that mailing list post. Which St. Helens files are you using for these tests?

Additionally, translate has has the cost of writing the data included in the test, which isn't a fair comparison to feeding data to a GUI. Both of these things doesn't mean we're reading it as fast as we could, however. I'm sure there's more performance to get, but we do want to be careful not to hork up the API and basic usage in the processes.

@c42f
Copy link
Contributor Author

c42f commented Jan 31, 2015

The file I've been testing with is a 402mb version of St Helens. I don't mean we should directly compare to the old mailing list post - who knows what machine it was done on. What's interesting is that there's a direct comparison of pc2pc vs las2las on a given piece of data on the same machine.

Measured in the GUI, I get the following:
lastools: 1.46s
pdal: 4.70s
Loading happens in its own thread so this is pretty consistent between runs.

@gadomski
Copy link
Member

Thought I'd put a little bit of data behind this performance question, since I had some cycles this morning. I put together a quick little thing to compare between PDAL versions and against lastools and liblas. These tests were all done on my MacBook Pro laptop, OS X 10.1.1, Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz. Some notes:

  • PDAL 0.9.8 is pre-rewrite
  • PDAL @ 0.9.8-2107-gf02dea5 is master from this morning (name gotten from git describe --tags)
  • LAStools is actually built from my fork, which just includes a couple tweaks to get things to compile on my system (removing tr1)
  • The 12388139 point lasfile is https://liblas.org/samples/st-helens.las
  • The 22404422 point lasfile is some personal data from a RIEGL VZ-6000 TLS scan of the Helheim Glacier in Greenland

These numbers are all from a single run of the appropriate las-to-las passthrough command (pc2pc, pdal translate, and las2las). No averaging over multiple runs was done. Buyer beware. 💀

Edit: as @hobu mentioned above these are all reading + writing tests — not necessarily applicable to a GUI application. You'd probably have to build some custom test executables to just test reading, and I don't have that many free cycles this morning.

Edit 2: all this testing was done w/ SSD hard drive.

name points real user sys
PDAL @ 0.9.8 12388139 5.689 4.921 0.730
PDAL @ 0.9.8 22404422 10.385 8.922 1.417
PDAL @ 0.9.8-2107-gf02dea5 12388139 8.133 7.332 0.772
PDAL @ 0.9.8-2107-gf02dea5 22404422 14.920 13.444 1.449
lastools 7799e5a 12388139 4.649 4.434 0.205
lastools 7799e5a 22404422 8.354 7.966 0.379
liblas at 1.8.0-19-g2fec87a 12388139 3.102 2.507 0.572
liblas at 1.8.0-19-g2fec87a 22404422 5.476 4.429 1.014

And some pretty pictures:

user-plus-sys

user-to-sys

@c42f
Copy link
Contributor Author

c42f commented Feb 1, 2015

Thanks Pete! IMO translating las->las is a reasonable proxy for measuring it in a GUI application: Assuming that the point unpacking and repacking code each take half of the time, just halve the numbers above and you get a rough representation of the unpacking time.

If we were talking about factors of 10% in performance difference it'd be a different story, but we're talking about 200 to 300% here.

@mpgerlek
Copy link
Contributor

mpgerlek commented Feb 3, 2015

Generalizing Pete’s data even further by adding the wall-clock time of the two datasets, we get:

PDAL-old: 16.1 sec
PDAL-new: 23.0 sec
lastools: 13.0 sec
liblas: 8.6 sec

liblas is roughly twice as fast as PDAL-old and three times faster than PDAL-new.

(check my math, though)

-mpg

On Jan 31, 2015, at 1:04 PM, Pete Gadomski <notifications@github.commailto:notifications@github.com> wrote:

Thought I'd put a little bit of data behind this performance question, since I had some cycles this morning. I put together a quick little thinghttps://github.com/gadomski/dogshow to compare between PDAL versions and against lastoolshttps://github.com/LAStools/LAStools/ and liblashttps://github.com/libLAS/libLAS/. These tests were all done on my MacBook Pro laptop, OS X 10.1.1, Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz. Some notes:

  • PDAL 0.9.8 is pre-rewrite
  • PDAL @ 0.9.8-2107-gf02dea5 is master from this morning (name gotten from git describe --tags)
  • LAStools is actually built from my forkhttps://github.com/gadomski/LAStools/, which just includes a couple tweaks to get things to compile on my system (removing tr1)
  • The 12388139 point lasfile is https://liblas.org/samples/st-helens.las
  • The 22404422 point lasfile is some personal data from a RIEGL VZ-6000 TLS scan of the Helheim Glacier in Greenland

These numbers are all from a single run of the appropriate las-to-las passthrough command (pc2pc, pdal translate, and las2las). No averaging over multiple runs was done. Buyer beware. [:skull:]

name points real user sys
PDAL @ 0.9.8 12388139 5.689 4.921 0.730
PDAL @ 0.9.8 22404422 10.385 8.922 1.417
PDAL @ 0.9.8-2107-gf02dea5 12388139 8.133 7.332 0.772
PDAL @ 0.9.8-2107-gf02dea5 22404422 14.920 13.444 1.449
lastools 7799e5a 12388139 4.649 4.434 0.205
lastools 7799e5a 22404422 8.354 7.966 0.379
liblas at 1.8.0-19-g2fec87a 12388139 3.102 2.507 0.572
liblas at 1.8.0-19-g2fec87a 22404422 5.476 4.429 1.014

And some pretty pictures:

[user-plus-sys]https://cloud.githubusercontent.com/assets/58314/5989034/d775a616-a93e-11e4-94b0-e9b0913a8bbd.png

[user-to-sys]https://cloud.githubusercontent.com/assets/58314/5989035/d91b3594-a93e-11e4-8bea-0520438bb996.png


Reply to this email directly or view it on GitHubhttps://github.com//issues/726#issuecomment-72328548.

@abellgithub
Copy link
Contributor

Well, a quick look at the liblas code says that it's just different. Doing a comparison of liblas and PDAL isn't really apples to apples. PDAL loads the data from the file into memory so that it can be processed. las2las does no such thing. It simply writes the points as it reads. This may be fine for streaming operations, but doesn't meet the needs of some algorithms that PDAL exploits. We discussed streaming points through the system early in development and decided against it. One can't, for example, sort streamed points or perform certain spatial algorithms.

Today Howard and I discussed adding an interface for some applications that only want a simple reader or writer interface, but that's simply not the way that PDAL was designed as it didn't meet the requirements. We may get there, though.

@c42f
Copy link
Contributor Author

c42f commented Feb 4, 2015

Right, I didn't quite appreciate that all data was now read into memory before you can get at the first point. It's certainly really useful if you're going to do filtering which requires spatial context, but not so good for applications which need to read the data into their own buffers for whatever reason. (In my case, because I need to be passing the data efficiently to OpenGL which means having the buffer layout optimized for the OpenGL API.)

@abellgithub
Copy link
Contributor

Closing in favor of #754.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants