Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question/Feature request: Make filters.randomize streamable #4195

Open
jo-chemla opened this issue Sep 28, 2023 · 3 comments
Open

Question/Feature request: Make filters.randomize streamable #4195

jo-chemla opened this issue Sep 28, 2023 · 3 comments

Comments

@jo-chemla
Copy link

jo-chemla commented Sep 28, 2023

Hi PDAL community,
I'm wondering what prevents the filters.randomize from being streamable. Using this in combination with other pdal pipeline filters (head, stats, head, crop, splitter, merge, stats, etc) would allow for out-of-core operations that would process pointclouds with requirements higher than RAM capacity.

Having thatrandomize filter produce a random list of indices, and processing points in subsequent pipeline stages entry by entry (or in batches of specified size), looks like this stage could support streaming.
Best, Jonathan

@abellgithub
Copy link
Contributor

I don't understand how this would work. You could theoretically reorder points that are in a loaded chunk (but this operation is not currently supported), but this seems of little use. You will have to provide some more detail on an implementation.

@jo-chemla
Copy link
Author

Hi Andrew,
It is true that even once the random indices have been computed, the entirety of the pointcloud file has to be crawled to retrieve each point (+ coords and attributes) to feed for other pipeline stages. This question should therefore more be rephrased something like:

Can we execute filters.randomize, so that it process batches of points of the input pointcloud, without ever overflowing RAM?

If so, one could then just chain two pipelines, one to use filter.randomize and produce a resulting intermediate pointcloud that could then be parsed as an input for a streamable pipeline. Thanks for the feedback!

@abellgithub
Copy link
Contributor

This just isn't how PDAL works. We never perform random access on points in a pipeline. Access is always sequential. Random access with some file types (notably compressed files), just doesn't work well. I don't know what else to suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants