-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unbuffered output option to proj #554
Conversation
This reduces external processes from between 13 and 1462 calls to 1 depending on the plot.
@QuLogic: I may be too blind to see the obvious, but could you add a few comments about why it is necessary to switch off the buffering for this to work? |
Well... As I don't plan on generating the complete set of projection plots very often I don't care about faster execution time. I don't understand the difference of unbuffered versus buffered output and why it would make a difference to users. |
Not the pull request owner, but my understanding is that @QuLogic used to start one process for every path, but with this change would be able to pipe all paths through the same process. (Of course sequentially.) When using a pipe as The problem here: when starting a new path, some output data of the old path might still be stuck in the buffer. Having unbuffered output is essentially the same as calling fflush after every output method, so this guarantees all the data that was written to the However, I'm not sure this is a clean solution. Even when turning off buffering, a lot of things have to be in the right place for this to work. A cleaner solution would be to merge the paths in Python, piping them altogether into Summary: A better solution would be to not turn off buffering, but merge the input data in Python and separate it there again. Intermediate flushing only works if all parts work together, and is rather fragile. |
Thanks, @aaronpuchert, for this clear analysis. I see the reason now, and vaguely remember to have done a similar thing some 20 years ago - which I would obviously not have done, had I known what you have told me now :-) @QuLogic - what do you think about Aaron's suggestion of assembling and splitting the individual paths in Python, and keeping the I/O buffered? |
That might work for the frame and graticule, but I'm not sure we can use that for the general feature polygons as it's run through shapely's transform machinery. I don't think there's any way to get shapely to "transform and wait for results later". I'm pretty sure that the features (i.e., coastlines) are the majority of the small things that should be batched, so ignoring them is not really going to help.
|
I don't think an improvement of plot generation is enough reason for this to be added to the proj utility. It's just a quick hack. There's better ways to get good performance if that's your concern. I am still not convinced that there are other use cases for this were interfacing with the library directly won't be the better option. Aarons analysis backs it up. I am -1 on merging this. |
I think it's a right pain to make any incremental changes to the plots when it's so slow, but as this is a pretty niche change, that's understandable. |
So you want to use this for an interactive feature? That changes the situation of course. Piping data through If you want to interactively transform a lot coordinates, you should use the library directly. Now I have a very limited understanding of shapely, but generally it's possible to bind to C functions in Python. Since there are but a few functions needed for a typical transformation, I think this shouldn't even be hard. The good thing is that it scales much better than piping unbuffered data back and forth. I don't think this would have solved your problem sustainably, because having no buffering carries a significant performance impact of its own. |
It's not interactive. It's the script that's in this PR. |
With this option, we are no longer limited to one single path per
proj
process when creating the plots. There are between 13 and 1462 paths, so cutting it down to 1 process per plot greatly improves plot runtime. Creating all plots takes 6 minutes onmaster
and 1m25s with this branch.