Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ungridded to ungridded performance #9

Open
duncanwp opened this issue Jan 27, 2017 · 3 comments
Open

Ungridded to ungridded performance #9

duncanwp opened this issue Jan 27, 2017 · 3 comments

Comments

@duncanwp
Copy link
Member

This originated here: https://jira.ceh.ac.uk/browse/JASCIS-305

One way to speed up collocation (and perhaps other operations) is to output the set of values as a ragged array, then perform the kernel as an array operation.

So e.g. output a numpy array of length output values, but with elements which are numpy arrays of values to operate on. Then just operate on those values as a numpy array wise operation. Not sure how much time this would save, but could be much quicker in some situations.

@duncanwp
Copy link
Member Author

I'm currently of the thinking that we would be better off converting the sample points loop to C, either through Cython or PyPy (or Numba?) rather than trying to vectorise it because I don't think we'll get the flexibility we want that way.

@duncanwp
Copy link
Member Author

By profiling an example CloudSat onto aircraft collocation it appears that the biggest bottleneck is actually the constraint step within the loop. Underneath we use the query_ball_point method to find nearby points for each sample point, but the query_ball_tree method should be much faster as we pass it all the sample points in one go. This moves the constraint step outside the sample point loop.

I've created a branch for this work here: https://github.com/cedadev/cis/tree/optimize_ug_col

@duncanwp
Copy link
Member Author

duncanwp commented Mar 3, 2017

I've had a bit more of a play with this today. Using query_ball_point does help (speed-up of about 50%) but not as much as I'd hoped - the bottleneck just gets moved to the loop over the constrained points.

I've had a look at ways of dealing with the staggered (jagged) array which you end up with. I think a scipy sparse array is probably the way to do this, but how to construct it and then actually use it to calculate the kernel values is going to need a bit more thought.

The other option is to calculate the whole sparse_distance_matrix in the first place. I've created an implementation for it, but it's very slow once you have many constrained points to calculate the distance of. We'll need this when calculating weighted averages anyway so I I'll probably concentrate on making this faster - using Cython for the main loop I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant