-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shorten tests and vectorize patches_method
#292
Conversation
…f pairwise comparison and adjust related tests
…o the minimum of 10
…s csv) as input of interp_nd_binning
…eps, reorder TestBinning to make file reading/saving order work
patches_method
Ready for review! To fix them in this PR:
|
We should indeed transfer all the convolution functionalities to geoutils in the long term I think (when the numba convolution works!). Maybe just raise an issue for the time being? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job speeding up all these tests !!
I'm force merging because tests passed two days ago on the finalized PR. However, some tests now fail in |
Summary
The test part of the CI now runs in 6min (including the test on documentation building), instead of 20min before! We should be able to get that under 3min once the
terrain.py
functions are vectorized 😄This PR improves the speed of many tests, but especially those linked to functions of
spatialstats.py
andfit.py
that require a lot of processing during either optimization or sampling.The
patches_method
is reworked and can now be performed by convolution. This dramatically increases the computing speed compares to the sample size drawn. However, if the convolution is performed on an entire raster, it can still be slower than random patch sampling.Two points are left to a later PR: improving the speed of
terrain.py
functions by convolution, which will naturally improve the speed of related tests and examples. And improving the speed oftest_coreg.py
, which will be better after the advances planned for the module in the next weeks/months.@adehecq: Will be interesting to discuss where to put some functions listed further below, that probably better fit in
geoutils
.As a change summary, this PR:
load_ref_and_diff
function available to the test classes oftest_spatialstats.py
, then calls the variables using e.g.self.ref
to avoid duplicated loading in every function;test_nd_binning
to reduce computing time;test_spatialstats.py
to avoid duplicating long processing steps;test_spatialstats.py
for point 3 to work correctly;xdem.spatialstats.interp_nd_binning
to recognize pd.Interval columns that are (unescapably) converted to string when saving to csv, to support pd.Dataframe initated from files read on disk (now done in point 3);xdem.spatialstats._choose_cdist_equidistant_sampling_parameters
to respect the pairwise samples and to allow small samples sizes while avoidingskgstat
errors (minimum is 10);xdem.spatialstats._choose_cdist_equidistant_sampling_parameters
to test the new functions;xdem.spatialstats.sample_empirical_variogram
calls withsubsample=10
to reduce computing time;number_effective_samples
calls that depend onneff_hugonnet_approx
withsubsample=10
to reduce computing time;maxlag
inxdem.spatialstats.sample_empirical_variogram
;niter
argument toscipy.optimize.basinhopping
inxdem.fit.robust_sumsin_fit
;test_fit.py
functions using basinhopping toniter<25
, depending on the test case (need to converge to the test function);docs/source/code
and adjusts the line changes in the docs;Raster
objects for thepatches_method
;_patches_loop_cadrants
function;_patches_convolution
method based on convolution.Additionally, this PR adds some functions that might be good to move to
geoutils
:Raster
orndarray
input + an exclusion/inclusion mask asVector
,np.ndarray
orgpd.GeoDataFrame
into a 1Dnp.ndarray
of included terrain or a 2Dnp.ndarray
with NaNs on included terrain is now consistently performed by the function_preprocess_values_with_mask_to_array
(then used ininfer_...
functions andpatches_method
, for now).convolution
function wrapper that calls either_scipy_convolution
or_numba_convolution
. Scipy has methods that are quite efficient for large arrays with any kernel size, while numba is very fast for small kernel sizes (source: https://laurentperrinet.github.io/sciblog/posts/2017-09-20-the-fastest-2d-convolution-in-the-world.html). Strangely, thenumba
convolution currently fails with a Segmentation Fault, impossible to trace the error...mean_filter_nan
function that adapts a NaN arrays to compute the mean and count of valid samples by convolution.Resolves #289
Resolves #294
Resolves #284
To-do-list:
The twelve labors of Hercules (reduce under 1s, or up to 5s for fitting/sampling requiring more processing):
The four horsemen of the apocalypse (first three unchanged to avoid damaging the quality of the examples):
patches_method
)