-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Homogenize and simplify spatialstats.py
#276
Conversation
Update upstream
…unction checks, names, descriptions
Thanks for this huge effort ! 👏 I made some (mostly minor) comments directly in the code. We will see with practice how it goes.
I went through each function's name and commented when I thought some changes could be useful. We should particularly try to be consistent between the use of
Mmmh. Maybe a good idea to have just one function with a parameter to use one or the other.
Sounds perfect!
Yes, I think it is important to have a simple pipeline that runs all the steps. |
…t_equidistant_sampling_params
Merging now that comments are accounted for. |
This PR aims to address several issues linked to
spatialstats
, in particular the ease-of-use, clarity of parameters and tests!Summary of changes
sample_empirical_variogram
in relation toskgstat.MetricSpace
have been simplified, and further described. In particular thesubsample
argument is now automatically calculated to match the number pairwise samples found in a single ensemble (N**2/2 if N is the subsample), even if the method uses more complex distinct ensembles (as is the case for the defaultcdist_equidistant
method),nd_binning
,plot_1d_binning
andplot_2d_binning
.skgstat.models
, which results in the deletion of thevgm
,cov
functions that existed inxdem.spatialstats
, and the replacement of occurences infit_sum_model_variogram
andneff
functions (for fitting to empirical variograms, and spatial integration),pd.Dataframe
object with columns "models", "range", "psills" and "smooth", described in related function output/input. The format is tested before each function run to provide clearValueError
messages in case of wrong user input,get_func_sum_vgm_models
,covariance_from_vgm
andcorrelation_from_vgm
have been added to convert variogram model parameters of apd.Dataframe
into a function of sum of variogram with spatial lags, a spatial covariance function with spatial lags, and also a spatial correlation function with spatial lags,neff
functions to estimate a number of effective samples based on spatial correlation have been improved, clarified, and further tested. In particular, theneff_circular_approx_theoretical
now contains exact circular integration formulas for any number of summedspherical
,gaussian
,exponential
andcubic
models. It is used to thoroughly test theneff_circular_approx_numerical
function that can integrate numerically any number of summed variogram of any form. Those two fonctions are to be used when the user solely provides an area value. When the shape of the area is provided, theneff_exact
orneff_approx_hugonnet
, based on double sum of covariance with exact coordinates, can be used to derive the number of effective samples. The double covariance sum has been vectorized for computational speed, and the random sampling of theneff_approx_hugonnet
is now tested withrandom_state
arguments.Additional changes after answers to questions below:
vgm
into parameters or functions names intovariogram
,_
in function namesstd_err
,std_err_finite
,distance_latlon
,create_circular_mask
andcreate_circular_ring
to remove from API.sample_empirical_variogram
and_choose_cdist_equidistant_sampling_parameters
,(Old) Questions for @adehecq and @erikmannerfelt to move forward with this PR:
neff
function, which defaults to one of the other functions, and can be used to choose which calculation? Should I change the others to hidden functions_neff_...
? This would remove them from API.nd_binning
andinterp_nd_binning
, a new "standardization
" function, and a new "error_map
" function; then another one that combinessample_empirical_variogram
,fit_sum_model_variogram
andneff
to get the error in an area? Something along these lines.To-do-list for related issues
statistics
argument fornd_binning
#261,bins
column in empirically sampled variograms #253,spatialstats
#250,nd_binning
example in the documentation #238,double_sum_covar()
clarify and improve function arguments #215,sample_empirical_variogram
with multiple jobs #204.As well as:
spatialstats
to ensure clear error message with wrong user input.