New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP : Remove smirnov entries from special/_legacy.pxd #9089
Conversation
Modify a generic test framework to respect type of distribution shape arguments. Allowed removal of smirnov entries from _legacy.pxd and functions.json in special. Still fails one test test_ksone_fit_freeze in scipy/stats/tests/test_distributions.py.
Test Failure 2: It fails trying to construct a moment because the first arg ('n') passed in has a float type, not an int type. (See below.) What exactly is this function
|
With the "legacy" behavior of flooring integer parameters, you probably get some semi-reasonable result from the fit --- the default optimizer is nelder-mead which probably is not terribly misled. |
two reasons why
Also, it is not clear which function or whether a function has an extension to real numbers, e.g. degrees of freedom in F, t and chi2 distributions don't need to be int. Many discrete distribution parameters extend to floats. |
[There are actually more than two tests failing in the CI runs. The failures internal to For My observation on |
I don't have the latest scipy installed, I get what I would expect. Although it would be better if the return for np.inf and other large n were the asymptotic value, if that is possible.
I'm mainly in favor of a general policy, where the ufuncs in scipy special handle those cases. This is more convenient and makes is easier to handle many distributions consistently in scipy.stats and in other packages. If I remember correctly (I haven't looked at it in a long time), then the asymptotic distribution for the one-sided kstest has a simple form that would not have a int requirement (n=np.inf IIUC) Related to extension to positive real line: scipy stats distribution uses
The bug was reported from pymvpa. pymvpa was (and maybe still is) fitting all distributions to find a candidate distribution that is closest to the data. AFAIK, the fit unit tests in scipy.stats.distributions still use a blacklist for distributions where I don't know of a usecase where I would fit smirnov, but that doesn't mean that it might not show up. I hadn't looked at gh-7491 (too close to or at the beginning of my vacation last year). I don't see how that is related. For many distributions boundary values for the parameters are explicitly excluded and return or should return nan in those cases. There have been changes over the years to the behavior at the boundaries, but most likely the code for those cases is still not completely "clean" for distributions that are not used much. |
If the test data is from real data then it is a real use case. [Some distributions are blacklisted for testing but this particular test is an individual test specifically targeted at
For Re gh-7491: If I recall correctly |
Closing as removing |
(WIP/Example)
@person142 has noted in #8737 (review) that adding unsafe versions of functions to
special/_legacy.pxd
is unwanted.This work/example removes the
smirnov
andsmirnovi
entries to understand the effect.After removing
smirnov
entries from_legacy.pxd
andfunctions.json
, two tests fail with TypeError Exceptions.Test Failure 1:
scipy/stats/tests/test_continuous_basic.py: test_rvs_broadcast
fails as an array constructed from the distribution's shape values is not given the type of the value.test_rvs_broadcast
is a generic routine used to test 100+ distributions. It gets the shape parameters fromscipy/stats/_distr_params.py
I.e. an array of
float
is created even though the first element of the shape maybe an int. Attempting to pass the float intosmirnov(int n, double e)
fails.This test can be made to pass. The change to
stats/tests/test_continuous_basic.py
in this PR uses the type of the shape_args element when creating the array of ones. I.e.allargs.append(shape_args[k]*np.ones(shp))
->
allargs.append(shape_args[k]*np.ones(shp, dtype=type(shape_args[k])))
This would affect the invocation of
test_rvs_broadcast
for all 100_ distributions. None of the current list of distributions in_distr_params.py:distcont
fail. Any reason to think that this change would not be safe to apply, independent of gh-8737?