On multi-dimensional (discrete, non-parametric) goodness-of-fit tests

About the MINEN test
Some About the A-D test
About the K-S test
1. Multi-dimensional
About the DNN test
About the Li test

About the MINEN test

This CV answer mentions the ndtest Python code. It is a translation of the Matlab minentest function from the multdist code.

implements Aslan & Zech's and Szekely & Rizzo's test based on an analogy to statistical energy, which is minimized when the two samples are drawn from the same parent distribution

The main references for this method appear to be:

Statistical energy as a tool for binning-free, multivariate goodness-of-fit tests, two-sample comparison and unfolding, Aslan & Zech (2005)
Energy statistics: A class of statistics based on distances, Szekely & Rizzo (2014)

The multdist package is now developed as part of the highdim code.

Recently, the energy package by Maria Rizzo (written in R) and the code's manual became available.

About the A-D test

Multi-dimensional Anderson-Darling statistic based goodness-of-fit test for spectrum sensing, Gurugopinath & Samudhyatha (2015)

(..) we propose a multi-dimensional extension of the Anderson-Darling statistic based goodness-of-fit lest for spectrum sensing in a cognitive radio network with multiple nodes.

About the K-S test

The ASAIP article Beware the Kolmogorov-Smirnov test! by Feigelson & Babu states that:

KS test probabilities are wrong if the model was derived from the dataset.

The KS test can not be applied in two or more dimensions.

we recommend that astronomers replace the Kolmogorov-Smirnov test with the similar, but more sensitive, Anderson-Darling test.

The CV question Why can't one generalize the Kolmogorov-Smirnov test to 2 or more dimensions? is related to this article. Regarding the statement that "The KS test can not be applied in two or more dimensions", user Glen_b says:

As stated, this seems too strong.

(..) difficulties have been considered in several ways in a number of papers that yield bivariate/multivariate versions of Kolmogorov-Smirnov statistics that don't suffer from that problem.

and whuber comments:

I find the OP's reference of dubious quality (at the outset it misinterprets what hypothesis tests mean), it finally admits that "the bootstrap can come to the rescue, and significance levels for the particular multidimensional statistic and the particular dataset under study can be numerically computed."

so the dismissal of the K-S test by Feigelson & Babu is downplayed. This article also mentions several references on the topic.

Multi-dimensional

The article The two-dimensional Kolmogorov-Smirnov test by Lopes, Reid & Hobson (2007) compares:

three variations on the Kolmogorov-Smirnov test for multi-dimensional data sets are surveyed: Peacock’s test (..) Fasano and Franceschini’s test (..) Cooke’s test

Adapting goodness-of-fit tests to multi-dimensional space is generally seen as a challenge. Tests based on binning face the hurdle of what is called in the literature "the curse of dimensionality"

About Cooke's test they mention:

We show that his test is not a faithful variation of Peacock’s test and that the upper-bound for computing it is incorrectly stated

Apparently, only the two-dimensional case is addressed in this article. The references for the methods compared are:

Two-dimensional goodness-of-fit testing in astronomy, Peacock (1983)

(..) Two new statistical tests are developed. The first is a two-dimensional version of the Kolmogorov–Smirnov test, for which the distribution of the test statistic is investigated using a Monte Carlo method. This test is found in practice to be very nearly distribution-free, and empirical formulae for the confidence levels are given.

A multidimensional version of the Kolmogorov-Smirnov test, Fasano and Franceschini (1987)

The authors discuss a generalization of the classical Kolmogorov-Smirnov test, which is suitable to analyse random samples defined in two or three dimensions. This test provides some improvements with respect to an earlier version proposed by Peacock.

Cooke's method does not appear to have a valid reference and I could not find it online.

This answer in CV mentions "A two-dimensional extension of the Kolmogorov-Smirnov test" in the article:

A multivariate Kolmogorov-Smirnov test of goodness of fit Justel, Peña & Zamar (1997)

This paper presents a distribution-free multivariate Kolmogorov-Smirnov goodness-of-fit test. The test uses a statistic which is built using Rosenblatt's transformation and an algorithm is developed to compute it in the bivariate case. An approximate test, that can be easily computed in any dimension, is also presented. The power of these multivariate tests is studied in a simulation study.

About the DNN test

Estimation of Goodness-of-Fit in Multidimensional Analysis Using Distance to Nearest Neighbor, Narsky (2003)

A new method for calculation of goodness of multidimensional fits in particle physics experiments is proposed. This method finds the smallest and largest clusters of nearest neighbors for observed data points. The cluster size is used to estimate the goodness-of-fit and the cluster location provides clues about possible problems with data modeling. The performance of the new method is compared to that of the likelihood method and Kolmogorov-Smirnov test using toy Monte Carlo studies.

About the Li test

A nonparametric test for equality of distributions with mixed categorical and continuous data, Li, Maasoumi & Racined (2009)

In this paper we consider the problem of testing for equality of two density or two conditional density functions defined over mixed discrete and continuous variables. We smooth both the discrete and continuous variables, with the smoothing parameters chosen via least-squares cross-validation. The test statistics are shown to have (asymptotic) normal null distributions. However, we advocate the use of bootstrap methods in order to better approximate their null distribution in finite-sample settings and we provide asymptotic validity of the proposed bootstrap method. Simulations show that the proposed tests have better power than both conventional frequency-based tests and smoothing tests based on ad hoc smoothing parameter selection, while a demonstrative empirical application to the joint distribution of earnings and educational attainment underscores the utility of the proposed approach in mixed data settings.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
minenergy.py		minenergy.py
ndtest.py		ndtest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On multi-dimensional (discrete, non-parametric) goodness-of-fit tests

About the MINEN test

About the A-D test

About the K-S test

Multi-dimensional

About the DNN test

About the Li test

About

Releases

Packages

Languages

License

Gabriel-p/minenergy

Folders and files

Latest commit

History

Repository files navigation

On multi-dimensional (discrete, non-parametric) goodness-of-fit tests

About the MINEN test

About the A-D test

About the K-S test

Multi-dimensional

About the DNN test

About the Li test

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages