Ball Statistics
Introdution
The fundamental problems for data mining, statistical analysis, and machine learning are:
- whether several distributions are different?
- whether random variables are dependent?
- how to pick out useful variables/features from a high-dimensional data?
These issues can be tackled by Ball statistics, which enjoy following admirable advantages:
- available for most of datasets (e.g., traditional tabular data, brain shape, functional connectome, wind direction and so on)
- insensitive to outliers, distribution-free and model-free;
- theoretically guaranteed and computationally efficient.
Softwares
R package
Install the Ball package from CRAN:
install.packages("Ball")Compared with selective R packages available for datasets in metric spaces:
| fastmit | energy | HHG | Ball | |
|---|---|---|---|---|
| Test of equal distributions | ||||
| Test of independence | ||||
| Test of joint independence | ||||
| Feature screening / Sure Independence Screening (SIS) | ||||
| Iterative Feature screening / Iterative SIS | ||||
| Datasets in metric spaces | SNT | |||
| Robustness | ||||
| Parallel programming | ||||
| Computational efficiency |
SNT is the abbreviation of strong negative type.
See the following documents for more details about the Ball package:
- github page (short)
- vignette (moderate)
- JSS paper (detailed)
Python package
Install the Ball package from PyPI:
pip install BallReferences
- Pan, Wenliang; Tian, Yuan; Wang, Xueqin; Zhang, Heping. Ball Divergence: Nonparametric two sample test. Ann. Statist. 46 (2018), no. 3, 1109--1137. doi:10.1214/17-AOS1579. https://projecteuclid.org/euclid.aos/1525313077
- Wenliang Pan, Xueqin Wang, Weinan Xiao & Hongtu Zhu (2018) A Generic Sure Independence Screening Procedure, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1462709
- Wenliang Pan, Xueqin Wang, Heping Zhang, Hongtu Zhu & Jin Zhu (2019) Ball Covariance: A Generic Measure of Dependence in Banach Space, Journal of the American Statistical Association, DOI: 10.1080/01621459.2018.1543600
- Jin, Z., Wenliang P., Wei Z., and Xueqin W. (2018). Ball: An R package for detecting distribution difference and association in metric spaces. arXiv preprint arXiv:1811.03750. URL http://arxiv.org/abs/1811.03750.
Bug report
Open an issue or send an email to Jin Zhu at zhuj37@mail2.sysu.edu.cn