Add multivariate hypergeometric distribution #1963
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This pull requests adds the multivariate hypergeometric distribution. This is a generalization of the hypergeometric distribution. It includes:
scr/multivariate/mvhypergeom.jl
that implementsMvHypergeom
as a subtype ofDiscreteMultivariateDistribution
.scr/samplers/mvhypergeom.jl
that implements sampling.test/multivariate/mvherpgeom.jl
.Motivation
The multivariate hypergeometric distribution is an important distribution in statistics for testing independence in contingency tables. It is implemented in the
numpy
andscipy
Python packages but currently it is not supported inDistributions.jl
.Implementation details
The type
MvHypergeometric
is created as a subtype ofDiscreteMultivariateDistribution
. Functions for the mean, variance and covariance matrix are implemented. Evaluation of the log pdf and sampling are also implemented. Sampling is implemented in the filescr/samplers/mvhypergeom.jl
. The procedure is analogous to sampling from a multinomial distribution. The entries are sampled sequentially from univariate hypergeometric distributions.Testing
Tests are include in
test/multivariate/mvherpgeom.jl
. The statistics, pdf and sampling are all tested. The pdf is also compared to the pdf of the hypergeometric distribution. Specifically, for the marginal and conditional distributions of the multivariate hypergeometric are univariate hypergeometric (as used in the sampling).Dependencies
No new dependencies are added.