In [1]:
pwd()

"/mnt/Data/projects/personal/scratch-code/julia_stuff/tpoisson"

In [2]:
using Pkg
Pkg.activate(".")

[32m[1m Activating[22m[39m environment at `/mnt/Data/projects/personal/scratch-code/julia_stuff/tpoisson/Project.toml`


# Big Idea

Take some correlated gamma sims, run them through a Poisson process, and now you magically have negative binomial sims. Wow!

Now we need to calculate the PMF of the multivariate negative binomial sims. How?

Magic!

In [3]:
using Distributions, Statistics, LinearAlgebra, SpecialFunctions

Okay, not really. Say we have a set of target negative binomial distributions. By this I mean we want to end up with margins that have a set of target parameters and a desired correlation structure.

Negative Binomial (`NB`) distributions are parameterized by a *size* $r$ and a *probability* $p$. So we need to set some target parameters.

Smaller probabilities allow for a wider range of possible correlations.

In [4]:
target_probs = (0.14, 0.08)

(0.14, 0.08)

In [5]:
target_sizes = (23, 30)

(23, 30)

In [6]:
margins = Tuple(NegativeBinomial(r, p) for (r, p) in zip(target_sizes, target_probs))

(NegativeBinomial{Float64}(r=23.0, p=0.14), NegativeBinomial{Float64}(r=30.0, p=0.08))

In [7]:
Y = median.(margins)

(139, 341)

If we assume that the margins are independent, then we can find the joint PMF by calculating the PMF of each margin at their respective values and taking the product.

In [8]:
independent_logPMF(margins, Y) = sum([logpdf(m, y) for (m, y) in zip(margins, Y)])
independent_PMF(margins, Y) = exp(independent_logPMF(margins, Y))

independent_PMF (generic function with 1 method)

In [9]:
independent_PMF(margins, Y)

7.74716982679846e-5

In [10]:
scale_factors = √(prod(1 .- target_probs))

0.8894942383174834

In [11]:
sum([independent_PMF(margins, [x, y]) for x in 0:2000 for y in 0:2000])

0.999999999999989

# Setup

There are these two equations from this one paper, **Eq_14** and **Eq_29**, that can somehow determine the probability mass function of correlated NB distributions. That doesn't sound too amazing until you consider that holy carp! we have a correlated MvNB distribution!

Okay, so what's the deal?

Well there's this beast of an equation, **Eq_14**

$$
\mathbb{P}(Y = y) = \mathbb{E}\left\lbrace c\left(F_1(X_1), \ldots, F_d(X_d)\right)\right\rbrace \cdot \prod_{i=1}^d g_i (y_i)
$$

where $g_i(\cdot)$ is the $i^{th}$ PMF of the target marginal negative binomial distribution. In this bivariate case, the margins are

$$
\begin{align*}
Y_1 &\sim \mathrm{NB}(30, 0.14) \\
Y_2 &\sim \mathrm{NB}(23, 0.09)
\end{align*}
$$

The next part is the expected value of the copula function. The copula function is given by **Eq_29**

$$
c_R(u_1, \ldots, u_d) = \frac{1}{|{R}|^{1/2}}e^{-\frac{1}{2}(\Phi^{-1}(u))^T (R^{-1} - I_d) \Phi^{-1}(u)},\quad u=(u_1, \ldots, u_d)^T \in [0, 1]^d
$$

Essentially it's the PDF of a standard multivariate gaussian distribution, but with the diagonal of the correlation matrix removed in the exponent. Let's try to write it as a function of $u$ and $R$.

In [12]:
Φ⁻¹(u) = quantile(Normal(), u)
cR(u, R) = inv(√det(R)) * exp( (-0.5 .* Φ⁻¹(u)' * (inv(R) - I) * Φ⁻¹(u))[] )

cR (generic function with 1 method)

In [13]:
U = rand(10000, 2)
Φ⁻¹(U[1,:])

2-element Array{Float64,1}:
 0.9848110654261236
 0.06075252162771529

In [14]:
ρ = 0.0
R = Float64[1 ρ; ρ 1]

2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0

In [24]:
U = rand(1, 2);
cR_val = mapslices(x -> cR(x, R), U; dims=2);
E = mean(cR_val)

1.0

In [16]:
myPDF(x, y) = exp(log(E) + sum([logpdf(m, y) for (m, y) in zip(margins, [x, y])]))

myPDF (generic function with 1 method)

In [17]:
myCDF(a, b) = sum(myPDF(u, v) for u in 0:a for v in 0:b)

myCDF (generic function with 1 method)

In [18]:
myCDF(median(margins[1]), median(margins[2]))

0.2541964274438002

In [19]:
independent_PMF(margins, Y)

7.74716982679846e-5

In [20]:
myPDF(Y...)

7.74716982679846e-5