# Mandatory imports and utils

In [None]:
{-# LANGUAGE BangPatterns, ScopedTypeVariables #-}
import Control.Monad
import Control.Monad.Primitive

import qualified Data.Vector.Unboxed as U

import Numeric.SpecFunctions
import Numeric.MathFunctions.Constants
import Numeric.MathFunctions.Comparison
import Numeric.Polynomial.Chebyshev

import Text.Printf(printf)

import IHaskell.Display
import Graphics.Rendering.Chart.Backend.Cairo
import Graphics.Rendering.Chart.Easy

:l NB/Plot

# Incomplete beta

Quick reminder about beta function and (regularized) incomplete beta functions:

Beta function:
$$B(a,b) = \int_0^1 t^{a-1}(1 - t)^{b-1} \,dt $$

Incomplete beta:
$$B(x; a,b) = \int_0^x t^{a-1}(1 - t)^{b-1} \,dt \qquad x \in [0,1]$$

Regularized incomplete beta (from now it'll be referred to simply as incomplete beta)
$$I(x; a,b) = \frac{B(x; a,b)}{B(a,b)}$$


## Debugging of [math-functions#36](https://github.com/bos/math-functions/issues/36)

Again uncovered in plot for `cumulative . quantile` roundtrip error for beta distribution. It manifested itself as sharp spike near `p = 0.5`. Or if expressed in terms of `incompleteBeta` and `invIncompleteBeta`:

In [None]:
let fun x = let p  = invIncompleteBeta 4.5 4.5 x
                x' = incompleteBeta 4.5 4.5 p
            in x'
toRenderable
  $ let d = 5e-9 
     in plotFunctions [\x -> logBase 10 $ relativeError (fun x) x] (0, 1)

Note that both incomplete beta and its inverse are well behaved near 0.5 as seen from plots below. So spike is clearly bug

In [None]:
toRenderable
  $ layout_title .~ "incomplete beta"
  $ plotFunctions [incompleteBeta 4.5 4.5] (0,1)
toRenderable
  $ layout_title .~ "inverse incomplete beta"
  $ plotFunctions [invIncompleteBeta 4.5 4.5] (0,1)

Closeup of roundtrip error

In [None]:
let fun x = let p  = invIncompleteBeta 4.5 4.5 x
                x' = incompleteBeta 4.5 4.5 p
            in x'
toRenderable
  $ let d = 5e-9 
     in plotFunctions [\x -> logBase 10 $ relativeError (fun x) x] (0.5-d, 0.5+d)

So how does `incompleteBeta` looks in neigborhood of 0.5?

In [None]:
toRenderable
  $ let d = 5e-9 
        a = 4        
     in plotFunctions [incompleteBeta a a] (0.5 - d, 0.5 + d)

One of incomplete beta's properties is:

  $$I(x; a,b) = 1 - I(1-x; b,a)$$
  

Also this implementation uses series which converge only for $x < \frac{a}{a+b}$ and otherwise calculates 
$1 - I(1-x; b,a)$. So in this case we switch approximations at `x=0.5` which should explain jump in the plot.
Let compare `incompleteBeta`'s output with exact functions:

$$I(0.5; a,a) = 0.5$$

And 

$$\frac{d}{dx}I(x; a,b) = \frac{1}{B(a,b)} x^{a-1}(1-x)^{b-1}$$

So in small neighborhood of 0.5 we can approximate $I(x; a,a)$ as $0.5 + k\cdot(x-0.5)$ where

$$k = \frac{1}{2^{2a-2} B(a,a)}$$

In [None]:
toRenderable
  $ let d = 5e-9 
        a = 7.5
        k = 1 / (exp (logBeta a a) * 2**(2*a - 2))
     in plotFunctions [ incompleteBeta a a
                      , \x -> 0.5 + k*(x-0.5)
                      ] (0.5 - d, 0.5 + d)

Hmm... `incompleteBeta` clearly fails to converge to exact answer. What could cause it?  

And let plot gap size dependency on `a`:

In [None]:
toRenderable $
  let fun a = incompleteBeta a a (addUlps   1  0.5)
            - incompleteBeta a a (addUlps (-1) 0.5)
  in plotFunctions [fun] (1e-3,11)

It absolutely looks like something which fails to converge. Also notice that when `a>10` issue suddently disappears. It looks like we switch approximations. But there's no switching between approximations at `a=10`! It looks very strange.

But maybe problem is casused but not including enough terms tio expasion? Let play with convergence criterion (`eps` in `incompleteBetaWorker` definition)

In [None]:
incompleteBeta' :: Double -- ^ /p/ > 0
               -> Double -- ^ /q/ > 0
               -> Double -- ^ /x/, must lie in [0,1] range
               -> Double
incompleteBeta' p q = incompleteBeta_ (logBeta p q) p q

-- | Regularized incomplete beta function. Same as 'incompleteBeta'
-- but also takes logarithm of beta function as parameter.
incompleteBeta_ :: Double -- ^ logarithm of beta function for given /p/ and /q/
                -> Double -- ^ /p/ > 0
                -> Double -- ^ /q/ > 0
                -> Double -- ^ /x/, must lie in [0,1] range
                -> Double
incompleteBeta_ beta p q x
  | p <= 0 || q <= 0            =
      error $ printf "incompleteBeta_: p <= 0 || q <= 0. p=%g q=%g x=%g" p q x
  | x <  0 || x >  1 || isNaN x =
      error $ printf "incompletBeta_: x out of [0,1] range. p=%g q=%g x=%g" p q x
  | x == 0 || x == 1            = x
  | p >= (p+q) * x   = incompleteBetaWorker beta p q x
  | otherwise        = 1 - incompleteBetaWorker beta q p (1 - x)

-- Worker for incomplete beta function. It is separate function to
-- avoid confusion with parameter during parameter swapping
incompleteBetaWorker :: Double -> Double -> Double -> Double -> Double
incompleteBetaWorker beta p q x
  -- For very large p and q this method becomes very slow so another
  -- method is used.
  | p > 3000 && q > 3000 = error "incompleteBetaApprox beta p q x"
  | otherwise            = loop (p+q) (truncate $ q + cx * (p+q)) 1 1 1
  where
    -- Constants
    eps = m_epsilon / 128
    cx  = 1 - x
    -- Loop
    loop !psq (ns :: Int) ai term betain
      | done      = betain' * exp( p * log x + (q - 1) * log cx - beta) / p
      | otherwise = loop psq' (ns - 1) (ai + 1) term' betain'
      where
        -- New values
        term'   = term * fact / (p + ai)
        betain' = betain + term'
        fact | ns >  0   = (q - ai) * x/cx
             | ns == 0   = (q - ai) * x
             | otherwise = psq * x
        -- Iterations are complete
        done = db <= eps && db <= eps*betain' where db = abs term'
        psq' = if ns < 0 then psq + 1 else psq

-- Create plots
toRenderable
  $ let d = 5e-9 
        a = 7.5
        k = 1 / (exp (logBeta a a) * 2**(2*a - 2))
     in plotFunctions [ incompleteBeta a a
                      , \x -> 0.5 + k*(x-0.5)
                      , incompleteBeta' a a
                      ] (0.5 - d, 0.5 + d)

toRenderable $
  let fun a = incompleteBeta' a a (addUlps   1  0.5)
            - incompleteBeta' a a (addUlps (-1) 0.5)
  in plotFunctions [logBase 10 . abs . fun] (1e-3,11)

No effect!

Next lead is `a=10`. Do we have such condition anywhere in the code. Yes! In `logBeta` and we do use `logBeta` in `incompleteBeta`.

```
logBeta' :: Double -> Double -> Double
logBeta' a b
    | ...
    | p >= 10   = ...
    | q >= 10   = ...
    | otherwise = logGamma p + logGamma q - logGamma pq
    where
      p   = min a b
      q   = max a b
      ...
```

So when `a<10` we calculate $B(a,a)$ as $\Gamma(a)\Gamma(a)/ \Gamma{a+a}$ using `logGamma`. And `logGamma`'s documentation clearly states that if doesn't have full double precision. It looks like culprit! Let try it:

In [None]:
-- | Compute the natural logarithm of the beta function.
logBeta' :: Double -> Double -> Double
logBeta' a b
    | p < 0     = m_NaN
    | p == 0    = m_pos_inf
    | p >= 10   = log q * (-0.5) + m_ln_sqrt_2_pi + logGammaCorrection p + c +
                  (p - 0.5) * log ppq + q * log1p(-ppq)
    | q >= 10   = logGamma p + c + p - p * log pq + (q - 0.5) * log1p(-ppq)
    | otherwise = logGammaL p + logGammaL q - logGammaL pq
    where
      p   = min a b
      q   = max a b
      ppq = p / pq
      pq  = p + q
      c   = logGammaCorrection q - logGammaCorrection pq

-- | Compute the log gamma correction factor for @x@ &#8805; 10.  This
-- correction factor is suitable for an alternate (but less
-- numerically accurate) definition of 'logGamma':
--
-- >lgg x = 0.5 * log(2*pi) + (x-0.5) * log x - x + logGammaCorrection x
logGammaCorrection :: Double -> Double
logGammaCorrection x
    | x < 10    = m_NaN
    | x < big   = chebyshevBroucke (t * t * 2 - 1) coeffs / x
    | otherwise = 1 / (x * 12)
  where
    big    = 94906265.62425156
    t      = 10 / x
    coeffs = U.fromList [
               0.1666389480451863247205729650822e+0,
              -0.1384948176067563840732986059135e-4,
               0.9810825646924729426157171547487e-8,
              -0.1809129475572494194263306266719e-10,
               0.6221098041892605227126015543416e-13,
              -0.3399615005417721944303330599666e-15,
               0.2683181998482698748957538846666e-17
             ]

incompleteBeta'' :: Double -- ^ /p/ > 0
               -> Double -- ^ /q/ > 0
               -> Double -- ^ /x/, must lie in [0,1] range
               -> Double
incompleteBeta'' p q = incompleteBeta_ (logBeta' p q) p q

-----------------------------------------------------------------------------------
toRenderable
  $ let d = 5e-9 
        a = 7.5
        k = 1 / (exp (logBeta a a) * 2**(2*a - 2))
     in plotFunctions [ incompleteBeta a a
                      , \x -> 0.5 + k*(x-0.5)
                      , incompleteBeta'' a a
                      ] (0.5 - d, 0.5 + d)

toRenderable $
  let fun a = incompleteBeta'' a a (addUlps   1  0.5)
            - incompleteBeta'' a a (addUlps (-1) 0.5)
  in plotFunctions [logBase 10 . abs . fun] (1e-3,11)

That's it! So problem is insuffucient precision of `logBeta`. Switching to `logGammaL` fixes problem but we need to check performance and look for other methods. In particular we need to understand what is `logGammaCorrection`.