## MCMC sampling
In the following we will compare adaptive vs nonadapative sampling with different proposals and measurement errors $\sigma$

We start by loading some previous samplings to infer the nonadaptive proposal from their covariance structure.

In [1]:
using GynC: readsamples, tabulate, proposal, Config, Lausanne, batch

    +(AbstractArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury) at /home/numerik/bzfsikor/.julia/v0.4/WoodburyMatrices/src/SymWoodburyMatrices.jl:106
is ambiguous with: 
    +(DataArrays.DataArray, AbstractArray) at /home/numerik/bzfsikor/.julia/v0.4/DataArrays/src/operators.jl:276.
To fix, define 
    +(DataArrays.DataArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury)
before the new definition.
    +(AbstractArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury) at /home/numerik/bzfsikor/.julia/v0.4/WoodburyMatrices/src/SymWoodburyMatrices.jl:106
is ambiguous with: 
    +(DataArrays.AbstractDataArray, AbstractArray) at /home/numerik/bzfsikor/.julia/v0.4/DataArrays/src/operators.jl:300.
To fix, define 
    +(DataArrays.AbstractDataArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury)
before the new definition.
  likely near /home/numerik/bzfsikor/.julia/v0.4/Graphs/src/common.jl:3
  likely near /home/numerik/bzfsikor/.julia/v0.4/Graphs/src/dot.jl:80
    promote_rule(Type{Mamba.ScalarLogical}, Type{##317

In [2]:
ss=readsamples("../data/0729")
tabulate(ss)

UndefVarError: UndefVarError: displaysize not defined

We now compute the proposal covariance matrix from the concatenated samplings.
Note that since theses Samplings apply to different persons, we might overestimate the wanted proposal since the Samplings also differ in their means, which reflects in a higher concatenated covariance

In [3]:
prop=proposal(ss)

116x116 Array{Float64,2}:
  0.0447767     0.0166214     0.0122865    …   0.00470154   -3.96231e-5 
  0.0166214     0.0631213     0.00111703       0.00669164    0.000295814
  0.0122865     0.00111703    0.0294749        0.00297939    0.000163624
 -0.0144799    -0.013932      0.00211897      -0.00128058    0.000299357
  0.019061      0.0128569     0.00353484       0.00409058    0.000296154
  0.0104416    -0.00332791   -0.00245846   …   0.00176953   -0.000673728
  0.000169384  -0.0126778    -0.00216748       0.0036454    -0.000231338
  0.00180442    0.0029766    -0.00178019      -0.000285955  -0.000124594
  0.00949314    0.0169925    -0.000450821      0.0019432    -0.000499532
  0.0229441     0.0187565     0.00307691       0.00151511   -0.000418582
  0.00768178    0.00109877    0.00278791   …   0.0029169     0.000472576
  0.00714424    0.0142447    -1.15786e-5      -0.00123643   -0.000523399
 -0.00588239   -0.00063318   -0.0013083        0.00313556    0.000162285
  ⋮                      

In [4]:
trace(prop)

3.527998930059972

We now can compute further samplings with this proposal density.

In [5]:
csman = [Config(Lausanne(p), sigma_rho=r, propvar=prop*scale, adapt=false, thin=100) for p=1:3, r=[0.05, 0.1], scale=[0.05, 0.1, 0.3, 0.5, 1, 2]]

csadapt = [Config(Lausanne(p), sigma_rho=r, propvar=prop/3, adapt=true, thin=100) for p=1:3, r=[0.05, 0.1]]

cs = Config[csman[:]; csadapt[:]]

datadir = "/datanumerik/bzfsikor/gync/0905/"
@spawn batch(cs, [1000, 100_000, 1_000_000, 2_000_000, 5_000_000, 10_000_000, 20_000_000], dir=datadir)

42-element Array{GynC.Config,1}:
 Config:
 patient: l1
 sigma:   0.05
 propvar trace: 0.17639994650299856
 adapt:   false
 thin:    100
 init:    4858380976513133059
 prior:   Tuple{Array{Distributions.Distribution{Distributions.Univariate,S<:Distributions.ValueSupport},1},Distributions.MixtureModel{Distributions.Multivariate,Distributions.Continuous,Distributions.MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}}}}
 Config:
 patient: l2
 sigma:   0.05
 propvar trace: 0.17639994650299856
 adapt:   false
 thin:    100
 init:    4858380976513133059
 prior:   Tuple{Array{Distributions.Distribution{Distributions.Univariate,S<:Distributions.ValueSupport},1},Distributions.MixtureModel{Distributions.Multivariate,Distributions.Continuous,Distributions.MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}}}}
 Config:
 patient: l3
 sigma:   0.05
 propvar trace: 0.17639994650299856
 adapt:   false
 thin:    100
 init:    4858380976513133059
 prior: 

In [35]:
datadir = "/datanumerik/bzfsikor/gync/0905/"
tab = readsamples(datadir) |> tabulate
hcat(tab[[:person, :sigma, :adapt, :tracepropinit]], tab[:length] ./ tab[:unique])

Unnamed: 0,person,sigma,adapt,tracepropinit,x1
1,l1,0.05,false,0.1763999465029985,44.84304932735426
2,l1,0.05,false,0.352799893005997,114.28571428571429
3,l1,0.05,false,1.058399679017991,303.030303030303
4,l1,0.05,true,1.1759996433533244,4.701457451810061
5,l1,0.05,false,1.763999465029986,312.5
6,l1,0.05,false,3.527998930059972,1190.4761904761904
7,l1,0.05,false,7.055997860119944,2857.1428571428573
8,l1,0.1,false,0.17639994650299853,2.450379808870375
9,l1,0.1,false,0.352799893005997,6.822213125938054
10,l1,0.1,false,1.058399679017991,183.8235294117647


## Results

Unfortunately this method does not work too well, since the number of unique samples in the nonadaptive case is much slower then the corresponding adaptive method.

## Using the average covariances

We can also try using the average of the samplings covariances, eliminating the overestimation due to different sampling means.

In [5]:
meanprop = map(proposal, ss) |> mean

116x116 Array{Float64,2}:
  0.00993125   -0.000497752   0.0014299    …   0.000743674  -3.3408e-5 
 -0.000497752   0.0144269     0.00158371       0.00131299   -5.10832e-5
  0.0014299     0.00158371    0.0105035        0.00144064    5.78305e-5
 -0.00198158   -0.00081318    0.00150236      -0.000677366  -1.77825e-5
  0.00173317    0.000516856  -1.34737e-6       0.00108705    5.13792e-5
  2.37634e-5   -0.000434933  -0.000190143  …  -0.000651979   7.19817e-5
 -4.45762e-5   -0.00126062    0.000436126      0.000753306   2.04528e-5
  0.00219808    0.000818994   0.000863762      0.000999007  -1.20646e-5
  0.000672605   0.00153512   -0.000375445      0.00117915   -2.9707e-5 
  0.00178312    0.00069866   -0.000687709      0.00090427    5.95139e-5
  0.000384667   0.0015601     0.000530375  …   0.000593993   1.84736e-5
 -0.000221954   0.0016267    -0.000646137     -0.00029772   -7.28629e-6
  0.000338944  -0.000795736   5.00691e-5       0.00129151    6.52405e-5
  ⋮                                   

In [6]:
prop ./ meanprop

116x116 Array{Float64,2}:
   4.50866    -33.3929    …   -55.457       6.32205     1.18604 
 -33.3929       4.37525       605.322       5.09649    -5.79082 
   8.59253      0.705323        0.0401785   2.0681      2.82938 
   7.30726     17.1328         26.3062      1.89053   -16.8344  
  10.9977      24.8753         12.3858      3.76302     5.76409 
 439.399        7.65154   …    -9.25022    -2.71409    -9.35972 
  -3.79987     10.0569         12.8041      4.8392    -11.3108  
   0.820907     3.63446        -2.14761    -0.286239   10.3273  
  14.114       11.0691         31.4607      1.64797    16.8153  
  12.8674      26.8465       -145.371       1.67551    -7.03335 
  19.9699       0.704295  …     2.35808     4.91067    25.5811  
 -32.188        8.75681        24.1956      4.15298    71.8334  
 -17.3551       0.795716       -1.90063     2.42783     2.48748 
   ⋮                      ⋱                             ⋮       
  -8.67015     29.2517         34.5476      1.28819     6.54013 

## Observation

These two matrices look quite different, think about a way to compare multivariate normal distributions / covariance matrices (reappearing problem).

## A similarity measure of two multivar. Normals

Using the geometric intuition of the projection of the two elipses onto each other, define the similiarty $s$ as  $$ s(\varphi_1, \varphi_2) := \frac{\int \varphi_1 \varphi_2}{\sqrt{\int \varphi_1 \varphi_1 \int \varphi_2 \varphi_2}} $$

We know (https://en.wikipedia.org/wiki/Gaussian_function#Multi-dimensional_Gaussian_function) that 
$$ \int \exp{(-x^TAx)} = \sqrt{\frac{\pi^n}{\det A}}  $$ and thus  (probably the constant is wrong)
$$
\int \varphi= \frac{1}{(2\pi)^\frac{n}{2} \sqrt{\det{V}}} \sqrt{\frac{2\pi^n}{\det{V^{-1}}}}$$

Furthermore the product of the pdfs $\varphi_1\varphi_2$ is the pdf of the normal distr. with inv. cov. matrix $$V_{12}^{-1} = V_1^{-1}+V_2^{-1} $$

In [7]:
sim(V1, V2) = intprod(V1, V2) / sqrt(intprod(V1,V1) * intprod(V2,V2))

function intprod(V1, V2) 
    n = size(V1, 1)
    Vinv = invcovprod(V1, V2)
    #1 / ((2pi) ^ (n/2) / sqrt(det(Vinv))) 
    sqrt((2*pi)^n / det(Vinv)) # check the 2*pi^n term
end

invcovprod(V1, V2) = (inv(V1) + inv(V2))

invcovprod (generic function with 1 method)

In [8]:
function randsympos(n)
  X = rand(n,n)
  X*X'
end

V1 = randsympos(100)
V2 = randsympos(100)
@show det(V1), det(V2)
sim(V1,V2)

(det(V1),det(V2)) = (5.319533928478608e52,1.3210188501718356e47)


1.5303951013324258e-16

In [9]:
tabulate(ss)

UndefVarError: UndefVarError: displaysize not defined

In [10]:
covs = [cov(s.samples) for s in ss]
[sim(a,b) for a in covs, b in covs]

# strange that they are all that different

12x12 Array{Any,2}:
 1.0          9.28706e-21  1.23424e-18  …  2.70521e-18  3.88395e-23
 9.28706e-21  1.0          8.06485e-22     2.1501e-20   4.19068e-23
 1.23424e-18  8.06485e-22  1.0             1.30632e-16  6.08356e-24
 4.01151e-23  2.27259e-23  7.39637e-25     4.03087e-23  2.37113e-23
 1.35408e-20  7.56007e-22  9.05158e-21     1.10256e-19  7.46903e-23
 1.40092e-19  5.91076e-21  2.241e-17    …  5.12076e-18  6.58311e-23
 8.72578e-24  2.33017e-26  9.43486e-17     1.02374e-20  2.14134e-29
 9.502e-21    6.85974e-22  8.50018e-23     6.89425e-21  3.56641e-23
 1.99077e-22  1.15691e-23  2.2995e-24      1.2283e-21   3.64434e-22
 2.84938e-22  8.06172e-23  2.8076e-25      3.15078e-23  3.32214e-23
 2.70521e-18  2.1501e-20   1.30632e-16  …  1.0          2.10963e-23
 3.88395e-23  4.19068e-23  6.08356e-24     2.10963e-23  1.0        

In [11]:
# Now we see that prop and meanprop arent that different
sim(prop,meanprop)

0.015921692533508155

## Simulation with meanprop

In [15]:
csman = Config[Config(Lausanne(p), sigma_rho=0.1, propvar=meanprop*scale, adapt=false, thin=100) for p=1:3, scale=[0.1, 0.3, 0.5, 0.8, 1, 1.2, 1.5, 2]][:]

datadir = "/datanumerik/bzfsikor/gync/0906/"
@spawn batch(csman, [1000, 100_000, 1_000_000, 2_000_000, 5_000_000, 10_000_000, 20_000_000], dir=datadir)

RemoteRef{Channel{Any}}(1,1,74)

connecting to worker 1 out of 24

srun: job 171 queued and waiting for resources
srun: job 171 has been allocated resources




In [None]:
# simulate for priorest
cs = Config[Config(Lausanne(p), sigma_rho=s, thin=100, adapt=true) for p=1:45, s=[0.1]][:]

datadir = "/datanumerik/bzfsikor/gync/0911/"
batch(cs, [1000, 100_000, 1_000_000, 2_000_000, 5_000_000, 10_000_000, 20_000_000], dir=datadir)

connecting to worker 1 out of 45

srun: job 219 queued and waiting for resources
srun: job 219 has been allocated resources
