# mmCDH23 probabilities for $Ca^{2+}$-site occupancy
This notebook automates the MCMC Gibbs sampling of probability distributions for CDH23 $\mathrm{Ca}^{2+}$ occupancy at discrete binding sites between repeats. A general user would only need to specify their parameters in the section **Inputs to be specified by user**.

In [None]:
# load packages
include("gibbs.jl");
gr(); 
myrng = MersenneTwister();


myclr∇ = cgrad(:RdBu_10,rev=false);
myclrp = palette(:seaborn_muted6);
myclrs = [myclr∇[z] for z∈LinRange(0.0,0.7,30)];

### Inputs to be specified by user
In this section, the user can choose the parameter values for $\mathrm{Ca}^{2+}$ binding affinity, parameters for binomial models of $\mathrm{Ca}^{2+}$ occupancy in hair bundles, as well as MCMC and plotting parameters. The default $K_d$ values were taken from <a href=https://doi.org/10.1016/j.neuron.2010.03.028>(Sotomayor 2010)</a> for EC1-2. Findings reported in <a href= https://doi.org/10.7554/eLife.43473>(Tobin 2019)</a> indicate that there are about $NM=40$ tiplinks in a bundle. We also take $NL=24$ as the number of linker regions in a tiplink. From <a href=https://doi.org/10.1101/cshperspect.a029280>(Jaiganesh 2018)</a> these are the canonical $\mathrm{Ca}^{2+}$-binding linkers and omit the two linkers that are noncanonical.

Recall the <i>typical</i> $\mathrm{Ca}^{2+}$ configurations at a linker are those where $\mathrm{Ca}^{2+}$ binds in order of its binding affinity: 000,001,011,111. The <i>atypical</i> configurations are those where $\mathrm{Ca}^{2+}$ binds out of order of its binding affinity: 010,100,101,110. In these representations, the triple ijk has i,j,k corresponding to sites 1,2,3. i is 0 or 1 depending if $\mathrm{Ca}^{2+}$ is not bound at site 1 or if it is. Similar holds for j and k.

In [None]:
# Biochemical parameters
#  Kd values
prm = Dict{Symbol,Float64}();
prm[:K₁]=71.4;
prm[:K₂]=44.3;
prm[:K₃]=(1.9+5)/2;

#  Maximum probability of all atypical cases happening. 3/8 was 
prm[:τ] = 3/8; # p010+p100+p101+p110 <= τ

#  Ca²⁺ ranges to be examined 
#  code assumes it is from 1:1:M for some M and includes 500
Carg = 1.0:1.0:500;

#  Ca²⁺ value for which running more specific analysis of Table #
Caval = 20.;

In [None]:
# Bundle occupancy parameters
#  NM is the number of CDH23 monomers within a bundle
NM = 2*40; 
#  NL is the number of linker regions within a tiplink
NL = 24;
# Ca values at which bundle occupancy is analyzed, these should be integer-valued, within Carg, and there should be four of them
Ca = [20.0,40.0,50.0,500.0];

In [None]:
# MCMC Gibbs sampling parameters
#  number of samples in a chain after discarding burn-in
nsmp = 25000; 

#  number of independent mcmc chains
nchains = 4;

# burn-in period
nburnin = nsmp÷2;

In [None]:
# Plot parameters
#  window for Ca²⁺ concentration zoom-in
figwin = (0,50);

### Statistics by Gibbs sampling

In [None]:
# Perform gibbs sampling and compute marginal probabilities 
# ordered as p010 p100 p101 p110 p000 p001 p011 p111
SMPS = Vector{Array{Float64,3}}(undef,nchains);
pmargs = Vector{Matrix{Float64}}(undef,nchains);
@inbounds for i=1:nchains
    ram = gibbsrun(prm,Carg;nsmp=nsmp+nburnin,rng=myrng);
    # discard burn-in
    SMPS[i] = copy(ram[:,nburnin+1:end,:]);
    pmargs[i] = sum(SMPS[i],dims=2)/nsmp |> (x->reshape(x,(:,8)));
    
    println("finished with MCMC chain $i ...")
end

# Aggregate samples across all chains
SMP = SMPS[1];
for i=2:nchains
    SMP = cat(SMP,SMPS[i];dims=2);
end
pmarg = sum(SMP,dims=2)/(nchains*nsmp) |> (x->reshape(x,(:,8)));

In [None]:
# export the Gibbs samples for any further computation
#  SMP was nCa x nchains*nsmp x 8 
#  we permute first and last indices of array and then store as a reshaped 8nchains*nsmp x nCa matrix
#  Consecutive blocks of eight in the first column are the probabilities of all Ca2+ configs in a single Gibbs sample 
xpt = permutedims(SMP,(3,2,1));
xpt = reshape(xpt,8*nchains*nsmp,:);
dfxpt = DataFrame(xpt,:auto);
CSV.write("gibbssmp.csv",dfxpt);

MCMC convergence across all $[\mathrm{Ca}^{2+}]$ is assessed by running several independent chains and comparing the within-chain and between chain variance. This is quantified by the <i>Gelman-Rubin</i>(GR) statistic which is computed according to the procedure described in (Sec 11.4, <a href=http://www.stat.columbia.edu/~gelman/book/>Bayesian Data Analysis</a>). GR statistics close to 1 are supporting evidence of MCMC convergence. GR statistics significantly larger than $1$ suggest that more MCMC samples are needed.

Below, this statistic is computed for each probability of $\mathrm{Ca}^{2+}$ configuration at a linker and across all $\mathrm{Ca}^{2+}$ concentrations. The effect of changing $\mathrm{Ca}^{2+}$ concentration upon the GR statisic is summarized by quantiles. 

In [None]:
# Compute Gelman-Rubin convergence statistics for each Ca²⁺ configuration probability across all Ca²⁺ ranges
#  Each chain is split in half and the twice as many half-chains are analyzed
grs = grstat(SMPS);
println("Gelman-Rubin statistic for each linker configuration summarized over all Ca²⁺ values:")
dfgrstat = DataFrame("pCa²⁺"=>["p010","p100","p101","p110","p000","p001","p011","p111"],
                     "min"=>[minimum(grs[:,j]) for j=1:8],
                     "2.5%"=>[quantile(grs[:,j],0.025) for j=1:8],
                     "50%"=>[quantile(grs[:,j],0.5) for j=1:8],
                     "97.5%"=>[quantile(grs[:,j],0.975) for j=1:8],
                     "max"=>[maximum(grs[:,j]) for j=1:8])

In [None]:
p1 = plot(Carg,pmarg[:,1],labels="p010",linewidth=3,color_palette=myclrp,
          ribbon=(pmarg[:,1]-minimum(SMP[:,:,1],dims=2),
                  maximum(SMP[:,:,1],dims=2)-pmarg[:,1]),fillalpha=0.2)
plot!(Carg,pmarg[:,2],labels="p100",linewidth=3,
          ribbon=(pmarg[:,2]-minimum(SMP[:,:,2],dims=2),
                  maximum(SMP[:,:,2],dims=2)-pmarg[:,2]),fillalpha=0.2)
plot!(Carg,pmarg[:,3],labels="p101",linewidth=3,
          ribbon=(pmarg[:,3]-minimum(SMP[:,:,3],dims=2),
                  maximum(SMP[:,:,3],dims=2)-pmarg[:,3]),fillalpha=0.2)
plot!(Carg,pmarg[:,4],labels="p110",linewidth=3,
      xlabel="[Ca²⁺]",ylabel="probability",
          ribbon=(pmarg[:,4]-minimum(SMP[:,:,4],dims=2),
                  maximum(SMP[:,:,4],dims=2)-pmarg[:,4]),fillalpha=0.2,
      color_palette=myclrp,
      legend=:topleft,
      xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10,
      xlims=figwin);

p4 = plot(Carg,pmarg[:,1],labels="p010",linewidth=3,color_palette=myclrp,
          ribbon=(pmarg[:,1]-minimum(SMP[:,:,1],dims=2),
                  maximum(SMP[:,:,1],dims=2)-pmarg[:,1]),fillalpha=0.2)
plot!(Carg,pmarg[:,2],labels="p100",linewidth=3,
          ribbon=(pmarg[:,2]-minimum(SMP[:,:,2],dims=2),
                  maximum(SMP[:,:,2],dims=2)-pmarg[:,2]),fillalpha=0.2)
plot!(Carg,pmarg[:,3],labels="p101",linewidth=3,
          ribbon=(pmarg[:,3]-minimum(SMP[:,:,3],dims=2),
                  maximum(SMP[:,:,3],dims=2)-pmarg[:,3]),fillalpha=0.2)
plot!(Carg,pmarg[:,4],labels="p110",linewidth=3,
      xlabel="[Ca²⁺]",ylabel="probability",
          ribbon=(pmarg[:,4]-minimum(SMP[:,:,4],dims=2),
                  maximum(SMP[:,:,4],dims=2)-pmarg[:,4]),fillalpha=0.2,
      color_palette=myclrp,
      legend=:topleft,
      xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10);

p3 = plot(Carg,pmarg[:,5],labels="p000",linewidth=3,color_palette=myclrp,
          ribbon=(pmarg[:,5]-minimum(SMP[:,:,5],dims=2),
                  maximum(SMP[:,:,5],dims=2)-pmarg[:,5]),fillalpha=0.2)
plot!(Carg,pmarg[:,6],labels="p001",linewidth=3,
      ribbon=(pmarg[:,6]-minimum(SMP[:,:,6],dims=2),
              maximum(SMP[:,:,6],dims=2)-pmarg[:,6]),fillalpha=0.2)
plot!(Carg,pmarg[:,7],labels="p011",linewidth=3,
      ribbon=(pmarg[:,7]-minimum(SMP[:,:,7],dims=2),
              maximum(SMP[:,:,7],dims=2)-pmarg[:,7]),fillalpha=0.2)
plot!(Carg,pmarg[:,8],labels="p111",linewidth=3,
      ribbon=(pmarg[:,8]-minimum(SMP[:,:,8],dims=2),
              maximum(SMP[:,:,8],dims=2)-pmarg[:,8]),fillalpha=0.2,
      xlabel="[Ca²⁺]",ylabel="probability",
      color_palette=myclrp,
      legend=:topright,
      xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10);

p2 = deepcopy(p3); plot!(p2,xlims=figwin); plot!(p1,ylims=ylims(p2));

In [None]:
println()
println("Solid lines are the marginal probabilities across the constraint region of admissibile probability distributions.")
println("Left plot shows probabilities of the typical configurations when [Ca²⁺] varies along (0, 500) micromolar range.")
println("Right plot on bottom left zooms into [Ca²⁺] in the $(figwin) micromolar range.");
println("Bands show the extreme ranges observed over the Gibbs samples.")
lay = @layout [b c]
plot(p3,p2,layout=lay,size=(900,300),margin=4mm)

In [None]:
savefig("margprb_typical.pdf");

In [None]:
println()
println("Solid lines are the marginal probabilities across the constraint region of admissibile probability distributions.")
println("Left plot shows probabilities of the atypical configurations when [Ca²⁺] varies along (0, 500) micromolar range.")
println("Right plot on bottom left zooms into [Ca²⁺] in the $(figwin) micromolar range.");
println("Bands show the extreme ranges observed over the Gibbs samples.")
lay = @layout [b c]
plot!(p4,ylims=ylims(p2))
plot(p4,p1,layout=lay,size=(900,300),margin=4mm)

In [None]:
savefig("margprb_atypical.pdf");

### Gibbs sampling at a highlighted $[\mathrm{Ca}^{2+}]$ value

In [None]:
ram = gibbsrun(prm,Caval;nsmp=nsmp+nburnin,rng=myrng);
SMP0 = copy(ram[nburnin+1:end,:]);

In [None]:
# Plot histogram of typicals
p1 = histogram(SMP0[:,5],title="p000",labels="",normalize=:probability,bins=10,
               ylabel="fraction of\n constraint region");
p2 = histogram(SMP0[:,6],title="p001",labels="",normalize=:probability,bins=10);
p3 = histogram(SMP0[:,7],title="p011",labels="",normalize=:probability,bins=10,
               ylabel="fraction of\n constraint region",
               xlabel="probability");
p4 = histogram(SMP0[:,8],title="p111",labels="",normalize=:probability,bins=10,
               xlabel="probability");

println()
println("Marginal histograms at [Ca²⁺]=$(Caval)")
lay = @layout [a b;c d];
plot(p1,p2,p3,p4,layout=lay,margin=2mm)

In [None]:
savefig("marghist_Ca=$(Caval).pdf");

In [None]:
# Plot trace plots
# Plot traces of likelies
q1 = plot(SMP0[:,5],title="p000",labels="",ylabel="value");
q2 = plot(SMP0[:,6],title="p001",labels="");
q3 = plot(SMP0[:,7],title="p011",labels="",xlabel="sample",ylabel="value");
q4 = plot(SMP0[:,8],title="p111",labels="",xlabel="sample");

println()
println("Trace plots at [Ca²⁺]=$(Caval)")
lay = @layout [a b;c d];
plot(q1,q2,q3,q4,layout=lay)

In [None]:
savefig("extrace_Ca=$(Caval).pdf");

In [None]:
# Compute some statistics
M = Matrix{Float64}(undef,4,4);
for i=5:8
    for j=5:8
        M[i-4,j-4] = sum((SMP0[:,i].<=SMP0[:,j]))/nsmp;
    end
end

println("For [Ca²⁺] = $Caval:")
println("Probabilities of pi<=pj for all pairs of p000,p001,p011,p111:");
df = DataFrame("Pair"=>["p000","p001","p011","p111"],
               "p000"=>M[:,1],"p001"=>M[:,2],"p011"=>M[:,3],"p111"=>M[:,4]);
println(df)

# Bundle $Ca^{2+}$-occupancy
This section takes $N_M$ as the number of CDH23 monomers in a bundle and $N_L$ as the number of linker regions.
### Record marginal $Ca^{2+}$ state probabilities at different concentrations for bundle analysis

In [None]:
Ps = Dict{String,Vector}(
     "state"=>["p000","p001","p011","p111"],
     "[Ca²⁺]=20 μM"=>[mean(SMP[20,:,5]),mean(SMP[20,:,6]),mean(SMP[20,:,7]),mean(SMP[20,:,8])],
     "[Ca²⁺]=40 μM"=>[mean(SMP[40,:,5]),mean(SMP[40,:,6]),mean(SMP[40,:,7]),mean(SMP[40,:,8])],
     "[Ca²⁺]=50 μM"=>[mean(SMP[50,:,5]),mean(SMP[50,:,6]),mean(SMP[50,:,7]),mean(SMP[50,:,8])],
     "[Ca²⁺]=500 μM"=>[mean(SMP[500,:,5]),mean(SMP[500,:,6]),mean(SMP[500,:,7]),mean(SMP[500,:,8])]
      );
dfPs = DataFrame(Ps)

### Distributions for occupancy at a linker

In [None]:
hst1 = pltbndl(20;SMP=SMP,NM=NM,NL=NL);
plot!(hst1,xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10,titlefontsize=14,
      ylims=(0,1),xlims=(0,25))
hst2 = pltbndl(40;SMP=SMP,NM=NM,NL=NL);
plot!(hst2,xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10,titlefontsize=14,
      ylims=(0,1),xlims=(0,25))
hst3 = pltbndl(50;SMP=SMP,NM=NM,NL=NL);
plot!(hst3,xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10,titlefontsize=14,
      ylims=(0,1),xlims=(0,25))
hst4 = pltbndl(500;SMP=SMP,NM=NM,NL=NL);
plot!(hst4,xtickfontsize=10,ytickfontsize=10,fontsize=12,legendfontsize=10,titlefontsize=14,
      ylims=(0,1),xlims=(0,25))
plot!(hst1,xlabel=""); plot!(hst2,xlabel="",ylabel="",legend=false); 
plot!(hst3,legend=false); plot!(hst4,ylabel="",legend=false);

lay = @layout [a b;c d]
plot(hst1,hst2,hst3,hst4,layout=lay,size=(750,500),margin=3mm)

In [None]:
savefig("bndllnk.pdf");

#### Variance in bundles

In [None]:
Zs = [bndlvar(k;SMP=SMP,NM=NM,NL=NL,nsmp=3000) for k∈[20,40,50,500]];
sct = pltbndlvar("111";Zs=Zs,Cavals=[20,40,50,500],xlims=(0,25),ylims=(0,3),mrkα=0.3)

In [None]:
savefig("bndlvar_cbar.pdf");