<h1 id="tocheading">TABLE OF CONTENTS</h1>
<div id="toc"></div>

**Updates to the table of contents are periodic, but run the cell below to first start or force an update.**

**2017-09-04 11:30am**

The main fluxSense() idea was to add a trajectory at a sensitive point for the gradient, to avoid gradients going to zero.

It seems like that idea addresses a problem that isn't really there: although the gradients do go very small, at least with the MGO network they are not small enough to kill the learning, and the Hessian minimization quickly focuses on the right direction and gets us home.

Instead, the problem (getting stuck in training) arises because of the term that penalizes small differences in the two outputs, i.e., the term that wants output decisions to be clear. We'll call this the **beta term**. (In the code, we use beta as a param to quantify how much of this term to add to the total cost function.)  This term is needed because otherwise all outputs could go to the desired mean target, without clear decisions. However, in changing the landscape, it looks like this term also creates some obstacles that prodcue local minima. Specifically, if a final output is to *smoothly* go from decision "0" to decision "1", it *must* pass through ambiguous decision=0.5; thus the term that penalizes thus ambiguous decisions puts an obstacle to moving along that path, and can create local minima.  FluxSense() does solve a problem, and that is the problem it solves: it specifically places a trajectory at decision=0.5, and therefore drives a non-zero gradient (abolishes the local minimum) even if the original data points tend to stay away from that informative location.

Can we do without the beta term? Not really, not without completely changing the nature of the outputs, which currently are single-trial decisions. If we were propagating the probability of a decision, then maybe having the single-trial output be a target like 0.75 might be fine, instead of what we currently do, which is to want single trial to randomly end at output=1 75% of the time, output=0 the other 25%.

But are there simpler strategies than fluxSense()? For example, divide the training into two phases, first  train with beta=0; and then start training again, from the resulting endpoint, but now with beta>0? Not obviously: we tried a simple MGO with beta=0, and it turns out that in certain regimes, trainign fails to even get to target=0.75, never mind whether the decisions are ambiguous.

> As detailed in a cell below, the issue appeared to be that at beta=0, when initial mean(hits) < 0.5, we go to a limiting case where final mean(hits)=0.5, probably because the fastest way to improve mean(hits) is to drive everyone into a single fixed point and reach 0.5. When initial mean(hits)>0.5, we always get to the target mean(hits)=0.75

So, we need beta>0.  This is nto a big general problem, its a specific problem for us (so can't be trumpeted as I wanted to :). That brings us to the current state, namely, asking whether with beta>0 we can find a clear example where training gets stuck, and where fluxSense() would resolve it.  If the answer is yes, will try to incorporate fluxSense() into bbox_Hessian_minimization().


In [1]:
macro javascript_str(s) display("text/javascript", s); end

javascript"""
$.getScript('https://sites.google.com/site/brodylabhome/files/make_table_of_contents.js')
"""

In [2]:

using PyCall
using PyPlot
using ForwardDiff
using DiffBase
using MAT

pygui(true)

import Base.convert
convert(::Type{Float64}, x::ForwardDiff.Dual) = Float64(x.value)
function convert(::Array{Float64}, x::Array{ForwardDiff.Dual}) 
    y = zeros(size(x)); 
    for i in 1:prod(size(x)) 
        y[i] = convert(Float64, x[i]) 
    end
    return y
end

include("general_utils.jl")
include("hessian_utils.jl")

"""
We define functions to convert Duals, the variable types used by ForwardDiff, 
to Floats. This is useful if we want to print out the value of a variable 
(since print doesn't know how to Duals). Note that after being converted to a Float, no
differentiation by ForwardDiff can happen!  e.g. after
    x = convert(Float64, y)
ForwardDiff can still differentiate y, but it can't differentiate x
"""



"We define functions to convert Duals, the variable types used by ForwardDiff, \nto Floats. This is useful if we want to print out the value of a variable \n(since print doesn't know how to Duals). Note that after being converted to a Float, no\ndifferentiation by ForwardDiff can happen!  e.g. after\n    x = convert(Float64, y)\nForwardDiff can still differentiate y, but it can't differentiate x\n"

# Setup -- definitions of forwardModel() and backwardsModel()

In [3]:
"""
o = g(z)    squashing tanh function, running from 0 to 1, is equal to 0.5 when input is 0.
"""
function g(z)
    return 0.5*tanh.(z)+0.5
end
    
"""
z = g^-1(o)    inverse of squashing tanh function, input must be in (0, 1), output is zero when passed 0.5.
"""
function ginverse(z)
    return 0.5*log.(z./(1-z))
end


"""
forwardModel(startU; dt=0.01, tau=0.1, nsteps=100, input=[0.1, 0], noise=[], W=[0 -5;-5 0], 
init_add=0, start_add=0, const_add=0, sigma=0, gleak=1, U_rest=0, 
    do_plot=false, nderivs=0, difforder=0, clearfig=true, fignum=1, dUdt_mag_only=false)

Runs a tanh() style-network forwards in time, given its starting point, using simple Euler integration
    tau dU/dt = -U + W*V + I
    V = 0.5*tanh(U)+ 0.5

**PARAMETERS:**

startU     A column vector, nunits-by-1, indicating the values of U at time zero


**OPTIONAL PARAMETERS**

dt      Scalar, timestep size

tau     Scalar, in seconds

gleak   
        dUdt will have a term equal to gleak*(U_rest - U)
U_rest

nsteps  Number of timesteps to run, including time=0.

input   Either an nunits-by-1 vector, in which case inputs to each unit are constant
        across time, or a matrix, nunits-by-nsteps, indicating input for each unit at each timepoint.

W       Weight matrix, nunits-by-nunits

init_add    DEPRECATED: Vector or scalar that gets added to the input current at very first timestep.
            Deprecated because this made it dt-dependent. Replaced by start_add.

start_add   Vector or scalar that gets added, once, to the initial U[:,1], before the integration process begins.

const_add   Scalar that gets added to U after every timestep

sigma       After each timestep, add sigma*sqrt(dt)*randn() to each element of U

do_plot   Default false, if true, plots V of up to the first two dimensions

fignum     Figure number on which to plot

clrearfig  If true, the figure is first cleared, otherwise any plot ois overlaid

nderivs, difforder     Required for making sure function can create its own arrays and 
                       still be differentiated

dUdt_mag_only  If true, returns |dUdt|^2 from the first timestep only, then stops.

** RETURNS:**

Uend Vend       nunits-by-1 vectors representing the final values of U and V that were found.
U, V            nunits-by-nsteps matrices containing the full trajectories

"""
function forwardModel(startU; dt=0.01, tau=0.1, nsteps=100, input=[], noise=[], W=[0 -5;-5 0], 
    init_add=0, start_add=0, const_add=0, do_plot=false, nderivs=0, difforder=0, clearfig=true, fignum=1,
    dUdt_mag_only=false, sigma=0, g_leak=1, U_rest=0, theta=0, beta=1, other_unused_params...)

    my_input = ForwardDiffZeros(size(input,1), size(input,2), nderivs=nderivs, difforder=difforder)
    for i=1:prod(size(input)); my_input[i] = input[i]; end
    input = my_input;
    
    nunits = length(startU)
    if size(startU,2) > size(startU,1)
        error("startU must be a column vector")
    end
    
    # --- formatting input ---
    if ~(typeof(input)<:Array) || prod(size(input))==1  # was a scalar
        input = input[1]*(1+ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder))
    elseif length(input)==0 # was the empty matrix
        input = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    elseif size(input,2)==1     # was a column vector
        input = input*(1+ForwardDiffZeros(1, nsteps, nderivs=nderivs, difforder=difforder))
    end    
    # --- formatting noise ---
    if ~(typeof(noise)<:Array) || prod(size(noise))==1  # was a scalar
        noise = noise*(1+ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder))
    elseif length(noise)==0 # was the empty matrix
        noise = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    elseif size(noise,2)==1     # was a column vector
        noise = noise*(1+ForwardDiffZeros(1, nsteps, nderivs=nderivs, difforder=difforder))
    end    
    
    U = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    V = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    
    if ~(typeof(W)<:Array); W = [W]; end

    W     = reshape(W, nunits, nunits)
    U     = reshape(U, nunits, nsteps)
    V     = reshape(V, nunits, nsteps)
    input = reshape(input, nunits, nsteps)
    noise = reshape(noise, nunits, nsteps)

    input[:,1] += init_add
    input      += const_add

    #@printf("size(U) is (%d,%d), and size(startU) is (%d,%d) and size(noise) is (%d,%d)", 
    #    size(U,1), size(U,2), size(startU,1), size(startU,2), size(noise,1), size(noise,2))
    # @printf("U[1]=%g, noise[1]=%g\n", startU, noise[1])
    U[:,1] = startU + noise[:,1] + start_add; # @printf("Resulting U=%g\n", U[1])
    V[:,1] = g((U[:,1]-theta)/beta); # @printf("Resulting V=%g\n", V[1])
    
    for i=2:nsteps
        dUdt = g_leak*(U_rest -U[:,i-1]) + W*V[:,i-1] + input[:,i-1]
        if dUdt_mag_only; return sum(dUdt.*dUdt); end;
        # @printf("dUdt=%g\n", dUdt[1])
        # @printf("i=%g\n", i)
        # @printf("noise[2]=%g\n", noise[2])
        U[:,i] = U[:,i-1] + (dt/tau)*dUdt + noise[:,i] + sigma*sqrt(dt)*randn(size(U,1),1)
        # @printf("Resulting U[2]=%g\n", U[2])
        V[:,i] = g((U[:,i]-theta)/beta)
        # @printf("Resulting V[2]=%g\n", V[2])
    end

    if do_plot
        figure(fignum)
        if length(startU)==1
            if clearfig; clf(); end;
            t = (0:nsteps-1)*dt
            plot(t, V[1,:], "b-")
            plot(t[1], V[1,1], "g.")
            plot(t[end], V[1,end], "r.")
            xlabel("t"); ylabel("V1"); ylim([-0.01, 1.01])
        elseif length(startU)>=2
            if clearfig; clf(); end;
            plot(V[1,:], V[2,:], "b-")
            plot(V[1,1], V[2,1], "g.")
            plot(V[1,end], V[2,end], "r.")
            xlabel("V1"); ylabel("V2"); 
            xlim([-0.01, 1.01]); ylim([-0.01, 1.01])
        end
    end

    return U[:,end], V[:,end], U, V
end


"""
backwardsModel(endU; dt=0.01, tau=0.1, nsteps=100, input=[0],noise=[],  W=[0 -5;-5 0], 
    do_plot=false, nderivs=0, difforder=0, clearfig=true, fignum=1, tol=1e-15, start_eta=10)

Runs a tanh() style-network BACKWARDS in time, given its ending point, by making a backwards
guess at each timepoint and then using Hessian minimization to find the backwards vector that correctly
leads to the current timestep value.  Uses forwardModel() . The forwards equations are:

    tau dU/dt = -U + W*V + I
    V = 0.5*tanh(U)+ 0.5

**PARAMETERS:**

endU     A column vector, nunits-by-1, indicating the values of U at time=end


**OPTIONAL PARAMETERS:**

dt      Scalar, timestep size

tau     Scalar, in seconds

nsteps  Number of timesteps to run, including time=0.

input   Either an nunits-by-1 vector, in which case inputs to each unit are constant
        across time, or a matrix, nunits-by-nsteps, indicating input for each unit at each timepoint.

W       Weight matrix, nunits-by-nunits

do_plot   Default false, if true, plots V of up to the first two dimensions

tol       Tolerance in the minimization procedure for finding each backwards timestep. Passed on
          to trust_region_Hessian_minimization()

start_eta   Passed on to trust_region_Hessian_minimization()

fignum     Figure number on which to plot

clrearfig  If true, the figure is first cleared, otherwise any plot ois overlaid

nderivs, difforder     Required for making sure function can create its own arrays and 
                       still be differentiated



** RETURNS:**

Ustart Vstart   nunits-by-1 vectors representing the starting values of U and V that were found.
U, V            nunits-by-nsteps matrices containing the full trajectories
costs           1-by-nsteps vector with the final cost from the minimization procedure for each
                timestep. This is the squared difference between the U[t+1] produced by the U[t] 
                guess and the actual U[t+1]

"""
function backwardsModel(endU; nsteps=100, start_eta=10, tol=1e-15, maxiter=400, 
    do_plot=false, init_add=0, start_add=0, dt=0.01, 
    input=[], noise=[], nderivs=0, difforder=0, clearfig=false, fignum=1, params...)    

    nunits = length(endU)

    # --- formatting input ---
    if ~(typeof(input)<:Array) || prod(size(input))==1  # was a scalar
        input = input[1]*(1+ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder))
    elseif length(input)==0 # was the empty matrix
        input = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    elseif size(input,2)==1     # was a column vector
        input = input*(1+ForwardDiffZeros(1, nsteps, nderivs=nderivs, difforder=difforder))
    end    
    # --- formatting noise ---
    if ~(typeof(noise)<:Array)  # was a scalar
        noise = noise*(1+ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder))
    elseif length(noise)==0 # was the empty matrix
        noise = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    elseif size(noise,2)==1     # was a column vector
        noise = noise*(1+ForwardDiffZeros(1, nsteps, nderivs=nderivs, difforder=difforder))
    end    
    
    function J(U1, U2; nderivs=0, difforder=0, noise=[], inputs=[], pars...)
        U2hat = forwardModel(U1; nsteps=2, noise=noise, input=input, nderivs=nderivs, difforder=difforder, pars...)[1]
        U2hat = U2hat
        DU = U2hat - U2
    
        return sum(DU.*DU)
    end
    
    if length(noise)==0
        noise = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    end

    U = ForwardDiffZeros(nunits, nsteps, nderivs=nderivs, difforder=difforder)
    U = reshape(U, nunits, nsteps)
    costs = ForwardDiffZeros(nsteps, 1, nderivs=nderivs, difforder=difforder)    
    
    U[:,end] = endU
    for i=(nsteps-1):-1:1
        if i==1
            my_init_add = init_add
            my_start_add = start_add
        else
            my_init_add = 0
            my_start_add = 0
        end
                
        U[:,i], costs[i] = trust_region_Hessian_minimization(U[:,i+1], 
            (x) -> J(x, U[:,i+1]; nderivs=length(endU), difforder=2, 
            input=input[:,i:i+1], noise = noise[:,i:i+1], 
            init_add=my_init_add, start_add=my_start_add, params...); 
            verbose=false, start_eta=start_eta, tol=tol, maxiter=maxiter)
        if i>1; U[:,i] += noise[:,i]; end
    end
    
    
    V = g(U)
    
    if do_plot
        figure(fignum)   
        if typeof(params)<:Array; params = Dict(params); end;
        if haskey(params, :dt);     dt     = params[:dt];     end
        if haskey(params, :nsteps); nsteps = params[:nsteps]; end
        if length(endU)==1
            if clearfig; clf(); end;
            t = (0:nsteps-1)*dt
            plot(t, V[1,:], "m-")
            plot(t[1], V[1,1], "go")
            plot(t[end], V[1,end], "ro")            
            ylim([-0.01, 1.01])
        elseif length(endU)>=2
            if clearfig; clf(); end;            
            plot(V[1,:], V[2,:], "m-")
            plot(V[1,1], V[2,1], "go")
            plot(V[1,end], V[2,end], "ro")
            xlim([-0.01, 1.01]); ylim([-0.01, 1.01])
        end
    end
    
    return U[:,1], V[:,1], U, V, costs
end

backwardsModel

### Testing forward and backwards models with only 1 dimension

In [4]:
figure(1); clf();
params = Dict(:noise => [0.1], :W => [-2], :nsteps=>10, :start_add=>-1.9)
Uend = forwardModel([1.1]; do_plot=true, params...)[1]
Ustart = backwardsModel(Uend; do_plot=true, tol=1e-30, params...)[1]
@printf("Ustart came back as %g\n", Ustart[1])

Ustart came back as 1.1


### Testing forward and backwards models now with 2 dimensions

In [5]:
nsteps=50
params = Dict(:noise =>0.03*randn(2,nsteps) + [0.1,0]*ones(1,nsteps), :W => [0 -5; -5 0], :nsteps=>nsteps)

Uend, Vend, U, V              = forwardModel([0.1,0.1]; do_plot=true, params...);
Ustart, Vstart, bU, bV, costs = backwardsModel(Uend; do_plot=true, tol=1e-30, params...)

@printf("Ustart came back as : "); print_vector_g(Ustart); print("\n")

Ustart came back as : [0.1, 0.1]


# TO-DOs

1. ~~Be able to use W as an optimizable parameter (including configs like "all horizontal weights are the same")~~ DONE!
2. ~~Check out what is going on with the weird trajectories in the function-based MGO example~~  DONE: it's just the strong, single-timestep initial_add
3. ~~Check out whether reducing beta solves the sticking issue even without extra finalFluxPoint locations~~. It does. Reducing beta from 0.01 to 0.003 was enough.  (We also needed dto change the cost_limit to -0.00288, since the range of costs changes when beta changes.)
3. Find the saddle points and use those as the finalFluxPoint locations
4. ===
5. ~~Run a ProAnti model with noise only in initial conditions, and thus with the framework as we have it~~ (skipped, went straight to next step)
6. ~~Make a cost function with frozen noise, and figure out how frozen noise will interact with the backwards trajectory in the minimizations~~
7. ~~Make a forwards and backwards model with Urest, etc., just like in ProAnti()~~
6. ===
7. ~~Make sure that minimization procedures that use tanh() walls report the model parameter, not the control parameter~~ DONE
7. ~~Figure out what is going on with the change in gradient and Hessian upon change of dt~~ DONE: it was just the init step
7. Clean up examples of forward and backwards models and of 1-d use of fluxSense() function
8. Fine a 2-d example where flux points are actually needed -- when beta=0, it is not so clear.
9. ~~Fix the walls issue in bbox_Hessian_minimization using tanh encoding.~~
8. Measure gradient sensitivity to each of the endpoints in a set of trajectories, as a measure of whether fluxSense is needed or not.
9. Optimize either an MGO or a ProAnti
10. If fluxSense is needed in ProAnti, could try choosing the Anti unit endpoint values by maximizing the |dJ/dw|^2 over those values.
11. Clean up the notebooks and write up what we've been doing!
12. Try to combinee fluxSense with bbox_Hessian_minimization

# At beta=0, when initial mean(hits) < 0.5, we go to a limiting case where final mean(hits)=0.5, probably because the fastest way to improve mean(hits) is to drive everyone into a single fixed point and reach 0.5.  When initial mean(hits)>0.5, we always get to the target mean(hits)=0.75

In [53]:
# The following sequence leads to a situation where having only [-0.8, -0.8] as the single finalFluxPoint 
# leads to the minimization getting stuck.  Adding further finalFluxPoints solves the problem

dt = 0.02
t = 0:dt:1
tau = 0.1
nsteps = length(t)
t = t[1:nsteps]

W = -4
noise = 0
input = 0
sigma = 0


model_params = Dict(:dt=>dt, :tau=>tau, :W=>[0 W; W 0], :nsteps=>nsteps, 
:noise=>noise, :input=>input, :sigma=>sigma, :const_add=>0, :init_add=>0)


# WORKING gradient:
# ForwardDiff.gradient((x)->JJ(startU; do_plot=true, nderivs=length(x), difforder=1, 
#    make_dict([["init_add" 2], "const_add"], x, model_params)...), [2.9, -2.9, 0.1])



# The backward and costfunc functions should turn a single-scalar parameter W into the matrix W
# backward always runs with no within-forward noise, i.e., sigma=0
backward = (endpoint; do_plot=false, pars...) -> begin
    pars = Dict(pars)
    if haskey(pars, :W); 
        W=pars[:W];   # mess with it only if it is not already a matrix:
        if length(W)==1; pars=make_dict(["W"], [[0 W;W 0]], pars); end;
    end;     
    backwardsModel(endpoint; do_plot=do_plot, make_dict(["sigma"], [0], pars)...)[1]
end



costfunc = (startpoints; do_plot=false, verbose=false, beta=0.003, nderivs=0, difforder=0, sr=26, pars...) -> begin
    pars = Dict(pars)
    if haskey(pars, :W); 
        W=pars[:W];   # mess with it only if it is not already a matrix:
        if length(W)==1; pars=make_dict(["W"], [[0 W;W 0]], pars); end;
    end;         
    JJ(startpoints; seedrand=sr, beta=beta, 
        do_plot=do_plot, verbose=verbose, nderivs=nderivs, difforder=difforder, pars...)
end

function JJ(initUs; theta1=0.15, theta2=0.2, beta=0.003, verbose=false, nderivs=0, difforder=0, 
    do_plot=false, pre_string="", zero_last_sigmas=0, seedrand=NaN, params...)

    if ~isnan(seedrand); srand(seedrand); end
    
    Vend = ForwardDiffZeros(size(initUs,1), size(initUs,2), nderivs=nderivs, difforder=difforder)

    if do_plot; clf(); end;
    
    for i=1:size(initUs,1)
        if false # i>size(initUs,1) - zero_last_sigmas
            Ue, Ve, U, V = forwardModel(initUs[i,:]; sigma=0, nderivs=nderivs, difforder=difforder, 
                do_plot=do_plot, clearfig=false, params...)            
        else
            Ue, Ve, U, V = forwardModel(initUs[i,:]; nderivs=nderivs, difforder=difforder, 
                do_plot=do_plot, clearfig=false, params...)
        end
        Vend[i,:] = Ve
    end
    
    hits = 0.5*(1 + tanh.((Vend[:,1]-Vend[:,2])/theta1))
    diffs = tanh.((Vend[:,1]-Vend[:,2])/theta2).^2
    
    cost1 = (mean(hits) - 0.75).^2 
    cost2 = -beta*mean(diffs)

    if do_plot
        title(@sprintf("mean(hits)=%g, mean(diffs)=%g", convert(Float64, mean(hits)), convert(Float64, mean(diffs))))
    end
    
    if verbose
        @printf("%s", pre_string)
        @printf("-- cost=%g,   cost1=%g, cost2=%g :  mean(hits)=%g, mean(diffs)=%g\n", 
            convert(Float64, cost1+cost2), convert(Float64, cost1), convert(Float64, cost2),
            convert(Float64, mean(hits)), convert(Float64, mean(diffs)))
    end
    
    return cost1 + cost2, mean(hits), mean(diffs)
end
  


fluxFinalPoint = [-0.8 -0.8; -0.6 -0.6 ; -0.4 -0.4; -0.2 -0.2; 0 0; 0.2 0.2]
fluxFinalPoint = zeros(0,2);


beta = 0.05
args = [["start_add" 2], "const_add", "W", "sigma"]
seed = [0.1, 0.1, 2.1, -1, 0.1]
walls = Dict(:start_add=>[-5.1, 5.1], :W=>[-5.1, 5.1], :sigma=>[-0.5, 0.5])
# sr = 1504432803 causes a total mess with everything around the decision boundary; 
# sr = 1504432962 gets stuck at mean(hits)=0.66 but if we reduce the bounds of sigma, reaches 0.74


beta = 0.05
args = [["start_add" 2], "const_add", "W"] 
seed = [0.1, 0.1, 2.1, -1] 
walls = Dict(:start_add=>[-5.1, 5.1], :W=>[-5.1, 5.1]) # 
# sr = 1504433339 gives a mean(hits)=0.5 mess
# sr = 1504433467 also fails with mean(hits)=0.5
# sr = 1504433544 fails with mean(hits)=0.80, and if beta=0, causes a zero-eval hess, and beta=0.00001 makes for mean(hits)=0.75 success


beta = 0.003 # 00001
args = [["start_add" 2], "const_add", "W"] 
seed = [0.1, 0.1, 2.1, -1] 
# This seed proves deadly, and almost always starts with mean(hits)<0.5:  seed = [-0.1, 0.1, 2.1, -1] 
# even raising beta to 0.3 doesn't solve the problem with that seed
# If we use the regular balanced seed = [0.1, 0.1, 2.1, -1]  then beta=0.003 is very helpful
walls = Dict(:start_add=>[-5.1, 5.1], :W=>[-5.1, 5.1]) # 
# sr = 1504577892 gives a mean(hits)=0.5 mess
# sr = 1504577918 gives a mean(hits)=0.5 mess
# sr = 1504578029 gives a mean(hits)=0.5 mess
# sr = 1504578098 gives a mean(hits)=0.5 mess


new_random_seed = true; if new_random_seed
    sr = convert(Int64, round(time()))
else
    sr = old_sr
end
# sr = 1504433892
old_sr = sr

srand(sr)

# THIS IS THE GOOD ONE FOR ALL THE COMMENTS ON sr NUMBERS ABOVE: startU=randn(50,2)-3
startU=randn(20,2)-3


clf()
print("seed = "); print_vector_g(seed); print("\n")
ocost, omhits, omdiffs = costfunc(startU; do_plot=true, sr=sr, verbose=true, make_dict(args, seed, model_params)...)


# :sigma=>[-0.3, 0.3] does fine but :sigma=>[-0.2, 0.2] gets stuck.
# If we fix sigma at 0 it also gets stuck, but dynamics kind of odd, W a bit to big, or decrease dt
params, traj = bbox_Hessian_keyword_minimization(seed, args, walls, # , :sigma=>[-0.2, 0.2]), 
(;params...) -> costfunc(startU; beta=beta, sr=sr, do_plot=false, verbose=true, merge(model_params, Dict(params))...)[1], 
verbose=true, start_eta=1, tol=1e-12, hardbox=true, maxiter=200 )

# params, cost, ptraj, gtraj = fluxSense(costfunc, backward, model_params, startU, fluxFinalPoint, args, seed; 
#    start_eta=0.01, tol=1e-15, maxiter=400, verbose=true, report_every=1, do_plot=false, cost_limit=cost_limit) # cost_limit=-0.000935) # for beta=0.01

# And show the final position
figure(1); clf()
cost, mhits, mdiffs = 
    costfunc(startU; beta=beta, do_plot=true, sr=sr, verbose=true, make_dict(args, params, model_params)...)

repeat_results_in_fig2 = false; if repeat_results_in_fig2
    figure(2); clf()
    costfunc(startU; beta=beta, do_plot=true, sr=sr, verbose=true, 
        make_dict(args, params, merge(Dict(:fignum=>2), model_params))...)
    figure(1); 
end
params'

# For beta=0, and ntrials=20, we collected a bunch of results and observed that it failes about half the time
# WHEN the initial mean)hits) is below 0.5.  It never fails if the initial mean(hits) is above 0.5. 
# Seems like when it starts below 0.5, the fastest way to increase mean(hits) is to push it to 0.5 and floor it there.
# The results were collected in "Results.mat"
res = [res ; omhits mhits]

[omhits mhits]

seed = [0.1, 0.1, 2.1, -1]




-- cost=0.0625021,   cost1=0.0625021, cost2=-8.09569e-11 :  mean(hits)=0.499996, mean(diffs)=2.69856e-08
0: eta=1 ps=[0.100, 0.100, 2.100, -1.000]
-- cost=0.0625021,   cost1=0.0625021, cost2=-8.09569e-11 :  mean(hits)=0.499996, mean(diffs)=2.69856e-08
-- cost=0.0591229,   cost1=0.0591239, cost2=-9.84305e-07 :  mean(hits)=0.506846, mean(diffs)=0.000328102
1: eta=1.1 cost=0.0591229 jtype=constrained costheta=-0.619 ps=[0.655246, -0.454896, 1.74443, -1.47202]
-- cost=0.0613303,   cost1=0.0613304, cost2=-1.17822e-07 :  mean(hits)=0.50235, mean(diffs)=3.92741e-05
eta going down: new_cost-cost=0.00220733 and jumptype='Newton'
2: eta=0.55 cost=0.0591229 jtype=Newton costheta=NaN ps=[0.655246, -0.454896, 1.74443, -1.47202]
-- cost=0.0613303,   cost1=0.0613304, cost2=-1.17822e-07 :  mean(hits)=0.50235, mean(diffs)=3.92741e-05
eta going down: new_cost-cost=0.00220733 and jumptype='Newton'
3: eta=0.275 cost=0.0591229 jtype=Newton costheta=NaN ps=[0.655246, -0.454896, 1.74443, -1.47202]
-- cost=0.

1×2 Array{Float64,2}:
 0.499996  0.749999

In [41]:
old_sr

1504578552

In [8]:
res = zeros(0,2)

0×2 Array{Float64,2}

In [34]:
figure(2);
clf();
plot(res[:,1], res[:,2], ".");
ylim(0, 1)
title(@sprintf("%d runs", size(res,1)))

I = sortperm(res[:,1])
res[I,:]

16×2 Array{Float64,2}:
 0.499955  0.5     
 0.49996   0.500001
 0.499981  0.5     
 0.499985  0.749999
 0.499993  0.749999
 0.499999  0.5     
 0.5       0.749999
 0.500001  0.75    
 0.500001  0.749998
 0.500001  0.749999
 0.500008  0.75    
 0.50001   0.75    
 0.500015  0.749999
 0.500016  0.750001
 0.50002   0.75    
 0.500035  0.75    

In [37]:
gu = ["mean(hits) at start" "mean(hits) at end" ; res[sortperm(res[:,1]),:]]
matwrite("Results.mat", Dict("results"=>gu))

In [None]:
a = matread("Results.mat")


In [None]:
dt = 0.001
t = 0:dt:1
tau = 0.1
nsteps = length(t)
t = t[1:nsteps]

model_params = Dict(:dt=>dt, :tau=>tau, :W=>[0 W; W 0], :nsteps=>nsteps, 
:noise=>noise, :input=>input, :sigma=>sigma, :const_add=>0, :init_add=>0)

figure(2); clf()
costfunc(startU; beta=beta, sr=convert(Int64, round(time())), do_plot=true, verbose=true, 
    make_dict(args, params, merge(Dict(:fignum=>2), model_params))...)
params'

In [None]:
dt = 0.01
t = 0:dt:1
tau = 0.1
nsteps = length(t)
t = t[1:nsteps]

model_params = Dict(:dt=>dt, :tau=>tau, :W=>[0 W; W 0], :nsteps=>nsteps, 
:noise=>noise, :input=>input, :sigma=>sigma, :const_add=>0, :init_add=>0)
