Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multihypo docs page, better intro? #674

Open
dehann opened this issue Jan 8, 2021 · 7 comments
Open

Multihypo docs page, better intro? #674

dehann opened this issue Jan 8, 2021 · 7 comments
Assignees
Milestone

Comments

@dehann
Copy link
Member

dehann commented Jan 8, 2021

"""
I have read the data association page and I think it makes sense. One of the questions I had was on scope/approach - depending on where you want to go with it, it may make sense to first introduce the problem in general terms before describing how it is tackled within Caesar.
"""

@dehann dehann added this to the v0.0.x milestone Jan 8, 2021
@dehann
Copy link
Member Author

dehann commented Feb 7, 2021

Separately, nullhypo= answer to question, which will be on similar docs page. Just dropping it in here as best ergonomic issue thread for now.

In case of a nullhypo= being set,

  • We dont want to change the structure of the factor graph and eliminated Bayes net --> tree, so that makes nullhypo= weird,
    • If =0.1 then 10% of time that "factor should not be there"...?
  • So we use a next best alternative to achieve very nearly the same thing as nullhypo=,
    • nullhypo= in a stochastic sense means maximum entropy, or said differently -- the factor does not reduce entropy in variable estimate under nullhypothesis.
      • i.e. the variable could be anywhere else, totally unrestricted by this factor (with say 10% likelihood).
    • So one solution is that nullhypo= can 'increase' entropy at the sampling proposal stage, but must then reject any samples that add 'too much' entropy,
      • Easiest is a rejection sampler when nullhypo is occurring.
        • the proposal product step then rejects the 'high entropy' samples at the required rate.
      • so in MM-iSAMv2 there is no 'rejection sampler', it happens implicitly during the clique Gibbs sampling process. Sounds bit like magic, but once you get used to how the system works it is very natural.

What does that mean... When a factor is used as conditional likelihood for calculating proposal distributions, when nullhypo is active we add a large amount of entropy to the given sample (must stay on-manifold). This gain in entropy is then 'rejected' during the product between proposals steps from other factors on the variable. The multiscale Gibbs operation selects the most relevant kernels from incoming proposal densities, and it is very likely that the high entropy kernels will be rejected.

Said another way, if a variable were not restricted (say one connected factor at nullhypo=0.5), then there needs to be enough entropy that 50% of the time that variable can be anywhere on the domain. However that is not computationally possible either. So we do the next best thing, we adaptively add a bunch of entropy based on current numerical values (50% of the time), which spreads the likely location of the variable out over much larger area in the domain. If more factors pull the variable elsewhere, then the adaptive entropy will still spread around the current estimate. So the big boost in entropy is still somewhat local, but infinite support is impossible with current understanding of sampling and Bayes tree techniques. Practical testing, and tweaking of how much noise to add (i.e. params.spreadNH) allows pretty powerful "best alternative" solution to a pretty hard problem. It is likely that in most cases a user would not know the difference between puritan max entropy nullhypo= and our current rejection sampling approach.

It is possible that the current IIF code is not always adding the noise (entropy) right, or the spreadNH mechanics can be improved, but I'm pretty confident about this solution as a good medium term stand in solution (barring any bugs of course).

cc @lemauee, @GearsAD

@dehann dehann self-assigned this Feb 7, 2021
@lemauee
Copy link

lemauee commented Feb 8, 2021

Thanks @dehann for the writeup, thats pretty much what I already figured out by performing convolutions using factors with the nullhypo set.

However, add entropy is still somehow a spongy term for me. What distribution (uniform? broad gaussian?) underlies this process? params.spreadNH should set some parameter of that distribution, (connected to distribution width for uniform, to standard deviation for gaussian?). What parameter is this exactly?

Best,
Leo

@lemauee
Copy link

lemauee commented Feb 16, 2021

Hi @dehann ,

I came across having to describe the mechanics of the "adding entropy" for my thesis again (see previous post). Could you elaborate that term a little or point me to the implementation? Grasping the mathematical model behind this would be really important for me :)

Best,
Leo

@Affie
Copy link
Member

Affie commented Feb 16, 2021

Hi Leo, I can point you to the code where it is added: https://github.com/JuliaRobotics/IncrementalInference.jl/blob/4ac8ccda4f27d8834950ad6a1913851031932add/src/ApproxConv.jl#L228-L239
Sorry, I can't help you more than that. My understanding is it only spreads that is currently there. So don't think its any distribution.

@lemauee
Copy link

lemauee commented Feb 17, 2021

Thanks for the hint, if maniAddOps does not do anything strange, by using rand() we should end up with a uniform distribution of width spreadDist?:

using Gadfly
spreadDist = 1;
n = 1000;
x = Vector{Float64}(undef,n)
for k in 1:length(x)
    x[k] = spreadDist*(rand()-0.5)
end
pl = plot(x=x, Geom.histogram(bincount=10))

nullhypo_spread

Next question would be if spreadDist is directly equal to the spreadNH solver parameter. From performing an example convolution it seems like this is the case:

julia> fg = initfg()
julia> getSolverParams(fg).spreadNH
3.0
using Caesar, RoMEPlotting, LinearAlgebra, Statistics, Gadfly

N = 100
fg = initfg()

addVariable!(fg, :l0, Point2, N=N)

trans = MvNormal([1.0,0.0], Diagonal([0.01;0.01]))
addVariable!(fg, :l1, Point2, N=N)
l1Prior = PriorPoint2(MvNormal([1,0], Diagonal([0.01;0.01])))
addFactor!(fg,[:l1], l1Prior)

f1 = Point2Point2(trans)

addFactor!(fg, [:l0, :l1], f1, nullhypo=0.5)

pts = approxConv(fg, :l0l1f1, :l0)
p = kde!(pts)
pll0 = plotKDE(p)

ensureAllInitialized!(fg)

pl = plotSLAM2DLandmarks(fg)

nullhypo_convolution_example

The particles of l1 get spread by the convolution in a circle with diameter 3.0 (+the prior noise on l1 and the factors noise) around l0s position implied by the factor in a uniform way.

Best,
Leo

@dehann
Copy link
Member Author

dehann commented Feb 19, 2021

Hi @lemauee ,

So the most important aspect of nullhypo is that for some fraction of the particles calculated on a variable proposal (via samples as proxy) on say p(X | Y=Y) (i.e. evaluating the conditional p(X|Y) for given values Y), that for some of the samples it is as if p(X|Y) never even existed. So if you say 10% of samples are "nullhypo" (and remember given the Bayes tree approach we dont want to change the factor graph by deleting factors) we need to do the next best thing ... how to make as if the factor was never there?

Here in IIF we inject noise onto 10% of existing samples in X, and then solve the remaining 90% according to the structure in p(X|Y). It is important that the 10% follows from the existing values in X, since the factor should have NO affect on those 10% samples. However we cannot just use the values in X because that will break the Markov assumption in sampling that each step in the sampling chain should only be correlated to one previous step, and not correlate back multiple stages in the sampling. The practical effect of this is that say true value x* is around say 10, but the conditional p(X|Y=Y) is trying to force X toward say -100, then there needs to be some way (during this iteration of the Gibbs sampling) to relieve that "strain". By saying 10% nullhypo, it allows some chance that a few samples will be much closer to x*. As the Gibbs process continues, samples nearer x* will be selected and all the other 99% of samples predicted by p(X|Y=Y) and nullhypo will be rejected at that stage. Therefore the posterior on X might be a little biased due to our workaround approach, but that is the difference between say 10 to 11, vs 10 to -100. The "win" in this compromise is that the structure of the Bayes tree does not have to be updated.

What spreadNH does is magnify the existing spread of pre-X values by adding random values that much bigger than the current spread (makes us add noise relative to the scale being used). If spreadNH is too big, then the bias on posterior will get a bit worse. If spreadNH is too small, then large discrepancies between true value and proposals from p(X|Y) will not allowed to alleviate the "strain" as though a nullhypo. spreadNH=3 works well from trail and error in SLAM, as a good tradeoff. The idea is to add this "noise" to pre-X samples on-manifold. So we have to be careful how the exponential map is defined. Before AMP 41, RoME 244, we have to use maniOpsAdd as callback -- we are busy changing all that to just use Manifolds.jl as good standard.

On rand vs randn, it does not really matter, since this is a rejection sampling technique that is still localized upto spreadNH around current values. rand is a little better in that there is no concentration within spreadNH region -- but either should work fine. rand should have just a little less bias in the final posterior result.

Hope that helps, we still need to write all this up in our publications :-)

Best,
Dehann

@lemauee
Copy link

lemauee commented Feb 19, 2021

Hi @dehann ,

Thanks for the insight, getting the idea of working on the existing values of X was really important to understand the behavior. In my little example that could not be distinguished from my (wrong) assumption. So to get a belief for a certain contitional before the gibbs sampling step in case there is a nullhypo on that conditional, the prior kernels from the variable that is being sampled to, "pre-X" are spread for the nullhypo amount, and not the kernels originating from the factor "X|Y"?

Whats still a statement that needs clarification for me from your post is "makes us add noise relative to the scale being used". How does the relative spreadNH translate an absolute spread? Or did I missunderstand something there?

I experienced first hand what setting spreadNH too big does, the all-famous 150 pose example with 10% outliers added, and nullhypo set to 0.1 looks like this when spreadNH is set to the standard 3.0:
Bildschirmfoto von 2021-02-19 08-38-01
With spreadNH set to 16 (the 16 coming from my wrong assumption of how the particles must be spread to reach every corner of my example) particles drift away several magnitudes. The solver even crashed due to some infinite Number error :
Bildschirmfoto von 2021-02-19 08-37-20

But with speadNH=3.0 the example looks wrong after the whole round:
Bildschirmfoto von 2021-02-19 08-50-30
GTSAM with a robust cost function (that needed some manual tuning, I admit, thought that maybe wasn't necessary in Caesar ;) ) solves it fine:
Bildschirmfoto von 2021-02-19 08-51-50
Is there at least some advice when to set spreadNH to lower and when to higher values, and by how much the standard 3.0 should be changed? Blind trial and error with Caesar can become a little time-consuming atm :) I suppose for my example setting spreadNH a little higher would make sense due to that one can see a wrong "strain" (wrong loopclosure) acting.

If its of any help I can port the currently "read-from-matfile" example to a "1-script-julia-version". I don't want to see Caesar loose against paramertric solving in such a simple scenario ;)

Best,
Leo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants