# cranmer/play

Switch branches/tags
Nothing to show
Latest commit 3f5f5c0 Mar 7, 2014
 .. Failed to load latest commit information. GaussianInformationGeometryEmbedding.ipynb Mar 5, 2014 GaussianInformationGeometryEmbedding.py Mar 7, 2014 README.md Mar 7, 2014 gaussianInfoGeom.pdf Mar 7, 2014 gaussianInfoGeom.png Mar 7, 2014 interpEmbedding.pdf Mar 7, 2014 interpEmbedding.png Mar 7, 2014 interpEmbedding.py Mar 7, 2014 learnEmbedding.pdf Mar 7, 2014 learnEmbedding.png Mar 7, 2014 learnEmbedding.py Mar 7, 2014 linearRegression.py Mar 7, 2014 unicodeInterpEmbedding.py Mar 7, 2014

# Manifold Learning and Information Geometry

Kyle Cranmer, BSD Liceense

I'm working on a paper about Informtion Geometry.
The first step was an iPython notebook on Visualizing information geometry with multidimensional scaling.

The other python files are me working my way to something more ambitious.

Step 1: The first step is described nicely in this iPython notebook on Visualizing information geometry with multidimensional scaling. Final result shown here. Important part is a grid of points in (μ,σ) being mapped by f:(μ,σ)→ℝ³.

Step 2: Try various scikit-learn algorithms, setteling on NuSVR. Results not so good, smaller red spots not approximating the larger colorful training samples, particuarly near the edges. The lower-right is what I'm ultimately after, the inverse map of back into the space (μ,σ) -- for these points, that should be a perfect grid. Surprised I can't do regression on

Step 3: Since this is a low dimensional problem with little noise, just try SciPy Interpolation algorithms. Ah, that's better, now I'm getting the grid I want.

Step 4: To do: make a regular grid in the target space and project it via the inverse into the (μ,σ) plane. This is a "smart" sampling from an information theoretic perspective. It's also a good choice if simulation of the statistical model is expensive (ie. full simulation with GEANT at the LHC, which takes about 20 min per event, and the real collisons are happening at 40 million/sec. )

Step 5: To do: repeat exercise with MSSM SUSY grids published on HepData. Change from Gaussian to Poisson, where I can still solve for the information distance analytically. Repeat steps 3 & 4 for that case, evaluate embedding from information persepctive, and propose a better choice for the parameter scan "grids" used by the ATLAS collaboration.

Step 6: To do: move to a non-number counting model, generate graph of KL distances, use shortest path on this graph (via Dijkstra's algorithm) to approximate the geodesic from the fisher information metric -- perhaps using boost libraries or the code from FINE, amazingly the code is available -- nicely done Kevin Carter!