In [None]:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
%matplotlib inline

# In progress #

This notebook is still under construction.


## Background ##

In 1801 a young Gauss became an overnight sensation when he rediscovered the minor planet [Ceres](https://en.wikipedia.org/wiki/Ceres_(dwarf_planet)), which had been "lost" by its discoverer (Piazzi) earlier that year.  You can read more of the history and his methodology [here](https://math.berkeley.edu/~mgu/MA221/Ceres_Presentation.pdf) and [here](http://sites.math.rutgers.edu/~cherlin/History/Papers1999/weiss.html).  In brief he was clever, he calculated by hand (for over 100 hours), and he invented the method of least squares.  We're not as clever and we'll use a computer, but we'll follow his lead and use the method of least squares too.  Hopefully this will take less than 100 hours!

### Model ###

We integrate the (very simple) ODEs for motion of a test particle under a central force.  We use "odeint" to evolve the particle starting from x,v then "observe" it from the Earth -- which we simplify as a point moving in the x-y plane at radius 1 with 1 revolution per unit time (i.e. our length units are AU and our time units years).  One could certainly imagine a more complex set-up where we included a proper Earth motion and even perturbations from the other (major) planets.  The ODE isn't very hard to set up.

If the Earth were on a circular orbit, then $v^2=GM_\odot/r$.  For a period of 1 unit and a radius of 1 unit this would imply $v=2\pi$ and hence $GM_\odot=(2\pi)^2$.

In [None]:
# Set up code which will integrate the orbit given a initial conditions and
# "observe" our object from the Earth.
# We'll idealize this problem dramatically, putting the Earth on a circular
# orbit in the x-y plane and ignoring the influence of everything but the Sun.
def derivs(y,t):
    """The derivatives for the ODE integration (see below)."""
    kSol   = 4*np.pi**2                # GM in Msun, AU, yr units. and astronomical units.
    r2     = np.sum(y[:3]**2)          # Squared radius.
    nhat   = y[:3]/(np.sqrt(r2)+1e-20) # Avoid divide by zero.
    dy     = np.empty_like(y)
    # To avoid round-off error and avoid Ceres-Sun scattering in case our
    # orbit gets too close to the origin, we add a small positive constant
    # to our squared radius.
    # The derivatives are just dr/dt=v and dv/dt=-GM/r^2 nhat:
    dy[:3] = y[3:]
    dy[3:] = -kSol/(r2+1e-4) * nhat
    return(dy)
    #
def predict_angles(r,v,tobs):
    """Given initial positions and velocities (r,v), predict the angular positions
       as viewed from the Earth at observation times tobs."""
    # We pack the position and velocity into a 6-vector, y, and integrate the EOM.
    y0  = np.append(r,v)
    res = odeint(derivs,y0,tobs)
    # Now work out the Earth's position and subtract it from each observation
    # to get a relative vector.
    omegat = 2*np.pi*tobs
    earth  = np.vstack( (np.cos(omegat),np.sin(omegat),0*omegat) ).T
    rel    = res[:,:3] - earth
    # and from the relative vector work out theta and phi.
    rr     = np.sqrt( np.sum( res[:,:3]**2, axis=1 ) )
    thta   = np.arccos( res[:,2]/(rr+1e-30))
    phi    = np.arctan2(res[:,1],res[:,0])
    # We return the "true" path of this orbit as well, in case we want
    # to look at it in 3D or something later.
    return( (thta,phi,res[:,:3]) )

In [None]:
# Generate a data vector by simulating an orbit, observing it and then adding noise.
r    = np.array([1.0,5.0,5.0])
vcir = (2*np.pi)/(np.sum(r**2))**0.25  # Velocity for a circular orbit: Sqrt[GM/r].
v    = vcir * np.random.normal(size=3) # Small enough for a bound orbit.
v    = vcir * np.array([0.5,0.5,0.5])  # Hold this fixed for now...
tobs = np.linspace(0.,15.,500)         # Must contain IVP, i.e. t=0.
print("Initial conditions: ",r," ",v)
# Generate a true path and add noise.
thta,phi,pth = predict_angles(r,v,tobs)
thta += 0.01 * np.random.normal(size=len(thta))
phi  += 0.01 * np.random.normal(size=len(phi ))

In [None]:
deg    = 180./np.pi
fig,ax = plt.subplots(1,1,figsize=(6,4))
cax    = ax.scatter(phi*deg,(np.pi/2-thta)*deg,c=tobs)
ax.set_xlabel('Azimuth')
ax.set_ylabel('Altitude')
plt.colorbar(cax)

## Orbit determination through minimization ##

To find the orbit of Ceres now becomes a minimization problem, in 6D (initial position and velocity components).  Let us define $\chi^2$ of a 6D vector $(r,v)$.  For simplicity we'll assume $\theta$ and $\phi$ have constant errors of 0.01 radians, since this is what we put in!  Note this isn't super physical, but serves our purposes.

In [None]:
def chi2(pars):
    """Returns the chi^2 for fitting the data given the 6 "parameters" (r,v)."""
    r = pars[:3]
    v = pars[3:]
    t,p,pth = predict_angles(r,v,tobs)
    c2= np.sum( ((t-thta)/0.01)**2 ) + np.sum( ((p-phi)/0.01)**2 )
    return(c2)

# Check chi^2 for some random initial conditions just to make sure.
pars = np.array([10.,10.,10.,0.,0.,0.]) + np.random.normal(size=6)
chi2(np.array([10.,10.,10.,0.,0.,0.]))

## Minimization without gradient information ##

The first issue we face is that the integration is not "differentiable", that is we don't know how to compute the gradient of $\chi^2$ with respect to our 6 parameters (we could talk about auto-differentation here).

* Random flailing around
* Pattern search.
* Powell
* Levenburg-Marquardt

Have students investigate how minimizers approach the right solution.  We can introduce different minimizers and plot sky projections of our orbits, how $\chi^2$ decreases with number of steps etc.

Have students try to write their own code or possibly use canned SciPy routines for minimization.

## Minimization with gradient information ##

To compare this to a situation where we have gradient information let us modify our problem somewhat.  For 6D this will help -- for much higher dimensions it will gain us tremendously.  To search in $N_{par}$ dimensions without gradients requires O($N_{par}^2$) function evaluations.  To search with gradients requires O($N_{par}$).

Suppose Ceres is distant from the Sun, so its orbital time is long.  Further suppose that the observations are taken over a short time period.  We can Taylor series expand the path of Ceres around the central observation:
$$
  \vec{x}(t) \simeq \vec{x}_0 + (t-t_0)\left.\frac{d\vec{x}}{dt}\right|_0
    + \frac{1}{2}(t-t_0)^2 \left.\frac{d^2\vec{x}}{dt^2}\right|_0
    = \vec{x}_0 + (t-t_0)\vec{v}_0 - \frac{k\hat{x}_0}{2r_0^2}(t-t_0)^2
$$
where $r_0^2=|\vec{x}_0|^2$.  For fixed observations $\vec{x}(t_i)$ is a function of $\vec{x}_0$ and $\vec{v}_0$.  The complication now is just the position-to-angle step.  To simplify the algebra so we can focus on the numerics, let's imagine we observe from the Sun, not the Earth (or the orbital radius of Ceres is so large compared to that of the Earth that it doesn't matter).  This assumption could be dropped, but it won't change our strategies, just some algebra along the way.

The obvious path would be to take as our data $\theta_i$ and $\phi_i$ and as our model $\theta(t)$ and $\phi(t)$ then:
$$
  \chi^2 = \sum_i\frac{[\theta_i-\theta(t_i)]^2}{\sigma_i^2} +
           \sum_i\frac{[\phi_i-\phi(t_i)]^2}{\sigma_i^2}
$$
while
\begin{eqnarray}
  \frac{d\chi^2}{d\vec{x}_0} &=& \sum_i \frac{[\theta_i-\theta(t_i)]}{\sigma_i^2}
  \frac{d\theta(t_i)}{d\vec{x}_0} + \cdots \\
  &=& \sum_i \frac{[\theta_i-\theta(t_i)]}{\sigma_i^2}
  \frac{-1}{\sqrt{1-y_i^2}}\frac{dy_i}{d\vec{x}_0} + \cdots \\
  &=& \sum_i \frac{[\theta_i-\theta(t_i)]}{\sigma_i^2}
  \frac{-1}{\sqrt{1-y_i^2}}\left[ \frac{dz/d\vec{x}_0}{r_i}
  - \frac{z(t_i)\,dr/d\vec{x}_0}{r_i^2} \right] + \cdots
\end{eqnarray}
where $y_i=z(t_i)/|\vec{x}(t_i)|$ and we've written $r_i=r(t_i)=|\vec{x}(t_i)|$ as above.  Note the $\vec{x}_0$ dependence enters through the first term, $\vec{x}_0$, but also through the $1/r_0^2$ term multiplying $(t-t_0)^2$.

You can do this, but it is actually fairly awkward.  Perhaps easier is to recast the observations into unit vectors, $\hat{n}_i$.  We can predict $\hat{n}(t)$ simply by dividing $\vec{x}(t)$ by $r(t)$.  Now being close to the observations means maximizing $\hat{n}(t_i)\cdot\hat{n}_i$.  In fact for small angles, $\hat{n}(t_i)\cdot\hat{n}_i\approx 1-(1/2)\theta^2$.  So our log-likelihood could be
$$
  \ln L = \sum_i w_i\ \hat{n}_i\cdot\hat{n}(t_i)
$$
with weights $w_i\propto \sigma_i^{-2}$.  This is simpler to deal with.  For $\vartheta=\{\vec{x}_0,\vec{v}_0\}$ we have
$$
  \frac{\partial\ln L}{\partial\vartheta} = \sum_i w_i\hat{n}_i\cdot
  \left( \frac{1}{r_i}\frac{\partial\vec{x}(t_i)}{\partial\vartheta} -
  \frac{\vec{x}(t_i)}{r_i^3}\vec{x}(t_i)\cdot\frac{\partial\vec{x}(t_i)}{\partial\vartheta}\right)
  = \sum_i\frac{w_i\hat{n}_i}{r_i}\cdot\left( \mathbf{1} - \frac{\vec{x}(t_i)}{r_i^2}\vec{x}(t_i)\cdot\right)\frac{\partial\vec{x}(t_i)}{\partial\vartheta}
$$
which is relatively easy to code up since
$$
  \frac{\partial x^a(t_i)}{\partial x_0^b} = \delta_{ab} + \cdots
  \quad , \quad
  \frac{\partial x^a(t_i)}{\partial v_0^b} = (t-t_0)\delta_{ab}
$$
It is a useful trick when doing these sorts of problems to let the computer do the chain rule for you: note that at the level of numbers the chain rule is essentially just matrix multiplication.  This helps to debug the code and keep the source closer to the algebra.  So in this case...

#### Symbolic manipulation ####

Depending upon what we want to teach, this is a good place to introduce sympy and symbolic manipulations.  Since none of our formulae involve complex functions, we can have the computer auto-generate all of the derivatives as NumPy vectorized expressions.

## Minimizers ##

Minimizers we can investigate:

* Powell
* BFGS
* PCG
* AdaProp
* ADAM


### Other references ###

Multigrid article
https://link.springer.com/article/10.1007/BF01227487

NASA report.
https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19700016051.pdf

Kalman filter & numerical integration.
https://www.degruyter.com/downloadpdf/j/arsa.2014.49.issue-2/arsa-2014-0007/arsa-2014-0007.pdf

Quasi-linearization
http://www.dtic.mil/dtic/tr/fulltext/u2/608287.pdf

Taylor expansion
http://lfvn.astronomer.ru/report/0000037/timflohrer_2.pdf

Review article
http://www.jhuapl.edu/techdigest/TD/td2703/vetter.pdf

Discussion of classical methods and optimization [hard to follow]
https://www.hindawi.com/journals/aaa/2013/960582/

Ideas about the "f and g" methods
https://academic.oup.com/mnras/article/391/3/1259/978116

Lectures notes on Laplace and Gauss methods
https://www.stardust2013.eu/Portals/63/Images/Training/OTS%20Repository/gronchi_OTS2013.pdf

