# Introduction to Finite Differences and Numerical Solution of Differential Equations

While we began the quarter looking at a couple examples of difference equations (logistic map and Mandelbrot set), it is perhaps more common in engineering to deal with models based on differential equations. This notebook provides a very concise introduction to numerical methods for computing approximate solutions to differential equations based on finite difference methods for numerical estimation of derivatives. Throughout the discussion, the major mathematical tool that comes into play is the __Taylor series approximation__.

## Numerical Differentiation

When we deal with differentiation (and integration), we are talking about the basic operations of calculus, so we start with the definition of the derivative we learned in calculus:

$$\frac{d f(t)}{dt} = \lim_{\Delta t\to 0} \frac{f(t+\Delta t)-f(t)}{\Delta t}$$ (1)

In the context of digital computation with floating-point arithmetic, we immediately run into a fundamental issue: we cannot really compute in the limit as something goes to zero. Finite precision floating-point systems can only represent a finite set of numbers, and that set does __not__ include arbitrarily small quantities.

The feasible alternative is to compute an approximation of the limit, and that almost always involves Taylor series. The (hopefully familiar) expression for the Taylor expansion of the function $f$ about the argument $t$ is:

$$f(t+\Delta t) = f(t)+\Delta t\frac{df(t)}{dt}+\frac{\Delta t^2}{2!} \frac{d^2f(t)}{dt^2}+\frac{\Delta t^3}{3!} \frac{d^3f(c_1)}{dt^3}$$ (2)


Solving for $\frac{df(t)}{dt}$ gives:

$$\frac{df(t)}{dt} = \frac{f(t + \Delta t) - f(t)}{\Delta t} + O(\Delta t)$$ (3)

and we have arrived at the first __finite difference__ formula. The terminology indicates that a derivative (that really involves a limit) is approximated in terms of function evaluations at finitely separated points.

This particular formula is the __first-order forward difference approximation to the first derivative__: it has an error term proportional to $(\Delta t)$ to the first power, it looks forward (i.e. it involves evaluation at $t$ and $t + \Delta t$), and it estimates the first derivative.

Similarly, looking backward in time (replacing $\Delta t$ with $-\Delta t$) produces an alternative Taylor expansion:

$$f(t-\Delta t) = f(t)-\Delta t\frac{df(t)}{dt}+\frac{\Delta t^2}{2!} \frac{d^2f(t)}{dt^2}-\frac{\Delta t^3}{3!} \frac{d^3f(c_1)}{dt^3}$$ (4)

which leads to the first-order backward difference formula:

$$\frac{df(t)}{dt} = \frac{f(t) - f(t - \Delta t)}{\Delta t} + O(\Delta t)$$ (5)

We can also obtain a higher-order estimate by considering the difference Eq.2 - Eq.4:

$$f(t+\Delta t) - f(t-\Delta t) = 2 \Delta t\frac{df(t)}{dt}+2 \frac{\Delta t^3}{3!} \frac{d^3f(c_1)}{dt^3}$$ (6)

Note that the terms of even degree cancel, including the degree 2 term that led to the leading order error in the previous formula, so solving for the first derivative now produces a formula with error of order $(\Delta t)^2$: 

$$\frac{df(t)}{dt} = \frac{f(t + \Delta t) - f(t - \Delta t)}{2 \Delta t} + O(\Delta t)^2$$ (7)

Eq. 7 is referred to as the second-order central difference (because it looks both forward and backward to evaluate at $t+\Delta t$ and $t-\Delta t$) estimate of the first derivative.

Employing additional evaluation points along with knowledge of the Taylor series leads to families of finite difference approximations of various derivatives with various orders of accuracy. (A quick session with your favorite search engine will yield tables of such formulas.) One formula of particular usefulness is the second-order central difference estimate of the second derivative. Isolating the second derivative is relatively straightforward; summing the Taylor series (Eqs. 2 and 4) so the first derivatives term cancel and solving for the second derivative yields:

$$\frac{d^2f(t)}{dt^2} = \frac{f(t + \Delta t) -2 f(t) + f(t - \Delta t)}{(\Delta t)^2} + O(\Delta t)^2$$ (7)

Finite difference formulas give us a means to compute floating-point approximations of derivatives and, from their basis in Taylor series, we can determine the order of the __truncation error__ and estimate how the error should depend on the spacing $\Delta t$ between evaluation points. Useful formulas involve truncation errors proportional to some positive power $n$; i.e. $\propto(\Delta t)^n$. This leads us to expect that, as the spacing between evaluations decreases, the error decreases.

Should we then conclude that optimal accuracy is obtained by choosing to make $\Delta t$ as small as possible?

The answer is a resounding __NO!__ Why? Truncation is only the first of 2 major sources of error in digital computation. The other major source of error is __roundoff__; i.e. the error incurred in approximating an infinite-precision real number by a finite-precision floating point number. As $\Delta t$ becomes very small, the evaluation points get so close together that the floating-point system cannot accurately resolve the difference.

The typical case is that, for relatively large spacing $\Delta t$, truncation error dominates and error can be reduced by decreasing the spacing. However, for sufficiently small spacing between evaluation points, roundoff error dominates and further reduction in spacing makes the error worse, not better.

> __Bottom line on spacing and error:__ Truncation error dominates for large spacing and gets better as spacing decreases. Roundoff error dominates for small spacing and gets better as spacing increases. There is some middle ground spacing where total error is minimized; typically is is taken to be some small integer root of _machine epsilon_, the smallest number that can be added to 1 and produce 1 as the floating-point result.

With that background, we are ready to tackle some differential equations.

## Ordinary Differential Equations (ODEs) - Initial Value Problems (IVPs)

Let's start with the minimal amount of nomenclature. A differential equation is simply an equation that involve one or more derivatives. A variable on the bottom of the derivative is an __independent variable__, and a variable on the top of a derivative is a __dependent variable__. An __ordinary__ differential equation (or system of equations) includes a single independent variable. If there are multiple independent variables, then you are dealing with a __partial differential equation__ (or system of equations). The __order__ of the differential equation (or system) is the order of the highest derivative.

For now, we focus on systems of first-order ordinary differential equations. (We will return later to look at partial differential equations.) Note that there is a standard "trick" to convert an $n^{th}$-order ODE to a system of first-order ODEs: Simply introduce new __independent__ variables for derivatives of order $0$ through $n-1$. This immediately creates a system of $n$ first-order ODEs; the first $n-1$ equations define the variables, and the original ODE rewritten in terms of the new variables becomes the $n^{th}$ equation.

### Euler's method

Let's start with the most fundamental problem: computing an approximate solution of a single first-order ODE with a specified initial value:

$$\frac{dy}{dt} = f(t,y)\;, \qquad y(0) = y_0$$ (8)

You have likely seen the simplest way of computing an approximate solution; i.e. __Euler's method__.

This is a classic "stepping" or "marching" method that replaces the continuous independent variable $t$ (that we think of here as time) with a discrete version:

$$t_n = t_0 + n \Delta t = t_0 + n h$$ (9)

 and computes a sequence of values at the discrete "future" times based on the existing known "history": 

$$y_n = y(t_n) = y(t_0 + n h)$$ (10)

So we will end up again with a difference equation (or map), and we will derive the difference equation by replacing the derivative with a finite difference estimate. For Euler's method, we approximate the derivative with a simple forward difference estimator

$$\frac{dy}{dt} = f(t,y)\;, \qquad y(0) = y_0 \rightarrow \frac{y_{n+1} - y_n}{h}\approx f(t_n, y_n)$$ (11)

and solve for $y_{n+1}$ to obtain a formula for the value at the next time step

$$y_{n+1} = y_n + h f(t_n, y_n)$$ (12)

This method is more precisely called the "forward Euler method" (because it uses the forward difference estimate for the first derivative) or the "explicit Euler method" (because Eq.12 gives an explicit formula for computing $y_{n+1}$ given $y_n$ and $f$.

Euler's method essentially computes the rate of change at the beginning of a time step (from $t_n$ to $t_{n+1}$) and ignores the change in that rate of change over short time steps. This assumption is, of course, not completely valid so truncation error is incurred. The truncation error for each step of Euler's method is $)(\Delta t)$ and the number of time steps required to cover an interval of length $O(1)$ is $O(\Delta t)^{-1}$ so the __global truncation error__ is $O(1)$. In practical terms, Euler's method is simple and can sometimes provide useful initial results, but you should not expect to improve those initial results by taking a larger number of smaller steps.

A method with higher-order truncation error is needed to have a chance of provding results that we can expect to improve by refining the discretization. The tool for creating such methods again involves applying the Taylor series at additional evaluation points.

__Modified Euler-Cauchy Method__

A next logical step up from Euler's method involves using Euler's method to estimate an intermediate point where a rate of change (i.e. the function $f$) can be computed that is a better representation of what happens over the interval. The detailed description of Modified Euler-Cauchy is as follows:

1) Compute derivate (RHS) at initial time (left side of interval)

$$rate_{left} = f(t, y(t))$$ (13)

2) Use that derivative value to compute "Euler" estimate of derivative at middle of interval.

$$y_{mid} = y(t) + \frac{h}{2} rate_{left}$$ (14)

3) Use mid-interval value to estimate rate of change over the interval.

$$rate_{mid} = f(t+\frac{h}{2}, y_{mid})$$ (15)

4) Use the midpoint rate estimage to compute the next value:
$$y_{RK2}(t+h) = y(t) + h (rate_{mid})$$ (16)

Putting the the pieces together gives:
$$y_{RK2}(t+h) = y(t)+h f\big(t+\frac{h}{2}, y(t)+\frac{h}{2} \; f(t,y(t)) \big)$$ (17)

which produces second order local error and first order global error, so error can be reduced by taking smaller steps for some "friendly" ODEs.

This brings us to the question of why the subscript on the modified Euler-Cauchy formula is "RK2"?

Euler's method and the modified Euler-Cauchy method are actually the first 2 methods in the family of __Runge-Kutta methods__ that aim to reduce the truncation error (i.e. increase the order by making the neglected terms involve higher powers of $h$) by including rate estimates at additional points in the timestep.

## $4^{th}$ Order Runge-Kutta Method

Perhaps the most commonly used member of the family, and a method that you should definitely be familiar with, is the __Fourth-Order Runge_Kutta (RK4)__ which involves averaging over 4 rate estimates (1 at the start of the interval, 2 in the middle, and 1 at the end). The method can be described as follows:

1) Compute initial rate estimate
$$f_1 = f(t_n,y_n)$$ (18)

2) Use initial rate estimate to estimate midstep values
$$f_2 = f(t_n+\frac{h}{2}, y_n+\frac{h}{2}f_1)$$ (19)

3) Use the midstep estimate to compute improved midstep estimate
$$f_3 = f(t_n+\frac{h}{2}, y_n+\frac{h}{2}f_2)$$ (20)

4) Use improved midstep estimate to estimate right-side estimate
$$f_4 = f(t_n+h,y_n+h f_3)$$ (21)

1) Compute weighted sum  of contributions to cancel as many terms in the Taylor series as possible. The "RK4" formula achieves local truncation error $\sim O(h^5)$ which implies global error $\sim O(h^4)$:
$$y_{n+1} = y_n + \frac{h}{6} [f_1 + 2 f_2 +2 f_3 + f_4]$$ (22)

## Application to Stability Analysis

When studying the behavior of dynamical systems, there are a few common phases of the analysis. The first step involves modeling and application of physical principles. If you are studying a pendulum, you might make modeling approximations such as:

1) The pendulum is a rigid body.

2) The pendulum remains in a vertical plane.

3) Friction may or may not be considered according to some manageable model.

4) Some exterior forcing may be applied.

Having made those modeling assumptions, you can apply Newton's laws or Lagrange's equations and obtain a (typically) second-order ODE governing the motion of the system. The simplest model (ignoring damping and forcing) would be:

$$\theta'' = -sin(\theta); \qquad \theta(0)=\theta_0, theta'(0) = \theta'_0$$ (23)

You can convert this to the first order system:

$$ 
\begin{aligned}
u_0' &= u_1 \\
u_1' &= -sin(u_0)
\end{aligned}$$ (24)

with initial conditions $u_0(0)=\theta_0$ and $u_1(0)=\theta'_0$.

By coding up the algorithms above, you now have the tools to compute an approximate numerical solution for a particular choice of the intial conditions.

The next stage of the dynamic analysis typically involves identification of steady-state equilibria. This involves setting the time derivatives to zero (so the rates all vanish), and solving the rate equations for equilibrium values. In this particular case, the equilibrium equations are:

$$ 
\begin{aligned}
0 &= u[1] \\
0 &= -sin(u[0])
\end{aligned}$$ (25)

and there are 2 distinct equilibria, $u_1=0$ (so the pendulum is at rest) at position $u_0 = 0$ (straight down beneath the pivot) or $u_0 = \pi$ (straight up above the pivot). These are both valid equilibrium solutions, but one is routinely observed while the other does not occur in practice; and the distiction between the two is __stability__.

There are numerous definitions of stability for different applications, but a simple version suffices here:

 The equilibrium is __stable__ if it is surrounded by a neighborhood in which all initial conditions lead to trajectories that remain in the neighborhood of the equilibrium.

 So the stability question is not about the behavior arising from a particular set of initial conditions; it is about the collective behavior of __all__ the initial conditions in the vicinity of the equilibrium.

This is an ideal application for parallelization. Yes, the computation of the trajectory or history resulting from a particular set of initial conditions needs to be computed serially (we cannot reasonably compute $y((k+1) h)$ without knowing $y(k h)); however, the trajectory for one set of initial conditions can be computed completely independent of the trajectory for any other set of initial conditions. Thus, the stability question naturally leads to the idea of launching a kernel that solves the ODE on a computational grid of many (realistically, millions of) sets of initial conditions. A 3D plot of something like the ratio of the final and initial distance from the equilibrium will convey useful information about stability. You will get a chance to pursue this concept in more detail in an upcoming homework...


