-
Notifications
You must be signed in to change notification settings - Fork 59
/
chap15.tex
202 lines (138 loc) · 8.99 KB
/
chap15.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
\chapter{Under the Hood}
\label{how}
In this chapter we ``open the hood,'' looking more closely at how some of the tools we have used---\lstinline{ode45}, \lstinline{fzero}, and \lstinline{fminsearch}---work.
\section{How ode45 Works}
\label{howode45}
According to the MATLAB documentation, \lstinline{ode45} uses ``an explicit Runge-Kutta formula, the Dormand-Prince pair.'' You can read about it at \url{https://greenteapress.com/matlab/runge}, but I'll give you a sense of it here.
\index{Runge-Kutta}
\index{ode45@\lstinline{ode45}}
\index{Dormand-Prince}
The key idea behind all Runge-Kutta methods is to evaluate the rate function several times at each time step and use a weighted average of the computed slopes to estimate the value at the next time step.
Different methods evaluate the rate function in different places and compute the average with different weights.
\index{differential equation}
\index{rate function}
\index{function!rate}
As an example, we'll solve the following differential equation:
\[ \frac{dy}{dt}(t) = y \sin t \]
Given a differential equation, it's usually straightforward to write a rate function:
\begin{code}
function res = rate_func(t, y)
dydt = y * sin(t);
res = dydt;
end
\end{code}
And we can use it like this:
\begin{code}
y0 = 1;
tspan=[0 4];
options = odeset('Refine', 1);
[T, Y] = ode45(@rate_func, tspan, y0, options);
\end{code}
For this example we'll use \lstinline{odeset} to set the \lstinline{Refine} option to \lstinline{1}, which tells \lstinline{ode45} to return only the time steps it computes, rather than interpolating between them.
\index{odeset@\lstinline{odeset}}
Now we can modify the rate function to plot the places where it gets evaluated:
\begin{code}
function res = rate_func(t, y)
dydt = y * sin(t);
res = dydt;
plot(t, y, 'ro')
dt = 0.01;
ts = [t t+dt];
ys = [y y+dydt*dt];
plot(ts, ys, 'r-')
end
\end{code}
When \lstinline{rate_func} runs, it plots a red circle at each location and a short red line showing the computed slope.
\index{time step}
Figure~\ref{fig:odeplot1} shows the result; \lstinline{ode45} computes 10 time steps (not counting the initial condition) and evaluates the rate function 61 times.
\begin{figure}[h]
\centerline{\includegraphics[scale=0.8]{images/figure15_01_new.eps}}
\caption{Points where \lstinline{ode45} evaluates the rate function}
\label{fig:odeplot1}
\end{figure}
Figure~\ref{fig:odeplot2} shows the same plot, zoomed in on a single time step.
The dark squares at $0.8$ to $1.2$ show the values that were returned as part of the solution.
The circles show the places where the rate function was evaluated.
\begin{figure}[h]
\centerline{\includegraphics[scale=0.8]{images/figure15_02_new.eps}}
\caption{Points where \lstinline{ode45} evaluates the rate function, zoomed in}
\label{fig:odeplot2}
\end{figure}
We can see that \lstinline{ode45} evaluates the rate function several times per time step, at several places between the end points.
We can also see that most of the places where \lstinline{ode45} evaluates the rate function are not part of the solution it returns, and they are not always good estimates of the solution.
This is good to know when you are writing a rate function; you should not assume that the time and state you get as input variables will be part of the solution.
In a sense, the rate function is answering a hypothetical question: ``\emph{If} the state at a particular time has these particular values, what \emph{would} the slope~be?''
At each time step, \lstinline{ode45} actually computes \emph{two} estimates of the next value.
By comparing them, it can estimate the magnitude of the error, which it uses to adjust the time step.
If the error is too big, it uses a smaller time step; if the error is small enough, it uses a bigger time step.
Because \lstinline{ode45} is \emph{adaptive} in this way, it minimizes the number of times it calls the rate function to achieve a given level of accuracy.
\index{adaptive}
\section{How fzero Works}
\label{howfzero}
According to the MATLAB documentation, \lstinline{fzero} uses ``a combination of bisection, secant, and inverse quadratic interpolation methods.'' (See \url{https://greenteapress.com/matlab/fzero})
\index{fzero@\lstinline{fzero}}
\index{bisection}
\index{secant method}
\index{inverse quadratic interpolation}
\index{root}
To understand what that means, suppose we're trying to find a root of a function of one variable, $f(x)$, and assume we have evaluated the function at two places, $x_1$ and $x_2$, and found that the results have opposite signs. Specifically, assume $f(x_1) > 0$ and $f(x_2) < 0$, as shown in Figure~\ref{fig:secant}.
\begin{figure}[h]
\centerline{\includegraphics[scale=0.8]{images/figure15_03_new.eps}}
\caption{Initial state of a root-finding search}
\label{fig:secant}
\end{figure}
As long as $f$ is continuous, there must be at least one root in this interval.
In this case we would say that $x_1$ and $x_2$ \emph{bracket} a zero.
\index{bracket}
If this were all you knew about $f$, where would you go looking for
a root? If you said ``halfway between $x_1$ and $x_2$,''
congratulations! You just invented a numerical method called
\emph{bisection}!
If you said, ``I would connect the dots with a straight line
and compute the zero of the line,''
congratulations! You just invented the \emph{secant method}!
And if you said, ``I would evaluate $f$ at a third point, find the
parabola that passes through all three points, and compute the zeros
of the parabola,'' congratulations, you just invented
\emph{inverse quadratic interpolation}!
That's most of how \lstinline{fzero} works. The details of how these methods are combined are interesting, but beyond the scope of this book. You can read more at \url{https://greenteapress.com/matlab/brent}.
\section{How fminsearch Works}
\label{howfminsearch}
According to the MATLAB documentation, \lstinline{fminsearch} uses the Nelder-Mead simplex algorithm. You can read about it at \url{https://greenteapress.com/matlab/nelder}, but you might find it overwhelming.
\index{fminsearch@\lstinline{fminsearch}}
\index{Nelder-Mead}
\index{golden-section search}
To give you a sense of how it works, I will present a simpler algorithm, the \emph{golden-section search}. Suppose we're trying to find the minimum of a function of a single variable, $f(x)$.
As a starting place, assume that we have evaluated the function at three places,
$x_1$, $x_2$, and $x_3$, and found that $x_2$ yields the lowest
value. Figure~\ref{fig:golden1} shows this initial state.
\begin{figure}[h]
\centerline{\includegraphics[scale=0.8]{images/figure15_04_new.eps}}
\caption{Initial state of a golden-section search}
\label{fig:golden1}
\end{figure}
We will assume that $f(x)$ is continuous and \emph{unimodal} in this range, which means that there is exactly one minimum between $x_1$ and $x_3$.
\index{minimum}
\index{unimodal}
The next step is to choose a fourth point, $x_4$, and evaluate
$f(x_4)$. There are two possible outcomes, depending on whether
$f(x_4)$ is greater than $f(x_2)$ or not.
Figure~\ref{fig:golden2} shows the two possible states.
\begin{figure}[h]
\centerline{\includegraphics[scale=0.8]{images/figure15_05_new.eps}}
\caption{Possible states of a golden-section search after evaluating $f(x_4)$}
\label{fig:golden2}
\end{figure}
If $f(x_4)$ is less than $f(x_2)$ (shown on the left), the
minimum must be between $x_2$ and $x_3$, so we would discard $x_1$ and proceed with the new triple $(x_2, x_4, x_3)$.
If $f(x_4)$ is greater than $f(x_2)$ (shown on the right), the
local minimum must be between $x_1$ and $x_4$, so we would discard $x_3$ and proceed with the new triple $(x_1, x_2, x_4)$.
Either way, the range gets smaller and our estimate of the optimal value of $x$ gets better.
This method works for almost any value of $x_4$, but some choices
are better than others. You might be tempted to bisect the interval between $x_2$ and $x_3$, but that turns out not to be the best choice. You can read about a better option at \url{https://greenteapress.com/matlab/golden}.
\section{Chapter Review}
The information in this chapter is not strictly necessary; you can use these methods without knowing much about how they work. But there are two reasons you might want to know.
One reason is pure curiosity. If you use these methods, and especially if you come to rely on them, you might find it unsatisfying to treat them as ``black boxes.'' At the risk of mixing metaphors, I hope you enjoyed opening the hood.
The other reason is that these methods are not infallible; sometimes things go wrong. If you know how they work, at least in a general sense, you might find it easier to debug them.
With that, you have reached the end of the book, so congratulations! I hope you enjoyed it and learned a lot. I think the tools in this book are useful, and the ways of thinking are important, not just in engineering and science, but in practically every field of inquiry.
Models are the tools we use to understand the world: if you build good models, you are more likely to get things right. Good luck!