### 3.3 Integration

Now we begin our work on the second principle computation of Calculus: evaluating a definite integral. Remember that a single-variable definite integral can be interpreted as the signed area between the curve and the $x$ axis. In this section we will study three different techniques for approximating the value of a definite integral.

Exercise 3.31. Consider the shaded area of the region under the function plotted in Figure 3.4 between $x=0$ and $x=2$.
a. What rectangle with area 6 gives an upper bound for the area under the curve? Can you give a better upper bound?
b. Why must the area under the curve be greater than 3 ?
c. Is the area greater than 4 ? Why/Why not?
d. Work with your partner to give an estimate of the area and provide an estimate for the amount of error that you're making.
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-21.jpg?height=467&width=554&top_left_y=1192&top_left_x=688)

Figure 3.4: A sample integration

### 3.3.1 Riemann Sums

In this subsection we will build our first method for approximating definite integrals. Recall from Calculus that the definition of the Riemann integral is

$$
\int_{a}^{b} f(x) d x=\lim _{\Delta x \rightarrow 0} \sum_{j=1}^{N} f\left(x_{j}\right) \Delta x
$$

where $N$ is the number of sub intervals on the interval $[a, b]$ and $\Delta x$ is the width of the interval. As with differentiation, we can remove the limit and have a decent approximation of the integral so long as $N$ is large (or equivalently, if $\Delta x$
is small).

$$
\int_{a}^{b} f(x) d x \approx \sum_{j=1}^{N} f\left(x_{j}\right) \Delta x .
$$

You are likely familiar with this approximation of the integral from Calculus. The value of $x_{j}$ can be chosen anywhere within the sub interval and three common choices are to use the left-aligned, the midpoint-aligned, and the right-aligned.
We see a depiction of this in Figure 3.5.
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-22.jpg?height=394&width=974&top_left_y=817&top_left_x=668)

Figure 3.5: Left-aligned Riemann sums, midpoint-aligned Riemann sums, and right-aligned Riemann sums

Clearly, the more rectangles we choose the closer the sum of the areas of the rectangles will get to the integral.

Exercise 3.32. Write code to approximate an integral with Riemann sums. You should ALWAYS start by writing pseudo-code as comments in your function. Your Python function should accept a Python Function, a lower bound, an upper bound, the number of subintervals, and an optional input that allows the user to designate whether they want left, right, or midpoint rectangles. Test your code on several functions for which you know the integral. You should write your code without any loops.

Exercise 3.33. Consider the function $f(x)=\sin (x)$. We know the antiderivative for this function, $F(x)=-\cos (x)+C$, but in this question we are going to get a sense of the order of the error when doing Riemann Sum integration.
a. Find the exact value of

$$
\int_{0}^{1} f(x) d x .
$$

b. Now build a Riemann Sum approximation (using your code) with various values of $\Delta x$. For all of your approximation use left-justified rectangles. Fill in the table with your results.

| $\Delta x$ | Approx. Integral | Exact Integral | Abs. Percent Error |
| :--- | :--- | :--- | :--- |
| $2^{-1}=0.5$ |  |  |  |
| $2^{-2}=0.25$ |  |  |  |
| $2^{-3}$ |  |  |  |
| $2^{-4}$ |  |  |  |
| $2^{-5}$ |  |  |  |
| $2^{-6}$ |  |  |  |
| $2^{-7}$ |  |  |  |
| $2^{-8}$ |  |  |  |

c. There was nothing really special about powers of 2 in part (b) of this problem. Examine other sequences of $\Delta x$ with a goal toward answering the question:
If we find an approximation of the integral with a fixed $\Delta x$ and find an absolute percent error, then what would happen to the absolute percent error if we divide $\Delta x$ by some positive constant $M$ ?
d. What is the apparent approximation error of the Riemann Sum method using left-justified rectangles.

Exercise 3.34. Repeat the previous problem using right-justified rectangles.

Theorem 3.2. In approximating the integral $\int_{a}^{b} f(x) d x$ with a fixed interval width $\Delta x$ we find an absolute percent error $P$.

- If we use left rectangles and an interval width of $\frac{\Delta x}{M}$ then the absolute percent error will be approximately $\qquad$ -.
- If we use right rectangles and an interval width of $\frac{\Delta x}{M}$ then the absolute percent error will be approximately $\qquad$ .

Exercise 3.35. The previous theorem could be stated in an equivalent way.
In approximating the integral $\int_{a}^{b} f(x) d x$ with a fixed interval number of subintervals we find an absolute percent error $P$.

- If we use left rectangles and $M$ times as many subintervals then the absolute percent error will be approximately $\qquad$ -
- If we use right rectangles and $M$ times as many subintervals then the absolute percent error will be approximately $\qquad$ .

Exercise 3.36. Create a plot with the width of the subintervals on the horizontal axis and the absolute error between your Riemann sum calculations (left, right,
and midpoint) and the exact integral for a known definite integral. Your plot should be on a log-log scale. Based on your plot, what is the approximate order of the error in the Riemann sum approximation?

### 3.3.2 Trapezoidal Rule

Now let's turn our attention to some slightly better algorithms for calculating the value of a definite integral: The Trapezoidal Rule and Simpson's Rule. There are many others, but in practice these two are relatively easy to implement and have reasonably good error approximations. To motivate the idea of the Trapezoid rule consider Figure 3.6. It is plain to see that trapezoids will make better approximations than rectangles at least in this particular case. Another way to think about using trapezoids, however, is to see the top side of the trapezoid as a secant line connecting two points on the curve. As $\Delta x$ gets arbitrarily small, the secant lines become better and better approximations for tangent lines and are hence arbitrarily good approximations for the curve. For these reasons it seems like we should investigate how to systematically approximate definite integrals via trapezoids.
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-24.jpg?height=362&width=1117&top_left_y=1250&top_left_x=599)

Figure 3.6: Motivation for using trapezoids to approximate a definite integral.

Exercise 3.37. Consider a single trapezoid approximating the area under a curve. From geometry we recall that the area of a trapezoid is

$$
A=\frac{1}{2}\left(b_{1}+b_{2}\right) h
$$

where $b_{1}, b_{2}$ and $h$ are marked in Figure 3.7. The function shown in the picture is $f(x)=\frac{1}{5} x^{2}(5-x)$. Find the area of the shaded region as an approximation to

$$
\int_{1}^{4}\left(\frac{1}{5} x^{2}(5-x)\right) d x .
$$

Now use the same idea with $h=\Delta x=1$ from Figure 3.6 to approximate the area under the function $f(x)=\frac{1}{5} x^{2}(5-x)$ between $x=1$ and $x=4$ using three trapezoids.
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-25.jpg?height=418&width=492&top_left_y=425&top_left_x=719)

Figure 3.7: A single trapezoid to approximate area under a curve.

Exercise 3.38. Again consider the function $f(x)=\frac{1}{5} x^{2}(5-x)$ on the interval $[1,4]$. We want to evaluate the integral

$$
\int_{1}^{4} f(x) d x
$$

using trapezoids to approximate the area.
a. Work out the exact value of the definite integral by hand.
b. Summarize your answers to the previous problems in the following table then extend the data that you have for smaller and smaller values of $\Delta x$.

| $\Delta x$ | Approx. Integral | Exact Integral | Abs. \% Error |
| :--- | :--- | :--- | :--- |
| 3 |  |  |  |
| 1 |  |  |  |
| $1 / 3$ |  | $\vdots$ |  |
| $1 / 9$ |  | $\vdots$ |  |
| $\vdots$ | $\vdots$ | $\vdots$ |  |

c. From the table that you built in part (b), what do you conjecture is the order of the approximation error for the trapezoid method?

Definition 3.3. (The Trapezoidal Rule) We want to approximate $\int_{a}^{b} f(x) d x$. One of the simplest ways is to approximate the area under the function with a trapezoid. Recall from basic geometry that area of a trapezoid is $A=\frac{1}{2}\left(b_{1}+b_{2}\right) h$. In terms of the integration problem we can do the following:
a. First partition $[a, b]$ into the set $\left\{x_{0}=a, x_{1}, x_{2}, \ldots, x_{n-1}, x_{n}=b\right\}$.
b. On each part of the partition approximate the area with a trapezoid:

$$
A_{j}=\frac{1}{2}\left[f\left(x_{j}\right)+f\left(x_{j-1}\right)\right]\left(x_{j}-x_{j-1}\right)
$$

c. Approximate the integral as

$$
\int_{a}^{b} f(x) d x=\sum_{j=1}^{n} A_{j}
$$

Exercise 3.39. Write code to give the trapezoidal rule approximation for the definite integral $\int_{a}^{b} f(x) d x$. Test your code on functions where you know the definite area. Then test your code on functions where you have approximated the area by examining a plot (i.e. you have a visual estimate of the area).

Exercise 3.40. Use the code that you wrote in the previous problem to test your conjecture about the order of the approximation error for the trapezoid rule. Integrate the function $f(x)=\sin (x)$ from $x=0$ to $x=1$ with more and more trapezoids. In each case compare to the exact answer and find the absolute percent error. The goal is to answer the question:
If we calculate the definite integral with a fixed $\Delta x$ and get an absolute percent error, $P$, then what absolute percent error will we get if we use a width of $\Delta x / M$ for some positive number M?

### 3.3.3 Simpsons Rule

The trapezoidal rule does a decent job approximating integrals, but ultimately you are using linear functions to approximate $f(x)$ and the accuracy may suffer if the step size is too large or the function too non-linear. You likely notice that the trapezoidal rule will give an exact answer if you were to integrate a linear or constant function. A potentially better approach would be to get an integral that evaluates quadratic functions exactly. In order to do this we need to evaluate the function at three points (not two like the trapezoidal rule). Let's integrate a function $f(x)$ on the interval $[a, b]$ by using the three points $(a, f(a))$, $(m, f(m))$, and $(b, f(b))$ where $m=\frac{a+b}{2}$ is the midpoint of the two boundary points.

We want to find constants $A_{1}, A_{2}$, and $A_{3}$ such that the integral $\int_{a}^{b} f(x) d x$ can be written as a linear combination of $f(a), f(m)$, and $f(b)$. Specifically, we want to find constants $A_{1}, A_{2}$, and $A_{3}$ in terms of $a, b, f(a), f(b)$, and $f(m)$ such that

$$
\int_{a}^{b} f(x) d x=A_{1} f(a)+A_{2} f(m)+A_{3} f(b)
$$

is exact for all constant, linear, and quadratic functions. This would guarantee that we have an exact integration method for all polynomials of order 2 or less but should serve as a decent approximation if the function is not quadratic.

Exercise 3.41. Draw a picture showing what the previous two paragraphs discussed.

Exercise 3.42. Follow these steps to find $A_{1}, A_{2}$, and $A_{3}$.
a. Prove that

$$
\int_{a}^{b} 1 d x=b-a=A_{1}+A_{2}+A_{3}
$$

b. Prove that

$$
\int_{a}^{b} x d x=\frac{b^{2}-a^{2}}{2}=A_{1} a+A_{2}\left(\frac{a+b}{2}\right)+A_{3} b .
$$

c. Prove that

$$
\int_{a}^{b} x^{2} d x=\frac{b^{3}-a^{3}}{3}=A_{1} a^{2}+A_{2}\left(\frac{a+b}{2}\right)^{2}+A_{3} b^{2}
$$

d. Now solve the linear system of equations to prove that

$$
A_{1}=\frac{b-a}{6}, \quad A_{2}=\frac{4(b-a)}{6}, \quad \text { and } \quad A_{3}=\frac{b-a}{6}
$$

Exercise 3.43. At this point we can see that an integral can be approximated as

$$
\int_{a}^{b} f(x) d x \approx\left(\frac{b-a}{6}\right)\left(f(a)+4 f\left(\frac{a+b}{2}\right)+f(b)\right)
$$

and the technique will give an exact answer for any polynomial of order 2 or below.

Verify the previous sentence by integrating $f(x)=1, f(x)=x$ and $f(x)=x^{2}$ by hand on the interval $[0,1]$ and using the approximation formula

$$
\int_{a}^{b} f(x) d x \approx\left(\frac{b-a}{6}\right)\left(f(a)+4 f\left(\frac{a+b}{2}\right)+f(b)\right)
$$

a. Use the method described above to approximate the area under the curve $f(x)=(1 / 5) x^{2}(5-x)$ on the interval $[1,4]$. To be clear, you will be using the points $a=1, m=2.5$, and $b=4$ in the above derivation.
b. Next find the exact area under the curve $g(x)=(-1 / 2) x^{2}+3.3 x-2$ on the interval $[1,4]$.
c. What do you notice about the two areas? What does this sample problem tell you about the formula that we derived above?

To make the punchline of the previous exercises a bit more clear, using the formula

$$
\int_{a}^{b} f(x) d x \approx\left(\frac{a-b}{6}\right)(f(a)+4 f(m)+f(b))
$$

is the same as fitting a parabola to the three points $(a, f(a)),(m, f(m))$, and $(b, f(b))$ and finding the area under the parabola exactly. That is exactly the step up from the trapezoid rule and Riemann sums that we were after:

- Riemann sums approximate the function with constant functions,
- the trapezoid rule uses linear functions, and
- now we have a method for approximating with parabolas.

To improve upon this idea we now examine the problem of partitioning the interval $[a, b]$ into small pieces and running this process on each piece. This is called Simpson's Rule for integration.

Definition 3.4. (Simpson's Rule) Now we put the process explained above into a form that can be coded to approximate integrals. We call this method Simpson's Rule after Thomas Simpson (1710-1761) who, by the way, was a basket weaver in his day job so he could pay the bills and keep doing math.
a. First partition $[a, b]$ into the set $\left\{x_{0}=a, x_{1}, x_{2}, \ldots, x_{n-1}, x_{n}=b\right\}$.
b. On each part of the partition approximate the area with a parabola:

$$
A_{j}=\frac{1}{6}\left[f\left(x_{j}\right)+4 f\left(\frac{x_{j}+x_{j-1}}{2}\right)+f\left(x_{j-1}\right)\right]\left(x_{j}-x_{j-1}\right)
$$

c. Approximate the integral as

$$
\int_{a}^{b} f(x) d x=\sum_{j=1}^{n} A_{j}
$$

Exercise 3.44. We have spent a lot of time over the past many pages building approximations of the order of the error for numerical integration and differentiation schemes. It is now up to you.

Build a numerical experiment that allows you to conjecture the order of the approximation error for Simpson's rule. Remember that the goal is to answer the question:
If I approximate the integral with a fixed $\Delta x$ and find an absolute percent error of $P$, then what will the absolute percent error be using a width of $\Delta x / M$ ?

Exercise 3.45. Write a Python function that implements Simpson's Rule. You should ALWAYS start by writing pseudo-code as comments in your file. You shouldn't need a loop in your function.

Exercise 3.46. Test your function on known integrals and approximate the order of the error based on the mesh size.

Thus far we have three numerical approximations for definite integrals: Riemann sums (with rectangles), the trapezoidal rule, and Simpsons's rule. There are MANY other approximations for integrals and we leave the further research to the curious reader.

Theorem 3.3. (Numerical Integration Schemes) Let $f(x)$ be a continuous function on the interval $[a, b]$. The integral $\int_{a}^{b} f(x) d x$ can be approximated with any of the following.

Riemann Sum: $\int_{a}^{b} f(x) d x \approx \sum_{j=1}^{N} f\left(x_{j}\right) \Delta x$
Error for Left and Right Riemann Sums: $\mathcal{O}(\Delta x)$
Riemann Sum: $\int_{a}^{b} f(x) d x \approx \sum_{m=1}^{N} f\left(x_{m}\right) \Delta x$
Error for Midpoint Riemann Sums: $\mathcal{O}\left(\Delta x^{2}\right)$
Trapezoidal Rule: $\int_{a}^{b} f(x) d x \approx \frac{1}{2} \sum_{j=1}^{N}\left(f\left(x_{j}\right)+f\left(x_{j-1}\right)\right) \Delta x$
Error for Trapezoidal Rule: $\mathcal{O}\left(\Delta x^{2}\right)$
Simpson's Rule: $\int_{a}^{b} f(x) d x \approx \frac{1}{6} \sum_{j=1}^{N}\left(f\left(x_{j}\right)+4 f\left(\frac{x_{j}+x_{j-1}}{2}\right)+f\left(x_{j-1}\right)\right) \Delta x$
Error for Simpson's Rule: $\mathcal{O}\left(\Delta x^{4}\right)$
where $\Delta x=x_{j}-x_{j-1}$ and $N$ is the number of subintervals.

Exercise 3.47. Theorem 3.3 simply states the error rates for our three primary integration schemes. For this problem you need to empirically verify these error rates. Use the integration problem and exact answer

$$
\int_{0}^{\pi / 4} e^{3 x} \sin (2 x) d x=\frac{3}{13} e^{3 \pi / 4}+\frac{2}{13}
$$

and write code that produces a log-log error plot with $\Delta x$ on the horizontal axis and the absolute error on the vertical axis. Fully explain how the error rates show themselves in your plot.

### 3.4 Optimization

### 3.4.1 Single Variable Optimization

You likely recall that one of the major applications of Calculus was to solve optimization problems - find the value of $x$ which makes some function as big or as small as possible. The process itself can sometimes be rather challenging due to either the modeling aspect of the problems and/or the fact that the differentiation might be quite cumbersome. In this section we will revisit those problems from Calculus, but our goal will be to build a numerical method for the Calculus step in hopes to avoid the messy algebra and differentiation.

Exercise 3.48. A piece of cardboard measuring 20 cm by 20 cm is to be cut so that it can be folded into a box without a lid (see Figure 3.8). We want to find the size of the cut, $x$, that maximizes the volume of the box.
a. Write a function for the volume of the box resulting from a cut of size $x$. What is the domain of your function?
b. We know that we want to maximize this function so go through the full Calculus exercise to find the maximum:

- take the derivative
- set it to zero
- find the critical points
- test the critical points and the boundaries of the domain using the extreme value theorem to determine the $x$ that gives the maximum.
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-31.jpg?height=443&width=484&top_left_y=1622&top_left_x=731)

Figure 3.8: Folds to make a cardboard box

The hard part of the single variable optimization process is often solving the equation $f^{\prime}(x)=0$. We could use numerical root finding schemes to solve this
equation, but we could also potentially do better without actually finding the derivative. In the following we propose a few numerical techniques that can approximate the solution to these types of problems. The basic ideas are simple!

Exercise 3.49. If you were blind folded and standing on a hill could you find the top of the hill? (assume no trees and no cliffs . . . this isn't supposed to be dangerous) How would you do it? Explain your technique clearly.

Exercise 3.50. If you were blind folded and standing on a crater on the moon could you find the lowest point? How would you do it? Remember that you can hop as far as you like ... because gravity ... but sometimes that's not a great thing because you could hop too far.

The intuition of numerical optimization schemes is typically to visualize the function that you're trying to minimize or maximize and think about either climbing the hill to the top (maximization) or descending the hill to the bottom (minimization).

Exercise 3.51. Let's turn your intuitions into algorithms. If $f(x)$ is the function that you are trying to maximize then turn your ideas from the previous problems into step-by-step algorithms which could be coded. Then try out your codes on the function

$$
f(x)=e^{-x^{2}}+\sin \left(x^{2}\right)
$$

to see if your algorithms can find the local maximum near $x \approx 1.14$. Try to generate several different algorithms.

Some of the most common algorithms are listed below. Read through them and see which one(s) you ended up recreating? The intuition for these algorithms is pretty darn simple - travel uphill if you want to maximize - travel downhill if you want to minimize.

Definition 3.5. (Derivative Free Optimization) Let $f(x)$ be the objective function which you are seeking to maximize (or minimize).

- Pick a starting point, $x_{0}$, and find the value of your objective function at this point, $f\left(x_{0}\right)$.
- Pick a small step size (say, $\Delta x \approx 0.01$ ).
- Calculate the objective function one step to the left and one step to the right from your starting point. Which ever point is larger (if you're seeking a maximum) is the point that you keep for your next step.
- Iterate (decide on a good stopping rule)

Exercise 3.52. Write code to implement the 1D derivative free optimization algorithm and use it to solve Exercise 3.48. Compare your answer to the analytic solution.

Definition 3.6. (Gradient Descent/Ascent) Let $f(x)$ be the objective function which you are seeking to maximize (or minimize).

- Find the derivative of your objective function, $f^{\prime}(x)$.
- Pick a starting point, $x_{0}$.
- Pick a small control parameter, $\alpha$ (in machine learning this parameter is called the "learning rate" for the gradient descent algorithm).
- Use the iteration $x_{n+1}=x_{n}+\alpha f^{\prime}\left(x_{n}\right)$ if you're maximizing. Use the iteration $x_{n+1}=x_{n}-\alpha f^{\prime}\left(x_{n}\right)$ if you're minimizing.
- Iterate (decide on a good stopping rule)

Exercise 3.53. Write code to implement the 1D gradient descent algorithm and use it to solve Exercise 3.48. Compare your answer to the analytic solution.

Definition 3.7. (Monte-Carlo Search) Let $f(x)$ be the objective function which you are seeking to maximize (or minimize).

- Pick many (perhaps several thousand!) different $x$ values.
- Find the value of the objective function at every one of these points (Hint: use lists, not loops)
- Keep the $x$ value that has the largest (or smallest if you're minimizing) value of the objective function.
- Iterate many times and compare the function value in each iteration to the previous best function value

Exercise 3.54. Write code to implement the 1D monte carlo search algorithm and use it to solve Exercise 3.48. Compare your answer to the analytic solution.

Definition 3.8. (Optimization via Numerical Root Finding) Let $f(x)$ be the objective function which you are seeking to maximize (or minimize).

- Find the derivative of your objective function.
- Set the derivative to zero and use a numerical root finding method (such as bisection or Newton) to find the critical point.
- Use the extreme value theorem to determine if the critical point or one of the endpoints is the maximum (or minimum).

Exercise 3.55. Write code to implement the 1D numerical root finding optimization algorithm and use it to solve Exercise 3.48. Compare your answer to the analytic solution.

Exercise 3.56. In this problem we will compare an contrast the four methods proposed in the previous problem.
a. What are the advantages to each of the methods proposed?
b. What are the disadvantages to each of the methods proposed?
c. Which method, do you suppose, will be faster in general? Why?
d. Which method, do you suppose, will be slower in general? Why?

Exercise 3.57. The Gradient Ascent/Descent algorithm is the most geometrically interesting of the four that we have proposed. The others are pretty brute force algorithms. What is the Gradient Ascent/Descent algorithm doing geometrically? Draw a picture and be prepared to explain to your peers.

Exercise 3.58. (This problem is modified from [6])
A pig weighs 200 pounds and gains weight at a rate proportional to its current weight. Today the growth rate if 5 pounds per day. The pig costs 45 cents per day to keep due mostly to the price of food. The market price for pigs if 65 cents per pound but is falling at a rate of 1 cent per day. When should the pig be sold and how much profit do you make on the pig when you sell it? Write this situation as a single variable mathematical model and solve the problem analytically (by hand). Then solve the problem with all four methods outlined thus far in this section.

Exercise 3.59. (This problem is modified from [6])
Reconsider the pig problem 3.58 but now suppose that the weight of the pig after $t$ days is

$$
w=\frac{800}{1+3 e^{-t / 30}} \text { pounds. }
$$

When should the pig be sold and how much profit do you make on the pig when you sell it? Write this situation as a single variable mathematical model. You
should notice that the algebra and calculus for solving this problem is no longer really a desirable way to go. Use an appropriate numerical technique to solve this problem.

Exercise 3.60. Numerical optimization is often seen as quite challenging since the algorithms that we have introduced here could all get "stuck" at local extrema. To illustrate this see the function shown in Figure 3.9. How will derivative free optimization methods have trouble finding the red point starting at the black point with this function? How will gradient descent/ascent methods have trouble? Why?
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-35.jpg?height=475&width=597&top_left_y=931&top_left_x=669)

Figure 3.9: A challenging numerical optimization problem. If we start at the black point then how will any of our algorithms find the local minimum at the red point?

### 3.4.2 Multivariable Optimization

Now let's look at multivariable optimization. The analytic process for finding optimal solutions is essentially the same as for single variable.

- Write a function that models a scenario in multiple variables,
- find the gradient vector (presuming that the function is differentiable),
- set the gradient vector equal to the zero vector and solve for the critical point(s), and
- interpret your answer in the context of the problem.

The trouble with unconstrained multivariable optimization is that finding the critical points is now equivalent to solving a system of nonlinear equations; a task that is likely impossible even with a computer algebra system.

Let's see if you can extend your intuition from single variable to multivariable. This particular subsection is intentionally quite brief. If you want more details on multivariable optimization it would be wise to take a full course in optimization.

Exercise 3.61. The derivative free optimization method discussed in the single variable optimization section just said that you should pick two points and pick the one that takes you furthest uphill.
a. Why is it insufficient to choose just two points if we are dealing with a function of two variables? Hint: think about contour line.
b. For a function of two variables, how many points should you use to compare and determine the direction of "uphill?"
c. Extend your answer from part (b) to $n$ dimensions. How many points should we compare if we are in $n$ dimensions and need to determine which direction is "uphill?"
d. Back in the case of a two-variable function, you should have decided that three points was best. Explain an algorithm for moving one point at a time so that your three points eventually converge to a nearby local maximum. It may be helpful to make a surface plot or a contour plot of a well-known function just as a visual.
The code below will demonstrate how to make a contour plot.

```
import numpy as np
import matplotlib.pyplot as plt
xdomain = np.linspace(-4,4,100)
ydomain = np.linspace(-4,4,100)
X, Y = np.meshgrid(xdomain,ydomain)
f = lambda x, y: np.sin(x)*np.exp(-np.sqrt (x**2+y**2))
plt.contour(X,Y,f(X,Y))
plt.grid()
plt.show()
```

Exercise 3.62. Now let's tackle the gradient ascent/descent algorithm. You should recall that the gradient vector points in the direction of maximum change. How can you use this fact to modify the gradient ascent/descent algorithm given previously? Clearly write your algorithm so that a classmate could turn it into code.

Exercise 3.63. How does the Monte Carlo algorithm extend to a two-variable optimization problem? Clearly write your algorithm.

Exercise 3.64. Try out the gradient descent/ascent and Monte Carlo algorithms on the function $f(x, y)=\sin (x) \cos (y)+0.1 x^{2}$ which has many local extrema and no global maximum. We are not going to code the multidimensional derivative free optimization routine in this section.

The derivative free, gradient ascent/descent, and monte carlo techniques still have good analogues in higher dimensions. We just need to be a bit careful since in higher dimensions there is much more room to move. Below we'll give the full description of the gradient ascent/descent algorithm. We don't give the full description of the derivative free or Monte Carlo algorithms since there are many ways to implement them. The interested reader should see a course in mathematical optimization or machine learning.

Definition 3.9. (The Gradient Descent Algorithm) We want to solve the problem

$$
\text { minimize } f\left(x_{1}, x_{2}, \ldots, x_{n}\right) \text { subject to }\left(x_{1}, x_{2}, \ldots, x_{n}\right) \in S
$$

a. Choose an arbitrary starting point $\boldsymbol{x}_{0}=\left(x_{1}, x_{2}, \ldots, x_{n}\right) \in S$.
b. We are going to define a difference equation that gives successive guesses for the optimal value:

$$
\boldsymbol{x}_{n+1}=\boldsymbol{x}_{n}-\alpha \nabla f\left(\boldsymbol{x}_{n}\right)
$$

The difference equation says to follow the negative gradient a certain distance from your present point (why are we doing this). Note that the value of $\alpha$ is up to you so experiment with a few values (you should probably take $\alpha \leq 1 \ldots$ why?).
c. Repeat the iterative process in step b until two successive points are close enough to each other.

Take Note: If you are looking to maximize your objective function then in the Monte-Carlo search you should examine if $z$ is greater than your current largest value. For gradient descent you should actually do a gradient ascent instead and follow the positive gradient instead of the negative gradient.

Exercise 3.65. The functions like $f(x, y)=\sin (x) \cos (y)$ have many local extreme values which makes optimization challenging. Implement your Gradient Descent code on this function to find the local minimum $(-\pi / 2,0)$. Start somewhere near $(-\pi / 2,0)$ and show by way of example that your gradient descent code may not converge to this particular local minimum. Why is this important?

### 3.5 Calculus with numpy and scipy

In this section we will look at some highly versatile functions built into the numpy and scipy libraries in Python. These libraries allow us to lean on pre-built numerical routines for calculus and optimization and instead we can focus our energies on setting up the problems and interpreting solutions. The down side here is that we are going to treat some of the optimization routines in Python as black boxes, so part of the goal of this section is to partially unpack these black boxes so that we know what's going on under the hood. If you haven't done Exercise 2.65 yet you may want to do so now in order to get used to some of the syntax used by the Python scipy library.

### 3.5.1 Differentiation

There are two main tools built into the numpy and scipy libraries that do numerical differentiation. In numpy there is the np.diff() command. In scipy there is the scipy.misc.derivative() command.

Exercise 3.66. In the following blocks of Python code we demonstrate what the np. $\operatorname{diff}()$ command does. Use these examples to give a thorough description for what np.diff() does to a Python list.

First example of np.diff():
import numpy as np
myList $=$ np.arange ( 0,10 )
print(myList)
print( np.diff(myList) )
Second example of $n p . \operatorname{diff}()$ :

```
import numpy as np
myList = np.linspace(0,1,6)
print(myList)
print( np.diff(myList) )
```

Third example of $n p . \operatorname{diff}()$ :

```
import numpy as np
x = np.linspace(0,1,6)
dx = x[1]-x[0]
y = x**2
dy = 2*x
print("function values: \n",y)
print("exact values of derivative: \n",dy)
print("values from np.diff(): \n",np.diff(y))
```

```
print("values from np.diff()/dx: \n",np.diff(y) / dx )
```

Exercise 3.67. Why does the np. diff () command produce a list that is one element shorter than the original list?

Exercise 3.68. If we have a list of $x$ values and a list of $y$ values for a function $y=f(x)$ then how do we use np. $\operatorname{diff}()$ to approximate the first derivative of $f(x)$ ? What is the order of the error in the approximation?

Exercise 3.69. What does the following block of Python code do?

```
import numpy as np
x = np.linspace(0,1,6)
dx = x[1]-x[0]
y = x**2
print( np.diff(y,2) / dx**2 )
```

Exercise 3.70. Use the np.diff() command to approximate the first and second derivatives of the function $f(x)=x \sin (x)-\ln (x)$ on the domain $[1,5]$. Then create a plot that shows $f(x)$ and the approximations of $f^{\prime}(x)$ and $f^{\prime \prime}(x)$.

Exercise 3.71. Next we look into the scipy.misc.derivative() command from the scipy library. This will be another way to calculate the derivative of a function. One advantage will be that you can just send in a Python function (or a lambda function) without actually computing the lists of values. Examine the following Python code and fully describe what it does.

```
import numpy as np
import scipy.misc
f = lambda x: x**2
x = np.linspace(1,5,5)
df = scipy.misc.derivative(f,x,dx = 1e-10)
print(df)
import numpy as np
x = np.linspace(0,1,6)
dx = x[1]-x[0]
y = x**2
dy = 2*x
print("function values: \n",y)
```

```
print("exact values of derivative: \n",dy)
print("values from np.diff(): \n",np.diff(y))
print("values from np.diff()/dx: \n",np.diff(y) / dx )
```

One advantage to using scipy.misc.derivative() is that you get to dictate the error in the derivative computation, and that error is not tied to the list of values that you provide. In its simplest form you can provide just a single $x$ value just like in the next block of code.

```
import numpy as np
import scipy.misc
f = lambda x: x**2
df = scipy.misc.derivative(f,1,dx = 1e-10) # derivative at x=1
print(df)
```

Exercise 3.72. In the following code we find the first and second derivatives of $f(x)=x \sin (x)-\ln (x)$ using scipy.misc.derivative(). Notice that we've chosen to take $\mathrm{dx}=1 \mathrm{e}-6$ for each of the derivative computations. That may seem like an odd choice, but there is more going on here. Try successively smaller and smaller values for the dx parameter. What do you find? Why does it happen?

```
import numpy as np
import scipy.misc
import matplotlib.pyplot as plt
f = lambda x: np.sin(x)*x-np.log(x)
x = np.linspace(1,5,100) # x domain: 100 points between 1 and 5
df = scipy.misc.derivative(f,x,dx=1e-6)
df2 = scipy.misc.derivative(f,x,dx=1e-6,n=2)
plt.plot(x,f(x),'b',x,df,'r--',x,df2,'k--')
plt.legend(["f(x)","f'(x)","f''(x)"])
plt.grid()
plt.show()
```


### 3.5.2 Integration

In numpy there is a nice tool called np.trapz() that implements the trapezoidal rule. In the following problem you will find several examples of the np.trapz() command. Use these examples to determine how the command works to integrate functions.

Exercise 3.73. First we'll approximate the integral $\int_{-2}^{2} x^{2} d x$. The exact answer
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-42.jpg?height=595&width=898&top_left_y=429&top_left_x=706)

Figure 3.10: Derivatives with scipy
is

$$
\int_{-2}^{2} x^{2} d x=\left.\frac{x^{3}}{3}\right|_{-2} ^{2}=\frac{16}{3}=5.3333 \ldots
$$

```
import numpy as np
x = np.linspace(-2, 2, 100)
dx = x[1]-x[0]
y = x**2
print("Approximate integral is ",np.trapz(y)*dx)
```

Next we'll approximate $\int_{0}^{2 \pi} \sin (x) d x$. We know that the exact value is 0 .

```
import numpy as np
x = np.linspace(0,2*np.pi,100)
dx = x[1]-x[0]
y = np.sin(x)
print("Approximate integral is ",np.trapz(y)*dx)
```

Pick a function and an interval for which you know the exact definite integral. Demonstrate how to use np.trapz() on your definite integral.

Exercise 3.74. Notice in the last examples that we multiplied the result of the np.trapz() command by dx. Why did we do this? What is the np.trapz() command doing without the $d x$ ?

In the scipy library there is a more general tool called scipy.integrate.quad(). The term "quad" is short for "quadrature." In numerical analysis literature rules
like Simpson's rule are called quadrature rules for integration. The function scipy.integrate.quad() accepts a Python function (or a lambda function) and the bounds of the definite integral. It outputs an approximation of the integral along with an approximation of the error in the integral calculation. See the Python code below.

```
import numpy as np
import scipy.integrate
f = lambda x: x**2
I = scipy.integrate.quad(f,-2,2)
print(I)
```

Exercise 3.75. What are the advantages and disadvantages to using the scipy.integrate.quad() command as compared to the np.trapz() command.

Exercise 3.76. If you have data for the hourly rate at which water is being drained from a dam and you want to find the total amount of water drained over the course of the time in the dataset, then which of the tools that we know would you use? Why?

### 3.5.3 Optimization

As you've seen in this section there are many tools built into numpy and scipy that will do some of our basic numerical computations. The same is true for numerical optimization problems. Keep in mind throughout the remainder of this section that the whole topic of numerical optimization is still an active area of research and there is much more to the story that what we'll see here. However, the Python tools that we will use are highly optimized and tend to work quite well.

Exercise 3.77. Let's solve a very simple function minimization problem to get started. Consider the function $f(x)=(x-3)^{2}-5$. A moment's thought reveals that the global minimum of this parabolic function occurs at $(3,-5)$. We can have scipy.optimize.minimize() find this value for us numerically. The routine is much like Newton's Method in that we give it a starting point near where we think the optimum will be and it will iterate through some algorithm (like a derivative free optimization routine) to approximate the minimum.

```
import numpy as np
from scipy.optimize import minimize
f = lambda x: (x-3)**2 - 5
minimize(f,2)
```

a. Implement the code above then spend some time playing around with the minimize command to minimize more challenging functions.
b. Explain what all of the output information is from the .minimize() command.

Exercise 3.78. There is not a function called scipy.optimize.maximize(). Instead, Python expects you to rewrite every maximization problem as a minimization problem. How do you do that?

Exercise 3.79. Solve Exercise 3.48 using scipy.optimize.minimize().

### 3.6 Least Squares Curve Fitting

In this section we'll change our focus a bit to look at a different question from algebra, and, in turn, reveal a hidden numerical optimization problem where the scipy.optimize.minimize() tool will come in quite handy.

Here is the primary question of interest:
If we have a few data points and a reasonable guess for the type of function fitting the points, how would we determine the actual function?

You may recognize this as the basic question of regression from statistics. What we will do here is pose the statistical question of which curve best fits a data set as an optimization problem. Then we will use the tools that we've built so far to solve the optimization problem.

Exercise 3.80. Consider the function $f(x)$ that goes exactly through the points $(0,1),(1,4)$, and $(2,13)$.
a. Find a function that goes through these points exactly. Be able to defend your work.
b. Is your function unique? That is to say, is there another function out there that also goes exactly through these points?

Exercise 3.81. Now let's make a minor tweak to the previous problem. Let's say that we have the data points $(0,1.07),(1,3.9),(2,14.8)$, and $(3,26.8)$. Notice that these points are close to the points we had in the previous problem, but all of the $y$ values have a little noise in them and we have added a fourth point. If we suspect that a function $f(x)$ that best fits this data is quadratic then $f(x)=a x^{2}+b x+c$ for some constants $a, b$, and $c$.
a. Plot the four points along with the function $f(x)$ for arbitrarily chosen values of $a, b$, and $c$.
b. Work with your partner(s) to systematically change $a, b$, and $c$ so that you get a good visual match to the data. The Python code below will help you get started.

```
import numpy as np
import matplotlib.pyplot as plt
xdata = np.array([0, 1, 2, 3])
ydata = np.array([1.07, 3.9, 14.8, 26.8])
a = # conjecture a value of a
b = # conjecture a value of b
c = # conjecture a value of c
x = # build an x domain starting at 0 and going through 4
guess = a*x**2 + b*x + c
```

```
# make a plot of the data
# make a plot of your function on top of the data
```

![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-46.jpg?height=600&width=903&top_left_y=584&top_left_x=703)

Figure 3.11: Initial attempt at matching data with a quadratic.

As an alternative to loading the data manually we could download the data from the book's github page. All datasets in the text can be loaded in this way. We will be using the pandas library (a Python data science library) to load the .csv files.

```
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
data = np.array( pd.read_csv(URL+'Exercise3_datafit1.cSv') )
# Exercise3_datafit1.csv
xdata = data[:,0]
ydata = data[:,1]
```

Exercise 3.82. Now let's be a bit more systematic about things from the previous problem. Let's say that you have a pretty good guess that $b \approx 2$ and $c \approx 0.7$. We need to get a good estimate for $a$.
a. Pick an arbitrary starting value for $a$ then for each of the four points find the error between the predicted $y$ value and the actual $y$ value. These errors are called the residuals.
b. Square all four of your errors and add them up. (Pause, ponder, and discuss: why are we squaring the errors before we sum them?)
c. Now change your value of $a$ to several different values and record the sum of the square errors for each of your values of $a$. It may be worth while to use a spreadsheet to keep track of your work here.
d. Make a plot with the value of $a$ on the horizontal axis and the value of the sum of the square errors on the vertical axis. Use your plot to defend the optimal choice for $a$.

Exercise 3.83. We're going to revisit part (c) of the previous problem. Write a loop that tries many values of $a$ in very small increments and calculates the sum of the squared errors. The following partial Python code should help you get started. In the resulting plot you should see a clear local minimum. What does that minimum tell you about solving this problem?

```
import numpy as np
import matplotlib.pyplot as plt
xdata = np.array([0, 1, 2, 3])
ydata = np.array([1.07, 3.9, 14.8, 26.8])
b = 2
c = 0.75
A = # give a numpy array of values for a
SumSqRes = [] # this is storage for the sum of the sq. residuals
for a in A:
    guess = a*xdata**2 + b*xdata + c
    residuals = # write code to calculate the residuals
    SumSqRes.append( ??? ) # calculate the sum of the squ. residuals
plt.plot(A,SumSqRes,'r*')
plt.grid()
plt.xlabel('Value of a')
plt.ylabel('Sum of squared residuals')
plt.show()
```

Now let's formalize the process that we've described in the previous problems.
Definition 3.10. (Least Squares Regression) Let

$$
S=\left\{\left(x_{0}, y_{0}\right),\left(x_{1}, y_{1}\right), \ldots,\left(x_{n}, y_{n}\right)\right\}
$$

be a set of $n+1$ ordered pairs in $\mathbb{R}^{2}$. If we guess that a function $f(x)$ is a best choice to fit the data and if $f(x)$ depends on parameters $a_{0}, a_{n}, \ldots, a_{n}$ then
a. Pick initial values for the parameters $a_{0}, a_{1}, \ldots, a_{n}$ so that the function $f(x)$ looks like it is close to the data (this is strictly a visual step ... take care that it may take some playing around to guess the initial values of the parameters)
b. Calculate the square error between the data point and the prediction from the function $f(x)$

$$
\text { error for the point } x_{i}: e_{i}=\left(y_{i}-f\left(x_{i}\right)\right)^{2}
$$

Note that squaring the error has the advantages of removing the sign, accentuating errors larger than 1, and decreasing errors that are less than 1.
c. As a measure of the total error between the function and the data, sum the squared errors

$$
\text { sum of square errors }=\sum_{i=1}^{n}\left(y_{i}-f\left(x_{i}\right)\right)^{2}
$$

(Take note that if there were a continuum of points instead of a discrete set then we would integrate the square errors instead of taking a sum.)
d. Change the parameters $a_{0}, a_{1}, \ldots$ so as to minimize the sum of the square errors.

Exercise 3.84. In 3.10 the last step is a bit vague. That was purposeful since there are many techniques that could be used to minimize the sum of the square errors. However, if we just think about the sum of the squared residuals as a function then we can apply scipy.optimize.minimize() to that function in order to return the values of the parameters that best minimize the sum of the squared residuals. The following blocks of Python code implement the idea in a very streamlined way. Go through the code and comment each line to describe exactly what it does.

```
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
xdata = np.array([0, 1, 2, 3])
ydata = np.array([1.07, 3.9, 14.8, 26.8])
def SSRes(parameters):
    # In the next line of code we want to build our
    # quadratic approximation y = ax^2 + bx + c
    # We are sending in a list of parameters so
    # a = parameters[0], b = parameters [1], and c = parameters[2]
    yapprox = parameters[0]*xdata**2 + \
                        parameters[1]*xdata + \
                            parameters[2]
    residuals = np.abs(ydata-yapprox)
    return np.sum(residuals**2)
```

```
BestParameters = minimize(SSRes, [2, 2,0.75])
print("The best values of a, b, and c are: \n",BestParameters.x)
# If you want to print the diagnositc then use the line below:
# print("The minimization diagnostics are: \n",BestParameters)
plt.plot(xdata,ydata,'bo',markersize=5)
x = np.linspace(0,4,100)
y = BestParameters.x[0]*x**2 + \
    BestParameters.x[1]*x + \
    BestParameters.x[2]
plt.plot(x,y,'r--')
plt.grid()
plt.xlabel('x')
plt.ylabel('y')
plt.title('Best Fit Quadratic')
plt.show()
```

![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-49.jpg?height=630&width=903&top_left_y=1165&top_left_x=516)

Figure 3.12: Best fit quadratic function.

Exercise 3.85. With a partner choose a function and then choose 10 points on that function. Add a small bit of error into the $y$-values of your points. Give your 10 points to another group. Upon receiving your new points:

- Plot your points.
- Make a guess about the basic form of the function that might best fit the data. Your general form will likely have several parameters (just like the quadratic had the parameters $a, b$, and $c$ ).
- Modify the code from above to find the best collection of parameters minimize the sum of the squares of the residuals between your function and the data.
- Plot the data along with your best fit function. If you are not satisfied with how it fit then make another guess on the type of function and repeat the process.
- Finally, go back to the group who gave you your points and check your work.

Exercise 3.86. For each dataset associated with this exercise give a functional form that might be a good model for the data. Be sure to choose the most general form of your guess. For example, if you choose "quadratic" then your functional guess is $f(x)=a x^{2}+b x+c$, if you choose "exponential" then your functional guess should be something like $f(x)=a e^{b(x-c)}+d$, or if you choose "sinusoidal" then your guess should be something like $f(x)=a \sin (b x)+c \cos (d x)+e$. Once you have a guess of the function type create a plot showing your data along with your guess for a reasonable set of parameters. Then write a function that leverages scipy.optimize.minimize() to find the best set of parameters so that your function best fits the data. Note that if scipy.optimize.minimize() does not converge then try the alternative scipy function scipy.optimize.fmin(). Also note that you likely need to be very close to the optimal parameters to get the optimizer to work properly.

You can load the data with the following script.

```
import numpy as np
import pandas as pd
URL1 = 'https://raw.githubusercontent.com/NumericalMethodsSullivan'
URL2 = '/NumericalMethodsSullivan.github.io/master/data/'
URL = URL1+URL2
datasetA = np.array( pd.read_csv(URL+'Exercise3_datafit2.csv') )
datasetB = np.array( pd.read_csv(URL+'Exercise3_datafit3.csv') )
datasetC = np.array( pd.read_csv(URL+'Exercise3_datafit4.csv') )
# Exercise3_datafit1.csv,
# Exercise3_datafit2.csv,
# Exercise3_datafit3.csv
```

As a nudge in the right direction, in the left-hand pane of Figure 3.13 the function appears to be exponential. Hence we should choose a function of the form $f(x)=a e^{b(x-c)}+d$. Moreover, we need to pick good approximations of the parameters to start the optimization process. In the left-hand pane of Figure 3.13 the data appears to start near $x=1970$ so our initial guess for $c$ might be $c \approx 1970$. To get initial guesses for $a, b$, and $d$ we can observe that the expected best fit curve will approximately go through the points $(1970,15000)$, $(1990,40000)$, and $(2000,75000)$. With this information we get the equations
$a+d \approx 15000, a e^{20 b}+d \approx 40000$ and $a e^{30 b}+d \approx 75000$ and work to get reasonable approximations for $a, b$, and $d$ to feed into the scipy.optimize.minimize() command.
![](https://cdn.mathpix.com/cropped/2025_02_27_429587f441ab5f434461g-51.jpg?height=630&width=1161&top_left_y=577&top_left_x=387)

Figure 3.13: Raw data for least squares function matching problems.

