# Vector calculus

## Single variable calculus

At its core, calculus is nothing more than **the study of relationships and change**.

### **Derivates**

To start with, let's imagine a straight line with the following equation:

$$
y = \pm mx (\pm b)
$$

* $y$ is a **function of $x$**, often written simply as $f(x)$. In the preceding equation, the output value $y$ is dependent on the input value $x$.
* **The $m$ value is the gradient**, which **tells us how steep the straight line is**, or **what its rate of change is** (that is, how much does a change in the $x$ value affect the $y$ value).
* The $\pm$ value tells us whether the line is moving upward or downward
* The $\pm b$ value tells us by how much the line is above or below the origin. 
* The $m$ and $b$ values in a straight line are constant throughout. 

You're probably wondering how to find it for an arbitrary straight line.

We start by first picking two points, $(x_1, y_1)$ and $(x_2, y_2)$, that lay on the line, and plug their values into the formula $m = \frac{y_2 - y_1}{x_2 - x_1}$. After having found the value for $m$, we find the value of $b$ by using the line equation and plugging into it the value for $m$ and one $(x, y)$ point on the line, and solve for $b$.

Well, that was very simple and straightforward. However, there are far more complex equations out there that aren't as straightforward—those that relate to curves (nonlinear functions), as illustrated in the following image:

![](curve.png)

Imagine a picture of a couple of hills or camel humps. If you trace the surface of them, you will have a curve, and as you may have no doubt noticed, they go up and then down and then back up, and the process repeats itself. 

From the preceding image of the curve, you can easily tell that **the gradient is not constant**, as it was in the previous example with the straight line. We could sketch straight lines along the curve and calculate their slopes to understand how the curve moves. However, there is a simpler method than this tedious one.

At the very core of calculus are two concepts, as follows:

* **Differentiation** helps us understand how much a function output changes with respect to changing input. 
* **Integration** helps us understand the impact of this change in inputs between certain points. 

We will begin initially by taking an in-depth look at differentiation. The primary equation for finding the derivative of a function is shown here:

$$
\frac{df}{dx} = f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}
$$

What this equation is doing is **finding the derivative of the function $f$ with respect to the variable in the denominator $x$**. This isn't too different from the earlier equation we saw (which we used to calculate the gradient of a straight line). We subtract two values, $f(x+h)$ and $f(x)$, and divide it by its difference, $h$. But what does $\lim_{h \to 0}$ have to do with this? **This tells us that we want the two points on the curve to be as close to each other as possible so that when we are sketching the gradient on the curve**, it looks like a straight line at one point on the curve. This allows us to better visualize and understand the effect of the change, as can be seen in the following screenshot:

![](derivate.png)

See the following example:

$$
\begin{align*}
f(x) & = 4x - 7\\
f'(x) & = \lim_{h \to 0} \frac{[4(x + h) - 7] - [4(x) - 7]}{h}\\
f'(x) & = \lim_{h \to 0} \frac{4x + 4h - 7 - 4x + 7}{h}\\
f'(x) & = \lim_{h \to 0} \frac{4h}{h}\\
f'(x) & = 4
\end{align*}
$$

### **Sum rule**

The sum rule states that the derivate of the sum of $2$ functions is the same as the sum of the individual derivates of the $2$ functions, as can be seen in the following equation:

$$
\frac{d}{dx}(f(x) + g(x)) = \frac{df(x)}{dx} + \frac{dg(x)}{dx}
$$

Let's suppose we have $f(x) = 3x^4 + 12x^2 + 8$ and $g(x) = 2x^2 - 11x + 2$.

From this, we can see that the following equation, $\frac{df(x)}{dx} + \frac{dg(x)}{dx} = (12x^3 + 24x) + (4x - 11) = 12x^3 + 28x - 11$ is the same as this one $\frac{d}{dx}(f(x) + g(x)) = \frac{d}{dx}(3x^4 + 14x^2 - 11x + 10) = 12x^3 + 28x - 11$

### **Power rule**

The power rule helps to find the derivative of a **function where the variable has an exponent**. Simply put, **you multiply the power by the constant in front of the variable, and reduce the power by $1$.**

$$
\begin{align*}
f(x) & = 4x^2\\
f'(x) & = \lim_{h \to 0} \frac{4(x + h)^2 - 4(x)^2}{h}\\
f'(x) & = \lim_{h \to 0} \frac{4(x^2 + 2xh + h^2) - 4(x)^2}{h}\\
f'(x) & = \lim_{h \to 0} \frac{4x^2 + 8xh + 4h^2 - 4x^2}{h}\\
f'(x) & = \lim_{h \to 0} \frac{8xh + 4h^2}{h}\\
f'(x) & = \lim_{h \to 0} 8x + 4h\\
f'(x) & = 8x\\
\end{align*}
$$

> Note that not every function will have a derivative, at least not in the function's domain.

There are certain functions-such as $f(x) = \frac{1}{x}$ or $f(x) = e^x$ that are not as straigtforward as the ones we saw earlier. The function $f(x) = \frac{1}{x}$ is not differentiable at $x = 0$ because its value is undefined. This is known as **discontinuity**.

The same applies to $f(x) = e^x$; however, $e$ (known as **Euler's number**) has a very interesting property whereby the function is equal to its derivate, that is, $f(x) = f'(x)$.

### **Trigonometric functions**

![](trigonometric.png)

Here, sine is $f(x) = sin(x)$ and cosine is $f(x) = cos(x)$.

Ths sine and cosine functions are related, and the derivate will show us how.

If $y = sin(x)$, then $\frac{dy}{dx} = cos(x)$. However, if $y = cos(x)$, then $\frac{dy}{dx} = -sin(x)$. The derivates create a loop, which we can see as follows:

$$
sin(x) \to^{\frac{d}{dx}} cos(x) \to^{\frac{d}{dx}} -sin(x) \to^{\frac{d}{dx}} -cos(x)  \to^{\frac{d}{dx}} sin(x)
$$

### **First and second derivatives**

Now that we know how to find the derivative of a function, it is important to know that we can take the derivative more than once. 

The first derivative, as we know, **gives us the gradient (slope of the tangent line) of a function at any given point $(x)$ on the curve—in other words, whether the curve's altitude (that is, $y$ or $f(x)$ is increasing or decreasing**. A **positive slope tells us $f(x)$ is increasing as $x$ increases** and **a negative slope tells us $f(x)$ is decreasing as $x$ increases**, and a **slope of $0$ tells us nothing about the curve's direction**, other than that **it is likely at a turning point (local minimum or local maximum)**. This can be written as follows: 

* If $\frac{d}{dx}f(t) > 0$, then $f(x)$ is increasing at $x = t$.
* If $\frac{d}{dx}f(t) < 0$, then $f(x)$ is decreasing at $x = t$.
* If $\frac{d}{dx}f(t) = 0$, then $x=t$ is a critical point of $f(x)$.

For example, let $f(x) = 4x^3 - 12x + 9x - 4$. The derivate of this function is shown here

$$
\frac{dy}{dx} = 12x^2 - 24x + 9
$$

At $x = 0$, the derivative is $9$, which tells us the function is increasing at this point. But at $x=1$ the derivative is $-3$ telling that the function is decreasing at this point.

The second derivative is the derivative of the derivative of the function. We write this as $f''(x)$ or $\frac{d^2y}{dx^2}$. As before, where the $1^{st}$ derivative told us whether the function was increasing or decreasing, the second derivative gives us the same information about the $1^{st}$ derivative-Whehter it is increasing or decreasing.

If the second derivative is positive, then as $x$ increases, the first derivative is increasing; and if the second derivative is negative, then as $x$ increases, the first derivative is decreasing. 

To help us visualize this, **when the second derivative is positive, the curve is concave up (parabola open upward) at a point**, whereas **when it is negative, the curve is concave down (parabola open downward)**. And as before, **when the second derivative is equal to $0$, we learn nothing new**. This point **could be a local maximum, a local minimum, or an inflection point**. This is written as follows:

* If $\frac{d^2}{dx^2}f(t) > 0$, then $f(x)$ is concave up at $x=t$
* If $\frac{d^2}{dx^2}f(t) < 0$, then $f(x)$ is concave up at $x=t$
* If $\frac{d^2}{dx^2}f(t) = 0$, then $x=t$ we obtain no new information about $f(x)$

For example, let's take the second derivative of the same function we used, as follows

$$
\frac{d^2y}{dx^2} = 24x - 24
$$

At $x = 0$, the second derivative is $-24$, which tells us the **function is concave down here**. But at $x=2$, it is equal to $24$, telling us the function is **concave up** .

Earlier, we learned that when $x$ is a critical point of a function we learn nothing new about the function at that point, but we can use it to find out whether it is a local maximum or a local minimum. These rules can be written as follows:

* If $\frac{d}{dx}f(t) = 0$ and $\frac{d^2}{dx^2}f(t) > 0$, then $f(x)$ has a **local minimum** at $x=t$.
* If $\frac{d}{dx}f(t) = 0$ and $\frac{d^2}{dx^2}f(t) < 0$, then $f(x)$ has a **local maximum** at $x=t$.
* If $\frac{d}{dx}f(t) = 0$ and $\frac{d^2}{dx^2}f(t) = 0$, then at $x=t$ we learn nothing new about $f(x)$.

### **Product rule**

The product rule gives us a straightforward method to find the derivative of the product of two functions. Let's take two arbitrary functions, $f(x)$ and $g(x)$, and multiply them. So, $y = f(x)g(x)$. The derivative is $\frac{dy}{dx} = f(x)g'(x) + f'(x)g(x)$.

Let's explore this in more detail to understand how this works. Have a look at the following equation:

$$
\lim_{dx \to 0}\frac{dy}{dx} = \lim_{dx \to 0} \frac{f(x)(g(x+dx) - g(x)) + (f(x+dx) - f(x))g(x)}{dx}
$$

We can rewrite the derivative, as follows:

$$
\lim_{dx \to 0}\frac{dy}{dx} = \lim_{dx \to 0} f(x)\frac{(g(x+dx) - g(x))}{dx} + g(x) \frac{(f(x+dx) - f(x))}{dx}
$$

This can be further simplified as $\lim_{x \to 0}f(x)g'(x) + f'(x)g(x)$


### **Quotient rule**

The quotient rule allows us to find the derivative of a function that is being divided by another function. This can be derived from the product rule. As before, we take two functions $f(x)$ and $g(x)$, but now, we will divide them. So, $y=\frac{f(x)}{g(x)}$. The derivative is $\frac{dy}{dx} = \frac{f(x)g'(x)- f'(x)g(x)}{g(x)^2}$

Suppose we have $f(x) = 4x^2 + 7$ and $g(x) = 2x^3 - 11$. Then, we have the following:

$$
y = \frac{f(x)}{g(x)} = \frac{4x^2 + 7}{2x^3 - 11}
$$

By finding the derivatives of $f(x)$ and $g(x)$ and plugging them into the preceding equation, we get the following:

$$
\frac{dy}{dx} = \frac{(4x^2 + 7)(6x^2) - (2x^3 - 11)(8x)}{(2x^3 - 11)^2}
$$

If we expand it, we find the derivative.

### **Chain rule**

The chain rule applies to functions that take in another function as input. Let's consider $F(x) = f(g(x))$, which is often written as $(f \circ g)(x)$ and read $f$ of $g$ of $x$. This means that the output of $g(x)$ will become the input to the function $f$.

The derivative of this will be written as follows:

$$
\frac{dF}{dx} = \frac{df}{dg} \times \frac{dg}{dx}
$$

This is the same as $F'(x) = f'(g(x))g'(x)$.

For example, suppose we have $f(g) = 7g^2 - 9$ and $g(x) = e^{x^2}$. We can differentiate the $2$ functions and get $f'(g) = 14g$ and $g'(x) = 2xe^{x^2}$

### **Antiderivate**

We now know what derivatives are and how to find them, but now, suppose **we know the rate of change** $(F)$ of the population $(f)$, and **we want to find what the population will be at some point in time**. What we have to do is **find a function $F$ whose derivative is $f$**. This is known as the **antiderivative**, and we define it formally as a function $F$ is called **an antiderivative of $f$ on** $[a, b]$ if $F'(x) = f(x)$ for all $a \leq x \leq b$.

Suppose we have a function $f(x) = x^n$, then $F(x) = \frac{1}{n+1}n^{n+1} + c$ (where $c$ is some constant), from which we can confirm that $F'(x) = x^n = f(x)$

| **Function**             | **Antiderivate**       |
| ------------------------ | ---------------------- |
| $cf(x)$                  | $cF(x)$                |
| $f(x) + g(x)$            | $F(x) + G(x)$          |
| $x^n$                    | $\frac{x^{n+1}}{n+1}$  |
| $\frac{1}{x}$            | $\mathbf{ln} \| x \| $ |
| $e^x$                    | $e^x$                  |
| $\mathbf{cos}x$                   | $\mathbf{sin}x$        |
| $\mathbf{sin}x$          | $-\mathbf{cos}x$       |
| $\frac{1}{\sqrt{1-x^2}}$ | $\mathbf{sin}^{-1}{x}$   |
| $\frac{1}{1 + x^2}$      | $\mathbf{tan}^{-1}x$   |


Let's suppose we have the following function:

$$
f'(x) = 4\mathbf{sin}x + \frac{2x^5 - \sqrt{x}}{x}
$$

We want to find its antiderivative. I know this probably looks like a difficult equation, but by using the preceding table, we can make this very easy for ourselves. Let's see how.

First, we rewrite the function so it becomes the following:

$$
f'(x) = 4\mathbf{sin}x + 2x^4 - x^{-\frac{1}{2}}
$$

And so, the antiderivate is as follows:
$$
f(x) = 4(-\mathbf{cos}x) + 2\frac{x^5}{5} - \frac{x^{\frac{1}{2}}}{\frac{1}{2}} + c
$$

To make things easier, we rewrite this, as follows:

$$
f(x) = -4\mathbf{cos}x + \frac{2}{5}x^5 - 2\sqrt{x} + 5
$$

And there you have it.

You may now be wondering whether or not we can find what the value of $c$ is, and if so, how. Let's go through another example, and see how.

Suppose we have a function that is the second derivative, and we want to find the antiderivative of the antiderivative—that is, the original function. We have the following:

$f''(x) = 12x^2 + 6x - 4$ and $f(0) = 4$ and $f(1) = 1$

Then, the first antiderivative is as follows:

$$
f'(x) = 4x^3 + 3x^3 - 4x + c
$$

And so, the second antiderivative is as follows:

$$
f(x) = x^4 + x^3 - 2x^2 + cx + d
$$

Here, we want to find the values of $c$ and $d$. We can do this simply by plugging in the preceding values and solving for the unknowns, as follows:

$f(0) = (0)^4 + (0)^3 - 2(0)^2 + c(0) + d = 4$ therefore, $d=4$ we can also do this:

$
f(1) = (1)^4 + (1)^3 - 2(1)^2 + c(1) + 4 = 1
$;
therefore $c = -3$ Thus, our function looks like this:

$$f(x) = x^4 + x^3 - 2x^2 - 3x + 4$$

### **Integrals**

We have studied derivatives, which is **a method for extracting information about the rate of change of a function**. But as you may have realized, integration is the reverse of the earlier problems.

In integration, **we find the area underneath a curve**. For example, **if we have a car and our function gives us its velocity, the area under the curve will give us the distance it has traveled between two points**.

Let's suppose we have the curve $y = f(x)$, and the area under the curse between $x=a$ (the lower limit) and $x=b$ (the upper limit, also written $[a, b]$) is $S$. Then we have the following:

$$
S = \int^{b}_{a}f(x)dx
$$
The diagramatical representation of the curve is as follows: 

![](integration.png)

This can be written as follows:

$$
\lim_{n \to \infty} \sum^{n}_{i=1}f(x^*_i)\triangle x = \lim_{n \to \infty} [f(x^*_1)\triangle x + f(x^*_2)\triangle x +\dots +  f(x^*_n)\triangle x] 
$$

In the preceding function, the following applies: $\triangle x = \frac{b-a}{n}$, and $x^*_i$ is in the subinterval $[x_{i-1}, x_i]$.

The function looks like this:

![](auc.png)

The integral gives us **an approximation of the area under the curve such that for some, $\epsilon > 0$ ($\epsilon$ is assumed to be a small value), the following formula applies**:

$$
\left| \int^{b}_{a}f(x)dx - \sum^{n}_{i=1}f(x^*_i)\triangle x \right| < \epsilon
$$

Now let's suppose our function lies both above and below the $x$ axis, this taking on positive and negative values, like so:

![](integration_2.png)

As we can see from the preceding screenshot, **the portions above the $x$ axis $(A_1)$ have a positive area, and the portions below the x axis $(A_2)$ have a negative area**. Therefore, the following formula applies:

$$
S = \int^{b}_{a}f(x)dx = A_1 - A_2
$$

Working with sums is an important part of evaluating integrals, and understanding this requires some new rules for sums. Look at the following examples:

* $\sum^{n}_{i=1}i = \frac{n(n+1)}{2}$
* $\sum^{n}_{i=1}i^2 = \frac{n(n+1)(2n + 1)}{6}$
* $\sum^{n}_{i=1}i^3 = [\frac{n(n+1)}{2}]^2$
* $\sum^{n}_{i=1}c = nc$
* $\sum^{n}_{i=1}ca_i = c\sum^{n}_{i=1}a_i$
* $\sum^{n}_{i=1}(a_i + b_i) = \sum^{n}_{i=1}a_i + \sum^{n}_{i=1}b_i$
* $\sum^{n}_{i=1}(a_i - b_i) = \sum^{n}_{i=1}a_i - \sum^{n}_{i=1}b_i$

Now, let's explore some of the important properties of integrals, which will help us as we go deeper into the chapter. Look at the following examples:

* $\int^{b}_{a} f(x)dx = -\int^{a}_{b}f(x)dx$
* $\int^{b}_{a} f(x)dx = 0$, when $a=b$
* $\int^{b}_{a} cdx = c(b-a)$, where $c$ is a constant
* $\int^{b}_{a} [f(x) + g(x)]dx = \int^{b}_{a}f(x)dx + \int^{b}_{a}g(x)dx$
* $\int^{b}_{a} [f(x) - g(x)]dx = \int^{b}_{a}f(x)dx - \int^{b}_{a}g(x)dx$
* $\int^{b}_{a} cf(x)dx = c\int^{b}_{a}f(x)dx$

Now, suppose we have the function $y = f(x)$, which looks like this:

![](function_y=f(x).png)

Then, we get the following property

$$
\int^{c}_{a}f(x)dx + \int^{b}_{c}f(x)dx = \int^{b}_{a}f(x)dx
$$

**This property only works for functions that are continuous and have adjacent intervals.**

### **The fundamental theorem of calculus**

**The fundamental theorem of calculus is the most important theorem in calculus** and is named very appropriately since **it establishes a relationship between differential calculus and integral calculus**. Let's see how

Suppose that $f(x)$ is ***continuous on***  $[a, b]$ and ***differentiable at*** $(a, b)$, and that $F(x)$ ***is the antiderivative of*** $f(x)$. Then, we have the following:

$$
\int^{b}_{a}f(x)dx = F(b) - F(a)
$$

Let's rewrite the preceding equation a bit so it becomes this equation:

$$
\int^{x}_{a}f(t)dt = F(x) - F(a)
$$

All we have done here is **replace $x$ with $t$ and $b$ with $x$**. And we know that $F(x) - F(a)$ **is also a function**. From this, we can derive the following property:

$$
\frac{d}{dx}(F(x) - F(a)) = F'(x) = f(x)
$$

We can derive the preceding property since $F(a)$ **is a constant and thus has the derivative** $0$.

By shifting our point of view a bit, we get the following function:

$$
G(x) = \int^{x}_{a}f(t)dt
$$

Therefore, we get $G'(x) = f(x)$

In summary, if **we integrate our function** $f$ and **then differentiate it**, we **end up with the original function** $f$.

### **Substitution rule**

Obviously, **being able to find the antiderivative of a function is important**, but **the anti-differentiation formulas do not tell us how to evaluate every type of integral**—for example, what to do when we have functions such as the following one:

$$
f(x) = \int 2x\sqrt{x^2 + 1} dx
$$

This **isn't as straightforward as the examples we saw earlier**. In this case, **we need to introduce a new variable to help us out and make the problem more manageable**. 

Let's make our new variable $u$, and $u = x^2 + 1$, and the differential of $u$ is then $du = 2x dx$. This changes the problem into the following:

$$
f(u) = \int \sqrt{u}du
$$

This is clearly a lot simpler. The antiderivative of this becomes the following:

$$
\frac{2}{3}u^{\frac{3}{2}} + c
$$

And by plugging in the original value $u = x^2 + 1$, we get the following:

$$
\frac{2}{3}(x^2 + 1)^{\frac{3}{2}} + c
$$

And there we have it.

This method is very useful, and works when we have problems that can be written in the following form:


$$\int f(g(x))g'(x) dx$$

If $F' = f$, then the following applies:

$$
\int F'(g(x))g'(x) = F(f(x)) + c
$$

That equation might be looking somewhat similar to you. And it should. It is the chain rule from differentiation. 

### **Areas between curves**

We know that **integration gives us the ability to find the area underneath a curve between two points**. But now, **suppose we want to find the area that lies between two graphs**, as in the following screenshot:

![](area_btw_2_graphs.png)

Our region $S$, as we can see, lies between the curves $f(x)$ and $g(x)$ in between the two vertical lines $x = a$ and $x = b$. Therefore, we can take an approximation of the area between the curves to be the following:

$$
A = \lim_{n \to \infty} \sum^{n}_{i=1}(f(x^*_i) - g(x^*_i))\triangle x
$$

We can rewrite this as an integral, in the following form:

$$
A = \int^{b}_{a} (f(x) - g(x))dx
$$

To visualize this better and create an intuition of what is happening, we have the following image:

![](area_btw_2_graphs_visualization.png)

### **Integration by parts**

By now, we know that **for every rule in differentiation, there is a corresponding rule in integration since they have an inverse relationship**.

In the earlier section on differentiation, we **encountered the product rule**. In integration, **the corresponding rule is known as integration by parts**.

As a recap, the **product rule states that if $f$ and $g$ are differentiable, then the following applies**:

$$
\frac{d}{dx}(f(x)g(x)) = f(x)g'(x) + g(x)f'(x)
$$

And so, in integration, this becomes the following:

$$
\int(f(x)g'(x) + g(x)f'(x))dx = f(x)g(x)
$$

We can rewrite this, as follows:

$$
\int(f(x)g'(x)dx = f(x)g(x) - \int g(x)f'(x)dx
$$

We can **combine this formula with the fundamental theorem of calculus** and obtain the following equation:

$$
\int^{b}_{a}f(x)g'(x)dx = f(x)g(x)\vert^b_a - \int^{b}_{a}g(x)f'(x)dx
$$

We can **use this to evaluate the integral between the interval $[a, b]$**.

> Note: The term $f(x)g(x)\vert^b_a$ merely states that **we plug in the value $b$ in place of $x$ and evaluate it**, and **then subtract it from the evaluation at $a$**.

We can also use the preceding substitution method for integration by parts to make our lives easier when calculating the integral. We make $u=f(x)$ and $v=g(x)$; then, the differentials are $du = f'(x)dx$ and $dv = g'(x)dx$. And so, the formula becomes this:

$$
\int u \mathbf{d}v = uv - \int v \mathbf{d}u
$$

## Multivariable calculus

Multivariable calculus has a lot of similarities with single variable calculus, except—as the name suggests—here, we will be dealing with functions that accept two or more variables as input. 

### **Partial derivatives**

A **partial derivative** is a **method we use to find the derivative of a function that depends on more than one variable, with respect to one of its variables, while keeping the others constant**. This allows us to understand how a function is affected by a single variable instead of by all of them. Suppose we are **modeling the price of a stock item**, and **the price depends on a number of different factors**. We can **vary one variable at a time to determine how much this change will affect the price of the stock item**. This is different from taking a total derivative, where all the variables vary.

A multivariate function can have as many variables as you would look like, but to keep things simple, we will look at a function with two variables, as follows:

$$
z = f(x, y) = 5y^3 + 7x^2y - 3y + 11
$$

This function looks a lot more complicated than the ones we have previously dealt with. Let's break it down. When we take the partial derivative of a function with respect to $x$, we find **the rate of change of $z$ as $x$ varies, while keeping $y$ constant**. The same applies when we differentiate with respect to any other variable.

Let's visually imagine the $\mathbf{xy}$-plane (a flat surface) as being the set of acceptable points that can be used as input to our function. The output, $z$, can be thought of as how much we are elevated (or the height) from the $\mathbf{xy}$-planee.

Let's start by first differentiating the function with respect to $x$, as follows:

$$
\frac{\partial z}{\partial x} = \lim_{\triangle x \to 0} \frac{z(x + \triangle x, y) - z(x, y)}{h}
$$

This gives us the following:

$$
\frac{\partial z}{\partial x} = 14xy
$$

Now, we will differentiate with respect to y, as follows:

$$
\frac{\partial z}{\partial y} = \lim_{\triangle x \to 0} \frac{z(x, y + \triangle y) - z(x, y)}{h}
$$


This gives us the following:

$$
\frac{\partial z}{\partial y} = 15y^2 + 7x^2 - 3
$$

As we saw earlier, **in single variable differentiation, we can take second derivatives of functions (within reason, of course)**, but **in multivariable calculus, we can also take mixed partial derivatives**, as illustrated here:

$$
\begin{align*}

\frac{\partial^2 z}{\partial x^2} & = 14y\\
\frac{\partial^2 z}{\partial x \partial y} & = \frac{\partial}{\partial x} \left(\frac{\partial z}{\partial y}\right) = 14x\\
\frac{\partial^2 z}{\partial y^2} & = 30y\\
\end{align*}
$$

You may have noticed that **when we take a mixed partial derivative, the order of the variables does not matter**, and **we get the same result whether we first differentiate with respect to $x$ and then with respect to $y$, or vice versa**. 

We can also write this in another form that is often more convenient, and this is what we will be using in this book, going forward. The function is illustrated here:

$$
f_x = \frac{\partial f}{\partial x}, f_{x,y} = \frac{\partial^2 f}{\partial x \partial y}, f_y = \frac{\partial f}{\partial y}
$$

### **Chain rule**

Let's take an arbitrary function $f$ that takes variables $x$ and $y$ as input, and there is some change in either variable so that $(x, y) \to (x + \triangle x, y + \triangle y)$.

Using this, we can find the change in $f$ using the following:

$$
\triangle f = f(x + \triangle x, y + \triangle y) - f(x, y)
$$

This leads us to the following equation:

$$
\triangle f = \frac{\partial f}{\partial x} \triangle x + \frac{f}{y}\triangle y
$$

Then, by **taking the limit of the function** as $\triangle x, \triangle y \to 0$, we can derive the chain rule for partial derivatives.

We express this as follows

$$
df = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy
$$

We now **divide this equation by an additional small quantity $(t)$ on which $x$ and $y$ are dependent**, to find the gradient along $(x(t), y(t))$. The preceding equation then becomes this one:

$$
\frac{df}{dt} = \frac{\partial f}{\partial x} \frac{dx}{dt} + \frac{\partial f}{\partial t} \frac{dy}{dt}
$$

The differentiation rules that we came across earlier still apply here and can be extended to the multivariable case.

### **Integrals**

As in the single variable case, **we have antiderivatives and integrals for functions that depend on multiple variables as well**. Earlier, we learned that an integral gives us the area under a curve $y = f(x)$ between an interval $[a, b]$. Now, **instead of finding the area over an interval, we will be finding the volume under the graph $z = f(x, y)$ over a region**. The equation looks like this:

$$
\iint_R f(x, y) dA = \mathbf{volume}
$$

In the preceding equation $R$ is a region in the $xy$-plane. Think of $R$ as being cut into multiple small rectangulars regions, denoted $\triangle A$. Then, we can approximate the volume as follows:

$$
\iint_R f(x, y) dA = \sum_{i}\lim_{\triangle A \to 0} f(x_i, y_i) \triangle A_i
$$

Additionally, $dA = dxdy$; thus, $\triangle A = \triangle x \triangle y$

> Note: A **double integral is not the same as taking an integral twice**.

Now, instead of calculating over small rectangular regions, let's divide the region into long, thin slices of a fixed width $\triangle x$. Sound familiar? It should, as this is very similar to what we did earlier in single variable integration.

Let's assign $\triangle x = S$ , and now, our integral takes the following form:

$$
\iint_ f(x, y)dA = \int^{d}_{c}f(S, y)dy
$$

We then multiply the result by $\triangle x$.

We can now rewrite the integral, as follows:

$$
\iint_ f(x, y)dA = \int^{b}_{a}\left(\int^{d}_{c} f(x, y)dy\right)dx = \int^{d}_{c}\left(\int^{b}_{a}f(x, y)dx\right)dy
$$

Here, $a \leq x \leq b$ and $c \leq y \leq d$

Suppose that we have the function $z = 1 - x^2 - y^2$ and the boundaries of the region are defined over $0 \leq x \leq 1$ and $0 \leq y \leq 1$. Then, the integral is as follows:

$$
Z = \int^{1}_{0}\int^{1}_{0}(1-x^2 - y^2)dx dy
$$

And by evaluating the inner integral, we get the following:

$$
Z = \int^{1}_{0} \left[ (1 - y^2)x - \frac{1}{3}x^3 \right] ^1_0 dy = \frac{2}{3} - y^2
$$

And by evaluating the outer integral, we get the following:

$$
Z = \int^{1}_{0} \left(\frac{2}{3} - y^2 \right) dy = \left[\frac{2}{3}y - \frac{1}{3}y^3\right]^1_0 = \frac{1}{3}
$$

And there you have it. That is **how we find integrals of multivariable functions**.

Let's now suppose that we have a function $f(x, y) = g(x) \dot h(y)$, and we evaluate the integral over the region where $a \leq x \leq b$ and $c \leq y \leq d$. Then we hav the following:

$$
\iint_R = f(x, y)dA = \left( \int^{b}_{a}g(x)dx \right) \times \left(\int^d_c h(y)dy \right)
$$

**This is a direct result of the distributive law.**

The region we have been integrating over so far has been rectangular, but this most likely will not always be the case. If the region is an irregular shape, then the limits of integration will vary at each slice.

The best way to deal with this is to write it as a function of the variable we are not integrating.

$f(x, y) = x^2 + y^2$, and the set of points it exists on it ${(x, y): x^2 + y^2 \leq 1}$, which tells us $-1 \leq x \leq 1$ and $-1 \leq y \leq 1$. We can now write this in the following form

$${(x, y): a \leq x \leq b, g(x) \leq y \leq h(x)}$$

Here, as we can see, $x$ is defined on the interval $[a, b]$, and y exists between two functions of $x$—$g(x)$ and $h(x)$.

We know from **trigonometry**, **particularly the Pythagorean theorem**, that the smallest value for y will be $\sqrt{1-x^2}$ and the largest value will be $+\sqrt{1-x^2}$.

We can now proceed to rewrite the preceding set of points, as follows:

$${(x, y): -1 \leq x \leq 1, - \sqrt{1-x^2} \leq y \leq \sqrt{1-x^2}}$$

Changing it up and writing it this way slices the unit disk into vertical lines spaced apart by a fixed width. 

Then, our integral becomes this one:

$$
\int^b_a \int^{h(x)}_{g(x)} f(x, y)dydx
$$

And because $f(x, y) = x^2 + y^2 = 1$, we can rewrite the preceding integral like so:

$$
\int^{1}_{-1} \left [\int^{\sqrt{1-x^2}}_{-\sqrt{1-x^2}} 1 dy\right]dx
$$

We then proceed by evaluating the inner integral and then the outer integral, like so:

$$
F \int^1_{-1} \left[y\right]^{\sqrt{1-x^2}}_{-\sqrt{1-x^2}}dx = \int^1_{-1}2\sqrt{1-x^2}dx = \pi
$$

We know this to be true from the area of a circle: $\pi r^2 = \pi (1)^2 = pi$

Some important properties for double integrals are shown in the following list:

* $\iint [f(x, y) + g(x, y)]dA = \iint_R f(x, y) dA + \iint_R g(x, y) dA$ 
* $\iint cf(x, y)dA = c\iint_R f(x, y) dA$ 
* $\iint f(x, y)dA = \iint_{R_1} f(x, y) dA + \iint_{R_2} f(x, y)$ if the $R$ can be split into two regions $R_1$ and $R_2$.
* $\iint_R f(x, y)dA \leq \iint_R g(x, y)dA$ when $f(x, y) \leq g(x, y)$ for all $(x, y) \in R$

Let's now suppose we have a cylinder with a spherical top, as in the following diagram, and we want to find its volume. The region under the sphere is $x^2 + y^2 + z^2 = 9$ and inside the cylinder $x^2 + y^2 = 5$ and above $z=0$

![](cylinder.png)

We know that we find the volume of a region as follows:

$$
V = \iint_R f(x, y)dA
$$

To evaluate this integral, we start by rewriting the equation of the sphere into $z = \sqrt{9 - (x^2 + y^2)}$ and the set of points where $x$ and $y$ are defined is ${(x, y): x^2 + y^2 \leq 5}$.

We rewrite our points and define the limits of the region in terms of polar coordinates $\theta$ and the radius $r$, so that the equation looks like this:

$0 \leq \theta \leq 2\pi$ and $0 \leq r \leq \sqrt{5}$

We can now rewrite $z$ as follows:

$$
z = \sqrt{9-(x^2 + y^2)} = \sqrt{9-r^2}
$$

So, the volume is as follows:

$$
V = \int^{2\pi}_0 \left( \int^{\sqrt{5}}_0 \sqrt{9-r^2} \right) dr d\theta
$$

And by evaluating the inner and outer integrals, we get the following:

$$
\begin{align*}
V & = \int^{2\pi}_0 \left[ -\frac{1}{3}(9-r^2)^\frac{3}{2} \right]^{\sqrt{5}}_0 d\theta \\
V & = \int^{2\pi}_0 \frac{19}{3}d\theta = \frac{38 \pi}{3}
\end{align*}
$$

We now know how to integrate our regions in $\R^2$ and find the volume under the graph. But what about when we have regions in $\R^3$ ? Earlier, we used a double integral for two-dimensional regions; so, naturally, for a three-dimensional region, we will use three integrals. We write this as follows:

$$
\iiint_R f(x, y, z)dV
$$

Suppose now that the region we integrate over is defined by $a \leq x \leq b, c \leq y \leq d$ and $r \leq z \leq s$. The triple integral the becomes the following

$$
\iiint_R f(x, y, z)dV = \int^s_r \int^d_c \int^b_a f(x, y, z)dxdydz
$$

Earlier on, we came across something called **substitution**, where **we made our function equal to a variable to make it easier for us to find the derivative**. We can also do the same in integration. 

Suppose we have the following integral:

$$
\int^b_a f(g(x))g'(x)dx
$$

We can make $u = g(x)$, and the integral then becomes this:

$$
\int^d_c f(u)du
$$

Now, let's move on to double integrals, and see **how we can transform regions to make them easier for us to deal with**. To do this, **we will need to call on our old friend the Jacobian matrix for help**. 

As a refresher, suppose we have $x f(u, v)$ and $y = g(u, v)$ Then the jacobian matrix is as follows:

$$
\frac{\partial(x, y)}{\partial(u, v)} = \begin{vmatrix} \frac{\partial x}{\partial u} && \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} && \frac{\partial y}{\partial v} \end{vmatrix}
$$

Also, recall that the Jacobian matrix can also be thought of as the determinant. So, we can rewrite the preceding equation as follows:

$$
\frac{\partial(x, y)}{\partial(u, v)} = \begin{vmatrix} \frac{\partial x}{\partial u} && \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} && \frac{\partial y}{\partial v} \end{vmatrix}
=
\frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial x}{\partial v}\frac{\partial y}{\partial u}
$$

Suppose now that we want to integrate $f(x, y)$ over $R$. Now, let's make $x = g(u, v)$ and $y = h(u, v)$, and rename our region as $S$. The integral now looks like this:
$$
\iint_R f(x, y)dA = \iint_S f(g(u, v), h(u, v)) \left| \frac{\partial(x, y)}{\partial(u, v)} \right| dudv
$$

From this we can easily observe that $dA = \left| \frac{\partial(x, y)}{\partial(u, v)} \right| dudv$


Let's move on to triple integrals now. Suppose we have a function $f(x, y, z)$ and we want to integrate it over $R$. We start by making $x = g(u, v, w), y = (u, v, w) and z = (u, v, w)$, and as before, we rename the new region as $S$. The jacobian matrix is the follwoing one:

$$
\frac{\partial(x, y, z)}{\partial(u, v, w)} = \begin{vmatrix} \frac{\partial x}{\partial u} && \frac{\partial x}{\partial v} && \frac{\partial x}{\partial w} \\
\frac{\partial y}{\partial u} && \frac{\partial y}{\partial v} && \frac{\partial y}{\partial w} \\
\frac{\partial z}{\partial u} && \frac{\partial z}{\partial v} && \frac{\partial z}{\partial w}
\end{vmatrix}
$$

The triple integral now looks like this:

$$
\iiint_R f(x, y, z)dV = \iiint_S f(g(u, v, w), h(u, v, w), k(u, v, w)) \left| \frac{\partial(x, y, z)}{\partial(u, v, w)} \right| du dv dw
$$

We now have a good enough understanding of multivariable calculus and are ready to dive into the wonderful world of vector calculus.

## Vector calculus

### **Derivates**

Earlier, we saw that functions are differentiated by using the limit variable in the quotient. But vectors, as we know, are not like scalars in that we cannot divide by vectors, which creates the need for new definitions for vector-valued functions.

We can define a vector function as a function $F: \R \to \R^n$—that is, it takes in a scalar value as input and outputs a vector. So, the derivative of F is defined as follows:

$$
F'= \frac{dF}{dx} = \lim_{\delta \to 0} \frac{1}{\delta x}[F(x + \delta x) - F(x)]
$$

In the preceding equation, $\delta x$ is a small pertubation on $x$. Additionally, $F$ is only differentiable if the following applies:

$$
\delta F = F(x + \delta x) - F(x) = F'(x)\delta x
$$

We can also write the preceding differential as follows:

$$
dF = F'(x)dx
$$

Generally, we differentiate vectors component-wise, so, the preceding differential becomes this:

$$
F'(x) = F'_i(x)e_i
$$

Here $e_i$ is an orthonormal basis vector.

Some rules for vector differentiation are shown in the following list:

* $\frac{d}{dt}(f \mathbf{g}) = \frac{df}{dt}\mathbf{g} + f\frac{d\mathbf{g}}{dt}$
* $\frac{d}{dt}(\mathbf{f} . \mathbf{g}) = \frac{df}{dt} . \mathbf{g} + \mathbf{f} . \frac{d\mathbf{g}}{dt}$
* $\frac{d}{dt}(\mathbf{f} \times \mathbf{g}) = \frac{df}{dt} \times \mathbf{g} + \mathbf{f} \times \frac{d\mathbf{g}}{dt}$
* $\frac{\partial}{\partial \mathbf{x}}(f(\mathbf{x}) g(\mathbf{x})) = \frac{\partial f}{\partial \mathbf{x}} g(\mathbf{x})+ f(\mathbf{x})\frac{\partial g}{\partial \mathbf{x}}$
* $\frac{\partial}{\partial \mathbf{x}}(f(\mathbf{x}) + g(\mathbf{x})) = \frac{\partial f}{\partial \mathbf{x}} + \frac{\partial g}{\partial \mathbf{x}}$
* $\frac{\partial}{\partial \mathbf{x}}(f \circ g)(\mathbf{x}) = \frac{\partial }{\partial \mathbf{x}} (f(g(x))) = \frac{\partial f}{\partial g}\frac{\partial g}{\partial \mathbf{x}}$

We know from earlier that we use the concept of limits to find the derivative of a function. So, let's see how we can find the limit of a vector. We use the concept of norms here. We say $v \to c$ if $|v-c| \to 0$, and so, if $f(r) = o(r)$, then as $ \mathbf{r}\to 0$, $\frac{|f(\mathbf{r})|}{|\mathbf{r}|} \to 0$

Generally, the derivative is calculated in all possible directions. But what if we want to find it in only one particular direction $n$ (unit vector)? Then, assuming $\delta \mathbf{r} = hn$, we have the following:

$$
f(\mathbf{r} + h\mathbf{n} - f(\mathbf{r}) = h(\nabla f . \mathbf{n}))
$$

From this, we can derive the directional derivative to be the following:

$$
\mathbf{n} . \nabla f = \lim_{h \to 0} \frac{1}{h}[f(\mathbf{r} + h\mathbf{r}) - f(\mathbf{r})]
$$

This gives us the rate of change of $f$ in this direction.
Suppose now that we have $n = e_i$. Then, our directional derivative becomes the following:

$$
e_i . \delta f = \lim_{h \to 0} \frac{1}{h} [f(\mathbf{r} + he_i) - f(\mathbf{r})] = \frac{\partial f}{\partial x_i}
$$

Therefore, we have the following:

$$
\delta f = \frac{\partial f}{\partial x_i}e_i
$$

And so, the condition of differentiability now becomes the following:

$$
\delta f = \frac{\partial f}{\partial x_i}\delta x_i
$$

We can express this in differential notation as follows:

$$
df = \nabla f . d\mathbf{r} = \frac{\partial f}{\partial x_i}d x_i
$$

This looks very similar to something we encountered earlier. It's the chain rule for partial derivatives.

Let's now take a function $f: \R^n \to \R$ that takes in a vector input $\mathbf{x} \in \R^n$ such taht $\mathbf{x} = x_1, x_2, \dots, x_{n-1}, x_n$. The partial derivatives of this function are written as follows:

$$
\frac{\partial f}{\partial x_1} = \lim_{h \to 0} \frac{f(x_1 + h, x_2, \dots, x_n - f(\mathbf{x})}{h}\\
\vdots\\
\frac{\partial f}{\partial x_n} = \lim_{h \to 0} \frac{f(x_1 + x_2, \dots, x_n + h - f(\mathbf{x})}{h}
$$

We can then write this collectively as an $\R^{1 \times n}$ vector, which we wrote as follows:

$$
\frac{df}{d\mathbf{x}} = \left[ \frac{\partial f(\mathbf{x})}{\partial x_1}, \frac{\partial f(\mathbf{x})}{\partial x_2}, \dots. \frac{\partial f(\mathbf{x})}{\partial x_n} \right ]
$$

Let's go a step further and imagine a vector function made of $m$ different scalar functions, which take the vector $x$ as input. We will write this as $y = f(x)$. 

Expanding y = f(x), we get the following:

$$
y_1 =f_1(\mathbf{x}) \\
y_2 =f_2(\mathbf{x}) \\
\vdots \\
y_m =f_m(\mathbf{x}) \\
$$

Let's revisit the **Jacobian matrix** briefly. As you can see, it is simply an $(m \times n)$ matrix containing all the partial derivatives of the earlier vector function. We can see what this looks like here:

$$
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial }{\partial \mathbf{x}} f_1 \mathbf{x} \\
\frac{\partial }{\partial \mathbf{x}} f_2 \mathbf{x} \\
\vdots \\
\frac{\partial }{\partial \mathbf{x}} f_m \mathbf{x}
\end{bmatrix}
=
\begin{bmatrix} \frac{\partial }{\partial x_1} f_1 \mathbf{x} && \frac{\partial }{\partial x_2} f_1 \mathbf{x} && \dots && \frac{\partial }{\partial x_n} f_1 \mathbf{x} \\
\frac{\partial }{\partial x_1} f_2 \mathbf{x} && \frac{\partial }{\partial x_2} f_2 \mathbf{x} && \dots && \frac{\partial }{\partial x_n} f_2 \mathbf{x}\\

&& \vdots && \ddots && \\
\frac{\partial }{\partial x_1} f_m \mathbf{x} && \frac{\partial }{\partial x_2} f_m \mathbf{x} && \dots && \frac{\partial }{\partial x_n} f_m \mathbf{x}\\
\end{bmatrix}


$$

Let's go a step further and extend this definition to multiple functions. Here, we have $y$, which is the sum of two functions $\mathbf{f}$ and $\mathbf{g}$, each taking in a different vectorial input, which gives us the following: 

$$
\mathbf{y = f(a) + g(b)}
$$

And for the sake of simplicity, $\mathbf{f, g, a}$, and $\mathbf{b}$ are all $n$-dimensional, which results in an n×n matrix, as follows:

$$
\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_{n-1} \\ y_n \end{bmatrix}
=
\begin{bmatrix} y_1(\mathbf{a}) + g_1(\mathbf{b}) \\
y_2(\mathbf{a}) + g_2(\mathbf{b}) \\
\vdots \\
y_{n-1}(\mathbf{a}) + g_{n-1}(\mathbf{b}) \\
y_{n}(\mathbf{a}) + g_{n}(\mathbf{b}) \\
\end{bmatrix}
$$

We can differentiate this matrix with respect to $\mathbf{a}$ or $\mathbf{b}$ and find the Jacobian matrix(es) for each.

By differentiating with respect to $\mathbf{a}$, we get the following:

$$
J_a = \frac{\partial \mathbf{y}}{\partial \mathbf{a}} =
\begin{bmatrix}

\frac{\partial}{\partial a_1}(y_1(\mathbf{a} + g_1(\mathbf{b})) && \frac{\partial}{\partial a_2}(y_1(\mathbf{a}) + g_1(\mathbf{b})) && \dots && \frac{\partial}{\partial a_n}(y_1(\mathbf{a}) + g_1(\mathbf{b})) \\

\frac{\partial}{\partial a_1}(y_2(\mathbf{a} + g_2(\mathbf{b})) && \frac{\partial}{\partial a_2}(y_2(\mathbf{a}) + g_2(\mathbf{b})) && \dots && \frac{\partial}{\partial a_n}(y_2(\mathbf{a}) + g_2(\mathbf{b})) \\

&&  \vdots && \ddots && \\

\frac{\partial}{\partial a_1}(y_n(\mathbf{a} + g_n(\mathbf{b})) && \frac{\partial}{\partial a_2}(y_n(\mathbf{a}) + g_n(\mathbf{b})) && \dots && \frac{\partial}{\partial a_n}(y_n(\mathbf{a}) + g_n(\mathbf{b})) \\

\end{bmatrix}
$$

By differentiating with respect to $\mathbf{b}$, we get the following:

$$
J_b = \frac{\partial \mathbf{y}}{\partial \mathbf{b}} =
\begin{bmatrix}

\frac{\partial}{\partial b_1}(y_1(\mathbf{a} + g_1(\mathbf{b})) && \frac{\partial}{\partial b_2}(y_1(\mathbf{a}) + g_1(\mathbf{b})) && \dots && \frac{\partial}{\partial b_n}(y_1(\mathbf{a}) + g_1(\mathbf{b})) \\

\frac{\partial}{\partial b_1}(y_2(\mathbf{a} + g_2(\mathbf{b})) && \frac{\partial}{\partial b_2}(y_2(\mathbf{a}) + g_2(\mathbf{b})) && \dots && \frac{\partial}{\partial b_n}(y_2(\mathbf{a}) + g_2(\mathbf{b})) \\

&&  \vdots && \ddots && \\

\frac{\partial}{\partial b_1}(y_n(\mathbf{a} + g_n(\mathbf{b})) && \frac{\partial}{\partial b_2}(y_n(\mathbf{a}) + g_n(\mathbf{b})) && \dots && \frac{\partial}{\partial b_n}(y_n(\mathbf{a}) + g_n(\mathbf{b})) \\

\end{bmatrix}
$$

We can do the same for any type of element-wise operation on the two functions. 

As in single variable and multivariable calculus, we have a chain rule for vector differentiation as well.

Let's take the composition of two vector functions that take in a vector input $\mathbf{f(g(x))}$, and so the gradient of this will be $\mathbf{\frac{\partial f}{\partial g} \frac{\partial g}{\partial x}}$ which looks similar to what we encountered before. Let's expand this further, as follows: 

$$
\frac{\partial}{\partial \mathbf{x}} \mathbf{f(g(x))} =
\begin{bmatrix}
\frac{\partial f_1}{\partial g_1} && \frac{\partial f_1}{\partial g_2} \\
\frac{\partial f_2}{\partial g_1} && \frac{\partial f_2}{\partial g_2} \\
\end{bmatrix}
\begin{bmatrix}
\frac{\partial g_1}{\partial x_1} && \frac{\partial g_1}{\partial x_2} \\
\frac{\partial g_2}{\partial x_1} && \frac{\partial g_2}{\partial x_2} \\
\end{bmatrix}
$$

In the majority of cases, for arguments in the Jacobian matrix where $i ≠ j$, the argument tends to be zero, which leads us to the following definitions:

$$
\frac{\partial \mathbf{f}}{\partial \mathbf{g}} = diag\left( \frac{\partial f_i}{\partial g_i} \right) \\

\frac{\partial \mathbf{g}}{\partial \mathbf{x}} = diag\left( \frac{\partial g_i}{\partial x_i} \right)
$$

And so, the following applies:

$$
\frac{\partial}{\partial \mathbf{x}} \mathbf{f(g(x))} = diag \left( \frac{\partial f_i}{\partial g_i} \right) diag \left( \frac{\partial g_i}{\partial x_i} \right) = diag \left( \frac{\partial f_i}{\partial g_i} \frac{\partial g_i}{\partial x_i} \right)
$$

As we can see, this is a diagonal matrix.

### **Vector fields**

We define a vector field as a function $\mathbf{F}: \R^n \to \R^m$, and it can only be differentiated if the following applies:

$$
\mathbf{\delta F = F(x + \delta x) - F(x)} = M\mathbf{\delta x}
$$

Here $M \in \R^{n \times m}$ is the derivaive of $F$.

We can think of $M$ as a matrix that maps one vector to another, and we can now express $F$ as follows:

$$
dy_j = \frac{\partial F_j}{\partial x_i}dx_i
$$

Here, $y_j = F(\mathbf{x})$ for all $j = 1,\dots, m $, and therfore, the derivative of $F$ is this 

$$
M_{i,j} = \frac{\partial y_j}{\partial x_i}
$$

Earlier on in single and multivariable calculus, we learned the importance of the chain rule, so it should be no surprise that we have it in vector calculus as well. And it goes as follows.

Suppose we have have $g: \R^p \to \R^n$ and $f: R^n \to R^m$ and the coordiantes are $u_a \in \R^p, x_i \in R^n$, and $y_r \in \R^m$. Then, the chain rule gives us the following:

$$
\frac{\partial y_r}{\partial u_a} = \frac{\partial y_r}{\partial x_i} \frac{\partial x_i}{\partial u_a}
$$


We can rewrite this in matrix form, as follows:

$$
M(f \circ g)_{r, a} = M(f)_{r, i}M(g)_{i, a}
$$

### **Inverse functions**

Inverse functions are a rather fascinating class of functions in that if we have two functions and we apply them on each other, we receive the identity. Mathematically, we define this as follows:

Suppose we have $$f, g: \R^n \to R^n$. Then, they are only inverse functions if $f \circ g = g \circ f = identity$. For example, we could have $f(\mathbf{u}) = \mathbf{v}$ and $g(\mathbf{v}) = \mathbf{u}$. Therefore, $M(f \circ g) = I$, which tells us that $M(g) = M(f)^{-1}$.

Here is anther cool property that this has:

$$
\det M(f) \det M(g) =1
$$