Skip to content

Commit

Permalink
Merge pull request #1493 from d2l-ai/master
Browse files Browse the repository at this point in the history
Release v0.15.0
  • Loading branch information
astonzhang committed Oct 23, 2020
2 parents ac4e912 + 5f08dc6 commit fc866fd
Show file tree
Hide file tree
Showing 209 changed files with 18,240 additions and 2,262 deletions.
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -63,4 +63,4 @@ This open source book is made available under the Creative Commons Attribution-S

The sample and reference code within this open source book is made available under a modified MIT license. See the [LICENSE-SAMPLECODE](LICENSE-SAMPLECODE) file.

[Chinese version](https://github.com/d2l-ai/d2l-zh) | [Discuss and report issues](https://discuss.d2l.ai/) | [Other Information](INFO.md)
[Chinese version](https://github.com/d2l-ai/d2l-zh) | [Discuss and report issues](https://discuss.d2l.ai/) | [Code of conduct](CODE_OF_CONDUCT.md) | [Other Information](INFO.md)
Expand Up @@ -334,7 +334,7 @@ def binom(n, k):
comb = comb * (n - i) // (i + 1)
return comb
pmf = torch.tensor([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
pmf = d2l.tensor([p**i * (1-p)**(n - i) * binom(n, i) for i in range(n + 1)])
d2l.plt.stem([i for i in range(n + 1)], pmf, use_line_collection=True)
d2l.plt.xlabel('x')
Expand Down Expand Up @@ -672,7 +672,7 @@ d2l.plot(x, np.array([phi(y) for y in x.tolist()]), 'x', 'c.d.f.')
```{.python .input}
#@tab pytorch
def phi(x):
return (1.0 + erf((x - mu) / (sigma * torch.sqrt(torch.tensor(2.))))) / 2.0
return (1.0 + erf((x - mu) / (sigma * torch.sqrt(d2l.tensor(2.))))) / 2.0
d2l.plot(x, torch.tensor([phi(y) for y in x.tolist()]), 'x', 'c.d.f.')
```
Expand Down
Expand Up @@ -538,7 +538,7 @@ for all practical purposes, our random vector has been transformed
into the principle eigenvector!
Indeed this algorithm is the basis
for what is known as the *power iteration*
for finding the largest eigenvalue and eigenvector of a matrix. For details see, for example, :cite:`Van-Loan.Golub.1983`.
for finding the largest eigenvalue and eigenvector of a matrix. For details see, for example, :cite:`Van-Loan.Golub.1983`.

### Fixing the Normalization

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

36 changes: 18 additions & 18 deletions chapter_appendix-mathematics-for-deep-learning/integral-calculus.md
@@ -1,7 +1,7 @@
# Integral Calculus
:label:`sec_integral_calculus`

Differentiation only makes up half of the content of a traditional calculus education. The other pillar, integration, starts out seeming a rather disjoint question, "What is the area underneath this curve?" While seemingly unrelated, integration is tightly intertwined with the differentiation via what is known as the *fundamental theorem of calculus*.
Differentiation only makes up half of the content of a traditional calculus education. The other pillar, integration, starts out seeming a rather disjoint question, "What is the area underneath this curve?" While seemingly unrelated, integration is tightly intertwined with the differentiation via what is known as the *fundamental theorem of calculus*.

At the level of machine learning we discuss in this book, we will not need a deep understanding of integration. However, we will provide a brief introduction to lay the groundwork for any further applications we will encounter later on.

Expand Down Expand Up @@ -187,7 +187,7 @@ We will instead take a different approach. We will work intuitively with the no

## The Fundamental Theorem of Calculus

To dive deeper into the theory of integration, let us introduce a function
To dive deeper into the theory of integration, let us introduce a function

$$
F(x) = \int_0^x f(y) dy.
Expand All @@ -201,10 +201,10 @@ $$

This is a mathematical encoding of the fact that we can measure the area out to the far end-point and then subtract off the area to the near end point as indicated in :numref:`fig_area-subtract`.

![Visualizing why we may reduce the problem of computing the area under a curve between two points to computing the area to the left of a point.](../img/SubArea.svg)
![Visualizing why we may reduce the problem of computing the area under a curve between two points to computing the area to the left of a point.](../img/sub-area.svg)
:label:`fig_area-subtract`

Thus, we can figure out what the integral over any interval is by figuring out what $F(x)$ is.
Thus, we can figure out what the integral over any interval is by figuring out what $F(x)$ is.

To do so, let us consider an experiment. As we often do in calculus, let us imagine what happens when we shift the value by a tiny bit. From the comment above, we know that

Expand Down Expand Up @@ -259,7 +259,7 @@ First, suppose that we have a function which is itself an integral:

$$
F(x) = \int_0^x f(y) \; dy.
$$
$$

Let us suppose that we want to know how this function looks when we compose it with another to obtain $F(u(x))$. By the chain rule, we know

Expand All @@ -286,16 +286,16 @@ $$\int_{u(0)}^{u(x)} f(y) \; dy = \int_0^x f(u(y))\cdot \frac{du}{dy} \;dy.$$

This is the *change of variables* formula.

For a more intuitive derivation, consider what happens when we take an integral of $f(u(x))$ between $x$ and $x+\epsilon$. For a small $\epsilon$, this integral is approximately $\epsilon f(u(x))$, the area of the associated rectangle. Now, let us compare this with the integral of $f(y)$ from $u(x)$ to $u(x+\epsilon)$. We know that $u(x+\epsilon) \approx u(x) + \epsilon \frac{du}{dx}(x)$, so the area of this rectangle is approximately $\epsilon \frac{du}{dx}(x)f(u(x))$. Thus, to make the area of these two rectangles to agree, we need to multiply the first one by $\frac{du}{dx}(x)$ as is illustrated in :numref:`fig_rect-transform`.
For a more intuitive derivation, consider what happens when we take an integral of $f(u(x))$ between $x$ and $x+\epsilon$. For a small $\epsilon$, this integral is approximately $\epsilon f(u(x))$, the area of the associated rectangle. Now, let us compare this with the integral of $f(y)$ from $u(x)$ to $u(x+\epsilon)$. We know that $u(x+\epsilon) \approx u(x) + \epsilon \frac{du}{dx}(x)$, so the area of this rectangle is approximately $\epsilon \frac{du}{dx}(x)f(u(x))$. Thus, to make the area of these two rectangles to agree, we need to multiply the first one by $\frac{du}{dx}(x)$ as is illustrated in :numref:`fig_rect-transform`.

![Visualizing the transformation of a single thin rectangle under the change of variables.](../img/RectTrans.svg)
![Visualizing the transformation of a single thin rectangle under the change of variables.](../img/rect-trans.svg)
:label:`fig_rect-transform`

This tells us that

$$
\int_x^{x+\epsilon} f(u(y))\frac{du}{dy}(y)\;dy = \int_{u(x)}^{u(x+\epsilon)} f(y) \; dy.
$$
$$

This is the change of variables formula expressed for a single small rectangle.

Expand Down Expand Up @@ -404,7 +404,7 @@ ax.set_zlim(0, 1)
ax.dist = 12
```

We write this as
We write this as

$$
\int_{[a, b]\times[c, d]} f(x, y)\;dx\;dy.
Expand All @@ -416,7 +416,7 @@ $$
\int_{[a, b]\times[c, d]} f(x, y)\;dx\;dy = \int_c^{d} \left(\int_a^{b} f(x, y) \;dx\right) \; dy.
$$

Let us see why this is.
Let us see why this is.

Consider the figure above where we have split the function into $\epsilon \times \epsilon$ squares which we will index with integer coordinates $i, j$. In this case, our integral is approximately

Expand All @@ -430,16 +430,16 @@ $$
\sum _ {j} \epsilon \left(\sum_{i} \epsilon f(\epsilon i, \epsilon j)\right).
$$

![Illustrating how to decompose a sum over many squares as a sum over first the columns (1), then adding the column sums together (2).](../img/SumOrder.svg)
![Illustrating how to decompose a sum over many squares as a sum over first the columns (1), then adding the column sums together (2).](../img/sum-order.svg)
:label:`fig_sum-order`

The sum on the inside is precisely the discretization of the integral
The sum on the inside is precisely the discretization of the integral

$$
G(\epsilon j) = \int _a^{b} f(x, \epsilon j) \; dx.
$$

Finally, notice that if we combine these two expressions we get
Finally, notice that if we combine these two expressions we get

$$
\sum _ {j} \epsilon G(\epsilon j) \approx \int _ {c}^{d} G(y) \; dy = \int _ {[a, b]\times[c, d]} f(x, y)\;dx\;dy.
Expand All @@ -466,9 +466,9 @@ $$
$$

## Change of Variables in Multiple Integrals
As with single variables in :eqref:`eq_change_var`, the ability to change variables inside a higher dimensional integral is a key tool. Let us summarize the result without derivation.
As with single variables in :eqref:`eq_change_var`, the ability to change variables inside a higher dimensional integral is a key tool. Let us summarize the result without derivation.

We need a function that reparameterizes our domain of integration. We can take this to be $\phi : \mathbb{R}^n \rightarrow \mathbb{R}^n$, that is any function which takes in $n$ real variables and returns another $n$. To keep the expressions clean, we will assume that $\phi$ is *injective* which is to say it never folds over itself ($\phi(\mathbf{x}) = \phi(\mathbf{y}) \implies \mathbf{x} = \mathbf{y}$).
We need a function that reparameterizes our domain of integration. We can take this to be $\phi : \mathbb{R}^n \rightarrow \mathbb{R}^n$, that is any function which takes in $n$ real variables and returns another $n$. To keep the expressions clean, we will assume that $\phi$ is *injective* which is to say it never folds over itself ($\phi(\mathbf{x}) = \phi(\mathbf{y}) \implies \mathbf{x} = \mathbf{y}$).

In this case, we can say that

Expand All @@ -486,7 +486,7 @@ D\boldsymbol{\phi} = \begin{bmatrix}
\end{bmatrix}.
$$

Looking closely, we see that this is similar to the single variable chain rule :eqref:`eq_change_var`, except we have replaced the term $\frac{du}{dx}(x)$ with $\left|\det(D\phi(\mathbf{x}))\right|$. Let us see how we can to interpret this term. Recall that the $\frac{du}{dx}(x)$ term existed to say how much we stretched our $x$-axis by applying $u$. The same process in higher dimensions is to determine how much we stretch the area (or volume, or hyper-volume) of a little square (or little *hyper-cube*) by applying $\boldsymbol{\phi}$. If $\boldsymbol{\phi}$ was the multiplication by a matrix, then we know how the determinant already gives the answer.
Looking closely, we see that this is similar to the single variable chain rule :eqref:`eq_change_var`, except we have replaced the term $\frac{du}{dx}(x)$ with $\left|\det(D\phi(\mathbf{x}))\right|$. Let us see how we can to interpret this term. Recall that the $\frac{du}{dx}(x)$ term existed to say how much we stretched our $x$-axis by applying $u$. The same process in higher dimensions is to determine how much we stretch the area (or volume, or hyper-volume) of a little square (or little *hyper-cube*) by applying $\boldsymbol{\phi}$. If $\boldsymbol{\phi}$ was the multiplication by a matrix, then we know how the determinant already gives the answer.

With some work, one can show that the *Jacobian* provides the best approximation to a multivariable function $\boldsymbol{\phi}$ at a point by a matrix in the same way we could approximate by lines or planes with derivatives and gradients. Thus the determinant of the Jacobian exactly mirrors the scaling factor we identified in one dimension.

Expand All @@ -502,7 +502,7 @@ $$
\int _ 0^\infty \int_0 ^ {2\pi} e^{-r^{2}} \left|\det(D\mathbf{\phi}(\mathbf{x}))\right|\;d\theta\;dr,
$$

where
where

$$
\left|\det(D\mathbf{\phi}(\mathbf{x}))\right| = \left|\det\begin{bmatrix}
Expand All @@ -517,7 +517,7 @@ $$
\int _ 0^\infty \int _ 0 ^ {2\pi} re^{-r^{2}} \;d\theta\;dr = 2\pi\int _ 0^\infty re^{-r^{2}} \;dr = \pi,
$$

where the final equality follows by the same computation that we used in section :numref:`integral_example`.
where the final equality follows by the same computation that we used in section :numref:`integral_example`.

We will meet this integral again when we study continuous random variables in :numref:`sec_random_variables`.

Expand Down

0 comments on commit fc866fd

Please sign in to comment.