This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

# The mathematical building blocks of neural networks

To provide sufficient context for introducing tensors and gradient descent, we’ll begin the
chapter with a practical example of a neural network. Then we’ll go over every new concept
that’s been introduced, point by point.

## Chapter summary

- **Tensors** form the foundation of modern machine learning systems. They come in various flavors of dtype, rank, and shape.
- You can manipulate numerical tensors via tensor operations (such as addition, tensor product, or elementwise multiplication), which can be interpreted as **encoding geometric transformations**. In general, everything in deep learning is amenable to a geometric interpretation.
- Deep learning models consist of **chains of simple tensor operations, parameterized by weights**, which are themselves tensors. The weights of a model are where its "knowledge" is stored.
- Learning means finding a set of values for the model’s weights that **minimizes a loss function for a given set of training data samples** and their corresponding targets.
- Learning happens by drawing random batches of data samples and their targets, and computing the **gradient** of the model parameters with respect to the loss on the batch. The model parameters are then moved a bit (the magnitude of the move is defined by the learning rate) in the opposite direction from the gradient. This is called **mini-batch gradient descent**.
- The entire learning process is made possible by the fact that all tensor operations in neural networks are differentiable, and thus it’s possible to apply the chain rule of derivation to find the gradient function mapping the current parameters and current batch of data to a gradient value. This is called **backpropagation**.

Two key concepts you’ll see frequently in future chapters are loss and optimizers. These are the two things you need to define before you begin feeding data into a model.

- The ***loss** is the quantity you’ll attempt to minimize during training, so it should represent a measure of success for the task you’re trying to solve.
- The **optimizer** specifies the exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.