# <span style="color:blue">Deep Learning with PyTorch</span>
Authors: Eli Stevens, Luca Antiga

### <span style="color:red">Chapter 1 Introducing deep learning and the PyTorch library</span>

##### <span style="color:green">1.4 PyTorch has the batteries included</span>
To bypass the cost of the Python interpreter and offer the opportunity to run models independently from a Python runtime, PyTorch also provides a deferred execution model named TorchScript. Using TorchScript, PyTorch can serialize a set of instructions that can be invoked independently from Python. Besides not incurring the costs of calling into Python, this execution mode gives PyTorch the opportunity to Just in Time (JIT) transform sequences of known operations into more efficient fused operations. These features are the basis of the production deployment capabilities of PyTorch.

### <span style="color:red">Chapter 2 It starts with a tensor</span>

##### <span style="color:green">2.1 Tensor fundamentals</span>
Using Python lists to store the vector can be suboptimal.
- Numbers in Python are full-fledged objects. Whereas a floating-point number might take only 32 bits to be represented on a computer, Python boxes them in a full-fledged Python object with reference counting and so on. This situation isn’t a problem if you need to store a small number of numbers, but allocating millions of such numbers gets inefficient.
- Lists in Python are meant for sequential collections of objects. No operations are defined for, say, efficiently taking the dot product of two vectors or summing vectors. Also, Python lists have no way of optimizing the layout of their content in memory, as they’re indexable collections of pointers to Python objects (of any kind, not numbers alone). Finally, Python lists are one-dimensional, and although you can create lists of lists, again, this practice is inefficient.
- The Python interpreter is slow compared with optimized, compiled code. Performing mathematical operations on large collections of numerical data can be must faster using optimized code written in a compiled, low-level language like C.

Indexing does not allocate a new chunk of memory, because that process would be inefficient. It was instead a different view of the same underlying data.

##### <span style="color:green">2.2 Tensors and storages</span>
Values are allocated in contiguous chunks of memory, managed by torch.Storage instances. A storage is a one-dimensional array of numerical data. The layout of a storage is always one-dimensional, irrespective of the dimensionality of any tensors that may refer to it.

##### <span style="color:green">2.3 Size, storage offset, and strides</span>
Accessing an element $i$, $j$ in a 2D tensor results in accessing the $storage\_offset + stride[0] * i + stride[1] * j$ element in the storage. The offset will usually be zero; if this tensor is a view into a storage created to hold a larger tensor the offset might be a positive value. This indirection between Tensor and Storage leads some operations, such as transposing a tensor or extracting a subtensor, to be inexpensive, as they don’t lead to memory reallocations; instead, they consist of allocating a new tensor object with a different value for size, storage offset, or stride.

Changing the subtensor has a side effect on the original tensor too. This effect may not always be desirable, so you can eventually clone the subtensor into a new tensor.

Transposing in PyTorch isn’t limited to matrices. You can transpose a multidimensional array by specifying the two dimensions along which transposing.

A tensor whose values are laid out in the storage starting from the rightmost dimension onward (moving along rows for a 2D tensor, for example) is defined as being contiguous. Contiguous tensors are convenient because you can visit them efficiently and in order without jumping around in the storage.

You can obtain a new contiguous tensor from a noncontiguous one by using the con- tiguous method. The content of the tensor stays the same, but the stride changes, as does the storage.

##### <span style="color:green">2.9 The tensor API</span>
A small number of operations exist only as methods of the tensor object. They’re recognizable by the trailing underscore in their name, such as zero_, which indicates that the method operates in-place by modifying the input instead of creating a new output tensor and returning it. Any method without the trailing underscore leaves the source tensor unchanged and returns a new tensor

### <span style="color:red">Chapter 3 Real-world data representation with tensors</span>

##### <span style="color:green">3.1 Tabular data</span>
Columns may contain numerical values, or labels, such as a string expressing an attribute. Therefore, tabular data typically isn’t homogeneous; different columns don’t have the same type.


PyTorch tensors, on the other hand, are homogeneous. Other data science packages, such as Pandas, have the concept of the data frame, an object representing a data set with named, heterogenous columns. By contrast, information in PyTorch is encoded as a number, typically floating-point (though integer types are supported as well). Numeric encoding is deliberate, because neural networks are mathematical entities that take real numbers as inputs and produce real numbers as output through successive application of matrix multiplications and nonlinear functions.

### <span style="color:red">Chapter 4 The mechanics of learning</span>

##### <span style="color:green">4.2 PyTorch’s autograd: Backpropagate all things</span>
All PyTorch tensors have an attribute named grad, normally None. The training loop starts with a tensor with requires_grad set to True, calls the model, computes the loss, and then calls backward on the loss tensor.

Calling backward leads derivatives to accumulate at leaf nodes. You need to zero the gradient explicitly after using it for parameter updates. To prevent this situation from occurring, you need to zero the gradient explicitly at each iteration. You can do so easily by using the in-place zero_ method.