# <center>Machine Learning with PyTorch and Sklearn - Raschka, Liu, and Mirjalili</center>
## <center>Chapters 11, 12, and 13</center>
### <center>Implementing a Multilayer Artificial Neural Network from Scratch</center>
### <center>and</center>
### <center>Parallelizing Neural Network Training with PyTorch</center>
### <center>and</center>
### <center>Going Deeper – The Mechanics of PyTorch</center>

---

## Remarks

> The trick of reverse mode is that we traverse the chain rule from right to left. We multiply a matrix by a vector, which yields another vector that is multiplied by the next matrix, and so on. Matrix-vector multiplication is computationally much cheaper than matrix-matrix multiplication, which is why backpropagation is one of the most popular algorithms used in NN training.

  



## Concepts

- Importante diferenciar o _forward pass_ que ele fala do _backward pass_. A definição do livro de forward propagation parece um pouco imprecisa. Escreva aqui em termo de _forward pass_ e _backward pass_.

- _Feedforward artificial NN_. Definir.

## Further reading

AI history:

> [AI winters](https://en.wikipedia.org/wiki/AI_winter)

Additional resources on backpropagation:

> [Who Invented Backpropagation?](http://people.idsia.ch/~juergen/who-invented-backpropagation.html)

> Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.

> Chapter 6, Deep Feedforward Networks, Deep Learning, by I. Goodfellow, Y. Bengio, and A. Courville, MIT Press, 2016 (manuscripts freely accessible at http://www.deeplearningbook.org).

> Pattern Recognition and Machine Learning, by C. M. Bishop, Springer New York, 2006.

The MNSIT dataset:

> LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.

Adding skip-connections, which are the main contribution of residual NNs:

> He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Using learning rate schedulers that change the learning rate during training:

> Smith, L. N. (2017, March). Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV) (pp. 464-472). IEEE.

Attaching loss functions to earlier layers in the networks as it’s being done in the popular ``Inception v3`` architecture:

> Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).

Automatic differentiation:

> Baydin, A. G., & Pearlmutter, B. A. (2014). Automatic differentiation of algorithms for machine learning. arXiv preprint arXiv:1404.7456.


In [1]:
import os
import torch
import pathlib
import numpy as np
import matplotlib.pyplot as plt

from PIL import Image
from torch.utils.data import DataLoader, Dataset

np.set_printoptions(precision=3)

In [2]:
torch.cuda.is_available()

True

In [3]:
a = torch.cuda.FloatTensor()

In [15]:
tensor = torch.rand(4, 5)
print(tensor)

tensor([[0.5290, 0.8148, 0.2979, 0.2081, 0.8574],
        [0.6506, 0.9768, 0.6655, 0.3058, 0.8328],
        [0.9598, 0.8347, 0.1182, 0.8505, 0.7721],
        [0.7770, 0.5142, 0.3316, 0.3090, 0.8752]])


In [19]:
# Transposing a tensor
print(tensor.T)

# Reshaping
# The reshaped tensor must have the same number of elements as the original
print(tensor.reshape(10, 2))

tensor([[0.5290, 0.6506, 0.9598, 0.7770],
        [0.8148, 0.9768, 0.8347, 0.5142],
        [0.2979, 0.6655, 0.1182, 0.3316],
        [0.2081, 0.3058, 0.8505, 0.3090],
        [0.8574, 0.8328, 0.7721, 0.8752]])
tensor([[0.5290, 0.8148],
        [0.2979, 0.2081],
        [0.8574, 0.6506],
        [0.9768, 0.6655],
        [0.3058, 0.8328],
        [0.9598, 0.8347],
        [0.1182, 0.8505],
        [0.7721, 0.7770],
        [0.5142, 0.3316],
        [0.3090, 0.8752]])


In [20]:
# Squeezing
print(torch.zeros(1, 2, 1, 4, 1))
print(torch.zeros(1, 2, 1, 4, 1).squeeze(2))

tensor([[[[[0.],
           [0.],
           [0.],
           [0.]]],


         [[[0.],
           [0.],
           [0.],
           [0.]]]]])
tensor([[[[0.],
          [0.],
          [0.],
          [0.]],

         [[0.],
          [0.],
          [0.],
          [0.]]]])


---

In [31]:
images = os.listdir("input/ch12/")
images

['cat-01.jpg',
 'cat-02.jpg',
 'cat-03.jpg',
 'dog-01.jpg',
 'dog-02.jpg',
 'dog-03.jpg']

- Implementar projeto com o MNSIT usando PyTorch. Procurar se há formas de fazer isso sem usar as definições de classe que ele usou no livro. Fazer tudo usando k-fold cross-validation.