### 4 - Introduction

In this chapter, the author introduces a few data manipulations.  These skills are necessary for any person entering in data science/machine learning.  Using PyTorch, and not MXnet, I had developed the same code/results showed in the book.  

Linear algebra is the base of machine learning.  It gives us a robust set of techniques for working with tabular data.  Matrix operation is the core of machine learning,  principally using algorithms like backpropagation to optimization ours models parameters to fit our data as best possible, determining which way to optimize parameters of the models requires a little bit of calculus.  


### 4.1 Data Manipulation

Generally, data manipulation has two core tasks:
- Acquire data
- Process data
Using the tensors of Pytorch, this kind of tool is necessary to store our data.  The tensors provide a few key advantages. First, it provides asynchronous computations using GPU and CPU.  Secondly, tensors can provide support for automatic differentiation. It is necessary for backpropagation and deep learning.

#### 4.1.1 Getting Started

This chapter is focusing on getting you up and running with the basic functionality.  The next two chapters will be more concentrate on the math behind element-wise, normal distributions, and other essential operations.  In section 17.2, we have more in-depth mathematical content to be explored. <br><br>
First, we need to import the Torch module.  In Torch we have tensors. Tensors are numerical arrays. It is like NDArrays of MXNet and can be stored in CPU or GPU. Tensors with two axes correspond to matrices, and arrays with more than two axes don't have any unique names. 

In [33]:
import torch
import numpy as np
x = torch.arange(12)
x

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

The variable x contains a tensor one-dimensional with length 12.  Another way to get the shape of a tensor is by using propriety .shape! If we have a two-dimensional array, the shape will be a tensor with two numbers! 

In [6]:
x.shape

torch.Size([12])

The function reshape allow us to change the shape of the tensors.  You can transform the one-dimensional tensor y into a matrix with shape (3,4).  If you don't want to make all calculations of dimensions, you can omit one dimension using the number -1 and writer the other dimensions. 

In [7]:
print(x.reshape((3,4)))
print(x.reshape((-1,4)))
print(x.reshape((3,-1)))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


***Pytorch*** allows us to initialize tensors in multiple ways.  For example, you can initialize tensors with one's, zero's, or grab catch from memory.  The last method is more performative, but not too useful, because of the big numbers. Take a look of numbers [4.5710e-41, -2.4891e-37,  3.0899e-41]

In [8]:
print(torch.ones(5))
print(torch.zeros(5))
print(torch.empty(5))

tensor([1., 1., 1., 1., 1.])
tensor([0., 0., 0., 0., 0.])
tensor([0.0000e+00, 0.0000e+00, 7.0065e-45, 0.0000e+00, 8.9683e-44])


We can randomly sample numbers from known distributions like gaussian or exponential. 

In [9]:
normal = torch.distributions.normal.Normal(torch.tensor([0.0]),torch.tensor([1.0]))
exponential = torch.distributions.exponential.Exponential(torch.tensor([1.0]))
sample_n = [normal.sample() for _ in range(5)]
sample_e = [exponential.sample() for _ in range(5)]
print(sample_n)
print(sample_e)

[tensor([1.4635]), tensor([-0.7932]), tensor([-0.9193]), tensor([1.2948]), tensor([0.3475])]
[tensor([0.0188]), tensor([0.8526]), tensor([0.6789]), tensor([1.2094]), tensor([0.0985])]


In the code snippet above, we created five samples of normal distributions with mean = 0 and std = 1($\mathcal{N}(0,1)$) and five samples with exponential with rate = 1($\lambda = 1$). 

#### 4.1.2 Operations

In [10]:
x = torch.arange(4)
y = torch.ones_like(x) * 2
print("x = ",x)
print("y = ",y)
print("x + y = ", x + y)
print("x - y = ", x - y)
print("x * y = ", x * y)
print("x / y = ", x / y)

x =  tensor([0, 1, 2, 3])
y =  tensor([2, 2, 2, 2])
x + y =  tensor([2, 3, 4, 5])
x - y =  tensor([-2, -1,  0,  1])
x * y =  tensor([0, 2, 4, 6])
x / y =  tensor([0, 0, 1, 1])


Operations in Pytorch are element-wise. But, if you pay attention to div operation, it doesn't look right correct? It is because x and y are Long type tensors.  You have to transform these tensors in Float type:

In [11]:
x = torch.arange(4)
y = torch.ones_like(x) * 2

# Transformation
x = x.double()
y = y.double()

print("x = ",x)
print("y = ",y)
print("x + y = ", x + y)
print("x - y = ", x - y)
print("x * y = ", x * y)
print("x / y = ", x / y)

x =  tensor([0., 1., 2., 3.], dtype=torch.float64)
y =  tensor([2., 2., 2., 2.], dtype=torch.float64)
x + y =  tensor([2., 3., 4., 5.], dtype=torch.float64)
x - y =  tensor([-2., -1.,  0.,  1.], dtype=torch.float64)
x * y =  tensor([0., 2., 4., 6.], dtype=torch.float64)
x / y =  tensor([0.0000, 0.5000, 1.0000, 1.5000], dtype=torch.float64)


Many operations can be applied element-wise, such as exponentiation:

In [12]:
x = torch.arange(4)
x = x.double()
print(x.exp())
print(torch.exp(x))

tensor([ 1.0000,  2.7183,  7.3891, 20.0855], dtype=torch.float64)
tensor([ 1.0000,  2.7183,  7.3891, 20.0855], dtype=torch.float64)


***torch.mm*** allow us to made matrix operations. In the next code snippet, we create two matrices and transpose the second to make a dot multiplication between x and y. x has the shape (3,4), and y transpose has the shape(4,3), then creating a matrix with shape (3,3): 

In [13]:
x = torch.arange(12).reshape((3,4))
y = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
print(x)
print(y.t())
print(torch.mm(x,y.t()))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[2, 1, 4],
        [1, 2, 3],
        [4, 3, 2],
        [3, 4, 1]])
tensor([[ 18,  20,  10],
        [ 58,  60,  50],
        [ 98, 100,  90]])


Another operations: 

In [14]:
print(torch.cat((x,y),0)) # Concatenation along axes 0
print(torch.cat((x,y),1)) # Concatenation along axes 0
print(x == y) # Binary statement: if x(i,j) == y(i,j) than 1, else 0
print(x.sum())
print(x.double().norm().item()) # Only for floating-point types

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [ 2,  1,  4,  3],
        [ 1,  2,  3,  4],
        [ 4,  3,  2,  1]])
tensor([[ 0,  1,  2,  3,  2,  1,  4,  3],
        [ 4,  5,  6,  7,  1,  2,  3,  4],
        [ 8,  9, 10, 11,  4,  3,  2,  1]])
tensor([[0, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.uint8)
tensor(66)
22.494443758403985


The function .item() transform the tensor into python scalar. 

#### 4.1.3 Broadcast Mechanism

In [15]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
print(a,a.shape)
print(b,b.shape)

tensor([[0],
        [1],
        [2]]) torch.Size([3, 1])
tensor([[0, 1]]) torch.Size([1, 2])


In [16]:
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

The broadcast mechanism is similar to Numpy. First, replicate the elements in rows and columns, so the two tensors have the same shape, and then apply the operations by elements.  Above Pytorch replicate column of tensor a and row b. 

#### 4.1.4 Indexing and slicing

Similar to Python array, elements in tensor can be accessed by its index.  One example, x[0:3] select the first element to last - 1 element, in that case, items [0,1,2] will be chosen. 

In [17]:
x = torch.arange(12)
x[0:3]

tensor([0, 1, 2])

In [18]:
x.reshape(-1,4)[1:3] # Matrix: Selects second and third row.

tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [19]:
x.reshape(-1,4)[1:3,0:2] # Matrix: Selects second and third row and first and second column

tensor([[4, 5],
        [8, 9]])

In [20]:
x_diff = x.reshape(-1,4)
x_diff[1,2] = -1
print(x_diff)
x_diff[1,:] = -2
print(x_diff) #Assign multiple times in the second row and all columns

tensor([[ 0,  1,  2,  3],
        [ 4,  5, -1,  7],
        [ 8,  9, 10, 11]])
tensor([[ 0,  1,  2,  3],
        [-2, -2, -2, -2],
        [ 8,  9, 10, 11]])


We can also write elements of a matrix. Like above! 

#### 4.1.5 Saving memory

Saving memory is useful when we have restricted memory.  The last operations made in this notebooks, we always allocate new memory to host results.  In the example below,  y = x + y, the matrix pointed to y will be different after you get the result. Python id() function gives us the exact address of the referenced object in memory.  First, evaluates y + x, allocate new memory for the result and then subsequently redirects y to point at this new location in memory. 

In [21]:
before = id(y)
y = y.reshape(12) + x
id(y) == before

False

Using inplace operations, we can have the same space of memory to store our results and avoid memory leak and unnecessarily allocation of memory.  Using zeros_likes, we clone the shape of a matrix and allocate zeros values into this. 

In [22]:
z = torch.zeros_like(y)
print('id(z):', id(z))
z[:] = x + y
print('id(z):', id(z))

id(z): 140120981356120
id(z): 140120981356120


To make even better use of memory, because of x + y here still allocate a temporary buffer to store x + y, we can directly invoke torch operations, avoiding temporary buffers. 

In [23]:
before = id(z)
torch.add(x, y, out=z)
id(z) == before

True

Another way to make in-place operations are:

In [32]:
x = torch.arange(4)
y = torch.arange(4)
print("id(x):", id(x))
x += y
print(x,"id(x):",id(x))
x = torch.arange(4)
y = torch.arange(4)
print("id(x):",id(x))
x.add_(y)
print(x,"id(x):",id(x))

id(x): 140120981561848
tensor([0, 2, 4, 6]) id(x): 140120981561848
id(x): 140121834222216
tensor([0, 2, 4, 6]) id(x): 140121834222216


#### !!! Careful !!! 

In-place operations for autograd is ***dangerous!*** <br>
https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd

#### 4.1.6 Mututal Transformation of NDArray and NumPy

This last subsection is only a minor and easy example of converting a numpy array to tensor and vice-versa. The converted arrays don't share the same memory, because you don't want the Torch or numpy waits for each other to make operation. 


In [38]:
x = torch.arange(5)
y = np.arange(5)
n_x = x.data.numpy()
t_x = torch.from_numpy(y)
print(type(n_x))
print(type(t_x))


<class 'numpy.ndarray'>
<class 'torch.Tensor'>


#### 4.1.7 Exercises

#### 1)

In [43]:
x = torch.arange(12).reshape((3,4))
y = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
print(x)
print(y)

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])


In [44]:
x == y

tensor([[0, 1, 0, 1],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.uint8)

In [41]:
x < y

tensor([[1, 0, 1, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]], dtype=torch.uint8)

In [45]:
x > y

tensor([[0, 0, 0, 0],
        [1, 1, 1, 1],
        [1, 1, 1, 1]], dtype=torch.uint8)

#### b)

In [51]:
a = torch.arange(6).reshape((2,3, 1))
b = torch.arange(2).reshape((1, 2))
print(a)
print(b)

tensor([[[0],
         [1],
         [2]],

        [[3],
         [4],
         [5]]])
tensor([[0, 1]])


In [52]:
a + b

tensor([[[0, 1],
         [1, 2],
         [2, 3]],

        [[3, 4],
         [4, 5],
         [5, 6]]])

Another situation

In [81]:
x = torch.empty(5,5,2,1)
y = torch.empty(    3,1)

In [82]:
x + y

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 2

Broadcasting operations have to respect two rules:
- The correspondent dimensions of vectors are equal, or
- One of the correspondent dimensions are 1

In the case above, 2 $\neq$ 3, so the operation is impossible! 

#### c)

In [104]:
a = torch.arange(12).reshape((3,4))
b = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
c = torch.ones(9).reshape((3,3)).long()
print(a)
print(b.t())
print(c)
out_mm = torch.zeros_like(c)
print("id(out_mm):",id(out_mm))
torch.mm(a, b.t(), out = out_mm)
torch.add(out_mm, c,  out = out_mm)
print(out_mm)
print("id(out_mm):",id(out_mm))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[2, 1, 4],
        [1, 2, 3],
        [4, 3, 2],
        [3, 4, 1]])
tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])
id(out_mm): 140120977403120
tensor([[ 19,  21,  11],
        [ 59,  61,  51],
        [ 99, 101,  91]])
id(out_mm): 140120977403120
