# Tensors and Neural Nets, a gentle introduction

## 1. What is a tensor

I didn't know very much about tensors ... so I've started to ask for more, finding inspiration from some answers on Quora like the ones [written by
William Oliver and Brian Bi](https://www.quora.com/What-is-a-tensor).

* Tensors **are not** generalizations or formalizations of vectors or matrices.

* Some tensors can be represented as 2D arrays, but these 2D arrays do not necessarily work anything like matrices or not any 2D array is a tensor. The numerical values in a matrix’s representation represent entirely different things than the numerical values in a tensor’s definition.

The fundamental definition of the dot product of two vectors x and y **is not** x1y1+x2y2, **it is ||x||||y||cos(θ)**. The former is just a convenient computational shortcut when working in Cartesian coordinates and the latter one is a geometrical operation of the **dot product and the dot product itself is an example of a tensor**.



So one definition of a tensor: A tensor is any multilinear map from a vector space to a scalar field ( from my understanding, a tensor maps pairs of vectors into scalars).

Let's begin with a vector in the XOY plane (**a particular case**), trying to find the x and y components of that vector. (I've followed [this great explanation](https://youtu.be/f5liqUk0ZTw)). But for this, we need the projections of this vector onto of each of the axes (its shadows on x an y axes, or how far we need to go from the origin to the "tip" of our vector using the road given by x and y axes).  

In this way, we will end up representing our vector as some x and y basis unit vectors (x * i and y * j), using the components as our final vector representation (while knowing the basis components),or representing the vector as an array [ x, y ] (**but** keep in mind that these 2 components pertain only to the vector that we took initially into account for which we've found the components).  

**For the general case**, taking the vector A, we will have the Ax, Ay and Az components pertaining to each of the basis components (assuming this time XYZ system of axes or three dimesnional space). We will need only one index because we have only one directional indicator (on basis of vector component) and this is what makes vectors tensors of rank 1. Similarly the scalars are considered tensors of rank 0 because they don't have any directional indicators.  



**What about higher ranked tensors?**

A rank 2 tensor in 3D space representation will have 9 components (having two indices because of the two directional indicators) and 9 sets (combinations) of 2 basis vectors ( such as: xx, xy, xz, yx, yy, yz, zx, zy, zz).


A rank 3 tensor in 3D space will be represented by 27 components (9 * 3 slices of tensors)  (having three indices because of the three directional indicators) each pertaining to 27 sets (combinations) of 3 basis vectors (xx**x**, xy**x**, xz**x**, yx**x**, yy**x**, yz**x**, zy**x**, zz**x**, zz**x** etc for the next components ("slices of tensors" as I use to visualize them and, one will stwich the third index with y or z for dinding out the rest of the other 18).  

Also  Brian Bi mentioned, **the tensor is the geometric object obtained by using the values in the multidimensional array as coefficients to form a linear combination of the unit basis tensors**, and this implies that when you change your coordinate system (let's say x′,y′,z′), the tensor’s components in the new system are obtained in the manner described above, by expressing the old unit basis vectors in terms of the new ones. This then will allows us to expand the tensor in terms of x′,y′,z′, giving the coordinates in the new coordinate system.


## 2. Tensors seen through Pytorch

In [59]:
import torch
import numpy as np
from numpy import array

In [9]:
# Few examples on tensors to understand their structure 

# A 1D tensor that looks like a one-line matrix/a vector
x = torch.randn((1, 3)) 

# A 2D tensor that looks like a matrix
y = torch.randn((2, 5)) 

# It will give us a series of 3 tensors containing other tensors of size (2x3)
# everything being comprised in the end as a 3D tensor
z = torch.randn((3, 2, 3)) 

# A scalar remains a 0-rank tensor (a tensor without shape) and it can be easily used within operations with other tensors
t = torch.randn(0)

print(x.size(), y.size(), z.size(), t.size())

torch.Size([1, 3]) torch.Size([2, 5]) torch.Size([3, 2, 3]) torch.Size([0])


### 2.1 Breaking down the rule that element-wise operations operate on tensors of the same shape

Having the **same shape** is not **fully** the same as being compatible(as I see it).  

When operating on two tensors/arrays, Pytorch compares their shapes element-wise. It starts with the trailing dimension (from right to the left), and works its way forward and checks for compatibility, if the dimensions:

- are equal  
**OR**
- one of them is 1

If these conditions are not met, an exception is thrown, indicating that the tensors/arrays have **incompatible** shapes.  

*However, when the shapes do not match, the shape of the returned output tensor checks for the **broadcasting conditions**.*

So, when doing some operation between a scalar (or one dimension tensor) and another tensor (with more dimensions), the scalar is "streched" (or "copied", imaginary speaking) without making any other copy/ array that fits the size of the "bigger" tensor involved in the operation (so extra memory is not actually allocated).
This is the higher level view of broadcasting.

**How come !??**

Subject to <strong> certain constraints </strong> (the need for having the same shape for one-on-one basis operations), the smaller array (the scalar in our case) is "broadcast" across the larger array (the matrix) so that they have compatible shapes (see [numpy documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)).



#### A . When broadcasting happens

Well... not all tensors are broacastable, they are subject to some criteria and broadcasting happens when:  

1. each tensor has at least one dimension (<em> so practically a tensor should be at least 1D tensor, looking like a vector </em>);  

2. iterating over the dimension sizes, starting at the trailing dimension (<em> one can do the checking backwards to see each dimension and compare </em>), the dimension sizes must either be equal, one of them is 1, or one of them does not exist.  

See more details [in Pytorch's documentation](https://pytorch.org/docs/stable/notes/broadcasting.html) and also [in this interesting blogpost](https://mc.ai/broadcasting-with-pytorch/).

#### B. For the beginning,  let's imagine broadcasting with Numpy

I will play a bit with Numpy... because normally I would not do a sum between 1D and 2D arrays and I wanted to see how it works.

The only requirement for broadcasting is a way of aligning array dimensions such that either (check [here](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) the documentation):

* **aligned dimensions** are equal;
* one of the aligned dimensions is 1.

And btw...arrays **do not need** to have **the same number of dimensions.**

In [91]:
# add vector and scalar--> vector
a = np.arange(4) + 5
print(a)

# add a matrix(2 axes/2D array) and a vector (one axis/1D array)--> 2D array
b = np.ones((4,3)) + np.arange(3)
c = np.arange(4).reshape((4,1)) + np.arange(5)

print(b)
print(c)

# add 2 vectors (vec column + vec row results in a matrix, damn it !!
aa = np.array([0, 10, 20, 30]) #aa(1,4)
bb = np.array([1, 2, 3])
# add an extra axis to the array, resulting in column vector aa(4,1) + row vector--> 2D array
print (aa[:, np.newaxis] + bb)

[5 6 7 8]
[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]
[[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]
 [3 4 5 6 7]]
[[ 1  2  3]
 [11 12 13]
 [21 22 23]
 [31 32 33]]


#### C. Broadcasting with tensors, after checking the broadcasting conditions

When iterating through each tensor's dimension we find that dimensions are not equal, we reconsider the smaller dimension to the level of the bigger dimension of the other tensor (it's like taking the max between the two dimensions compared).

In [92]:
# 2D + 1D tensors--> 2D tensor
a = torch.tensor([[1, 2, 1],
                  [2, 3, 1]],
                 )    
b = torch.ones(3)
print(a.size())
print(b.size())

(a+b).size()

torch.Size([2, 3])
torch.Size([3])


torch.Size([2, 3])

In [89]:
# They are broadcastble: 4D + 3D in this case
a_tensor = torch.randn(1,2,4,1)
b_tensor = torch.randn(2,1,1)

print((a_tensor + b_tensor).size())


torch.Size([1, 2, 4, 1])


In [90]:
# They are not broadcastble ( because 3!=2 in the third trailing dimension)
a_tensor_not = torch.randn(1,3,4,1)
b_tensor_not = torch.randn(2,1,1)

(a_tensor_not + b_tensor_not).size()

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1

#### D. Broadcasting is possible with matmul but not with mm

Matmul's behaviour depends on the dimensionality of the tensors (if broadcasting rules apply), while the mm function follows the mathematical rules for matrix multiplication.

In [93]:
# Broadcasting situations in matmul function
test5 = torch.randn(1, 2, 4, 1)
test6 = torch.randn(2, 1, 1)
# print(test5)
# print(test6)
output1 = torch.matmul(test5, test6)
print(output1.shape)

test7 = torch.randn(1, 2, 3)
test8 = torch.randn( 3, 2)
output2 = torch.matmul(test7, test8)
print(output2.shape)


torch.Size([1, 2, 4, 1])
torch.Size([1, 2, 2])


torch.Size([2])

Here is how I see in slow motion a broadcasting case:

1. [x] Both tensors are at least 1D;
2. [x] I check backwards (on each dimension level) the dimension of each tensor;  
In the above example, sizes of (4x1) and (1x1) allow for the multiplication resulting in 4x1 size tensors (it makes sense/more logic to me to compare the dimensions like this when it comes to multiplication);
3. [x] Then I iterate on the next dimension (with the value 1 and 1, so sizes are equal between the tensors);
4. [x] Then for the 4th trailing dimension, one of the tensors doesn't have the 3rd dimension and it will "borrow" the third dimension from the other one. 



![Check for broadcasting case](Images/Capture_broadcasting_matmul.JPG "Another example of checking broadcasting")


In [96]:
# Matmul function does broadcasting
a1_tensor = torch.randn(2, 3)
b1_tensor = torch.randn(3)
torch.matmul(a1_tensor, b1_tensor).shape

torch.Size([2])

In [81]:
# MM function does not support for broadcasting
a = torch.randn(2, 3)
b = torch.randn(2)
output2 = torch.mm(a,b)
output2.shape

a1 = torch.randn(1, 2, 4, 1)
b1 = torch.randn(2, 1, 1)
torch.mm(a1, b1).shape

RuntimeError: matrices expected, got 4D, 3D tensors at C:\w\1\s\tmp_conda_3.7_100118\conda\conda-bld\pytorch_1579082551706\work\aten\src\TH/generic/THTensorMath.cpp:131

###  2.2  Visualizing the sum of the elements within a tensor's dimension
 
One can have even more details after reading [this good article](https://towardsdatascience.com/understanding-dimensions-in-pytorch-6edf9972d3be).

Depending on the dimension chosen for the operation (**assuming here the case of a 3D tensor**) we analyzed:
-  the sum on the 0 dimension  (we could imagine this as a sum along all "slices" of tensors, colapsing the slices of tensors)    
-  the sum on the 1 dimension (we could think of a sum along the rows of each tensor, colapsing the rows of each tensor)    

-  the sum on the 2 dimension  (we could associate it with a sum along the olumns of each tensor, colapsing the rows of each tensor)  


In [42]:
#test4 is a 3D tensor(containing other 3 tensors each one having 2x3 size.shape)
test4 = torch.tensor([
     [
       [1, 2, 3],
       [4, 5, 6]
     ],
     [
       [1, 2, 3],
       [3, 3, 3]
     ],
     [
       [1, 2, 3],
       [4, 5, 6]
     ]
   ])

# when summing on 0 dimension
calapse_tensors = torch.sum(test4, dim=0) 
print(calapse_tensors)

# when summing on 1 dimension
calapse_rows = torch.sum(test4, dim=1) 
print(calapse_rows)

# when summing on 2 dimension
colapse_columns = torch.sum(test4, dim=2)
print(colapse_columns)

tensor([[ 3,  6,  9],
        [11, 13, 15]])
tensor([[5, 7, 9],
        [4, 5, 6],
        [5, 7, 9]])
tensor([[ 6, 15],
        [ 6,  9],
        [ 6, 15]])


## 3. Coming back to the calculation output 

### 3.1 For one layer 

In [60]:
def activation(x):
    """ Sigmoid activation function 
    
        Arguments
        ---------
        x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))


<strong> Note</strong> 
I will choose between more ways of having the vector column  ([`source`](https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics)):
1. input_tensor.reshape (find put more about it), 
2. input_tensor.resize_ which is an  in place operation that changes directly the content of a tensor
3. or input_tensor.view, which will give a new tensor with the same input tensor data. 

Here is a great forum thread to [read more about in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.

There are a few options here: [`weights.reshape()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape), [`weights.resize()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.resize_), and [`weights.view()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view).

In [69]:
### Generate some data
data = torch.manual_seed(7) # Set the random seed so things are predictable
print(data)

# Features are 5 random normal variables
features = torch.randn((1, 5)) # a tensor matrix 1x5 dimension, randomly generated according to the N(0,1)
print(features)
print(features.size())

# True weights for our data, random normal variables again
weights = torch.randn_like(features)
print(weights)

# change the shape of one line matrix tensor of weights to one column dimension with the help of tensor_input.view ( desired dimension)
print(weights.view(5,1))

# and a true bias term(a tensor as wel)
bias = torch.randn((1, 1))
print(bias)

<torch._C.Generator object at 0x000001DBCD51A250>
tensor([[-0.1468,  0.7861,  0.9468, -1.1143,  1.6908]])
torch.Size([1, 5])
tensor([[-0.8948, -0.3556,  1.2324,  0.1382, -1.6822]])
tensor([[-0.8948],
        [-0.3556],
        [ 1.2324],
        [ 0.1382],
        [-1.6822]])
tensor([[0.3177]])


In [70]:
## Calculate the output of this network using the weights and bias tensors (it works in this way because we are dealing with vectors !!!!)
y1 = activation(torch.sum(features * weights) + bias)
# OR
y2 = activation((features * weights).sum() + bias)
# Calculate the output of a node in the network using matrix multiplication
# activation(sum(torch.matmul(features, column_weights), bias))
y3 = activation(torch.mm(features, weights.view(5,1))+ bias)
# OR 
y33 = activation(torch.matmul(features, weights.view(5,1)) + bias)
y333 = activation(sum(torch.matmul(features, weights.view(5,1)), bias))

# print(y3, y33, y333, y1, y2)

In [97]:
## !!! Calculate the output of a node in the network using matrix multiplication with matmul ,  
# BUT matmul broatcasts (it allows operations between diferrent sizes of tensor under some conditions, see Note 1 above)
tensor1 = torch.randn(2, 3, 4)
tensor2 = torch.randn(4,1)
print(tensor1, tensor2)
torch.matmul(tensor1, tensor2).size()
print(torch.matmul(tensor1, tensor2))


tensor([[[-0.1282, -0.0747, -0.9838, -0.2720],
         [ 0.8230, -0.8795,  1.9651,  1.7311],
         [ 0.2184, -0.5579, -0.6674, -0.2830]],

        [[-2.1402,  0.8374,  0.2073,  0.8716],
         [-0.9958,  1.7378, -1.0768, -1.2173],
         [-0.2837,  1.1372,  0.8003,  0.2198]]]) tensor([[-0.5145],
        [ 1.5680],
        [ 0.1112],
        [-0.0949]])
tensor([[[-0.1348],
         [-1.7482],
         [-1.0345]],

        [[ 2.3546],
         [ 3.2329],
         [ 1.9973]]])


### 3.2 When having one hidden layer with 2 nodes

In [60]:
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 3 random normal variables
features = torch.randn([1, 3]) # we have a 2D tensor
print(features.shape)

# Define the size of each layer in our network
n_input = features.shape[1]     # Number of input units, must match number of input features
print(n_input)

n_hidden = 2                    # Number of hidden units, we will have 2 nodes in the hidden layer
n_output = 1                    # Number of output units

# Weights for inputs to hidden layer
W1 = torch.randn(n_input, n_hidden)
print(W1.shape)
# Weights for hidden layer to output layer
W2 = torch.randn(n_hidden, n_output)
print(W2)

# and bias terms for hidden and output layers
B1 = torch.randn((1, n_hidden))
print(B1)
B2 = torch.randn((1, n_output))

torch.Size([1, 3])
3
torch.Size([3, 2])
tensor([[-1.6822],
        [ 0.3177]])
tensor([[0.1328, 0.1373]])


**Note**:
1. h will be the hidden layer with 2 nodes;

2. h is a tensor resulted from the activation function applied to (features * W1 + B1);

3. output is a tensor resulted from teh activation function applied to (h * W2 + B2)

In [66]:
h = activation(torch.mm(features, W1) + B1)
print(h)
Y= activation(torch.mm(h, W2) + B2)
print(Y)

tensor([[0.6813, 0.4355]])
tensor([[0.3171]])


**Note**: tensor.shape[ ] and tensor.size() mean one and the same thing.  

They count for the number of elements along a dimension (depending of course if these functions take an argument or not). If they take an argument they refer to the shape/size of that dimension taken as argument (the mmax value for argument is no of dimensions-1, the minimum value is 0 meaning first dimension). If they don't have an argument, the function return the shape/size of the tensor.


In [104]:
t1 = torch.tensor(5)
print(t1, t1.shape) # the 0D tensor

t11= torch.tensor([5,6]) # a 1D tensor
print(t11, t11.shape)
print(t11.shape[0])

t12= torch.tensor([[5,6]]) #a 2D tensor, it is taken as tensor[1,2]
print(t12, t12.shape)
print(t12.shape[0])

t13= torch.tensor((1,3)) # a 1D tensor
print(t13, t11.shape)
print(t13.shape[0])


tensor(5) torch.Size([])
tensor([5, 6]) torch.Size([2])
2
tensor([[5, 6]]) torch.Size([1, 2])
1
tensor([1, 3]) torch.Size([2])
2


In [107]:
t2 = torch.tensor([[5, 4, 6, 7],[1, 5, 7, 9]])
print(t2, t2.shape)
print(t2.shape[0])

print(t2.shape[1])
# OR
print(t2.size(1))

t3 = torch.tensor([[10,2], [5,3], [11,5]])
print(t3, t3.shape) 
print(t3.shape[0]) 
print(t3.shape[1])


tensor([[5, 4, 6, 7],
        [1, 5, 7, 9]]) torch.Size([2, 4])
2
4
4
tensor([[10,  2],
        [ 5,  3],
        [11,  5]]) torch.Size([3, 2])
3
2


In [48]:
t4 = torch.tensor([[[10],[5]],[[4], [5]], [[5],[7]]]) # a 3D tensor
print(t4, t4.shape) #  is torch.Size([3, 2, 1])
print(t4.shape[0])
print(t4.shape[1])
print(t4.shape[2])

t = torch.unsqueeze(t4, 0) # is a 4D tensor, the second argument in unsqueeze adds an extra dimension
# t will be  torch.Size([1, 3, 2, 1]) 
print(t, t.shape) 
print(t.shape[0])
print(t.shape[1])
print(t.shape[2])
print(t.shape[3])

tensor([[[10],
         [ 5]],

        [[ 4],
         [ 5]],

        [[ 5],
         [ 7]]]) torch.Size([3, 2, 1])
3
2
1
tensor([[[[10],
          [ 5]],

         [[ 4],
          [ 5]],

         [[ 5],
          [ 7]]]]) torch.Size([1, 3, 2, 1])
1
3
2
1


## 4. Switching between Numpy's arrays and Pytorch's tensors

In [72]:
import numpy as np
a = np.random.rand(4,3)
a

array([[0.78868662, 0.73321195, 0.47463454],
       [0.3379168 , 0.85028137, 0.99561204],
       [0.54409091, 0.66113114, 0.93360019],
       [0.25007664, 0.11012498, 0.82379173]])

In [73]:
b = torch.from_numpy(a) 
b # is the array a transformed into a tensor

tensor([[0.7887, 0.7332, 0.4746],
        [0.3379, 0.8503, 0.9956],
        [0.5441, 0.6611, 0.9336],
        [0.2501, 0.1101, 0.8238]], dtype=torch.float64)

In [76]:
b.numpy() # wuth .numpy() we bring back the tensor b into a numpy array


array([[1.57737324, 1.46642389, 0.94926907],
       [0.67583361, 1.70056273, 1.99122408],
       [1.08818182, 1.32226228, 1.86720038],
       [0.50015328, 0.22024995, 1.64758346]])

In [77]:
b.mul_(2) # an in place operation that changes the orginal tensor

tensor([[3.1547, 2.9328, 1.8985],
        [1.3517, 3.4011, 3.9824],
        [2.1764, 2.6445, 3.7344],
        [1.0003, 0.4405, 3.2952]], dtype=torch.float64)

In [78]:
a  # a has changed as well after the change of "its" tensor

array([[3.15474647, 2.93284778, 1.89853815],
       [1.35166722, 3.40112547, 3.98244815],
       [2.17636364, 2.64452456, 3.73440075],
       [1.00030656, 0.4404999 , 3.29516692]])