# Tensor 
Tensor is a very important data type in Pytorch (even in Tensorflow). It can be a scala,array or n-dimensional matrix. Tensor is in some circumstance the same as "ndarray" in Numpy, for example its characteristics, shape, dtype... The difference between two libraries is that the operation of tensor can be **accelerated** by GPU.

In [1]:
import torch as t
import numpy as np

## Build up tensor
1. Using Tensor

In [2]:
# build a 5x3 matrix, no initialization
x = t.Tensor(5, 3)
# print the shape of tensor in form of tuple
print(x.shape)
# print the data type of the established tensor
print(x.dtype)

torch.Size([5, 3])
torch.float32


Talking about tensor type
We use .type() operation to get the type of a defined tensor. Pytorch set FloatTensor as default. We can use set_default_tensor_type to change the default type.

In [3]:
print(x.type())
t.set_default_tensor_type(t.DoubleTensor) # Only valid for newly constructed tensors, the old ones retain the original data format
print(t.tensor([1, 2.2]).type())

torch.FloatTensor
torch.DoubleTensor


2.Using distribution, random initialization

In [4]:
t.set_default_tensor_type(t.FloatTensor)
# use uniform distribution 0~1 to build the 2-d tensor
x = t.rand(5, 3)
print("Tensor using uniform distribution 0~1:{ten}".format(ten=x))
print("Comparison between .size: {si}; and .shape: {sha}".format(si=x.size(), sha=x.shape)) # both return a tuple format, so we can use tuple operation

# create tensor with integer value in interval 1~10, 2x2 dimension
x = t.randint(1, 10, [2, 2])
print("Tensor using uniform distribution with integer value:{ten}".format(ten=x))

# normal distribution N(0, 1), dimension as input
x = t.randn(3, 3)
print("Tensor using normal distribution N(0, 1): {ten}".format(ten=x))

# normal distribution with self defined mean and standard deviation, dimension is defined using mean and standarad deviation
x = t.normal(mean=t.full([10], 0, dtype=t.float), std=t.arange(1, 0, -0.1))
print("Tensor using normal distribution with self defined mean and standard deviation:{ten}".format(ten=x))

Tensor using uniform distribution 0~1:tensor([[0.3270, 0.6949, 0.2548],
        [0.2707, 0.1513, 0.1833],
        [0.4186, 0.6758, 0.1735],
        [0.0754, 0.7741, 0.0341],
        [0.0359, 0.2405, 0.1259]])
Comparison between .size: torch.Size([5, 3]); and .shape: torch.Size([5, 3])
Tensor using uniform distribution with integer value:tensor([[7, 5],
        [3, 9]])
Tensor using normal distribution N(0, 1): tensor([[-0.3397,  1.2896,  1.8661],
        [-0.5032, -0.5382,  0.5253],
        [-0.7063, -0.3635,  0.4330]])
Tensor using normal distribution with self defined mean and standard deviation:tensor([-0.1499, -0.8441,  0.4788, -1.6035,  0.5667,  0.4058,  1.0035, -0.0165,
        -0.0034, -0.1500])


3.Build up tensor using numpy data or original python data format

In [5]:
# from numpy 
a = np.array([1, 1])
a_t = t.from_numpy(a)
print("Data type of a:{num}; Data type of a_t:{ten}".format(num=a.dtype, ten=a_t.dtype)) # as we see, that the a_t follows the basic type of origin numpy data

# from list, do not prefer to use this method!!!!!!
b_ft = t.FloatTensor([2, 3.3])
b_t = t.tensor([2, 3.3]) 
print("Data type of b_ft:{num}; Data type of b_t:{ten}".format(num=b_ft.dtype, ten=b_t.dtype))

Data type of a:int32; Data type of a_t:torch.int32
Data type of b_ft:torch.float32; Data type of b_t:torch.float32


4. Creation using dimension, its values will be random generated.

In [6]:
# Creation of tensor just use dimension as input
a = t.empty(2, 3)
b = t.FloatTensor(2, 3)
c = t.IntTensor(2, 3)
a,b,c

(tensor([[2.4176e-12, 1.7740e+28, 7.1447e+31],
         [1.6216e-19, 7.0362e+22, 7.5632e+28]]),
 tensor([[0.0000e+00, 0.0000e+00, 6.7943e+22],
         [5.2669e-08, 6.6386e-07, 4.3126e-08]]),
 tensor([[0, 0, 0],
         [0, 0, 0]], dtype=torch.int32))

5. Build up a tensor using the same value

In [7]:
# full function, tensor data type is the same as the second input 
x = t.full([2, 3],7)
print("Create a tensor using full function: {ten} and its data type {data_t}".format(ten=x, data_t=x.dtype))

# all zeros tensor, data type follows the pytorch default type 
x = t.zeros([2, 4])
print("Create an all zeros tensor: {ten} and its data type {data_t}".format(ten=x, data_t=x.dtype))

# all ones tensor, data type follows the pytorch default type 
x = t.ones([2, 4])
print("Create an all ones tensor: {ten} and its data type {data_t}".format(ten=x, data_t=x.dtype))

# unit diagonal matrix as tensor, data type follows the pytorch default type, dimension must bigger than 1
x = t.eye(2, 4)
print("Create an unit diagnoal matrix tensor: {ten} and its data type {data_t}".format(ten=x, data_t=x.dtype))

Create a tensor using full function: tensor([[7, 7, 7],
        [7, 7, 7]]) and its data type torch.int64
Create an all zeros tensor: tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]]) and its data type torch.float32
Create an all ones tensor: tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]]) and its data type torch.float32
Create an unit diagnoal matrix tensor: tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.]]) and its data type torch.float32


## Shuffle a tensor 

In [8]:
a = t.rand(3, 1)
print(a)

# create a random idex array from 0 to 3
idx = t.randperm(3)

a_sf = a[idx]
print(a_sf)

tensor([[0.3615],
        [0.6256],
        [0.7473]])
tensor([[0.6256],
        [0.3615],
        [0.7473]])


## Tensor operation
1.Addition

In [9]:
x = t.rand(5, 3)
y = t.rand(5, 3)
print(x,y,sep='\n')

x + y

tensor([[0.2915, 0.3833, 0.1934],
        [0.6083, 0.7394, 0.6485],
        [0.0284, 0.8289, 0.6710],
        [0.8073, 0.8999, 0.9897],
        [0.0389, 0.1359, 0.0736]])
tensor([[0.5958, 0.0376, 0.7749],
        [0.1711, 0.1967, 0.7865],
        [0.4026, 0.2271, 0.0474],
        [0.3513, 0.4790, 0.1851],
        [0.9638, 0.6838, 0.5275]])


tensor([[0.8873, 0.4208, 0.9683],
        [0.7794, 0.9361, 1.4350],
        [0.4310, 1.0560, 0.7185],
        [1.1586, 1.3788, 1.1748],
        [1.0027, 0.8197, 0.6011]])

In [10]:
result = t.add(x, y)
result

tensor([[0.8873, 0.4208, 0.9683],
        [0.7794, 0.9361, 1.4350],
        [0.4310, 1.0560, 0.7185],
        [1.1586, 1.3788, 1.1748],
        [1.0027, 0.8197, 0.6011]])

In [11]:
# object method addition
result = y.add(x)
print("Comparison between result: {res} \n and y: {yy}".format(res=result, yy=y)) # do not change the value of y 

# inplace addition
result = y.add_(x)
print("Comparison between result: {res} \n and y: {yy}".format(res=result, yy=y)) # the value of y is changed 

Comparison between result: tensor([[0.8873, 0.4208, 0.9683],
        [0.7794, 0.9361, 1.4350],
        [0.4310, 1.0560, 0.7185],
        [1.1586, 1.3788, 1.1748],
        [1.0027, 0.8197, 0.6011]]) 
 and y: tensor([[0.5958, 0.0376, 0.7749],
        [0.1711, 0.1967, 0.7865],
        [0.4026, 0.2271, 0.0474],
        [0.3513, 0.4790, 0.1851],
        [0.9638, 0.6838, 0.5275]])
Comparison between result: tensor([[0.8873, 0.4208, 0.9683],
        [0.7794, 0.9361, 1.4350],
        [0.4310, 1.0560, 0.7185],
        [1.1586, 1.3788, 1.1748],
        [1.0027, 0.8197, 0.6011]]) 
 and y: tensor([[0.8873, 0.4208, 0.9683],
        [0.7794, 0.9361, 1.4350],
        [0.4310, 1.0560, 0.7185],
        [1.1586, 1.3788, 1.1748],
        [1.0027, 0.8197, 0.6011]])


2.Subtraction

In [12]:
a = t.rand(3, 4)
b = t.rand(4)

# boardcasting
c1 = a - b
c2 = t.sub(a, b)

print(c1.shape, c2.shape)
print("a:= {a};\n b:= {b};\n c1:= {c1};\n c2:= {c2}".format(a=a, b=b, c1=c1, c2=c2))

torch.Size([3, 4]) torch.Size([3, 4])
a:= tensor([[0.9477, 0.8660, 0.9188, 0.4647],
        [0.5913, 0.2223, 0.5526, 0.9539],
        [0.8989, 0.9163, 0.8479, 0.5364]]);
 b:= tensor([0.5236, 0.2809, 0.0127, 0.2103]);
 c1:= tensor([[ 0.4241,  0.5850,  0.9061,  0.2544],
        [ 0.0677, -0.0586,  0.5400,  0.7436],
        [ 0.3753,  0.6353,  0.8352,  0.3261]]);
 c2:= tensor([[ 0.4241,  0.5850,  0.9061,  0.2544],
        [ 0.0677, -0.0586,  0.5400,  0.7436],
        [ 0.3753,  0.6353,  0.8352,  0.3261]])


3. Element wise multiplication

In [13]:
c1 = a * b
c2 = t.mul(a, b)
print("a:= {a};\n b:= {b};\n c1:= {c1};\n c2:= {c2}".format(a=a, b=b, c1=c1, c2=c2))

a:= tensor([[0.9477, 0.8660, 0.9188, 0.4647],
        [0.5913, 0.2223, 0.5526, 0.9539],
        [0.8989, 0.9163, 0.8479, 0.5364]]);
 b:= tensor([0.5236, 0.2809, 0.0127, 0.2103]);
 c1:= tensor([[0.4963, 0.2433, 0.0116, 0.0977],
        [0.3096, 0.0625, 0.0070, 0.2006],
        [0.4707, 0.2574, 0.0107, 0.1128]]);
 c2:= tensor([[0.4963, 0.2433, 0.0116, 0.0977],
        [0.3096, 0.0625, 0.0070, 0.2006],
        [0.4707, 0.2574, 0.0107, 0.1128]])


4. Element wise division

In [14]:

c1 = a / b
c2 = t.div(a, b)
print(c1.shape, c2.shape)
print("a:= {a};\n b:= {b};\n c1:= {c1};\n c2:= {c2}".format(a=a, b=b, c1=c1, c2=c2))

torch.Size([3, 4]) torch.Size([3, 4])
a:= tensor([[0.9477, 0.8660, 0.9188, 0.4647],
        [0.5913, 0.2223, 0.5526, 0.9539],
        [0.8989, 0.9163, 0.8479, 0.5364]]);
 b:= tensor([0.5236, 0.2809, 0.0127, 0.2103]);
 c1:= tensor([[ 1.8099,  3.0822, 72.5329,  2.2100],
        [ 1.1292,  0.7914, 43.6273,  4.5367],
        [ 1.7167,  3.2614, 66.9348,  2.5512]]);
 c2:= tensor([[ 1.8099,  3.0822, 72.5329,  2.2100],
        [ 1.1292,  0.7914, 43.6273,  4.5367],
        [ 1.7167,  3.2614, 66.9348,  2.5512]])


5. Matrix multiplication

In [15]:
# 2D matrix multiplication

a = t.ones(2, 1)
b = t.ones(1, 2)
print(t.mm(a, b))
print(t.matmul(a, b))
print(a @ b)

# n-D matrix multipication, only the last two dimension will be multipled.

c = t.rand(4, 3, 28, 64)
d = t.rand(4, 3, 64, 32)
print(t.matmul(c, d).shape)

tensor([[1., 1.],
        [1., 1.]])
tensor([[1., 1.],
        [1., 1.]])
tensor([[1., 1.],
        [1., 1.]])
torch.Size([4, 3, 28, 32])


## Tensor on GPU

In [16]:
device = t.device("cuda:0" if t.cuda.is_available() else "cpu")
x = x.to(device)
y = y.to(x.device)
z = x+y

## Autograd
The algorithm of deep learning essentially uses backpropagation to determine the derivative, and PyTorch's autograd module implements this function. For all operations on Tensor, autograd can automatically provide differentiation for them, avoiding the complicated process of manually calculating derivatives

In [17]:
# overview the description of autograd
help(t.autograd)

Help on package torch.autograd in torch:

NAME
    torch.autograd

DESCRIPTION
    ``torch.autograd`` provides classes and functions implementing automatic
    differentiation of arbitrary scalar valued functions. It requires minimal
    changes to the existing code - you only need to declare :class:`Tensor` s
    for which gradients should be computed with the ``requires_grad=True`` keyword.
    As of now, we only support autograd for floating point :class:`Tensor` types (
    half, float, double and bfloat16) and complex :class:`Tensor` types (cfloat, cdouble).

PACKAGE CONTENTS
    _functions (package)
    anomaly_mode
    function
    functional
    grad_mode
    gradcheck
    profiler
    variable

CLASSES
    torch._C._FunctionBase(builtins.object)
        torch.autograd.function.Function(torch._C._FunctionBase, torch.autograd.function._ContextMethodMixin, torch.autograd.function._HookMixin)
    torch._C._LegacyVariableBase(builtins.object)
        torch.autograd.variable.Variabl

We use torch.autograd to calculate the gradient, when a neutral network is built up using pytorch. To get the gradient information, we follow the next steps:
1. Build up a calculation graph using torch.tensor with requires_grad=True. The tensors form the calculation node. 
2. After finishing all the tensor calculation, execute .backward() to calculate the needed gradient.
3. Use torch.tensor.grad to get the gradient of a certain tensor or variable.

Under the context of gradient, torch.tensor has two important characteristics, requires_grad and grad_fn Both characteristics have something to do with manual operation.
1. requires_grad has a boolean type. True means that the tensor's gradient needs to be calculated. False, just in the opposite. The requires_grad must be set when the tensor is built up, its default value is false.
2. grad_fn return if the tensor is result of a calculation or a function, for example torch.matmul(), torch.mul(), torch.add(). If yes, return the type of the calculation. 

To understand "requires_grad" and "grad_fn" better, there is an example:

In [26]:
# build up the calculation map
x = t.randn(2, 2)
y = t.randn(2, 2)
z = t.randn(2, 2,requires_grad=True)
a = x + y
b = a + z

print(x.requires_grad, y.requires_grad, z.requires_grad, a.requires_grad, b.requires_grad)
print(x.grad_fn, y.grad_fn, z.grad_fn, a.grad_fn, b.grad_fn)

False False True False True
None None None None <AddBackward0 object at 0x0000023295860CD0>


![title](img/01.png)
The graph show the calculation map. In pytorch we define x, y and z as "leaf variables". It is easy to understand the results of x, y, z's requires_grad. As b is a calculation result involving z, its requires_grad is also True. grad_fn is a little bit special, only a tensor signed with requires_grad=True, will be considered.Besides all tensors and their derived calculation results' grad_fn will be signed as None. If we modify x's requires_grad as True, a.grad_fn will change to be <AddBackward0 object at ....>. See below:

In [28]:
x.requires_grad = True
a = x + y
print(x.requires_grad, y.requires_grad, z.requires_grad, a.requires_grad, b.requires_grad)
print(x.grad_fn, y.grad_fn, z.grad_fn, a.grad_fn, b.grad_fn)

True False True True True
None None None <AddBackward0 object at 0x00000232E83E9400> <AddBackward0 object at 0x00000232E83E95B0>


<font color=red>Pay attention! Only the leaf variables' requires_grad can be modified. If we write "a.requires_grad = True", it declares an error.</font>
![title](img/02.png)

Now let's see how the gradient calculation will be executed in pytorch. 

In [69]:
# do forward propagation
x = t.tensor([[1.,2.,3.],[4.,5.,6.]], requires_grad=True)
y = x + 2
z = t.pow(y, 2) * 3
out = z.mean()
x, y, z, out

(tensor([[1., 2., 3.],
         [4., 5., 6.]], requires_grad=True),
 tensor([[3., 4., 5.],
         [6., 7., 8.]], grad_fn=<AddBackward0>),
 tensor([[ 27.,  48.,  75.],
         [108., 147., 192.]], grad_fn=<MulBackward0>),
 tensor(99.5000, grad_fn=<MeanBackward0>))

In [70]:
# just run this block for 1 time. Otherwise the result is false! We will explain it later
out.backward() 
x.grad

tensor([[3., 4., 5.],
        [6., 7., 8.]])

The calculation map here is really easy to understand, x -> y -> z -> out. We can see that out.backward() means to calculate $\frac{\partial out}{\partial x_i}$. Here we ignore the mathmatical derivation.

As we know that out determine the mean of z, $out=\frac{1}{dim(z)}\sum_i z_i$, so that out here is a scalar. <font color=red>If we want to know the derivation of z to x, the gradient parameter must be set as a ones tensor with the same dimension as z. Otherwise it will declare an error called: RuntimeError: grad can be implicitly created only for scalar outputs</font>

In [52]:
gradients = t.ones(z.size())
z.backward(gradients)
x.grad

tensor([[18., 24., 30.],
        [36., 42., 48.]])

The reason of, why we need an gradients parameter, is that pytorch only allow to calculated the derivation from scalar to tensor. From tensor to tensor is really hard to realize, so pytorch design a mechanism that z.backward(gradient) actually means:
```python
L = torch.sum(z * gradient)
L.backward()
x.grad
```
Now there is another question: how can we get the derivation of a median value, for example $\frac{\partial out}{\partial y}$. In the debugging process, sometimes we need to monitor the mediate variable gradient to ensure the effectiveness of the network. At this time, we need to print out the gradient of the non-leaf node.Here we introduce one method, retain_grad. The other one register_hook, see https://www.jianshu.com/p/ad66f2e38f2fd

In [53]:
x = t.ones(2, 2, requires_grad=True)
y = x + 2
y.retain_grad()
z = t.pow(y, 2) * 3
out = z.mean()
out.backward()
print(y.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


Now we come to solve the initial operational problem.

In [82]:
# do forward propagation
x = t.tensor([[1.,2.,3.],[4.,5.,6.]], requires_grad=True)
y = x + 2
z = t.pow(y, 2) * 3
out = z.mean()

if we run the following block for many time. An error occurs, <font color= red>Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.</font>


In [80]:
'''
Run the following code twice if you want to see the error!
-----------------------------------------------------------
out.backward() 
x.grad

'''

'\nRun the following code twice if you want to see the error!\n-----------------------------------------------------------\nout.backward() \nx.grad\n\n'

To solve this problem,we set the parameter retain_graph as True. Now we can run this code many time as you wish. 

In [92]:
'''
If you have run the block above before you reach here, run firstly the block "do forward propagation" above and then run the code below many
times. Focus on the result, you will find something interesting!

'''
out.backward(retain_graph=True) 
x.grad

tensor([[15., 20., 25.],
        [30., 35., 40.]])

However, anothor problem comes out. We can see from the results, as the number of runs increases, the gradient calculation results will add up. <font color=red>Therefore, remember to clear the gradient information of x every time you run!</font>

In [93]:
x.grad.data.zero_() # clear x's gradient information
out.backward(retain_graph=True) 
x.grad

tensor([[3., 4., 5.],
        [6., 7., 8.]])