## Basic data type
**All is about Tensor**

|python|PyTorch|
|---|---|
|Int|IntTensor of size()|
|float|FloatTensor of size()|
|Int array|IntTensor of size [d1,d2,...]|
|Float array|FloatTensor of size [d1,d2,...]|
|string|--|

#### How to denote string
- One-hot
- Embedding

### Data type
|Data type|dtype|CPU tensor|GPU tensor|
|---|---|---|---|
|32-bit floating point|`torch.float32`or`torch.float`|`torch.FloatTensor`|`torch.cuda.FloatTensor`|
|64-bit floating point|`torch.float64`or`torch.double`|`torch.DoubleTensor`|`torch.cuda.DoubleTensor`|
|16-bit floating point|`torch.float16`or`torch.half`|`torch.HalfTensor`|`torch.cuda.HalfTensor`|
|8-bit integer point (unsigned)|`torch.uint8`|`torch.ByteTensor`|`torch.cuda.ByteTensor`|
|8-bit integer point (signed)|`torch.int8`|`torch.CharTensor`|`torch.cuda.CharTensor`|
|16-bit integer point (signed)|`torch.int16`or`torch.short`|`torch.ShortTensor`|`torch.cuda.ShortTensor`|
|32-bit integer point (signed)|`torch.int32`or`torch.int`|`torch.IntTensor`|`torch.cuda.IntTensor`|
|64-bit integer point (signed)|`torch.int64`or`torch.long`|`torch.LongTensor`|`torch.cuda.LongTensor`|

### Type check

In [1]:
import torch

a = torch.randn(2, 3)
print(a.type())
print(type(a))
print(isinstance(a, torch.FloatTensor))  # parameter type validation

torch.FloatTensor
<class 'torch.Tensor'>
True


`a` is a 2-dim tensor with the size 2x3

In [2]:
# data type are different in gpu
import torch

data = torch.DoubleTensor(2, 3)
print(isinstance(data, torch.cuda.DoubleTensor))
data = data.cuda()
print(isinstance(data, torch.cuda.DoubleTensor))

False
True


### Dimension 0 / rank 0
**scalar**

In [3]:
import torch

print(torch.tensor(1.))
print(torch.tensor(1.3))

tensor(1.)
tensor(1.3000)


`torch.tensor(1.3)` is a 0-dim scalar, whereas `torch.tensor([1.3])` is a 1-dim, 1-sized **Tensor**<br>
**loss** 经常用0维张量表示

In [4]:
import torch

data = torch.tensor(2.2)
print(data.shape)
print(len(data.shape))
print(data.size())

torch.Size([])
0
torch.Size([])


### Dimension 1 / rank 1
**vector**

In [5]:
import torch
import numpy as np

print(torch.tensor([1.1]))
print(torch.tensor([1.1, 2.2]))
print(torch.FloatTensor(1))
print(torch.FloatTensor(2))
data = np.ones(2)
print(data)
print(torch.from_numpy(data))

tensor([1.1000])
tensor([1.1000, 2.2000])
tensor([-7.9386e-10])
tensor([-7.9384e-10,  3.0662e-41])
[1. 1.]
tensor([1., 1.], dtype=torch.float64)


**Attention**:
- `torch.tensor()` takes in data item
- `torch.FloatTensor()` takes in data shape, initialized randomly, or data item wrapped with `[item]`

**Bias, Linear Input** 经常用1维张量表示

### Dimension 2

In [6]:
import torch

data = torch.randn(2, 3)
print(data)
print(data.shape)
print(data.size(0))
print(data.size(1))
print(data.shape[1])

tensor([[ 1.3900,  0.3228,  2.0778],
        [ 0.0510, -0.9101,  0.0500]])
torch.Size([2, 3])
2
3
3


**Linear Input Batch** 经常用2维张量表示

### Dimension 3

In [7]:
import torch

data = torch.rand(2, 1, 3)
print(data)
print(data.shape)
print(data[0])
print(list(data.shape))

tensor([[[0.3193, 0.8452, 0.5259]],

        [[0.1395, 0.3114, 0.1656]]])
torch.Size([2, 1, 3])
tensor([[0.3193, 0.8452, 0.5259]])
[2, 1, 3]


**RNN Input Batch** 经常用2维张量表示
### Dimension 4

In [8]:
import torch

data = torch.rand(2, 3, 28, 28)
print(data)
print(data.shape)
print(data[0])
print(list(data.shape))

tensor([[[[7.9020e-01, 9.2412e-01, 1.2545e-02,  ..., 4.0728e-01,
           5.4756e-01, 4.0401e-01],
          [8.1836e-01, 9.5927e-01, 3.7657e-01,  ..., 7.0206e-02,
           2.6442e-01, 2.5098e-01],
          [6.6138e-01, 3.1906e-01, 8.5082e-01,  ..., 9.2735e-01,
           1.7698e-01, 2.6664e-01],
          ...,
          [2.6504e-01, 8.1270e-01, 2.6049e-02,  ..., 2.4965e-01,
           7.1566e-02, 6.2901e-01],
          [1.0624e-01, 7.6288e-01, 6.8958e-01,  ..., 3.0468e-01,
           1.3646e-01, 6.6147e-01],
          [8.2556e-01, 9.5165e-01, 9.2430e-02,  ..., 4.7954e-01,
           1.8112e-01, 9.6573e-01]],

         [[6.3637e-01, 6.9372e-01, 2.5479e-01,  ..., 8.6715e-01,
           4.7054e-01, 5.4061e-01],
          [3.9575e-01, 5.0156e-01, 5.0081e-01,  ..., 7.5456e-01,
           6.4379e-04, 1.2794e-01],
          [5.0436e-01, 4.7143e-01, 6.4827e-01,  ..., 3.4795e-02,
           8.0159e-01, 9.7851e-01],
          ...,
          [3.5056e-01, 6.9816e-01, 2.7687e-01,  ..., 5.9365

**Image** 经常用2维张量表示: `[batch_size,channel,height,width]`
### Mixed

In [9]:
import torch

data = torch.rand(2, 3, 28, 28)
print(data.shape)
print(data.numel())  # number of element
print(data.dim())

torch.Size([2, 3, 28, 28])
4704
4


## Create Tensor
### Import from numpy

In [10]:
import numpy as np
import torch

data = np.array([2, 3.3])
print(torch.from_numpy(data))
data = np.ones([2, 3])
print(torch.from_numpy(data))

tensor([2.0000, 3.3000], dtype=torch.float64)
tensor([[1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)


### Import from List

In [11]:
import torch

print(torch.tensor([2., 3.2]))
print(torch.FloatTensor([2., 3.2]))
print(torch.tensor([[2., 3.2], [1., 22.3]]))

tensor([2.0000, 3.2000])
tensor([2.0000, 3.2000])
tensor([[ 2.0000,  3.2000],
        [ 1.0000, 22.3000]])


### Uninitialized data
- `torch.empty(l)`
- `torch.FloatTensor(d1,d2,d3)`
- `torch.IntTensor(d1,d2,d3)`
### Set default type

In [12]:
import torch

print(torch.tensor([1.2, 3]).type())
torch.set_default_tensor_type(torch.DoubleTensor)
print(torch.tensor([1.2, 3]).type())
torch.set_default_tensor_type(torch.FloatTensor)

torch.FloatTensor
torch.DoubleTensor


`double` type is usually used in **Reinforcement Learning**, whereas `float` type is used otherwise.

### rand / rand_like, randint
- `[0,1)`
- `[min, max)`
- *_like

In [13]:
import torch

a = torch.rand(3, 3)
print(a)
print(torch.rand_like(a))
print(torch.randint(1, 10, [3, 3]))  # [1,10)

tensor([[0.2482, 0.3893, 0.8060],
        [0.0617, 0.8547, 0.0323],
        [0.4225, 0.2367, 0.9361]])
tensor([[0.6445, 0.2877, 0.2359],
        [0.8700, 0.3187, 0.8145],
        [0.1740, 0.6938, 0.0955]])
tensor([[2, 7, 5],
        [4, 2, 2],
        [3, 5, 3]])


### randn
- N(0,1)
- N(u,std)

In [14]:
import torch

print(torch.randn(3, 3))
print(torch.normal(mean=torch.full([9], 3.0), std=torch.full([9], 2.0)).view(3, 3))

tensor([[ 1.3538, -0.9255, -0.5808],
        [-0.4653, -0.6337,  1.1070],
        [-1.0155,  0.8418, -0.6310]])
tensor([[ 1.0734,  2.2591,  4.8391],
        [ 3.9824,  6.0406,  3.8260],
        [-1.1887,  2.0839,  3.7081]])


### full

In [15]:
import torch

print(torch.full([2, 3], 7.))

tensor([[7., 7., 7.],
        [7., 7., 7.]])


### arange/range
### linspace/logspace
### ones/zeros/eye
### randperm: random.shuffle

In [16]:
import torch

print(torch.randperm(10))
a = torch.rand(2, 3)
b = torch.rand(2, 2)
print(a, '\n', b)
idx = torch.randperm(2)
print(idx)
print(a[idx], '\n', b[idx])

tensor([8, 2, 4, 9, 0, 5, 6, 1, 7, 3])
tensor([[0.2991, 0.8423, 0.6147],
        [0.1951, 0.9974, 0.6566]]) 
 tensor([[0.3686, 0.3086],
        [0.6662, 0.3752]])
tensor([1, 0])
tensor([[0.1951, 0.9974, 0.6566],
        [0.2991, 0.8423, 0.6147]]) 
 tensor([[0.6662, 0.3752],
        [0.3686, 0.3086]])


## Indexing and Slice
### Indexing
- dim 0 first

In [17]:
import torch

data = torch.rand(4, 3, 28, 28)
print(data[0].shape)
print(data[0, 0].shape)
print(data[0, 0, 2, 4])

torch.Size([3, 28, 28])
torch.Size([28, 28])
tensor(0.4916)


### select first/last N
use **`:`**

In [18]:
import torch

data = torch.rand(4, 3, 28, 28)
print(data[:2].shape)
print(data[:2, :1, :, :].shape)
print(data[:2, 1:, :, :].shape)
print(data[:2, -1:, :, :].shape)

torch.Size([2, 3, 28, 28])
torch.Size([2, 1, 28, 28])
torch.Size([2, 2, 28, 28])
torch.Size([2, 1, 28, 28])


### select by steps
```
start:end:step
```
### select by specific index

In [19]:
import torch

data = torch.rand(4, 3, 28, 28)
print(data.index_select(0, torch.tensor([0, 2])))  # select the 0th and 2nd image in the batch (dim 0)
print(data.index_select(1, torch.tensor([1, 2])))  # select the 1st and 2nd channel

tensor([[[[0.7088, 0.3793, 0.4470,  ..., 0.3573, 0.3908, 0.5766],
          [0.6455, 0.8977, 0.5200,  ..., 0.9006, 0.9129, 0.1650],
          [0.6583, 0.6642, 0.1869,  ..., 0.8427, 0.2104, 0.4289],
          ...,
          [0.8010, 0.7757, 0.6833,  ..., 0.7287, 0.8546, 0.6893],
          [0.7575, 0.3833, 0.8530,  ..., 0.8193, 0.8495, 0.7231],
          [0.0432, 0.8897, 0.9249,  ..., 0.1218, 0.9154, 0.4881]],

         [[0.2005, 0.5213, 0.9345,  ..., 0.3974, 0.0960, 0.4001],
          [0.9628, 0.4946, 0.6078,  ..., 0.2566, 0.5435, 0.6679],
          [0.2166, 0.0384, 0.7543,  ..., 0.6260, 0.4738, 0.8203],
          ...,
          [0.0226, 0.6026, 0.6478,  ..., 0.6894, 0.3150, 0.4137],
          [0.0216, 0.0967, 0.7428,  ..., 0.1063, 0.4162, 0.0745],
          [0.0060, 0.3569, 0.5468,  ..., 0.7573, 0.5379, 0.4131]],

         [[0.7357, 0.9472, 0.9805,  ..., 0.1981, 0.8496, 0.1147],
          [0.9601, 0.7892, 0.8236,  ..., 0.1969, 0.9379, 0.2697],
          [0.7427, 0.1160, 0.2293,  ..., 0

### ... (dots)

In [20]:
import torch

data = torch.rand(4, 3, 28, 28)
print(data[...].shape)
print(data[:, 1, ...].shape)

torch.Size([4, 3, 28, 28])
torch.Size([4, 28, 28])


*... is just for convenience*
### select by mask

In [21]:
import torch

data = torch.randn(3, 4)
print(data)
mask = data.ge(0.5)  # greater than or equal to
print(mask)
print(torch.masked_select(data, mask))  # reshape automatically to dim 1

tensor([[-1.5994,  0.7768,  1.1631, -0.4760],
        [ 0.1112, -0.4026,  0.0910, -0.1785],
        [ 0.5317,  0.6668, -1.0157,  1.3639]])
tensor([[False,  True,  True, False],
        [False, False, False, False],
        [ True,  True, False,  True]])
tensor([0.7768, 1.1631, 0.5317, 0.6668, 1.3639])


### select by flatten index

In [22]:
import torch

data = torch.randn(3, 4)
print(data)
print(torch.take(data, torch.tensor([0, 2, 5])))

tensor([[-0.7062, -0.2849,  0.8619, -0.7494],
        [ 1.9935, -0.5579, -0.1603, -1.4519],
        [ 0.8707,  0.8974,  0.5575,  1.3623]])
tensor([-0.7062,  0.8619, -0.5579])


## Dimension reshape
### Operation
- view/reshape: lost dim information

In [23]:
import torch

data = torch.randn(4, 3, 28, 28)
print(data.view(4 * 3, 28 * 28).shape)  # numel must be equal

torch.Size([12, 784])


- squeeze/unsqueeze

In [24]:
import torch

data = torch.randn(4, 3, 28, 28)
# unsqueeze: insert one index
print(data.unsqueeze(0).shape)
print(data.unsqueeze(-1).shape)
print(data.unsqueeze(4).shape)
print(data.unsqueeze(-4).shape)
print(data.unsqueeze(-5).shape)
# print(data.unsqueeze(5).shape) # IndexError: Dimension out of range (expected to be in range of [-5, 4], but got 5)

# an example: add bias to image batch
bias = torch.rand(32)
# imgs = torch.rand(4, 32, 14, 14)
bias = bias.unsqueeze(1).unsqueeze(2).unsqueeze(0)
print(bias.shape)
# now imgs+bias is OK
# expand will be shown later
# squeeze: delete one index
print(bias.squeeze().shape)  # delete as more as possible (where dim index=1)
print(bias.squeeze(0).shape)
print(bias.squeeze(-1).shape)
print(bias.squeeze(1).shape)  # dim 1 cannot be deleted
print(bias.squeeze(-4).shape)

torch.Size([1, 4, 3, 28, 28])
torch.Size([4, 3, 28, 28, 1])
torch.Size([4, 3, 28, 28, 1])
torch.Size([4, 1, 3, 28, 28])
torch.Size([1, 4, 3, 28, 28])
torch.Size([1, 32, 1, 1])
torch.Size([32])
torch.Size([32, 1, 1])
torch.Size([1, 32, 1])
torch.Size([1, 32, 1, 1])
torch.Size([32, 1, 1])


- expand/repeat

In [25]:
import torch

# expand: broadcasting, do not add new data, recommended
imgs = torch.rand(4, 32, 14, 14)
bias = torch.rand(1, 32, 1, 1)
# src ===> dest
# 1. share dimension
# 2. src index should = 1
print(bias.expand(4, 32, 14, 14).shape)
print(bias.expand(-1, 32, -1, -1).shape)
print(bias.expand(-1, 32, -1, -4).shape)  # nonsense
after = imgs + bias.expand(4, 32, 14, 14)
print(after.shape)

# repeat: memory copied, copy existing data
# attention: api is a bit different
print(bias.repeat(4, 32, 1, 1).shape)  # wrong
print(bias.repeat(4, 1, 1, 1).shape)  # correct

torch.Size([4, 32, 14, 14])
torch.Size([1, 32, 1, 1])
torch.Size([1, 32, 1, -4])
torch.Size([4, 32, 14, 14])
torch.Size([4, 1024, 1, 1])
torch.Size([4, 32, 1, 1])


- transpose/t/permute

In [26]:
import torch

a = torch.rand(3, 3)
print(a)
print(a.t())
# t() expects a 2D tensor

tensor([[0.6442, 0.7168, 0.9299],
        [0.7354, 0.5221, 0.6724],
        [0.4489, 0.8888, 0.7132]])
tensor([[0.6442, 0.7354, 0.4489],
        [0.7168, 0.5221, 0.8888],
        [0.9299, 0.6724, 0.7132]])


In [27]:
import torch

data = torch.rand(4, 3, 32, 32)
# transpose is to swap dim
# [b,c,h,w] ===> [b,w,h,c]
# be careful!
print(data.transpose(1, 3).shape)
print(data.transpose(1, 3).contiguous().view(4, 3 * 32 * 32).view(4, 32, 32, 3).transpose(1, 3).shape)
# permute
# [b,c,h,w] ===> [b,h,w,c]
data = torch.rand(4, 3, 28, 32)
data_t = data.transpose(1, 3)
print(data_t.shape)
print(data_t.transpose(1, 2).shape)
# can be finished by permute in one line:
print(data.permute(0, 2, 3, 1).shape)

torch.Size([4, 32, 32, 3])
torch.Size([4, 3, 32, 32])
torch.Size([4, 32, 28, 3])
torch.Size([4, 28, 32, 3])
torch.Size([4, 28, 32, 3])


## Broadcasting
### key idea
1. insert 1 dim ahead
2. expand dims with size 1 t same size

example:
- feature maps: [4,32,14,14]
- bias: [32,1,1] ==> [1,32,1,1] ==> [4,32,14,14]

### why broadcasting
1. for actual demand
    - [class,students,scores]
    - add bias for every student: +5 score
    - [4,32,8]+[4,32,8] is needed
    - in fact, [4,32,8]+[5.0] is convenient
2. memory consumption
    - [4,32,8] ==> 1024
    - [5.0] ==> 1

### is it broadcasting-able?
- match from <font color=red>**last**</font> dim!
    - if current dim = 1, expand to same
    - if either has no dim, insert one dim and expand to same
    - otherwise, NOT broadcasting-able

## Merge or Split
- cat

In [28]:
import torch

a = torch.rand(4, 32, 8)
b = torch.rand(5, 32, 8)
print(torch.cat([a, b], dim=0).shape)

torch.Size([9, 32, 8])


- stack: create new dim

In [29]:
import torch

a = torch.rand(4, 3, 16, 32)
b = torch.rand(4, 3, 16, 32)
print(torch.cat([a, b], dim=2).shape)
# when stack, a and b have the same dim
print(torch.stack([a, b], dim=2).shape)

torch.Size([4, 3, 32, 32])
torch.Size([4, 3, 2, 16, 32])


- split: by length or number

In [30]:
import torch

a = torch.rand(32, 8)
b = torch.rand(32, 8)
c = torch.stack([a, b], dim=0)
aa, bb = c.split([1, 1], dim=0)  # number of each split
print(aa.shape, bb.shape)
aa, bb = c.split(1, dim=0)  # length
print(aa.shape, bb.shape)

torch.Size([1, 32, 8]) torch.Size([1, 32, 8])
torch.Size([1, 32, 8]) torch.Size([1, 32, 8])


- chunk: by output number

In [31]:
import torch

a = torch.rand(32, 8)
b = torch.rand(32, 8)
c = torch.stack([a, b], dim=0)
aa, bb = c.chunk(2, dim=0)  # output number
print(aa.shape, bb.shape)

torch.Size([1, 32, 8]) torch.Size([1, 32, 8])


## math operation
- add/minus/multiply/divide
- matmul
- pow
- sqrt/rsqrt
- round

### basic

In [32]:
import torch

a = torch.rand(3, 4)
b = torch.rand(4)
print(a + b)
print(torch.add(a, b))
print(torch.all(torch.eq(a - b, torch.sub(a, b))))
print(torch.all(torch.eq(a * b, torch.mul(a, b))))
print(torch.all(torch.eq(a / b, torch.div(a, b))))

tensor([[0.1005, 0.6229, 1.6519, 1.2822],
        [0.3442, 0.6109, 1.5206, 0.8659],
        [0.5469, 1.0048, 1.6472, 0.7928]])
tensor([[0.1005, 0.6229, 1.6519, 1.2822],
        [0.3442, 0.6109, 1.5206, 0.8659],
        [0.5469, 1.0048, 1.6472, 0.7928]])
tensor(True)
tensor(True)
tensor(True)


### matmul
- `torch.mm`
- `torch.matmul`
- `@`

In [33]:
import torch

a = torch.tensor([[3., 3.], [3., 3.]])
print("a=", a)
b = torch.ones(2, 2) * 2
print("b=", b)
print("a mm b = ", torch.mm(a, b))
print("a matmul b = ", torch.matmul(a, b))
print("a @ b = ", a @ b)

a= tensor([[3., 3.],
        [3., 3.]])
b= tensor([[2., 2.],
        [2., 2.]])
a mm b =  tensor([[12., 12.],
        [12., 12.]])
a matmul b =  tensor([[12., 12.],
        [12., 12.]])
a @ b =  tensor([[12., 12.],
        [12., 12.]])


#### an example

In [34]:
import torch

a = torch.rand(4, 784)
x = torch.rand(4, 784)
w = torch.rand(512, 784)
# to descent dim from 784 to 512
print((x @ w.t()).shape)

torch.Size([4, 512])


## statistics
- norm
- mean sum
- prod
- max,min,argmin,argmax
- kthvalue,topk