# Tensor的属性

In [2]:
import torch
import numpy as np

<img src="./images/tensor_attributes.png" width=550px>

## dtype

torch在创建Tensor时，`dtype`的指定只支持使用`torch.[DataType]`这样的方式去指定，而不能像numpy一样，可以直接使用字符串。

In [14]:
torch.tensor([1, 2, 3], dtype=torch.int64)

tensor([1, 2, 3])

In [15]:
np.array([1, 2, 3], dtype="int64")

array([1, 2, 3])

其中`torch.dtype`表示Tensor的数据类型，常见的有下面9种不同的数据类型，包括了

- `torch.float32`或`torch.float`，对应的Tensor类型为`torch.[cuda].FloatTensor`
- `torch.float64`或`torch.double`，对应的Tensor类型为`torch.[cuda].DoubleTensor`
- `torch.float16`或`torch.half`，对应的Tensor类型为`torch.cuda.HalfTensor`，不存在`torch.HalfTensor`
- `torch.uint8`，对应的Tensor类型为`torch.[cuda].ByteTensor`
- `torch.int8`，对应的Tensor类型为`torch.[cuda].CharTensor`
- `torch.int16`或`torch.short`，对应的Tensor类型为`torch.[cuda].ShortTensor`
- `torch.int32`或`torch.int`，对应的Tensor类型为`torch.[cuda].IntTensor`
- `torch.int64`或`torch.long`，对应的Tensor类型为`torch.[cuda].LongTensor`
- `torch.bool`，对应的Tensor类型为`torch.[cuda].BoolTensor`

还有几种数据类型，用的比较少：

- `torch.bfloat16`，对应的Tensor类型为`torch.[cuda].BFloat16Tensor`
- `torch.complex32`，对应的Tensor类型为`torch.[cuda].FloatTensor`
- `torch.complex64`，对应的Tensor类型为`torch.[cuda].DoubleTensor`
- `torch.complex128`或`torch.cdouble`，对应的Tensor类型为`torch.cuda.HalfTensor`，不存在`torch.HalfTensor`

其中[`bfloat16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)是一种和IEEE half-precision 16-bit float规定不一致的16Bit浮点数格式，它是直接对32位的IEEE 754规定的单精度float32的格式进行截取形成的，它是为机器学习系统特别定制的，它的组成是：
- 1位符号位
- 8个指数位
- 7个小数位

Neural networks are more sensitive to the size of the exponent than the size of the mantissa. To ensure identical behavior for underflows, overflows, and `NaNs`, `bfloat16` has the same exponent size as `float32`. `bfloat16` handles denormals differently from `float32`, it flushes them to zero. Unlike `float16`, which typically requires special handling like loss scaling, `bfloat16` is a drop-in replacement for `float32` when training and running deep neural networks.

简而言之，`bfloat16`表示的数值范围更大，但是精度不如`float16`

![floating point formats](./images/floating_point_formats.png)`

在不同的机器上，因为CPU架构等不同，Tensor的很多构建函数，对上面的部分`dtype`有可能是不支持的，比如`arange`函数就不支持在`cpu`上创建一个`float16`的Tensor

torch.arange(1,10, dtype=torch.float16, device=torch.device('cpu'))

由于CUDA对半精度支持的比较好，所以在'cuda'上创建，反而没有什么问题

In [2]:
torch.arange(1, 10, dtype=torch.float16, device=torch.device("cuda"))

tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.], device='cuda:0',
       dtype=torch.float16)

## device

`torch.device`表示的是Tensor的数据存储的设备，其中分为'cpu'和'cuda'

In [3]:
torch.device("cpu")

device(type='cpu')

In [4]:
torch.device("cuda:0")

device(type='cuda', index=0)

In [5]:
torch.device(type="cuda", index=0)

device(type='cuda', index=0)

In [5]:
torch.tensor([1, 2, 3, 4, 5, 6], device=torch.device("cuda:0"))

tensor([1, 2, 3, 4, 5, 6], device='cuda:0')

## layout

layout表示Tensor内部数据存储的内部布局，目前还是一个不成熟(beta)的特性，目前支持

- torch.strided
- torch.sparse_coo

现在主要用的就是面向dense Tensor的`torch.strided`，Tensor的Strides是一个list，它代表每个dimension上两邻两个idx之间的跨度(元素个数)。

In [3]:
torch.arange(60).reshape(3, 4, 5).stride()

(20, 5, 1)

和numpy不同的是，torch中的stride以元素的个数来表示跨度，而numpy则是用字节数量来表示跨度

In [8]:
np.arange(60).reshape(3, 4, 5).strides

(160, 40, 8)

## Tensor属性转换

我们可以使用`to`方法来指定新的属性后，生成新的Tensor

In [8]:
device_cuda = torch.device("cuda")
data = torch.tensor([1])
print(data.dtype, data.device)
data = data.to(dtype=torch.float32, device=device_cuda)
print(data.dtype, data.device)

torch.int64 cpu
torch.float32 cuda:0


## Tensor的形状

Tensor除了具有3个标准的属性外，一旦我们创建了一个Tensor，那么它就会具有一些形状相关的属性。

- t.shape: 返回的是一个torch.Size(tuple)类型的结果，表示每一维的维度值
- t.size(): 和t.shape一致
- t.ndim：返回Tensor有多少维
- t.numel()：它是一个方法，返回Tensor内有多少个元素
- len(t)：返回的是Tensor在第0维上的维度值

In [11]:
t = torch.empty(2, 3, 4)
print(f"shape of t is {t.shape}")
print(f"size of t is {t.size()}")
print(f"strides of t is {t.stride()}")
print(f"strides of axes{1} of t is {t.stride(1)}")
print(f"ndim of t is {t.ndim}")
print(f"numel of t is {t.numel()}")
print(f"len of t is {len(t)}")

shape of t is torch.Size([2, 3, 4])
size of t is torch.Size([2, 3, 4])
strides of t is (12, 4, 1)
strides of axes1 of t is 4
ndim of t is 3
numel of t is 24
len of t is 2


# Tensor的创建

In [2]:
import torch
import numpy as np

在Pytorch中我们可以有多种方法来创建Tensor，常用的包括下面几种：

- 从已有的scalar、list、tuple、numpy.array来创建
- 用`arange`、`linspace`、`logspace`等创建一维数列Tensor
- 用`ones`、`zeros`、`eye`、`full`、`empty`等来创建特别填充值的多维Tensor
- 用随机数来创建指定形状的Tensor

![](./images/tensor_creation.png)

## 从现有数据来创建一个Tensor

我们可以使用`torch.tensor()`函数来从已有的一个array_like的data来创建一个Tensor

In [3]:
# 从list创建
torch.tensor([1, 2, 3, 4, 5])

tensor([1, 2, 3, 4, 5])

In [4]:
# 从tuple创建
torch.tensor((1, 2, 3))

tensor([1, 2, 3])

In [5]:
# 从numpy.array创建，同时指定dtype和device
torch.tensor(np.array([1, 2, 3, 4, 5]), dtype=torch.float32, device="cuda:0")

tensor([1., 2., 3., 4., 5.], device='cuda:0')

需要注意的是，无论是从python的内置序列创建，还是从numpy.array来创建，创建出来的Tensor都是复制了原数据的内容。

如果我们希望，创建的Tensor不额外分配存储空间，而是和之前的numpy.array共享存储，那么可以使用`as_tensor`方法

In [6]:
arr = np.array([1, 2, 3, 4, 5])
t = torch.as_tensor(arr)
# 对于Tensor的数据改动，也会影响在ndarray上
t[0] = 6
print(arr)

[6 2 3 4 5]


不过使用`as_tensor`后，能共享底层存储的，前提是，as_type方法中指定的`dtype`和`device`和原ndarry是一致的。由于numpy不支持cuda，所以这样只能创建cpu上的tensor

In [7]:
arr = np.array([1, 2, 3, 4, 5])
t = torch.as_tensor(arr, dtype=torch.float32)  # 这种情况下，并不会共享底层存储
t[0] = 6
print(arr)

[1 2 3 4 5]


In [8]:
il = [1, 2, 3, 4, 5]
print(f"ndarray的默认整数类型为:{np.array(il).dtype}")
print(f"tensor的默认整数类型为: {torch.tensor(il).dtype}")

fl = [1.0, 2.0, 3.0, 4.0, 5.0]
print(f"ndarray的默认整数类型为:{np.array(fl).dtype}")
print(f"tensor的默认整数类型为: {torch.tensor(fl).dtype}")

ndarray的默认整数类型为:int64
tensor的默认整数类型为: torch.int64
ndarray的默认整数类型为:float64
tensor的默认整数类型为: torch.float32


In [9]:
# 从另外一个tensor来创建tensor
a = torch.tensor([1, 2, 3])
b = torch.tensor(a, dtype=torch.float, device="cuda:1")
b

  b = torch.tensor(a, dtype=torch.float, device="cuda:1")


tensor([1., 2., 3.], device='cuda:1')

In [10]:
b = a.clone()
b = b.to(device="cuda:1")
print(b)

tensor([1, 2, 3], device='cuda:1')


## `torch.tensor()`和`torch.Tensor()`的区别

`torch.Tensor`实际上是`torch.FloatTensor`，用它来创建新的Tensor时，实际调用的是构造函数，它会默认以`torch.float32`来作为`dtype`。而`torch.tensor`会根据`data`的类型自动推断。

In [11]:
l = [1, 2, 3, 4, 5]
print(torch.Tensor(l).dtype)
print(torch.tensor(l).dtype)

torch.float32
torch.int64


## 创建特别填充值的Tensor

### torch.arange

torch.arange(start=0, end, step=1)用于创建一个区间范围的Tensor

In [12]:
print(torch.arange(5))
print(torch.arange(1, 5))
print(torch.arange(1, 20, 3))

tensor([0, 1, 2, 3, 4])
tensor([1, 2, 3, 4])
tensor([ 1,  4,  7, 10, 13, 16, 19])


In [13]:
# 如果start、end以及step中有浮点数，则创建出来的是FloatTensor
torch.arange(1, 3.5, 0.5)

tensor([1.0000, 1.5000, 2.0000, 2.5000, 3.0000])

注意上面是没有包括3.5那个点的

### torch.linspace

`torch.linspace`与`torch.arange`有点类似，都指定一个起点，一个终点，和一个步长。但`linspace`里步长最终指定了生成的一维Tensor中元素的个数

```python
linspace(start(float),end(float),steps(int))
```
另外需要注意的是`torch.linspace`生成的一定是一个浮点数的Tensor，而且和`torch.arange`不同的是：`linspace`生成的Tensor是包括末点值的（inclusive）

In [14]:
torch.linspace(3, 10, 5)

tensor([ 3.0000,  4.7500,  6.5000,  8.2500, 10.0000])

### torch.logspace

`torch.logspace`和`torch.linspace`行为类似，区别在于`logspace`生成的序列的范围的起始与终点是一个以`base`为底，`start`和`end`为指数的数字。

```python
logspace(start, end, stpes, base=10.0) -> Tensor
```

### torch.ones、torch.zeros、torch.emtpy

它们三个都是用于创建一个指定`size`的Tensor，分别以1、0和未初始化的值来填充

它们三个返回的都是`FloatTensor`

In [15]:
print(torch.ones((2, 2)))
print(torch.zeros((3, 4)))
print(torch.empty((3, 3)))

tensor([[1., 1.],
        [1., 1.]])
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
tensor([[1.8560e-27, 4.5563e-41, 2.6275e-33],
        [0.0000e+00, 1.9838e-31, 0.0000e+00],
        [5.2258e-27, 4.5563e-41, 1.9818e-31]])


`torch.ones/zeros/empty`支持`torch.ones(d1,d2,...)`这种调用方法，而`numpy`则不支持。

### torch.eye

`torch.eye`返回的是一个2d的对角线为1，其他值都为0的Float矩阵Tensor

In [12]:
torch.eye(4)

tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 0., 1.]])

### torch.full

`torch.full`返回的是一个指定`size`和填充值的Tensor，Tensor的dtype是由填充值的类型来推导的。

```python
'''
Args:
  size(int...): a list ,tuple or torch.Size
  fill_vale(Scalar)
'''
full(size, fill_value)
```

In [13]:
torch.full((2, 3), 1.0)

tensor([[1., 1., 1.],
        [1., 1., 1.]])

## 使用随机数来创建Tensor

### torch.normal

`torch.normal`返回一个正态分布产生在的随机数填充的Tensor，它一共有4种参数传递方式

第一种是:

```python
'''
Args:
    mean (Tensor): the tensor of per-element means
    std (Tensor): the tensor of per-element standard deviations
'''
norm(mean, std)
```
生成的Tensor的size和mean和std的size是一致的，其中每个元素都是通过对应位置的mean和std形成的正态分布来随机产生的。

mean和std两个Tensor的shape不需要一致，但是元数数量需要一致，当shape不一致时，以mean的shape作为最终生成的Tensor的shape

In [14]:
mean = torch.arange(12).reshape(3, 4).to(dtype=torch.float32)
std = torch.linspace(0, 1, mean.numel())
torch.normal(mean, std)

  torch.normal(mean, std)


tensor([[ 0.0000,  0.9364,  1.9420,  2.9102],
        [ 4.0290,  4.3689,  5.2698,  7.6781],
        [ 7.8305,  8.5575, 10.7068, 12.5202]])

第二种是：

```python
'''
Args:
    mean (float, optional): the mean for all distributions
    std (Tensor): the tensor of per-element standard deviations
'''
normal(mean=0.0, std, *, out=None) -> Tensor
```
这种参数传递用法，与上面的区别就是mean变成一个Scalar，那么说明每个元素来共享一个mean值。

在这种情况下，生成的Tensor的shape就行std保持一致的了。

In [15]:
torch.normal(1.0, std)

tensor([ 1.0000,  0.9540,  0.7664,  1.0074,  0.7044,  0.8077,  1.3776,  2.2947,
         2.7416,  2.0305,  2.3241, -0.1520])

第三种：

```python
'''
Args:
    mean (Tensor): the tensor of per-element means
    std (float, optional): the standard deviation for all distributions
'''
normal(mean, std=1.0, *, out=None) -> Tensor
```
这种情况和第二种情况，恰恰相反了，std变成了每个元素共享的。

In [16]:
torch.normal(mean, 0.5)

tensor([[-0.2012,  1.6828,  1.9805,  2.8949],
        [ 2.7657,  6.4303,  5.6291,  7.1419],
        [ 8.1036,  9.3677, 10.5676, 10.7179]])

第四种：

```python
'''
Args:
    mean (float): the mean for all distributions
    std (float): the standard deviation for all distributions
    size (int...): a sequence of integers defining the shape of the output tensor.
'''
normal(mean, std, size, *, out=None) -> Tensor
```
这种情况下，所有的元素都共享mean和std，最终Tensor的形状是由`size`来决定的

In [17]:
torch.normal(0, 1, (3, 4))

tensor([[ 0.3704, -1.0554, -0.4917,  0.3783],
        [ 0.3406,  1.5351, -0.5526, -0.9879],
        [-0.5488,  1.2171, -0.1122,  0.2516]])

### torch.rand、torch.randn

`rand`直接生成指定形状的Tensor，其中每个元素都是由`[0,1)`均匀分布来随机产生。

`randn`直接生成指定形状的Tensor，其中每个元素都是由标准正态分布来随机产生。

In [16]:
torch.rand(3, 4)  # 或者 torch.randn((3,4))

tensor([[0.3252, 0.5300, 0.3352, 0.1053],
        [0.3589, 0.9020, 0.8210, 0.5692],
        [0.3691, 0.9678, 0.1090, 0.3802]])

In [17]:
torch.randn(3, 4)  # 或者 torch.randn((3,4))

tensor([[-1.8126,  0.3052, -1.5136,  0.6127],
        [-0.5126,  0.1974, -0.3502, -1.2904],
        [-0.2174, -0.3294,  0.3271, -1.2221]])

### torch.randint

产生一个由`[low,high)`区间均匀分布随机数填充的LongTensor

```python
randint(low=0,high,size,...)
```

In [20]:
torch.randint(1, 10, (3, 4))

tensor([[2, 3, 8, 6],
        [4, 4, 8, 3],
        [7, 5, 8, 1]])

### torch.randperm

生成一个随机全排列的一维的LongTensor

In [21]:
torch.randperm(12)

tensor([ 9, 10,  3,  5,  7,  4, 11,  0,  6,  1,  2,  8])

## 使用`xx_like`系列创建相同形态的Tensor

除了shape保持一致外，`dtype`、`layout`、`device`等，若无特别指定，则也与源Tensor保持一致。

```python
torch.zeros_like(input, ..) # 返回与input相同size的零矩阵

torch.ones_like(input, ..) #返回与input相同size的单位矩阵

torch.full_like(input, fill_value, …) #返回与input相同size，单位值为fill_value的矩阵

torch.empty_like(input, …) # 返回与input相同size,并被未初始化的数值填充的tensor

torch.rand_like(input, dtype=None, …) #返回与input相同size的tensor, 填充均匀分布的随机数值

torch.randint_like(input, low=0, high, dtype=None, …) #返回与input相同size的tensor, 填充[low, high)均匀分布的随机数值

torch.randn_like(input, dtype=None, …) #返回与input相同size的tensor, 填充标准正态分布的随机数值

```

In [22]:
src = torch.randn(4, 5)

In [23]:
torch.zeros_like(src)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [24]:
torch.ones_like(src, dtype=torch.int)

tensor([[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]], dtype=torch.int32)

In [25]:
torch.empty_like(src, device="cuda:0")

tensor([[1., 2., 3., 4., 5.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]], device='cuda:0')

In [26]:
# 这里即使full_value是int类型，但生成的Tensor，依然是用的src的dtype
torch.full_like(src, 42)

tensor([[42., 42., 42., 42., 42.],
        [42., 42., 42., 42., 42.],
        [42., 42., 42., 42., 42.],
        [42., 42., 42., 42., 42.]])

In [27]:
torch.rand_like(src)

tensor([[0.6990, 0.0617, 0.0418, 0.0900, 0.9830],
        [0.8164, 0.1852, 0.2386, 0.2956, 0.3999],
        [0.4655, 0.1092, 0.7640, 0.1811, 0.5279],
        [0.9435, 0.7238, 0.1428, 0.0752, 0.1228]])

In [28]:
torch.randn_like(src)

tensor([[ 0.9366,  0.1560,  0.6002,  1.3912,  0.5083],
        [ 1.1414,  2.5705, -0.3684, -0.0108, -0.2299],
        [-0.7489,  0.0045,  0.2403,  1.0501,  1.1373],
        [-0.4165,  1.2640,  0.7514, -0.1586,  0.4076]])

In [29]:
torch.randint_like(src, 1, 10)

tensor([[2., 1., 8., 1., 5.],
        [7., 7., 2., 6., 4.],
        [9., 7., 6., 5., 3.],
        [4., 4., 8., 7., 6.]])

# Tensor的操作

Pytorch中的Tensor大约支持100种以上的操作，其中包括了数学运算、线性代数、矩阵操作（转置、索引、切片等），这些操作都可以跑在CPU或GPU上，这也是Pytorch Tensor的强大之处。

![](./images/tensor_operatrions.png)

我们可以通过这个[页面](https://pytorch.org/docs/stable/torch.html)，来对Tensor支持的所有操作做个大概的了解。

In [2]:
import torch
import numpy as np

## 索引访值

我们可以像访问Numpy.ndarray一样，对torch.Tensor进行各种下标索引与范围切片。

In [4]:
t = torch.arange(12).reshape(3, 4)
print(f"t: {t}")
print(f"取t的第2行的所有元素: {t[1]}")
print(f"取t的最后一列的所有元素: {t[:, -1]}")
print(f"取t的第2列到最后一列的所有元素: {t[:, 2:]}")
print(f"取t的位置(2,3)上的元素: {t[2, 3]}")

t: tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
取t的第2行的所有元素: tensor([4, 5, 6, 7])
取t的最后一列的所有元素: tensor([ 3,  7, 11])
取t的第2列到最后一列的所有元素: tensor([[ 2,  3],
        [ 6,  7],
        [10, 11]])
取t的位置(2,3)上的元素: 11


**单一元素的Tensor**

当我们通过索引访问Tensor的单一元素时，得到的实际是一个`Tensor`类型的对象，它并不是python中的内置数据类型，我们可以通过Tensor的`item()`方法来获取python对象的标量。

In [5]:
type(t[2, 3])

torch.Tensor

In [7]:
t[2, 3].shape

torch.Size([])

In [8]:
type(t[2, 3].item())

int

## 组合与分片

### torch.cat

```
cat(tensors, dim=0) -> Tensor
```

`torch.cat`将给定义的tensor的序列(tensors)，按给定义的维度上合并起来，这就要求，这些tensor，除了合并的维度，其他的维度必须一致。

In [5]:
t1 = torch.randn(2, 3)
t2 = torch.randn(3, 3)
torch.cat([t1, t2], dim=0)

tensor([[ 1.8282, -1.3961, -1.6330],
        [-0.5860, -1.5504,  0.5470],
        [-0.4857, -0.4318, -0.0308],
        [ 0.1696, -0.7582,  0.8282],
        [-0.1592, -0.7631,  2.8051]])

### torch.stack

`torch.stack`和`torch.cat`接口用法一致，但它并不是在原有的维度上拼接，而是直接扩展一个新的维度。

这就要求，序列中的tensor在维度上必须一致。

In [6]:
t1 = torch.randn(2, 3)
t2 = torch.randn(2, 3)
torch.stack([t1, t2], dim=0)

tensor([[[ 0.9310, -2.5397,  0.7603],
         [ 0.5423,  0.0121,  2.4951]],

        [[-0.9212,  1.1312,  0.4553],
         [-0.3836, -2.2080, -0.8785]]])

### torch.split
```python
split(tensor, split_size_or_sections, dim=0)
```
`split`将tensor按指定的维度，分拆为多个Tensor的元组，拆分的块chunk的大小是splite_size指定的。可能出现不能整分的情况，这时候最后一块大小一般小于splite_size

split出来的Tensor是原tensor的一个view

In [9]:
a = torch.arange(10).view(5, 2)
a

tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])

In [10]:
torch.split(a, 2)

(tensor([[0, 1],
         [2, 3]]),
 tensor([[4, 5],
         [6, 7]]),
 tensor([[8, 9]]))

`split_size_or_sections`也可能是一个list(int)，这时候，它的每个元素，代表每个chunk的大小

In [11]:
a1, a2, a3 = torch.split(a, (1, 3, 1))
a1, a2, a3

(tensor([[0, 1]]),
 tensor([[2, 3],
         [4, 5],
         [6, 7]]),
 tensor([[8, 9]]))

In [12]:
# 切分出来的tensor和原tensor是共享存储的
a1[0, 0] = 42
a

tensor([[42,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9]])

### torch.chunk

```python
chunk(input, chunks, dim=0) -> List of Tensors
```
`chunk`和`split`功能类似，不同在于，chunk的第二的参数，直接指定的是chunk的数量，最后一个chunk的数量可能会少一些。也有可能`axis[dim]<chunks`，那么就直接切分为`axis[dim]`个。

切分出来的这些Tensor和原Tensor都是共享底层存储的，也就是说每个chunk都是原Tensor的一个view。

In [17]:
print(a.shape)
len(a.chunk(3, dim=1))

torch.Size([5, 2])


2

## 变换操作

### torch.reshape

```python
reshape(input, shape) -> Tensor
```
`reshape`返回一个和原Tensor具有相同数据，相同数量的Tensor，只是shape不一致。

### torch.view

torch.view vs. torch.reshape

`reshape`可以用在`compact`或`non-compact`的tensor上，而`view`只能用在`compact`的tensor上。`reshape`如果作用于`non-compact`的tensor上，则会产生一个copy

torch.view has existed for a long time. It will return a tensor with the new shape. The returned tensor will share the underling data with the original tensor. See the documentation here.

On the other hand, it seems that torch.reshape has been introduced recently in version 0.4. According to the document, this method will

> Returns a tensor with the same data and number of elements as input, but with the specified shape. When possible, the returned tensor will be a view of input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.

It means that torch.reshape may return a copy or a view of the original tensor. You can not count on that to return a view or a copy. According to the developer:

> if you need a copy use clone() if you need the same storage use view(). The semantics of reshape() are that it may or may not share the storage and you don't know beforehand.

Another difference is that reshape() can operate on both contiguous and non-contiguous tensor while view() can only operate on contiguous tensor. Also see here about the meaning of contiguous.

### torch.transpose

```python
transpose(input, dim0, dim1) -> Tensor
```
转置input的指定的2个维度，返回的Tensor和原来的Tensor共享存储

In [19]:
x = torch.rand(2, 3, 4)
x

tensor([[[0.3844, 0.1842, 0.3928, 0.7579],
         [0.8694, 0.9604, 0.4230, 0.6530],
         [0.8197, 0.6693, 0.0917, 0.8025]],

        [[0.5022, 0.3150, 0.9407, 0.3442],
         [0.0475, 0.8234, 0.0500, 0.8506],
         [0.1328, 0.9862, 0.7935, 0.4214]]])

In [20]:
y = torch.transpose(x, 0, 2)

In [21]:
print(x.stride())
print(y.stride())

(12, 4, 1)
(1, 4, 12)


### torch.permute

In [23]:
torch.permute(x, (2, 0, 1)).shape

torch.Size([4, 2, 3])

### squeeze和unsqueeze

squeeze在指定的维度上添加一维，而unsqueeze则在指定的维度上去掉`size=1`的维度，如果对应维度上的size不等于1，则不做任何操作

In [22]:
x = torch.randn(2,3)
y = x.unsqueeze(dim=1).unsqueeze(dim=0)
print(f'x shape: {x.shape}, \ny (x.unsqueeze) shape: {y.shape}')
print('unsqueeze shape', y.squeeze(dim=0).shape)

x shape: torch.Size([2, 3]), 
y (x.unsqueeze) shape: torch.Size([1, 2, 1, 3])
unsqueeze shape torch.Size([2, 1, 3])


### contiguous

There are a few operations on Tensors in PyTorch that do not change the contents of a tensor, but change the way the data is organized. These operations include:

`narrow()`, `view()`, `expand()` and `transpose()`

For example: when you call transpose(), PyTorch doesn't generate a new tensor with a new layout, it just modifies meta information in the Tensor object so that the offset and stride describe the desired new shape. In this example, the transposed tensor and original tensor share the same memory:

In [13]:
x = torch.randn(3, 2)
y = torch.transpose(x, 0, 1)
x[0, 0] = 42
print(y[0, 0])

tensor(42.)


This is where the concept of contiguous comes in. In the example above, x is contiguous but y is not because its memory layout is different to that of a tensor of same shape made from scratch. Note that the word "contiguous" is a bit misleading because it's not that the content of the tensor is spread out around disconnected blocks of memory. Here bytes are still allocated in one block of memory but the order of the elements is different!

When you call contiguous(), it actually makes a copy of the tensor such that the order of its elements in memory is the same as if it had been created from scratch with the same data.

Normally you don't need to worry about this. You're generally safe to assume everything will work, and wait until you get a RuntimeError: input is not contiguous where PyTorch expects a contiguous tensor to add a call to contiguous().

## 降维操作

### torch.mean

```python
'''
Args:
  input (Tensor): the input tensor.
  dim (int or tuple of ints): the dimension or dimensions to reduce.
  keepdim (bool): whether the output tensor has :attr:`dim` retained or not.
'''
mean(input, dim, keepdim=False, *, out=None) -> Tensor
```

对input沿着`dim`的维度求均值，这样的话，指定的那个维度就会被压缩掉，如果指定了`keepdim=True`的话，那个维度会保留，值为1

In [20]:
t = torch.randn(5, 6)
t

tensor([[ 0.8801,  0.3556,  1.3300, -0.8489, -1.8893, -0.7004],
        [-1.1341,  0.7043,  0.0767, -0.9126, -0.9413, -0.5077],
        [-0.8743, -2.0277,  0.5664,  0.4266,  2.9812,  0.9459],
        [ 0.1711, -2.1501, -1.3418, -1.8992,  0.6031, -0.8814],
        [ 0.8263,  1.1446, -1.6875,  1.1150, -0.2767, -0.7673]])

In [21]:
# 按列的方向(dim=0)将整个Tenoor压缩成为1维的
torch.mean(t, dim=0)

tensor([-0.0262, -0.3947, -0.2112, -0.4238,  0.0954, -0.3822])

In [22]:
torch.mean(t, dim=1, keepdim=True)

tensor([[-0.1455],
        [-0.4524],
        [ 0.3364],
        [-0.9164],
        [ 0.0591]])

对于高维Tensor，我们还可以同时对多个维度进行Reduce，求其均值。

In [23]:
t = torch.randn(2, 3, 4)
t

tensor([[[ 1.0852,  1.2012,  0.0452,  0.3130],
         [-0.8747, -0.2138,  1.4990, -0.9035],
         [-0.0255,  1.0016, -1.6872,  1.2495]],

        [[ 1.3288,  0.9973, -1.3797,  1.6270],
         [ 2.6580, -0.2791,  0.3662,  1.7222],
         [ 2.5107,  0.3394,  1.1392,  0.2362]]])

In [24]:
# 等价于reduce第0维，得到一个3x4的Tensor后，再reduce第1维，得到(3,)的Vector
torch.mean(t, dim=(0, 2))

tensor([0.6523, 0.4968, 0.5955])

In [25]:
t.mean(0).mean(1)

tensor([0.6523, 0.4968, 0.5955])

### torch.sum

`torch.sum`是一个和`torch.mean`用法上很像的操作，只是`sum`的reduce op变成了求和，而不是求均值。

In [26]:
torch.sum(t, dim=(0, 2))

tensor([5.2181, 3.9745, 4.7637])

### torch.argmax

In [30]:
x = torch.rand((2,3))
print('x:', x)
print('Argmax:', x.argmax(dim=1))

x: tensor([[0.0860, 0.1838, 0.4938],
        [0.7860, 0.8440, 0.6280]])
Argmax: tensor([2, 1])


### torch.maxmimu

In [41]:
def relu(x):
    return torch.maximum(x, torch.tensor(0))

x = torch.randn((2,3))
print(f'x:\n\t{x} \nrelu(x):\n\t{relu(x)}')

x:
	tensor([[-0.0520, -1.1005,  0.4070],
        [-0.4881,  1.2309,  2.3318]]) 
relu(x):
	tensor([[0.0000, 0.0000, 0.4070],
        [0.0000, 1.2309, 2.3318]])


## 排序

### torch.sort

```python
sort(input, dim=-1, descending=False, *, out=None) -> (Tensor, LongTensor)
```
`sort`对input按给定义的dim进行升序排列，返回排列后的Tensor的同时，也返回一个对应的下标的重排后的Tensor

dim的默认值是Tensor的最后一维

In [29]:
torch.sort(a, dim=1, descending=True)

torch.return_types.sort(
values=tensor([[ 1.1975, -0.6680, -0.8859],
        [ 1.7126,  0.5976,  0.1704],
        [ 1.4601,  1.1260, -0.7816]]),
indices=tensor([[0, 2, 1],
        [0, 1, 2],
        [1, 2, 0]]))

### torch.topk

```python
topk(input, k, dim=None, largest=True, sorted=True, *, out=None) -> (Tensor, LongTensor)
```
`topk`返回input中指定维度上，最大的k个元素，以及对应的索引。

In [30]:
a = torch.randn(5)
a

tensor([ 0.4168, -1.7439, -0.4161, -0.0458,  0.5801])

In [31]:
torch.topk(a, 3)

torch.return_types.topk(
values=tensor([ 0.5801,  0.4168, -0.0458]),
indices=tensor([4, 0, 3]))

### torch.kthvalue

```python
kthvalue(input, k, dim=None, keepdim=False, *, out=None) -> (Tensor, LongTensor)
```
`kthvalue`计算输出Tensor的指定维度上第`k`小的元素以及下标。如果dim没有指定，则默认为Tensor的最后一维。

In [7]:
a = torch.randn(4, 3)
a

tensor([[ 0.7333,  0.4090, -0.4306],
        [-1.7655, -1.2268, -0.9078],
        [ 2.7441, -0.9277, -0.4792],
        [-0.7800, -0.5171,  0.1977]])

In [8]:
torch.kthvalue(a, 2, dim=0)

torch.return_types.kthvalue(
values=tensor([-0.7800, -0.9277, -0.4792]),
indices=tensor([3, 2, 2]))

## 原地操作(in-place)

pytorch的Tensor支持了很多原地操作，它们的特点就是在方法末尾以`_`结束

In [34]:
t1 = torch.ones(2, 3)
print(f"t1 = {t1}")
t1.add_(2)
print(f"after plus 2: t1 = {t1}")

t1 = tensor([[1., 1., 1.],
        [1., 1., 1.]])
after plus 2: t1 = tensor([[3., 3., 3.],
        [3., 3., 3.]])


## 转换为其他数据类型

我们可以调用`numpy`接口,返回一个numpy.ndarray的对象，可以调用`tolist`接口，返回一个list的对象

In [35]:
t = torch.tensor([1, 2, 3, 4, 5, 6])

In [36]:
# 返回的ndarray还是和t是共享存储的
t.numpy()

array([1, 2, 3, 4, 5, 6])

In [37]:
t.reshape(2, 3).tolist()

[[1, 2, 3], [4, 5, 6]]

In [None]:
torch.repeat_interleave

## repeat和repeat_interleave

In [24]:
a = torch.arange(6).reshape((2, 3))
a

tensor([[0, 1, 2],
        [3, 4, 5]])

`repeat(d0, d1, d2)` 将对应的维度复制多份，如果之前没有对应的维度，则可以当作原来维度为1，处理。

In [28]:
a.repeat((2, 1, 2))

tensor([[[0, 1, 2, 0, 1, 2],
         [3, 4, 5, 3, 4, 5]],

        [[0, 1, 2, 0, 1, 2],
         [3, 4, 5, 3, 4, 5]]])

`repeat_interleave(n, dim)` 在对应的维度上进行复制，但复制的方式不是`[a b c a b c ]`这种，而是`[a a b b c c]`

In [29]:
a.repeat_interleave(2, dim=0)

tensor([[0, 1, 2],
        [0, 1, 2],
        [3, 4, 5],
        [3, 4, 5]])