# 2.2 数据操作

开始我学习Dive-into-DL-PyTorch这个材料。
我计划把这些代码过一遍，这样对于DL会有更明确的认识。😂之前自己太弱了。
我计划每个小结都做一个notebook，在第一节用markdown写下全部的知识点，也方便自己和其他人看。

1. pytorch的安装
2. 创建tensor
   1. **空**: torch.empty(5,3,dtype=torch.long)
   2. **赋值为1**: torch.ones(5,3)
   3. **赋值为0**: torch.zeros(5.3)
   4. **常用dtype**: torch.float (float32), torch.double  (float64), torch.int (int32), torch.long (int64), torch.bool (boolean)
3. numpy与tensor互转
   1. **tensor转numpy（共享data）:** x=torch.ones(5,3)  x.numpy()即可转换
   2. **numpy转tensor（不共享data）**: torch.tensor([1,2,3]) 这样转换不会共享data，也就是拷贝一份
   3. **numpy转tensor（共享data**）: torch.from_numpy(np.ones(5)) 这样转换会共享data
4. 计算操作
   1. 加法：x+y 或 torch.add(x,y)
   2. 切片: x[1:,:-1] 类似numpy
   3. 改变形状：x.view(3，5) ，x.view(-1，5)  **这种方法共享data**
   4. broadcast原理：两个tensor的shape不同，直接相加会进行broadcast，我的理解是维度的并集
   5. 其他操作见：https://pytorch.org/docs/stable/tensors.html


## 1. pytorch的安装

pytorch的安装：
https://pytorch.org/

选择对应的版本：

![install.png](https://pic.atlasbioinfo.com//blog/20210222/Snipaste_2021-02-22_09-53-44.png)

## 2. 创建tensor

In [7]:
### 2.2.1 创建tensor
import torch
#空tensor
torch.empty(5,3,dtype=torch.long)

tensor([[7957688336701796208, 8319958742476218724, 7521981625659912239],
        [8392862729398479919, 8299912210326384488, 7738135719178499177],
        [8245937189886977889, 7795557684927752291, 6874854174328382057],
        [3109858083190698083, 3689686583958992944, 3991939109057404969],
        [3775814413465248358,               23859,                   0]])

### Torch中的dtype

| Data type | dtype |
| :---- | ----: |
| 32-bit floating point | torch.float32 or torch.float |
| 64-bit floating point | torch.float64 or torch.double |
| 64-bit complex | torch.complex64 or torch.cfloat |
| 128-bit complex | torch.complex128 or torch.cdouble |
| 16-bit floating point 1 | torch.float16 or torch.half |
| 16-bit floating point 2 | torch.bfloat16 |
| 8-bit integer (unsigned) | torch.uint8 |
| 8-bit integer (signed) | torch.int8 |
| 16-bit integer (signed) | torch.int16 or torch.short |
| 32-bit integer (signed) | torch.int32 or torch.int |
| 64-bit integer (signed) | torch.int64 or torch.long |
| Boolean | torch.bool |

估计常用的：
* torch.float
* torch.double
* torch.int
* torch.long
* torch.bool




In [3]:
#随机tensor
import torch
randExample=torch.rand(5,3)
zeroExample = torch.zeros(5, 4, dtype=torch.long)
print(randExample)
print(zeroExample)

tensor([[0.4599, 0.7451, 0.6156],
        [0.3356, 0.2274, 0.4664],
        [0.9546, 0.7263, 0.5476],
        [0.5117, 0.4726, 0.4385],
        [0.6074, 0.3535, 0.2718]])
tensor([[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]])


In [8]:
# 用tensor转换
arr=[[1,2,3],[1,2,3],[1,2,3]]
x = torch.tensor(arr)
print(x)

print("x.new_ones(5,3)")
print(x.new_ones(5,3))


tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])
x.new_ones(5,3)
tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])


In [13]:
# 查看数据维度
print(x.shape)
print(x.size())
print(x.new_ones(5,3).shape)


torch.Size([3, 3])
torch.Size([5, 3])
torch.Size([3, 3])


### 其他pytorch的构造函数

| 函数 | 功能 |
| :---- | :---- |
| Tensor(\*sizes) | 基础构造函数 |
| tensor(data,) | 类似np.array的构造函数 |
| ones(\*sizes) | 全1Tensor |
| zeros(\*sizes) | 全0Tensor |
| eye(\*sizes) | 对角线为1，其他为0 |
| arange(s,e,step) | 从s到e，步长为step |
| linspace(s,e,steps) | 从s到e，均匀切分成steps份 |
| rand/randn(\*sizes) | 均匀\/标准分布 |
| normal(mean,std)/uniform(from,to) | 正态分布/均匀分布 |
| randperm(m) | 随机排列 |

## 算术操作

[pytorch文档，100多种线代计算操作](https://pytorch.org/docs/stable/tensors.html)

### 加法

In [16]:
x=torch.rand(5,3)
y = torch.rand(5, 3)
print(x)
print(y)
print(x + y)
print(torch.add(x,y))

tensor([[0.1406, 0.2201, 0.9098],
        [0.7349, 0.1998, 0.6482],
        [0.6419, 0.3224, 0.9947],
        [0.4533, 0.3280, 0.2317],
        [0.6811, 0.5898, 0.2982]])
tensor([[0.1433, 0.8546, 0.0953],
        [0.7631, 0.7208, 0.3683],
        [0.0098, 0.5418, 0.1683],
        [0.2717, 0.9358, 0.3943],
        [0.5012, 0.1841, 0.6279]])
tensor([[0.2839, 1.0747, 1.0051],
        [1.4980, 0.9206, 1.0165],
        [0.6518, 0.8641, 1.1630],
        [0.7251, 1.2638, 0.6260],
        [1.1823, 0.7739, 0.9261]])
tensor([[0.2839, 1.0747, 1.0051],
        [1.4980, 0.9206, 1.0165],
        [0.6518, 0.8641, 1.1630],
        [0.7251, 1.2638, 0.6260],
        [1.1823, 0.7739, 0.9261]])


** 注意，tensor赋值是共享内存的，也就是赋值为内存地址，因此改变一个原有的也会被改变 **
```python
print(x)
y=x
y+=1
print(x)
```
Output:
```python
tensor([[1.1406, 1.2201, 1.9098],
        [1.7349, 1.1998, 1.6482],
        [1.6419, 1.3224, 1.9947],
        [1.4533, 1.3280, 1.2317],
        [1.6811, 1.5898, 1.2982]])
tensor([[2.1406, 2.2201, 2.9098],
        [2.7349, 2.1998, 2.6482],
        [2.6419, 2.3224, 2.9947],
        [2.4533, 2.3280, 2.2317],
        [2.6811, 2.5898, 2.2982]])
```

如果需要克隆，可以用：
```python
y=x.clone()
```
下面会提到转换数据shape的函数：view，也是共享data的

**注：虽然view返回的Tensor与源Tensor是共享data的，但是依然是一个新的Tensor（因为Tensor除了包含data外还有一些其他属性），二者id（内存地址）并不一致。**

In [27]:
# 索引,数组切片
x[2:,:-1]

tensor([[2.6419, 2.3224],
        [2.4533, 2.3280],
        [2.6811, 2.5898]])

In [29]:
# view改变形状

print(x.view(15))
print(x.view(-1, 5))  # -1所指的维度可以根据其他维度的值推出来


tensor([2.1406, 2.2201, 2.9098, 2.7349, 2.1998, 2.6482, 2.6419, 2.3224, 2.9947,
        2.4533, 2.3280, 2.2317, 2.6811, 2.5898, 2.2982])
tensor([[2.1406, 2.2201, 2.9098, 2.7349, 2.1998],
        [2.6482, 2.6419, 2.3224, 2.9947, 2.4533],
        [2.3280, 2.2317, 2.6811, 2.5898, 2.2982]])


** 这里view同上，只是改变了数据的排列方式。**
** 因此如果赋值后也会是赋内存地址，改变后源数据也会改变 **

In [34]:
# tensor转换为number
print(x)
print(x[0][0].item())

tensor([[0.1406, 0.2201, 0.9098],
        [0.7349, 0.1998, 0.6482],
        [0.6419, 0.3224, 0.9947],
        [0.4533, 0.3280, 0.2317],
        [0.6811, 0.5898, 0.2982]])
0.14061522483825684


In [40]:
#broadcast机制
#如果x和y数组的维度不同，则触发broadcast机制完成运算

x=torch.tensor([1,2])
y=torch.tensor([[0],[1],[0],])
print(x)
print(y)
print(x+y)

print(torch.tensor([[1,2],[1,2],[1,2]])
      +torch.tensor([[0,0],[1,1],[0,0]]))
# x=

tensor([1, 2])
tensor([[0],
        [1],
        [0]])
tensor([[1, 2],
        [2, 3],
        [1, 2]])
tensor([[1, 2],
        [2, 3],
        [1, 2]])


In [41]:
# tensor和numpy也是共享data的
a = torch.ones(5)
b = a.numpy()
print(a, b)

a += 1
print(a, b)
b += 1
print(a, b)

tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]
tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]
tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]


In [43]:
# torch_from_numpy()可以把numpy转tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
print(a, b)

a += 1
print(a, b)
b += 1
print(a, b)

# 如果使用torch.tensor函数，会把数据拷贝，不会共享data
a=np.ones(5)
b=torch.tensor(a)
print(a, b)

a += 1
print(a, b)
b += 1
print(a, b)

[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)
[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


In [45]:
#to可以把数据从cpu移动到GPU?
# 平常我们会这么操作？

if torch.cuda.is_available():
    device = torch.device("cuda")          # GPU
    print(device)
    y = torch.ones_like(x, device=device)  # 直接创建一个在GPU上的Tensor
    x = x.to(device)                       # 等价于 .to("cuda")
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # to()还可以同时更改数据类型

cuda
tensor([2, 3], device='cuda:0')
tensor([2., 3.], dtype=torch.float64)


In [4]:
import torch
torch.zeros(5,3)

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])