# Pytorch基础

In [3]:
print("欢迎来到Pytorch的学习")

欢迎来到Pytorch的学习


## 导入需要使用的包

In [5]:
# 环境准备
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
print("torch_version: " + torch.__version__)
print("pandas_version: " + pd.__version__)
print("numpy_version: " + np.__version__)

torch_version: 2.9.0+cu128
pandas_version: 2.3.3
numpy_version: 2.3.4


## 1. Tensor（张量）用来表示数据，特别是多维数值数据

### 创建张量————scalar（标量）

In [6]:
# scalar
scalar = torch.tensor(7)  # 创建标量
scalar  # 获取标量

tensor(7)

In [7]:
scalar.ndim # 获取标量的维度

0

In [8]:
scalar.item() # 将标量转换为Python数值

7

### 创建张量————vector（向量）

In [9]:
# vector
vector = torch.tensor([7,7])
vector

tensor([7, 7])

### 维度可以通过闭合方括号的数量来确定

In [10]:
vector.ndim  # 获取向量的维度


1

In [11]:
vector.shape  # 获取向量的形状（元素的个数）

torch.Size([2])

### 创建张量————MATRIX（矩阵）

In [12]:
#MARTIX
MARTIX = torch.tensor([[1,2,3],[4,5,6]])
MARTIX

tensor([[1, 2, 3],
        [4, 5, 6]])

In [13]:
MARTIX.ndim  # 获取矩阵的维度

2

In [14]:
MARTIX.shape # 获取矩阵的形状

torch.Size([2, 3])

In [15]:
MARTIX[0]  # 获取矩阵的第一行

tensor([1, 2, 3])

In [16]:
MARTIX[1]  # 获取矩阵的第二行

tensor([4, 5, 6])

### 创建张量————TENSOR（张量）

In [17]:
# TENSOR
TENSOR = torch.tensor([[[1,2,3],
                       [3,6,9],
                       [2,4,5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

In [18]:
TENSOR.ndim  # 获取张量的维度,可以更多，n维

3

In [19]:
TENSOR.shape  # 获取张量的形状

torch.Size([1, 3, 3])

torch.Size([3, 1, 3]) 可解读为一个3行3列的矩阵

In [20]:
TENSOR[0]   # 获取张量的第一个矩阵

tensor([[1, 2, 3],
        [3, 6, 9],
        [2, 4, 5]])

Alright, it outputs `torch.Size([1, 3, 3])`.

维度从外到内,这意味着有一个 3 x 3 的维度。

![example of different tensor dimensions](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-pytorch-different-tensor-dimensions.png)

> **Note:** 使用小写字母表示 `scalar` and `vector` and 大写字母表示 `MATRIX` and `TENSOR`. 这是有意的. 在实践中，你经常会看到标量和向量用小写字母表示，例如 `y` or `a`. 而矩阵和张量则用大写字母表示，例如 `X` or `W`.

> You also might notice the names martrix（矩阵）and tensor（张量）used interchangably（互换地）. This is common. Since in PyTorch you're often dealing with `torch.Tensor`s (hence the tensor name), however, the shape and dimensions of what's inside will dictate what it actually is.（形状和维度才是区分他们的到底是那一类的关键）

Let's summarise.

| Name | What is it? | Number of dimensions | Lower or upper (usually/example) |
| ----- | ----- | ----- | ----- |
| **scalar** | a single number | 0 | Lower (`a`) | 
| **vector** | a number with direction (e.g. wind speed with direction) but can also have many other numbers | 1 | Lower (`y`) |
| **matrix** | a 2-dimensional array of numbers | 2 | Upper (`Q`) |
| **tensor** | an n-dimensional array of numbers | can be any number, a 0-dimension tensor is a scalar, a 1-dimension tensor is a vector | Upper (`X`) | 

![scalar vector matrix tensor and what they look like](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/00-scalar-vector-matrix-tensor.png)

## 随机张量

We've established tensors represent some form of data.

And machine learning models such as neural networks manipulate（操作） and seek patterns（模式） within tensors.

But when building machine learning models with PyTorch, it's rare you'll create tensors by hand (like what we've been doing).

Instead, a machine learning model often starts out with large random tensors of numbers and adjusts these random numbers as it works through data to better represent it.

In essence:

`Start with random numbers -> look at data -> update random numbers -> look at data -> update random numbers...`

As a data scientist, you can define how the machine learning model starts (initialization), looks at data (representation) and updates (optimization) its random numbers.

We'll get hands on with these steps later on.

For now, let's see how to create a tensor of random numbers.

We can do so using [`torch.rand()`](https://pytorch.org/docs/stable/generated/torch.rand.html) and passing in the `size` parameter.

In [21]:
# Random Tensors

RANDOM_TENSOR1 = torch.randn(3)  # 创建随机张量(3)
RANDOM_TENSOR1

tensor([-1.3099, -0.1811,  0.6901])

In [22]:
RANDOM_TENSOR1.ndim  # 获取随机张量的维度

1

In [23]:
RANDOM_TENSOR1.shape

torch.Size([3])

In [24]:
# Random Tensors

RANDOM_TENSOR2 = torch.randn(3,4)  # 创建随机张量(3*4),也可以是torch.randn(size=(3,4))
RANDOM_TENSOR2

tensor([[ 0.4556, -0.4837,  0.3098,  0.8422],
        [-0.5968,  0.2964,  0.6528,  0.9814],
        [ 0.4655,  0.6312,  0.2195, -2.0478]])

In [25]:
RANDOM_TENSOR2.ndim

2

In [26]:
RANDOM_TENSOR2.shape  

torch.Size([3, 4])

In [27]:
# Random Tensors

RANDOM_TENSOR3 = torch.randn(1,3,3)  # 创建随机张量(1*3*3)
RANDOM_TENSOR3

tensor([[[ 1.0378, -0.4633, -1.7917],
         [ 0.3666,  1.2363,  0.5686],
         [-0.0153,  0.8491,  1.2713]]])

In [28]:
RANDOM_TENSOR3.ndim

3

In [29]:
RANDOM_TENSOR3.shape

torch.Size([1, 3, 3])

### 创建图像大小的张量

In [30]:
random_tensor_image_size = torch.randn(224,224,3)  # height, width, color_channels(Red, Green, Blue)
random_tensor_image_size.shape, random_tensor_image_size.ndim

(torch.Size([224, 224, 3]), 3)

### 0-1 张量

In [31]:
# create a tensor of zeros(常用于mask)
zeros = torch.zeros(3,4)    # 创建一个2*3的全0张量
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [32]:
zeros * RANDOM_TENSOR2 # 全0张量与任意张量相乘，结果仍为全0张量

tensor([[0., -0., 0., 0.],
        [-0., 0., 0., 0.],
        [0., 0., 0., -0.]])

In [33]:
# create a tensor of ones
ones = torch.ones(3,4)
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [34]:
ones.dtype,zeros.dtype,RANDOM_TENSOR2.dtype

(torch.float32, torch.float32, torch.float32)

### 范围张量和相似张量

In [35]:
# create a range of tensors
one_to_ten = torch.arange(0,10)  # 创建一个范围张量，从0到9
one_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [36]:
# create a range of tensors
Q1 = torch.arange(start =1, end =11, step=1)  # 创建一个范围张量，从1到10
Q1

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [37]:
# create a range of tensors
Q2 = torch.arange(start =1, end =11, step=2)  # 创建一个范围张量，从1到10
Q2

tensor([1, 3, 5, 7, 9])

In [38]:
# create a tensor like
Q3 = torch.ones_like(one_to_ten) # 创建一个与one_to_ten形状相同的全1张量，也可以使用torch.ones_like(input=one_to_ten)
Q4 = torch.zeros_like(one_to_ten) # 创建一个与one_to_ten形状相同的全0张量，也可以使用torch.zeros_like(input=one_to_ten)
Q3, Q4

(tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

### 张量数据类型（tensor datatypes）

**Note**: Tensor datatypes is one of the 3 big errors you'll run into with Pytorch & Deep Learning:

1. Tensors not right datatype (e.g. compute float16 with float32)
   
2. Tensors not right shape
   
3. Tensors not ont the right (same) device

In [39]:
# Float 32 Tensor
float_32_tensor = torch.tensor([1.0 , 2.0 , 3.0], 
                               dtype=None, # What datatype do you want? float32, float16, int8, etc.
                               device=None, # If you want to put tensor on cpu(default), assign device='cpu', if on gpu, device='cuda'
                               requires_grad=False # whether or not to track gradients with this tensors operations
                               )  # 默认是float32
float_32_tensor, float_32_tensor.dtype

(tensor([1., 2., 3.]), torch.float32)

In [40]:
# Float 32 Tensor
float_32_tensor = torch.tensor([1.0 , 2.0 , 3.0], dtype=None)  # 默认是float32
float_32_tensor, float_32_tensor.dtype

(tensor([1., 2., 3.]), torch.float32)

In [41]:
# Float 32 Tensor
float_32_tensor = torch.tensor([1.0 , 2.0 , 3.0], dtype=torch.float32) 
float_32_tensor, float_32_tensor.dtype

(tensor([1., 2., 3.]), torch.float32)

In [42]:
# Transform float32 to float16
float_16_tensor=float_32_tensor.type(torch.float16)
float_16_tensor, float_16_tensor.dtype

(tensor([1., 2., 3.], dtype=torch.float16), torch.float16)

In [43]:
# Float 16 Tensor
float_16_tensor = torch.tensor([1.0 , 2.0 , 3.0], dtype=torch.float16) 
float_16_tensor, float_16_tensor.dtype

(tensor([1., 2., 3.], dtype=torch.float16), torch.float16)

In [44]:
# compute float16 with float32 error
float_16_tensor * float_32_tensor # 目前Pytorch的功能强大，可以自动转换dtype，不会报错

tensor([1., 4., 9.])

### Getting information from tensors (tensor attributes)

1. Tensors not right datatype (e.g. compute float16 with float32) - to get datatype from tensor , can use `tensor.dtype`
   
2. Tensors not right shape - to get shape from tensor , can use `tensor.shape`
   
3. Tensors not ont the right (same) device - to get device from tensor , can use `tensor.device`

In [45]:
# create a tensor 
tensor_example = torch.randn(3,4,device='cuda')
print(f"Datatype of tensor_example: {tensor_example.dtype}")
print(f"Shape of tensor_example: {tensor_example.shape}")
print(f"Device of tensor_example: {tensor_example.device}")
print(f"Size of tensor_example: {tensor_example.size()}") #tensor.size() 和 tensor.shape 功能相同

Datatype of tensor_example: torch.float32
Shape of tensor_example: torch.Size([3, 4])
Device of tensor_example: cuda:0
Size of tensor_example: torch.Size([3, 4])


### Manipulating Tensors (tensor operations)

Tensor operations include:
* Addition - 加法
  
* Subtraction - 减法
  
* Multiplication (element-wise) - 元素乘法
  
* Division - 除法
  
* Martix multiplication - 矩阵乘法

In [46]:
# Addition
# create a tensor and add 10 to it
tensor_A = torch.tensor([1,2,3])
tensor_A + 10

tensor([11, 12, 13])

In [47]:
# Subtraction
# create a tensor and subtract 10 from it
tensor_A - 10

tensor([-9, -8, -7])

In [48]:
# Multiplication
# create a tensor and multiply by 10
tensor_A * 10

tensor([10, 20, 30])

In [49]:
# Division
# create a tensor and divide by 10
tensor_A / 10

tensor([0.1000, 0.2000, 0.3000])

In [50]:
# Try out Pytorch in-built functions
torch.mul(tensor_A,10) # as same as tensor_A * 10
print(f"Multiplication result: {torch.mul(tensor_A,10)}")
torch.add(tensor_A,10) # as same as tensor_A + 10
print(f"Addition result: {torch.add(tensor_A,10)}")
torch.sub(tensor_A,10) # as same as tensor_A - 10
print(f"Subtraction result: {torch.sub(tensor_A,10)}")
torch.div(tensor_A,10) # as same as tensor_A / 10
print(f"Division result: {torch.div(tensor_A,10)}")

Multiplication result: tensor([10, 20, 30])
Addition result: tensor([11, 12, 13])
Subtraction result: tensor([-9, -8, -7])
Division result: tensor([0.1000, 0.2000, 0.3000])


### MARTIX Multiplication (矩阵乘法)

Two main ways of performing multiplacation in neural networks and deep learning:

1. Element-wise multiplication
   
2. Martix multiplication（dot product————点积）

There are two main rules that performing matrix mutliplication needs to satisfy:
1. The **inner dimensions** must match:
* `(3, 2) @ (3, 2)` won't work
* `(2, 3) @ (3, 2)` will work
* `(3, 2) @ (2, 3)` will work
2. The resulting matrix has the shape of the **outer dimensions**:
* `(2, 3) @ (3, 2)` -> `(2, 2)`
* `(3, 2) @ (2, 3)` -> `(3, 3)`

In [51]:
# Element wise Multiplication (矩阵乘法————逐元素相乘)
print(tensor_A, "*" , tensor_A)
print(f"Equals: {tensor_A * tensor_A}")
print(f"Equals: {torch.mul(tensor_A, tensor_A)}")

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])
Equals: tensor([1, 4, 9])


In [52]:
# Matrix Multiplication (矩阵乘法———点积)
print(tensor_A, "·", tensor_A)
print(f"Equals: {torch.matmul(tensor_A, tensor_A)}")
print(f"Equals: {tensor_A @ tensor_A}")

tensor([1, 2, 3]) · tensor([1, 2, 3])
Equals: 14
Equals: 14


In [53]:
# Martix Multiplication by loop
%time
value = 0
for i in range(len(tensor_A)):
    value += tensor_A[i] * tensor_A[i]
print(f"Equals: {value}")

CPU times: user 14 μs, sys: 1e+03 ns, total: 15 μs
Wall time: 28.8 μs
Equals: 14


In [54]:
# Martix Multiplication by torch in-built function
%time
value = torch.matmul(tensor_A, tensor_A)
print(f"Equals: {value}")

CPU times: user 19 μs, sys: 0 ns, total: 19 μs
Wall time: 35.3 μs
Equals: 14


**Note**: 很明显内置函数的运算更快

```python
value = 0
for i in range(1, 11):
    value += i
print(value)
```

循环基本写法

**Note**：

逐元素相乘：对应位置一一相乘，记作 A ⊙ B。

矩阵乘法：按行×列做点积，记作 A @ B 或 AB。

PyTorch 运算符

逐元素：* 或 torch.mul

矩阵乘法：@、torch.matmul、二维专用 torch.mm、批量 torch.bmm

形状规则

逐元素：对应维度要相等，或能按广播规则扩展（某维为 1 或缺失可广播）。

矩阵乘法：内维必须匹配，即 (m, n) @ (n, p) -> (m, p)；批量维可按规则广播，但参与乘法的最后两维需满足上述条件。

### One of the most common errors in deep learning : shape errors

In [55]:
# Shape for martix multiplication
tensor_1 = torch.tensor([[1,2],
                         [3,4],
                         [5,6]])

tensor_2 = torch.tensor([[7,10],
                         [8,11],
                         [9,12]])
torch.mm(tensor_1, tensor_2)  # tensor_2.T表示对tensor_2进行转置

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

To fix our tensor shape issues, we can manipulate the shape of one of our tensors using a **transpose**.

A **transpose** switches the axes or dimensions of a given tensor.

In [None]:
print(f"Shape of tensor_2 before transpose: {tensor_2.shape}")
print(f"Shape of tensor_2 after transpose: {tensor_2.T.shape}")

Shape of tensor_2 before transpose: torch.Size([3, 2])
Shape of tensor_2 after transpose: torch.Size([2, 3])


In [None]:
# The matrix multiplication operation works when tensor_2 is transposed
print(f"Original shapes: tensor_1 = {tensor_1.shape}, tensor_2 = {tensor_2.shape}")
print(f"New shapes: tensor_1 = {tensor_1.shape} (same shape as above), tensor_2.T = {tensor_2.T.shape}")
print(f"Multiplying: {tensor_1.shape} @ {tensor_2.T.shape} <- inner dimensions must match")
print("Output:\n")
output = torch.matmul(tensor_1, tensor_2.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_1 = torch.Size([3, 2]), tensor_2 = torch.Size([3, 2])
New shapes: tensor_1 = torch.Size([3, 2]) (same shape as above), tensor_2.T = torch.Size([2, 3])
Multiplying: torch.Size([3, 2]) @ torch.Size([2, 3]) <- inner dimensions must match
Output:

tensor([[ 27,  30,  33],
        [ 61,  68,  75],
        [ 95, 106, 117]])

Output shape: torch.Size([3, 3])


### Element Multiplication

In [None]:
x = torch.tensor([[1,2],
                 [3,4]])

y= torch.tensor([[5,6],
               [7,8]])

out = x*y
print(f"dim of x: {x.ndim}\n")
print(f"Shape of x: {x.shape}\n")
print(f"dim of y: {y.ndim}\n")
print(f"Shape of y: {y.shape}\n")
print(f"Shape of out: {out.shape}\n")
print(f"Contents of out: {out}")

dim of x: 2

Shape of x: torch.Size([2, 2])

dim of y: 2

Shape of y: torch.Size([2, 2])

Shape of out: torch.Size([2, 2])

Contents of out: tensor([[ 5, 12],
        [21, 32]])


**Note**: 形状完全一样，可以逐元素相乘

In [None]:
x = torch.tensor([[1,2],
                 [3,4]])

y= torch.tensor([5,6])

out = x*y
print(f"dim of x: {x.ndim}\n")
print(f"Shape of x: {x.shape}\n")
print(f"dim of y: {y.ndim}\n")
print(f"Shape of y: {y.shape}\n")
print(f"Shape of out: {out.shape}\n")
print(f"Contents of out: {out}")

dim of x: 2

Shape of x: torch.Size([2, 2])

dim of y: 1

Shape of y: torch.Size([2])

Shape of out: torch.Size([2, 2])

Contents of out: tensor([[ 5, 12],
        [15, 24]])


**Note**: 行对应，可以逐元素相乘

In [None]:
x = torch.tensor([[1,2],
                 [3,4]])

y= torch.tensor([[5],
                 [6]])

out = x*y
print(f"dim of x: {x.ndim}\n")
print(f"Shape of x: {x.shape}\n")
print(f"dim of y: {y.ndim}\n")
print(f"Shape of y: {y.shape}\n")
print(f"Shape of out: {out.shape}\n")
print(f"Contents of out: {out}")

dim of x: 2

Shape of x: torch.Size([2, 2])

dim of y: 2

Shape of y: torch.Size([2, 1])

Shape of out: torch.Size([2, 2])

Contents of out: tensor([[ 5, 10],
        [18, 24]])


**Note**: 列对应，可以逐元素相乘

### Finding the min,max,mean,sum,etc (tensor aggregation(张量聚合))


In [58]:
# Create a tensor
x = torch.arange(0,100,10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [60]:
# Find the min value in a tensor
min_value = torch.min(x) # also x.min()
print(f"Min value in tensor x is : {min_value}")

Min value in tensor x is : 0


In [61]:
# Find the max value in a tensor
max_value = torch.max(x) # also x.max()
print(f"Max value in tensor x is : {max_value}")

Max value in tensor x is : 90


In [62]:
# Find the mean value in a tensor
mean_value = torch.mean(x.type(torch.float32))  # 注意：要先将整数张量转换为浮点型张量才能计算均值,also x.type(torch.float32).mean()
print(f"Mean value in tensor x is : {mean_value}")  # dtype张量转换为浮点型张量才能计算均值

Mean value in tensor x is : 45.0


In [63]:
# compute the sum of all values in a tensor
sum_value = torch.sum(x) # also x.sum()
print(f"Sum of all values in tensor x is : {sum_value}")

Sum of all values in tensor x is : 450


### Finding the positional min and max

In [64]:
# Find the position of the max element in a tensor
max_position = torch.argmax(x) # also x.argmax()
print(f"Position of the max value in tensor x is : {max_position}")

Position of the max value in tensor x is : 9


In [65]:
# Find the position of the min element in a tensor 
min_position = torch.argmin(x) # also x.argmin()
print(f"Position of the min value in tensor x is : {min_position}")

Position of the min value in tensor x is : 0


### Reshaping,stacking,squeezing and unsqueezing tensors (重塑，堆叠，压缩和解压缩)

* Reshaping - reshapes an input tensor to a defined shape（把输入张量重塑成特定形状）
  
* View - Return a view of an input tensor of certain shape but keep the same memory as the original tensor（返回张量特定形状的视图，但不改变张量本身）
  
* Stacking - combine multiple tensors on top of each other (vstack) or side by side (hstack)（将多个张量堆叠）
  
* Squeeze - removes all `single` dimensions from a tensor
  
* Unsqueeze - add a `single` dimension to a target tensor
  
* Permute - Return a view of the input with dimensions permuted (swapped) in a certain way

In [66]:
# Create a tensor
x = torch.arange(1.,10.)
x,x.shape

(tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), torch.Size([9]))

In [67]:
# Reshape your tensor
x_shaped1 = x.reshape(1,9) # 元素数量要匹配
x_shaped2 = x.reshape(9,1) # 元素数量要匹配
x_shaped3 = x.reshape(3,3) # 元素数量要匹配
print(f"x_shaped1: {x_shaped1}, {x_shaped1.shape}")
print(f"x_shaped2: {x_shaped2}, {x_shaped2.shape}")
print(f"x_shaped3: {x_shaped3}, {x_shaped3.shape}")

x_shaped1: tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9])
x_shaped2: tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.],
        [7.],
        [8.],
        [9.]]), torch.Size([9, 1])
x_shaped3: tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]]), torch.Size([3, 3])


In [69]:
# Change the view of a tensor
z = x.view(1,9)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]]), torch.Size([1, 9]))

In [70]:
# Changing z changes x (because a view of a tensor shares the same memory as the original input tensor)(z 和 x 共享内存，改变z会影响x)
z[:,0] = 5
print(f"Modified z: {z}")
print(f"Original x after modifying z: {x}")

Modified z: tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
Original x after modifying z: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])


In [71]:
# Indexing (索引)
w = torch.tensor([[1,2,3],
                  [4,5,6]])
print(f"w的第三列: {w[:,2]}")
print(f"w的第二行: {w[1,:]}")
print(f"w的第二行第三列: {w[1,2]}")

w的第三列: tensor([3, 6])
w的第二行: tensor([4, 5, 6])
w的第二行第三列: 6


In [72]:
# Stack tensors on top 
x_stacked_0 = torch.stack([x,x,x,x])  # 默认在第0维度进行堆叠,即在最外层增加一个维度，torch.stack([<tensors>], dim=0)
x_stacked_1 = torch.stack([x,x,x,x], dim=1)  # 在第1维度进行堆叠
x_stacked_v = torch.vstack([x,x,x,x])  # 在第0维度进行堆叠
x_stacked_h = torch.hstack([x,x,x,x])  # 横向堆叠
x_stacked_0, x_stacked_0.shape,x_stacked_1, x_stacked_1.shape,x_stacked_v,x_stacked_v.shape,x_stacked_h, x_stacked_h.shape

(tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 torch.Size([4, 9]),
 tensor([[5., 5., 5., 5.],
         [2., 2., 2., 2.],
         [3., 3., 3., 3.],
         [4., 4., 4., 4.],
         [5., 5., 5., 5.],
         [6., 6., 6., 6.],
         [7., 7., 7., 7.],
         [8., 8., 8., 8.],
         [9., 9., 9., 9.]]),
 torch.Size([9, 4]),
 tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.],
         [5., 2., 3., 4., 5., 6., 7., 8., 9.]]),
 torch.Size([4, 9]),
 tensor([5., 2., 3., 4., 5., 6., 7., 8., 9., 5., 2., 3., 4., 5., 6., 7., 8., 9.,
         5., 2., 3., 4., 5., 6., 7., 8., 9., 5., 2., 3., 4., 5., 6., 7., 8., 9.]),
 torch.Size([36]))

**Note**: 

当输入是 1D 张量时：

vstack([x, x]) 与 stack([x, x], dim=0) 结果形状一致，常可视为等价。

例如 x: (4,) → vstack → (2, 4)，stack(dim=0) → (2, 4)。

当输入是 2D 或更高维张量时：

不等价。vstack 相当于按行拼接（cat(dim=0)），不会新增维度；

而 stack 会在指定位置新增一个新维度，形状会不同。

用形状直观对比

1D 情况（与你笔记本里的 x 一样是 1D）：

x: (9,)；vstack([x,x,x,x]) → (4, 9)

stack([x,x,x,x], dim=0) → (4, 9) ← 与 vstack 相同

stack([x,x,x,x], dim=1) → (9, 4) ← 新增列维

2D 情况（A、B: (3, 4)）：

vstack([A, B]) → (6, 4) ← 沿第 0 维拼接

stack([A, B], dim=0) → (2, 3, 4) ← 新增一个维度

该用哪个:

想“把行接在一起”（样本拼接/增大批量）：用 vstack 或 cat(dim=0)。

想“打包成一个新维度”（增加批次维、头数维等）：用 stack(dim=…)，它会新增一个维度。

小结:

vstack ≈ cat(dim=0)，但会把 1D 张量先提升为 2D 的行向量后再拼接；

hstack ≈ cat(dim=1)，对 1D 会当作列方向拼接；

stack(dim=k) 会“新增”一个维度，语义和结果与 vstack/hstack 不同（除 1D 特例外观一致）。

In [None]:
# 例如
w_1 = torch.stack([w,w])
w_2 = torch.vstack([w,w])
w_1,w_2

(tensor([[[1, 2, 3],
          [4, 5, 6]],
 
         [[1, 2, 3],
          [4, 5, 6]]]),
 tensor([[1, 2, 3],
         [4, 5, 6],
         [1, 2, 3],
         [4, 5, 6]]))

### 用最通俗的话理解“维度”（dimension）和 shape

- 维度是什么？
  - 可以理解成“有几条互相独立的方向/轴”。
  - 标量=0维（只有一个数）；向量=1维（像一条数轴上的一排数）；矩阵=2维（有行和列的表格）；更高维=在此基础上再套一层层“外壳”。

- shape 是什么？
  - shape 就是“每个维度上的长度”。
  - 例如 (3, 4) 表示 2D：3 行、4 列；(2, 3, 4) 表示 3D：最外层有 2 个“页”，每页是 3×4 的矩阵。
  - 在深度学习里常见的形状如 (N, C, H, W)：批次 N、通道 C、高 H、宽 W（PyTorch 常用 channels-first）。

- 常见操作如何改变维度/shape
  - stack：在指定位置“新增一个维度”。比如把多个同形状张量打包成一叠。
  - vstack/hstack：不新增维度，只是在现有维度上“接在一起”（v=竖直按行拼，h=水平按列拼）。
  - unsqueeze/squeeze：显式地“加/去”长度为 1 的维度。
  - reshape：改变形状但元素总数不变（像把乐高换个拼法）。
  - transpose/permute：交换维度顺序（把行列或更高维的轴对调）。

- 一句话技巧
  - 看 shape 从左到右就像坐标：(外层 … -> 内层)。矩阵乘法要“内维对得上（m,n)@(n,p)→(m,p)”，堆叠/拼接要“外观一致或能广播”。

In [None]:
# 维度/shape 快速可视化演示
import torch

print("--- 1D 示例 ---")
x = torch.arange(1., 6.)          # shape: (5,)
print("x:", x.shape)
x_stack0 = torch.stack([x, x], dim=0)   # 新增一个维度在最外层 -> (2, 5)
x_v = torch.vstack([x, x])              # 按行堆叠 -> (2, 5)（1D 特例与 stack(dim=0) 形状一致）
x_h = torch.hstack([x, x])              # 按列拼接 -> (10,)
print("stack(dim=0):", x_stack0.shape)
print("vstack:      ", x_v.shape)
print("hstack:      ", x_h.shape)

# unsqueeze 显式增加长度为1的维度
x_u0 = x.unsqueeze(0)   # (1, 5)
x_u1 = x.unsqueeze(1)   # (5, 1)
print("unsqueeze(0):", x_u0.shape)
print("unsqueeze(1):", x_u1.shape)

print("\n--- 2D 示例 ---")
A = torch.arange(1., 7.).reshape(2, 3)   # (2, 3)
B = torch.arange(10., 16.).reshape(2, 3) # (2, 3)
print("A:", A.shape, "B:", B.shape)
A_v = torch.vstack([A, B])               # (4, 3) ← 沿第0维拼接
A_cat0 = torch.cat([A, B], dim=0)        # (4, 3) ← 与 vstack 等价
A_s0 = torch.stack([A, B], dim=0)        # (2, 2, 3) ← 新增维度
A_h = torch.hstack([A, B])               # (2, 6) ← 沿第1维拼接
A_cat1 = torch.cat([A, B], dim=1)        # (2, 6) ← 与 hstack 等价
A_s1 = torch.stack([A, B], dim=1)        # (2, 2, 3) ← 在第1维新增维度
print("vstack:      ", A_v.shape)
print("cat(dim=0): ", A_cat0.shape)
print("stack(dim=0):", A_s0.shape)
print("hstack:      ", A_h.shape)
print("cat(dim=1): ", A_cat1.shape)
print("stack(dim=1):", A_s1.shape)

--- 1D 示例 ---
x: torch.Size([5])
stack(dim=0): torch.Size([2, 5])
vstack:       torch.Size([2, 5])
hstack:       torch.Size([10])
unsqueeze(0): torch.Size([1, 5])
unsqueeze(1): torch.Size([5, 1])

--- 2D 示例 ---
A: torch.Size([2, 3]) B: torch.Size([2, 3])
vstack:       torch.Size([4, 3])
cat(dim=0):  torch.Size([4, 3])
stack(dim=0): torch.Size([2, 2, 3])
hstack:       torch.Size([2, 6])
cat(dim=1):  torch.Size([2, 6])
stack(dim=1): torch.Size([2, 2, 3])


In [73]:
# torch.squeeze() - removes all single dimensions from a target tensor
print(f"Previous tensor: {x_shaped1}")
print(f"Previous shape: {x_shaped1.shape}")

# Remove extra dimensions from x_shaped1
x_squeezed = x_shaped1.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
Previous shape: torch.Size([1, 9])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
New shape: torch.Size([9])


In [76]:
# torch.unsqueeze() - adds a single dimension to a target tensor at a specific dim (dimension)
print(f"Previous target: {x_squeezed}")
print(f"Previous shape: {x_squeezed.shape}")

# Add an extra dimension with unsqueeze
x_unsqueezed0 = x_squeezed.unsqueeze(dim=0)
x_unsqueezed1 = x_squeezed.unsqueeze(dim=1)
print(f"\nNew tensor (dim=0): {x_unsqueezed0}")
print(f"New shape (dim=0): {x_unsqueezed0.shape}")
print(f"\nNew tensor (dim=1): {x_unsqueezed1}")
print(f"New shape (dim=1): {x_unsqueezed1.shape}")

Previous target: tensor([5., 2., 3., 4., 5., 6., 7., 8., 9.])
Previous shape: torch.Size([9])

New tensor (dim=0): tensor([[5., 2., 3., 4., 5., 6., 7., 8., 9.]])
New shape (dim=0): torch.Size([1, 9])

New tensor (dim=1): tensor([[5.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.],
        [7.],
        [8.],
        [9.]])
New shape (dim=1): torch.Size([9, 1])


In [77]:
# torch.permute - rearranges the dimensions of a target tensor in a specified order
x_original = torch.rand(size=(224, 224, 3)) # [height, width, colour_channels]

# Permute the original tensor to rearrange the axis (or dim) order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}") 
print(f"New shape: {x_permuted.shape}") # [colour_channels, height, width]

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


**Note**:

rand:返回一个张量，包含了从区间[0, 1)的均匀分布中抽取的一组随机数。

randn:返回一个张量，包含了从标准正态分布（均值为0，方差为1，即高斯白噪声）中抽取的一组随机数。

In [96]:
# torch.permute 类似于torch.view，都与原tensor共享内存，不同之处在于permute可以改变维度的顺序，而view只能改变形状但不能改变维度的顺序。
x = torch.randn(1,2,3)
x_view = x.view(2,3)
x_permute = x.permute(2,0,1)  # 交换维度
print(f"x: {x}")
print(f"x_view: {x_view}")
print(f"x_permute: {x_permute}")
x_permute[0,:,:] = 2 # 可以看到，修改x_permute会影响x，也不会影响x_view
print(f"x: {x}")
print(f"x_view: {x_view}")
print(f"x_permute: {x_permute}")

x: tensor([[[ 0.4808, -0.2926,  0.9423],
         [ 0.9485,  0.9199,  1.0902]]])
x_view: tensor([[ 0.4808, -0.2926,  0.9423],
        [ 0.9485,  0.9199,  1.0902]])
x_permute: tensor([[[ 0.4808,  0.9485]],

        [[-0.2926,  0.9199]],

        [[ 0.9423,  1.0902]]])
x: tensor([[[ 2.0000, -0.2926,  0.9423],
         [ 2.0000,  0.9199,  1.0902]]])
x_view: tensor([[ 2.0000, -0.2926,  0.9423],
        [ 2.0000,  0.9199,  1.0902]])
x_permute: tensor([[[ 2.0000,  2.0000]],

        [[-0.2926,  0.9199]],

        [[ 0.9423,  1.0902]]])


## Indexing (selecting data from tensors)

Indexing with PyTorch is similar to indexing with NumPy.

In [107]:
# Create a tensor
import torch
x = torch.arange(1, 19).reshape(2, 3, 3)
x, x.shape

(tensor([[[ 1,  2,  3],
          [ 4,  5,  6],
          [ 7,  8,  9]],
 
         [[10, 11, 12],
          [13, 14, 15],
          [16, 17, 18]]]),
 torch.Size([2, 3, 3]))

In [108]:
# Let's index on our new tensor
x[0]

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [109]:
# Let's index on the middle bracket (dim=1)
x[0][0]

tensor([1, 2, 3])

In [110]:
# Let's index on the most inner bracket (last dimension)
x[0][1][1]

tensor(5)

In [119]:
x[:,0]

tensor([[ 1,  2,  3],
        [10, 11, 12]])

In [113]:
# Get all values of 0th and 1st dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[ 2,  5,  8],
        [11, 14, 17]])

In [114]:
# Get all values of the 0 dimension but only the 1 index value of 1st and 2nd dimension
x[:, 1, 1]

tensor([ 5, 14])

In [115]:
# Get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :]

tensor([1, 2, 3])

In [120]:
# Index on x to return 9
print(x[0][2][2])

# Index on x to return 3, 6, 9
print(x[0, :, 2])

tensor(9)
tensor([3, 6, 9])


## PyTorch tensors & NumPy

NumPy is a popular scientific Python numerical computing library. 

And because of this, PyTorch has functionality to interact with it.

* Data in NumPy, want in PyTorch tensor -> `torch.from_numpy(ndarray)`
  
* PyTorch tensor -> NumPy -> `torch.Tensor.numpy()` 

In [132]:
# NumPy array to tensor
import numpy as np
import torch

array = np.arange(1.,10.)
tensor = torch.from_numpy(array) # warning: when converting from numpy -> pytorch, pytorch reflects numpy's default datatype of float64 unless specified otherwise
print(f"array: {array}, dtype: {array.dtype}")
print(f"tensor: {tensor}, dtype: {tensor.dtype}")
tensor = tensor.type(torch.float32)  # convert to float32
print(f"Converted tensor: {tensor}, dtype: {tensor.dtype}")

array: [1. 2. 3. 4. 5. 6. 7. 8. 9.], dtype: float64
tensor: tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=torch.float64), dtype: torch.float64
Converted tensor: tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), dtype: torch.float32


**Note**: 

Numpy的数组默认是float64格式

In [133]:
# Change the value of array, what will this do to `tensor`?
array = array + 1
array, tensor

(array([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]))

In [138]:
# tensor to NumPy array
import numpy as np
import torch

tensor = torch.arange(1.,10.)
array = tensor.numpy() # warning: when converting from pytorch -> numpy, numpy reflects pytorch's default datatype unless specified otherwise
print(f"tensor: {tensor}, dtype: {tensor.dtype}")
print(f"array: {array}, dtype: {array.dtype}")
array = array.astype(np.float64) # numpy转换格式用astype
print(f"Converted array: {array}, dtype: {array.dtype}")

tensor: tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.]), dtype: torch.float32
array: [1. 2. 3. 4. 5. 6. 7. 8. 9.], dtype: float32
Converted array: [1. 2. 3. 4. 5. 6. 7. 8. 9.], dtype: float64


In [139]:
# Change the value of tensor, what will this do to `array`?
tensor = tensor + 1
tensor, array

(tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 array([1., 2., 3., 4., 5., 6., 7., 8., 9.]))

**Note**:

array 和 tensor 不共享内存

## Reproducbility (trying to take random out of random)

In short how a neural network learns:

`start with random numbers -> tensor operations -> update random numbers to try and make them better representations of the data -> again -> again -> again...`

To reduce the randomness in neural networks and PyTorch comes the concept of a **random seed**.

Essentially what the random seed does is "flavour" the randomness. （随机种子的作用是影响随机性）

In [141]:
import torch

# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(random_tensor_A)
print(random_tensor_B)
print(random_tensor_A == random_tensor_B)

tensor([[0.9409, 0.8103, 0.7426, 0.6611],
        [0.7813, 0.2003, 0.7085, 0.2656],
        [0.1189, 0.5953, 0.6250, 0.3385]])
tensor([[0.9298, 0.8004, 0.4918, 0.8913],
        [0.9207, 0.3757, 0.1219, 0.7395],
        [0.3519, 0.1420, 0.3500, 0.6632]])
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])


In [142]:
# Let's make some random but reproducible tensors
import torch

# Set the random seed
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

torch.manual_seed(RANDOM_SEED)
random_tensor_D = torch.rand(3, 4)

print(random_tensor_C)
print(random_tensor_D)
print(random_tensor_C == random_tensor_D)

tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])
tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])


 ## Running tensors and PyTorch objects on the GPUs (and making faster computations)

 GPUs = faster computation on numbers, thanks to CUDA + NVIDIA hardware + PyTorch working behind the scenes to make everything hunky dory (good).

### 1. Getting a GPU

1. Easiest - Use Google Colab for a free GPU (options to upgrade as well)
   
3. Use your own GPU - takes a little bit of setup and requires the investment of purchasing a GPU, there's lots of options..., see this post for what option to get: https://timdettmers.com/2020/09/07/which-gpu-for-deep-learning/
   
4. Use cloud computing - GCP, AWS, Azure, these services allow you to rent computers on the cloud and access them

For 2, 3 PyTorch + GPU drivers (CUDA) takes a little bit of setting up, to do this, refer to PyTorch setup documentation: https://pytorch.org/get-started/locally/ 

In [144]:
!nvidia-smi

Sun Oct 26 17:22:53 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.01             Driver Version: 550.163.01     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:18:00.0 Off |                  N/A |
| 38%   36C    P8             21W /  320W |     266MiB /  16376MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4080 ...    Off |   00

### 2. Check for GPU access with PyTorch

In [146]:
# Check for GPU access with PyTorch
import torch
torch.cuda.is_available()

True

For PyTorch since it's capable of running compute on the GPU or CPU, it's best practice to setup device agnostic code: https://pytorch.org/docs/stable/notes/cuda.html#best-practices

E.g. run on GPU if available, else default to CPU

In [147]:
# Setup device agnostic code 
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [148]:
# Count number of devices
torch.cuda.device_count()

4

## 3. Putting tensors (and models) on the GPU

The reason we want our tensors/models on the GPU is because using a GPU results in faster computations.

In [149]:
# Create a tensor (default on the CPU)
tensor = torch.tensor([1, 2, 3])

# Tensor not on GPU
print(tensor, tensor.device)

tensor([1, 2, 3]) cpu


In [150]:
# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3], device='cuda:0')

In [151]:
# If tensor is on GPU, can't transform it to NumPy
tensor_on_gpu.numpy()

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In [152]:
# To fix the GPU tensor with NumPy issue, we can first set it to the CPU
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
tensor_back_on_cpu

array([1, 2, 3])

In [173]:
tensor_on_gpu,tensor_back_on_cpu.device

(tensor([1, 2, 3], device='cuda:0'), 'cpu')

In [162]:
tensor = torch.randn(1000, 1000, 1000)
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
print(tensor.sum())
end.record()
torch.cuda.synchronize()
print(f"Time taken (ms): {start.elapsed_time(end)}")

tensor(-13278.1016)
Time taken (ms): 215.99609375


In [171]:
tensor_gpu = tensor.to(device)
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
print(tensor_gpu.sum())
end.record()
torch.cuda.synchronize()
print(f"Time taken (ms): {start.elapsed_time(end)}")

tensor(-13278.1045, device='cuda:0')
Time taken (ms): 6.910431861877441


**Note**:

很明显gpu的运算速度更快