Transformers 库建立在 Pytorch 框架之上（Tensorflow 的版本功能并不完善），
虽然官方宣称使用 Transformers 库并不需要掌握 Pytorch 知识，但是实际上我们还是需要通过 Pytorch 的 
DataLoader 类来加载数据、使用 Pytorch 的优化器对模型参数进行调整等等。

In [None]:
# 张量 (Tensor) 是深度学习的基础，例如常见的 0 维张量称为标量 (scalar)、1 维张量称为向量 (vector)、2 维张量称为矩阵 (matrix)。Pytorch 本质上就是一个基于张量的数学计算工具包，它提供了多种方式来创建张量：


In [5]:
import torch
torch.empty(2, 3) # empty tensor (uninitialized), shape (2,3)


tensor([[-5.9185e-31,  1.3116e-42,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00]])

In [2]:
torch.rand(2, 3) # random tensor, each value taken from [0,1)

tensor([[0.0597, 0.7948, 0.3898],
        [0.4318, 0.8445, 0.8419]])

In [3]:
torch.randn(2, 3) # random tensor, each value taken from standard normal distribution

tensor([[ 0.2076,  0.9583, -0.4509],
        [-0.6981, -0.3623, -0.6361]])

In [4]:
torch.zeros(2, 3, dtype=torch.long) # long integer zero tensor

tensor([[0, 0, 0],
        [0, 0, 0]])

In [5]:
torch.zeros(2, 3, dtype=torch.double) # double float zero tensor
# tensor([[0., 0., 0.],
#         [0., 0., 0.]], dtype=torch.float64)

tensor([[0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)

In [8]:
torch.arange(10)
torch.tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:
array = [[1.0, 3.8, 2.1], [8.6, 4.0, 2.4]]
torch.tensor(array)


tensor([[1.0000, 3.8000, 2.1000],
        [8.6000, 4.0000, 2.4000]])

In [9]:
import numpy as np
array = np.array([[1.0, 3.8, 2.1], [8.6, 4.0, 2.4]])
torch.from_numpy(array)

tensor([[1.0000, 3.8000, 2.1000],
        [8.6000, 4.0000, 2.4000]], dtype=torch.float64)

     PyTorch with CUDA Support 
     conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y 

In [6]:
# this code asures that pytotch can access the GPU
# shift the conda env pytorh_GPU_cuda to the front
import torch
#print(torch.cuda.is_available())
print(torch.version.cuda)

11.8


In [7]:
import torch, platform
print("PyTorch", torch.__version__) # 2.80+CPU is CPU , 2.5.2cu is CUDA
print("CUDA available", torch.cuda.is_available())
print("CUDA version", torch.version.cuda)
print("GPU", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "None")

PyTorch 2.5.1
CUDA available True
CUDA version 11.8
GPU NVIDIA GeForce RTX 3060 Laptop GPU


In [12]:
import torch, platform, subprocess, sys
print("PyTorch build:", torch.__version__)          #PyTorch now sees the CUDA 12.1 wheel (+cu121)
print("Python:", sys.version.split()[0], platform.architecture()[0])
try:
    print("CUDA runtime (nvcc):", subprocess.check_output(["nvcc","--version"], text=True).split("\n")[3])
except FileNotFoundError:
    print("CUDA runtime (nvcc): not found  ← this is OK, wheels bundle their own")

PyTorch build: 2.8.0+cu126
Python: 3.11.13 64bit
CUDA runtime (nvcc): Cuda compilation tools, release 12.6, V12.6.85


In [13]:
# 上面这些方式创建的张量会存储在内存中并使用 CPU 进行计算，如果想要调用 GPU 计算，需要直接在 GPU 中创建张量或者将张量送入到 GPU 中：

torch.rand(2, 3).cuda()


tensor([[0.6033, 0.7959, 0.3852],
        [0.0942, 0.1345, 0.6468]], device='cuda:0')

In [14]:
import torch
print("CUDA available:", torch.cuda.is_available())
print("Device count:", torch.cuda.device_count())
print("Current device:", torch.cuda.current_device())
print("Device name:", torch.cuda.get_device_name(0))

x = torch.rand(1).cuda()
print("Random tensor on GPU:", x)

CUDA available: True
Device count: 1
Current device: 0
Device name: NVIDIA GeForce RTX 3060 Laptop GPU
Random tensor on GPU: tensor([0.4079], device='cuda:0')


In [8]:

torch.rand(2, 3, device="cuda")


tensor([[0.4028, 0.8158, 0.5686],
        [0.1025, 0.6553, 0.3080]], device='cuda:0')

In [16]:

torch.rand(2, 3).to("cuda")

tensor([[0.2328, 0.0817, 0.6757],
        [0.0998, 0.0417, 0.6276]], device='cuda:0')

In [17]:
# 进行 view 操作的张量必须是连续的 (contiguous)，可以调用 is_conuous 来判断张量是否连续；如果非连续，需要先通过 contiguous 函数将其变为连续的。也可以直接调用 Pytorch 新提供的 reshape 函数，它与 view 功能几乎一致，并且能够自动处理非连续张量。

# 转置 transpose 交换张量中的两个维度，参数为相应的维度：

x = torch.tensor([[1, 2, 3], [4, 5, 6]])
x


tensor([[1, 2, 3],
        [4, 5, 6]])

In [18]:

x.transpose(0, 1)


tensor([[1, 4],
        [2, 5],
        [3, 6]])

In [19]:
# 交换维度 permute 与 transpose 函数每次只能交换两个维度不同，permute 可以直接设置新的维度排列方式：

x = torch.tensor([[[1, 2, 3], [4, 5, 6]]])
print(x, x.shape)


tensor([[[1, 2, 3],
         [4, 5, 6]]]) torch.Size([1, 2, 3])


In [20]:

x = x.permute(2, 0, 1)
print(x, x.shape)


tensor([[[1, 4]],

        [[2, 5]],

        [[3, 6]]]) torch.Size([3, 1, 2])


In [21]:

# 广播机制
# 前面我们都是假设参与运算的两个张量形状相同。在有些情况下，即使两个张量形状不同，也可以通过广播机制 (broadcasting mechanism) 对其中一个或者同时对两个张量的元素进行复制，使得它们形状相同，然后再执行按元素计算。

# 例如，我们生成两个形状不同的张量：

x = torch.arange(1, 4).view(3, 1) # shape (3,1) 
y = torch.arange(4, 6).view(1, 2) # shape (1,2)

In [22]:
 print(x + y)

tensor([[5, 6],
        [6, 7],
        [7, 8]])


In [23]:
x = torch.arange(12).view(3, 4)
x

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [24]:
 x[1, 3] # element at row 1, column 3

tensor(7)

In [25]:
 x[1:3] 

tensor([[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [26]:
x[:, 2]

tensor([ 2,  6, 10])

In [27]:
x[:, 2:4]

tensor([[ 2,  3],
        [ 6,  7],
        [10, 11]])

In [None]:
# # 1. 卸载现有 Pillow（无论用 pip 还是 conda 装的）
# # run in in terminal is better by ZHEN . because jupyter sometimes has issue with conda yes not show in cell 
# conda uninstall pillow -y 
# pip uninstall pillow  # 确保彻底清除!

# # 2. 用 conda 重新安装（自动解决 DLL 依赖）
# conda install pillow -c conda-forge -y 

In [1]:
import platform
print(platform.architecture())  # 应输出 ('64bit', 'WindowsPE')

('64bit', 'WindowsPE')


In [2]:
from PIL import Image
print(Image.__version__)

11.3.0


cannot import name 'datasets' from 'torchvision' (unknown location)

In [9]:
import pkg_resources
print(pkg_resources.get_distribution("torchvision").version)
# # 从 pytorch 官方频道安装（自动匹配 CUDA 版本）
#conda install torchvision -c pytorch -c nvidia
# conda update torchvision -c pytorch -y

0.23.0+cu126


In [9]:
import torchvision
print(torchvision.__version__)

0.20.1


In [10]:
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

train_features, train_labels = next(iter(train_dataloader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")

img = train_features[0].squeeze()
label = train_labels[0]
print(img.shape)
print(f"Label: {label}")

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data\FashionMNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 26.4M/26.4M [00:03<00:00, 8.27MB/s]


Extracting data\FashionMNIST\raw\train-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 29.5k/29.5k [00:00<00:00, 267kB/s]


Extracting data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 4.42M/4.42M [00:01<00:00, 4.35MB/s]


Extracting data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 5.15k/5.15k [00:00<?, ?B/s]

Extracting data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to data\FashionMNIST\raw

Feature batch shape: torch.Size([64, 1, 28, 28])
Labels batch shape: torch.Size([64])
torch.Size([28, 28])
Label: 1





In [17]:
#conda activate torch_cuda
!python -c "import torch; print(torch.__version__)"

2.5.1


In [2]:
from torch.utils.data import IterableDataset, DataLoader

class MyIterableDataset(IterableDataset):

    def __init__(self, start, end):
        super(MyIterableDataset).__init__()
        assert end > start
        self.start = start
        self.end = end

    def __iter__(self):
        return iter(range(self.start, self.end))

ds = MyIterableDataset(start=3, end=7) # [3, 4, 5, 6]
# Single-process loading
print(list(DataLoader(ds, num_workers=0)))
# # Directly doing multi-process loading
# print(list(DataLoader(ds, num_workers=2)))

[tensor([3]), tensor([4]), tensor([5]), tensor([6])]
