在深度学习中，维度（Dimensions）是一个非常重要的概念，特别是在处理张量（Tensor）数据时。

**基本维度表示 在PyTorch中，张量通常遵循以下维度格式：**

In [1]:
# 在导入语句之后添加设备定义
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

In [4]:
# 常见的图像数据维度格式 [N, C, H, W]
# N (Batch Size): 批次大小 表示一次处理xx张图像
# C (Channels): 通道数
# H (Height): 图像高度
# W (Width): 图像宽度

# 示例：
batch_size = 64
image_tensor = torch.randn(64, 1, 28, 28)  # FashionMNIST数据形状
print(f"张量形状: {image_tensor.shape}")  # 输出: torch.Size([64, 1, 28, 28])


张量形状: torch.Size([64, 1, 28, 28])


#### 通道（Channels）的数量取决于图像的类型和应用场景。

1. 灰度图像 (Grayscale)
通道数: 1

例子: FashionMNIST, MNIST
数据形状: [N, 1, H, W]

In [2]:
# 灰度图像示例
grayscale_image = torch.randn(64, 1, 28, 28)  # FashionMNIST
print(f"灰度图像形状: {grayscale_image.shape}")

灰度图像形状: torch.Size([64, 1, 28, 28])


2. 彩色图像 (RGB)
通道数: 3
通道含义: Red (红), Green (绿), Blue (蓝)

例子: CIFAR-10, ImageNet
数据形状: [N, 3, H, W]

In [3]:
# RGB彩色图像示例
rgb_image = torch.randn(32, 3, 32, 32)  # CIFAR-10
print(f"RGB图像形状: {rgb_image.shape}")

RGB图像形状: torch.Size([32, 3, 32, 32])


3. RGBA图像
通道数: 4
通道含义: Red, Green, Blue, Alpha (透明度)
数据形状: [N, 4, H, W]

In [None]:
# 深度学习中的常见通道数
# 不同数据集的典型通道数
datasets_channels = {
    "MNIST": 1,           # 手写数字，灰度
    "FashionMNIST": 1,    # 服装图像，灰度
    "CIFAR-10": 3,        # 彩色小图像，RGB
    "CIFAR-100": 3,       # 彩色小图像，RGB
    "ImageNet": 3,        # 彩色大图像，RGB
    "PASCAL VOC": 3,      # 彩色图像，RGB
}

#### 维度的变换

In [None]:
# 展平操作
x = torch.randn(64, 1, 28, 28)
flattened = x.view(64, -1)  # 展平为 [64, 784]
# 或使用s flatten
flattened = torch.flatten(x, start_dim=1)  # 结果相同

# 维度转置
x = torch.randn(64, 3, 28, 28)
transposed = x.permute(0, 2, 3, 1)  # 转换为 [64, 28, 28, 3]


#### 2. 维度增加和减少

In [None]:
# 增加维度
x = torch.randn(64, 28, 28)
expanded = x.unsqueeze(1)  # 变为 [64, 1, 28, 28]

# 减少维度
x = torch.randn(64, 1, 28, 28)
squeezed = x.squeeze(1)  # 变为 [64, 28, 28]


In [None]:
# 不同数据集的典型维度
datasets_info = {
    "MNIST": {"shape": "[N, 1, 28, 28]", "description": "手写数字，灰度"},
    "FashionMNIST": {"shape": "[N, 1, 28, 28]", "description": "服装图像，灰度"},
    "CIFAR-10": {"shape": "[N, 3, 32, 32]", "description": "彩色小图像，RGB"},
    "CIFAR-100": {"shape": "[N, 3, 32, 32]", "description": "彩色小图像，RGB"},
    "ImageNet": {"shape": "[N, 3, 224, 224]", "description": "彩色大图像，RGB"}
}


#### 维度在神经网络中的应用

1. 全连接层维度

In [5]:
# 输入维度转换示例
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()  # [N, 1, 28, 28] -> [N, 784]
        self.linear_stack = nn.Sequential(
            nn.Linear(28*28, 512),   # 输入784维，输出512维
            nn.ReLU(),
            nn.Linear(512, 512),     # 输入512维，输出512维
            nn.ReLU(),
            nn.Linear(512, 10)       # 输入512维，输出10维(分类数)
        )


2. 卷积层维度

In [6]:
# 卷积层的输入输出维度
conv_layer = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
# 输入: [N, 1, 28, 28]
# 输出: [N, 32, 26, 26] (由于3x3卷积核，无padding)


In [7]:
# 在训练过程中检查维度
def check_dimensions(dataloader):
    for batch_x, batch_y in dataloader:
        print(f"Input shape: {batch_x.shape}")
        print(f"Label shape: {batch_y.shape}")
        break

# 使用示例
train_dataloader = DataLoader(training_data, batch_size=64)
check_dimensions(train_dataloader)
# 输出示例:
# Input shape: torch.Size([64, 1, 28, 28])
# Label shape: torch.Size([64])


NameError: name 'training_data' is not defined