# torch.nomal()方法的介绍

In [2]:
import torch
help(torch.normal)

Help on built-in function normal in module torch:

normal(...)
    normal(mean, std, *, generator=None, out=None) -> Tensor
    
    Returns a tensor of random numbers drawn from separate normal distributions
    whose mean and standard deviation are given.
    
    The :attr:`mean` is a tensor with the mean of
    each output element's normal distribution
    
    The :attr:`std` is a tensor with the standard deviation of
    each output element's normal distribution
    
    The shapes of :attr:`mean` and :attr:`std` don't need to match, but the
    total number of elements in each tensor need to be the same.
    
    .. note:: When the shapes do not match, the shape of :attr:`mean`
              is used as the shape for the returned output tensor
    
    .. note:: When :attr:`std` is a CUDA tensor, this function synchronizes
              its device with the CPU.
    
    Args:
        mean (Tensor): the tensor of per-element means
        std (Tensor): the tensor of per-element st

从官方文档来看，torch.normal()一共有三个传参：
1. means：必传，张量类型，代表的是对每个元素求均值的结果
2. std：必传，张量类型，代表每个元素求标准差的结果
3. generator：可选，作为一个随机数生成器，默认使用全局生成器
4. out：可选，输出张量，如果传入，则将传入的内容写入输出的张量中

**mean和std张量的形状可以不相同，但是元素的个数必须要相同，当形状不同时，以mean的形状作为输出张量的形状**

In [8]:
x = torch.normal(torch.ones(3), torch.arange(0, 1, 1))
print(x)

tensor([1., 1., 1.])


# torch.matmul()方法的介绍

In [9]:
help(torch.matmul)

Help on built-in function matmul in module torch:

matmul(...)
    matmul(input, other, *, out=None) -> Tensor
    
    Matrix product of two tensors.
    
    The behavior depends on the dimensionality of the tensors as follows:
    
    - If both tensors are 1-dimensional, the dot product (scalar) is returned.
    - If both arguments are 2-dimensional, the matrix-matrix product is returned.
    - If the first argument is 1-dimensional and the second argument is 2-dimensional,
      a 1 is prepended to its dimension for the purpose of the matrix multiply.
      After the matrix multiply, the prepended dimension is removed.
    - If the first argument is 2-dimensional and the second argument is 1-dimensional,
      the matrix-vector product is returned.
    - If both arguments are at least 1-dimensional and at least one argument is
      N-dimensional (where N > 2), then a batched matrix multiply is returned.  If the first
      argument is 1-dimensional, a 1 is prepended to its dimens

torch.matmul()包含三个参数：
1. input：tensor
2. other：tensor
3. out：tensor，可选

该方法的作用是将传入的两个张量做乘法

In [10]:
from d2l import torch as d2l
help(d2l.plt.scatter)

Help on function scatter in module matplotlib.pyplot:

scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, *, edgecolors=None, plotnonfinite=False, data=None, **kwargs)
    A scatter plot of *y* vs. *x* with varying marker size and/or color.
    
    Parameters
    ----------
    x, y : float or array-like, shape (n, )
        The data positions.
    
    s : float or array-like, shape (n, ), optional
        The marker size in points**2 (typographic points are 1/72 in.).
        Default is ``rcParams['lines.markersize'] ** 2``.
    
    c : array-like or list of colors or color, optional
        The marker colors. Possible values:
    
        - A scalar or sequence of n numbers to be mapped to colors using
          *cmap* and *norm*.
        - A 2D array in which the rows are RGB or RGBA.
        - A sequence of colors of length n.
        - A single color format string.
    
        Note that *c* should not be a single

# random.shuffle函数

In [13]:
import random
help(random.shuffle)

Help on method shuffle in module random:

shuffle(x, random=None) method of random.Random instance
    Shuffle list x in place, and return None.
    
    Optional argument random is a 0-argument function returning a
    random float in [0.0, 1.0); if it is the default None, the
    standard random.random will be used.


将输入进行乱序操作，不返回新的数据，在原输入上操作

# range()

In [14]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __bool__(self, /)
 |      self != 0
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |

rang()是一个类，作为类使用的时候，有一种方式是，有三个传参：
1. start：开始位置
2. step：步长
3. stop：结束位置

返回一个按照传参规则得到的有序对象，int型

In [24]:
a = range(0, 1000, 10)
print(list(a))

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990]


# min()

In [25]:
help(min)

Help on built-in function min in module builtins:

min(...)
    min(iterable, *[, default=obj, key=func]) -> value
    min(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its smallest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the smallest argument.


In [28]:
b = min(1100, 1000)
print(b)

1000


# 对张量按照索引查询的小发现

下边是定义了一个2x4的张量x

In [30]:
x = torch.tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(x[0, 1]) # 如果是直接按照这样的索引去查找，输出的是第1行的第2个元素：2
l = [0, 1]
print(x[l])  # 但是如果先定义一个列表l=[0, 1]，然后让x按照l中的数值取查找，那么就是按照0轴输出第一行和第二行的元素了。

tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])


发现这一点，是因为在线性回归这一节，使用小批量的方式读取数据集的时候，循环最后，返回features[]中的索引是一个列表或者是一个张量，输出的是每一行的元素

**整个线性回归中，最终要求解的是模型的w和b两个参数**

# torch.no_grad
torch.no_grad出现在训练模型和验证模型的代码中。
官方文档中，no_grad类表示禁止求导的计算，一旦设定的no_grad，那么在这条代码影响的范围下，即使requires_grad=True也无效。
但是我没看到为什么要这么做，只是写了这样可以节省计算的花费，没写为什么会节省

问chatgpt的回答：
torch.no_grad()是一个上下文管理器（Context Manager），用于在代码块中禁用梯度跟踪和自动微分，从而提高代码的执行效率并减少内存消耗。当我们在不需要计算梯度的情况下执行代码时，使用torch.no_grad()可以有效地减少不必要的计算和内存使用。
具体来说，torch.no_grad()的作用包括：
1. 禁用梯度跟踪：在torch.no_grad()的上下文中，所有的操作都不会被记录用于自动微分，这意味着计算图不会被构建，也不会计算张量的梯度。这对于不需要计算梯度的推理过程非常有用。
2. 减少内存消耗：梯度计算需要额外的内存来存储中间结果，而在torch.no_grad()的上下文中，这些中间结果不会被保存，从而减少了内存的使用。

**一般在推理阶段、参数更新阶段和验证集评估阶段，都不需要计算梯度，可以使用no_grad类去节省空间，提高效率**

In [31]:
help(torch.no_grad)

Help on class no_grad in module torch.autograd.grad_mode:

class no_grad(torch.utils._contextlib._NoParamDecoratorContextManager)
 |  no_grad(orig_func=None)
 |  
 |  Context-manager that disables gradient calculation.
 |  
 |  Disabling gradient calculation is useful for inference, when you are sure
 |  that you will not call :meth:`Tensor.backward()`. It will reduce memory
 |  consumption for computations that would otherwise have `requires_grad=True`.
 |  
 |  In this mode, the result of every computation will have
 |  `requires_grad=False`, even when the inputs have `requires_grad=True`.
 |  There is an exception! All factory functions, or functions that create
 |  a new Tensor and take a requires_grad kwarg, will NOT be affected by
 |  this mode.
 |  
 |  This context manager is thread local; it will not affect computation
 |  in other threads.
 |  
 |  Also functions as a decorator.
 |  
 |  .. note::
 |      No-grad is one of several mechanisms that can enable or
 |      disable

# 使用pytorch框架实现线性神经网络的思路

1. 创建数据集：定义方法，使用normal方法创建数据集
2. 读取数据集：定义方法，循环读取小批量样本，每次循环可以遍历所有样本
3. 定义线性模型
4. 定义损失函数
5. 定义优化方法：梯度下降
6. 初始化参数，包括权重w和偏置b
7. 设置其他参数：学习率、batch_size、epoch数量
8. 指定网络，指定损失函数
9. 开始训练，用循环的方式（条件是X和y取所有批量），使用梯度下降方法更新参数w和b
10. 使用全部样本集合验证，输出loss
11. 输出真实w和b与预测w和b之间的差距

# data.TensorDataset
**用于封装数据的类**
主要作用是将多个张量合成一个数据集，并提供索引访问功能
可以传多个张量，但是第一个维度必须相同，作为样本数量，按照索引获取每个样本

In [32]:
from torch.utils import data

help(data.TensorDataset)

Help on class TensorDataset in module torch.utils.data.dataset:

class TensorDataset(Dataset)
 |  TensorDataset(*args, **kwds)
 |  
 |  Dataset wrapping tensors.
 |  
 |  Each sample will be retrieved by indexing tensors along the first dimension.
 |  
 |  Args:
 |      *tensors (Tensor): tensors that have the same size of the first dimension.
 |  
 |  Method resolution order:
 |      TensorDataset
 |      Dataset
 |      typing.Generic
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __getitem__(self, index)
 |  
 |  __init__(self, *tensors: torch.Tensor) -> None
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __len__(self)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __annotations__ = {'tensors': typing.Tuple[torch.Tensor, ...]}
 |  
 |  __orig_bases__ = (torch.utils.data.dataset.Dataset[typing.Tuple[torch....
 |  
 |  __parameters__ = ()
 |  
 |  ---

In [49]:
from torch.utils import data
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]
                  ])
print(a)
dataset = data.TensorDataset(a)
print(dataset[0])

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
(tensor([1, 2, 3]),)


# data.DataLoader
用于加载数据和批处理的类，主要作用是将数据集分割成小批次，并在每个批次上提供可迭代的数据加载器。
分割小批次的时候，传参有三个：
- dataset：需要加载的数据
- batch_size：每一个批量要加载的样本数量
- shuffle：加载数据是，是否要打乱顺序

In [33]:
help(data.DataLoader)

Help on class DataLoader in module torch.utils.data.dataloader:

class DataLoader(typing.Generic)
 |  DataLoader(*args, **kwds)
 |  
 |  Data loader. Combines a dataset and a sampler, and provides an iterable over
 |  the given dataset.
 |  
 |  The :class:`~torch.utils.data.DataLoader` supports both map-style and
 |  iterable-style datasets with single- or multi-process loading, customizing
 |  loading order and optional automatic batching (collation) and memory pinning.
 |  
 |  See :py:mod:`torch.utils.data` documentation page for more details.
 |  
 |  Args:
 |      dataset (Dataset): dataset from which to load the data.
 |      batch_size (int, optional): how many samples per batch to load
 |          (default: ``1``).
 |      shuffle (bool, optional): set to ``True`` to have the data reshuffled
 |          at every epoch (default: ``False``).
 |      sampler (Sampler or Iterable, optional): defines the strategy to draw
 |          samples from the dataset. Can be any ``Iterable``

In [45]:
import torch
from torch.utils.data import DataLoader, Dataset

# 自定义数据集类
class MyDataset(Dataset):
    def __init__(self, data):
        self.data = data
    
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return len(self.data)

# 创建数据集实例
dataset = MyDataset([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# 创建数据加载器
dataloader = DataLoader(dataset, batch_size=3, shuffle=True)

# 迭代加载数据批次
for batch in dataloader:
    print(batch)

tensor([7, 8, 5])
tensor([ 3,  2, 10])
tensor([4, 1, 6])
tensor([9])


In [50]:
help(iter)

Help on built-in function iter in module builtins:

iter(...)
    iter(iterable) -> iterator
    iter(callable, sentinel) -> iterator
    
    Get an iterator from an object.  In the first form, the argument must
    supply its own iterator, or be a sequence.
    In the second form, the callable is called until it returns the sentinel.


# nn库中定义的MSELoss类
提供了计算均方误差的方法，可以在构建损失函数时直接调用

In [52]:
loss = nn.MSELoss()
l = loss(net(X), y)

NameError: name 'net' is not defined

In [51]:
from torch import nn
help(nn.MSELoss)

Help on class MSELoss in module torch.nn.modules.loss:

class MSELoss(_Loss)
 |  MSELoss(size_average=None, reduce=None, reduction: str = 'mean') -> None
 |  
 |  Creates a criterion that measures the mean squared error (squared L2 norm) between
 |  each element in the input :math:`x` and target :math:`y`.
 |  
 |  The unreduced (i.e. with :attr:`reduction` set to ``'none'``) loss can be described as:
 |  
 |  .. math::
 |      \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
 |      l_n = \left( x_n - y_n \right)^2,
 |  
 |  where :math:`N` is the batch size. If :attr:`reduction` is not ``'none'``
 |  (default ``'mean'``), then:
 |  
 |  .. math::
 |      \ell(x, y) =
 |      \begin{cases}
 |          \operatorname{mean}(L), &  \text{if reduction} = \text{`mean';}\\
 |          \operatorname{sum}(L),  &  \text{if reduction} = \text{`sum'.}
 |      \end{cases}
 |  
 |  :math:`x` and :math:`y` are tensors of arbitrary shapes with a total
 |  of :math:`n` elements each.
 |  
 |  The mean o

# Sequential类的用法
是一个模型容器类，可以使用该类按照规则组合多个层。两种实例化方法：
1. 实例化一个Sequential类，在实例化时就传入层对象列表

In [None]:
net = nn.Sequential(
    nn.Linear(2, 1),
    nn.ReLU()
)

2. 单独实例化一个空的Sequential模型容器，通过add方法去添加层

In [60]:
from torch import nn
net1 = nn.Sequential()
# net1.add(nn.Linear(2, 1))
# net1.add(nn.ReLU())

AttributeError: 'Sequential' object has no attribute 'add'

In [55]:
help(nn.Sequential)

Help on class Sequential in module torch.nn.modules.container:

class Sequential(torch.nn.modules.module.Module)
 |  Sequential(*args)
 |  
 |  A sequential container.
 |  Modules will be added to it in the order they are passed in the
 |  constructor. Alternatively, an ``OrderedDict`` of modules can be
 |  passed in. The ``forward()`` method of ``Sequential`` accepts any
 |  input and forwards it to the first module it contains. It then
 |  "chains" outputs to inputs sequentially for each subsequent module,
 |  finally returning the output of the last module.
 |  
 |  The value a ``Sequential`` provides over manually calling a sequence
 |  of modules is that it allows treating the whole container as a
 |  single module, such that performing a transformation on the
 |  ``Sequential`` applies to each of the modules it stores (which are
 |  each a registered submodule of the ``Sequential``).
 |  
 |  What's the difference between a ``Sequential`` and a
 |  :class:`torch.nn.ModuleList`? A

# nn库中的Linear模型
提供了一个定义好的线性转换模型：y=xA+b；
需要传入两个参数：
1. in_features：输入样本特征的大小
2. out_features：输出的样本的大小

In [61]:
help(nn.Linear)

Help on class Linear in module torch.nn.modules.linear:

class Linear(torch.nn.modules.module.Module)
 |  Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None) -> None
 |  
 |  Applies a linear transformation to the incoming data: :math:`y = xA^T + b`
 |  
 |  This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
 |  
 |  On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.
 |  
 |  Args:
 |      in_features: size of each input sample
 |      out_features: size of each output sample
 |      bias: If set to ``False``, the layer will not learn an additive bias.
 |          Default: ``True``
 |  
 |  Shape:
 |      - Input: :math:`(*, H_{in})` where :math:`*` means any number of
 |        dimensions including none and :math:`H_{in} = \text{in\_features}`.
 |      - Output: :math:`(*, H_{out})` where all but the last dimension
 |        are the same shape as the input and :math

In [67]:
help(torch.optim.SGD)

Help on class SGD in module torch.optim.sgd:

class SGD(torch.optim.optimizer.Optimizer)
 |  SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False, *, maximize: bool = False, foreach: Union[bool, NoneType] = None, differentiable: bool = False)
 |  
 |  Implements stochastic gradient descent (optionally with momentum).
 |  
 |  .. math::
 |     \begin{aligned}
 |          &\rule{110mm}{0.4pt}                                                                 \\
 |          &\textbf{input}      : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta)
 |              \text{ (objective)}, \: \lambda \text{ (weight decay)},                          \\
 |          &\hspace{13mm} \:\mu \text{ (momentum)}, \:\tau \text{ (dampening)},
 |          \:\textit{ nesterov,}\:\textit{ maximize}                                     \\[-1.ex]
 |          &\rule{110mm}{0.4pt}                                                                 \\
 |          &\textb

In [68]:
help(torch.optim.SGD)

Help on class SGD in module torch.optim.sgd:

class SGD(torch.optim.optimizer.Optimizer)
 |  SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False, *, maximize: bool = False, foreach: Union[bool, NoneType] = None, differentiable: bool = False)
 |  
 |  Implements stochastic gradient descent (optionally with momentum).
 |  
 |  .. math::
 |     \begin{aligned}
 |          &\rule{110mm}{0.4pt}                                                                 \\
 |          &\textbf{input}      : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta)
 |              \text{ (objective)}, \: \lambda \text{ (weight decay)},                          \\
 |          &\hspace{13mm} \:\mu \text{ (momentum)}, \:\tau \text{ (dampening)},
 |          \:\textit{ nesterov,}\:\textit{ maximize}                                     \\[-1.ex]
 |          &\rule{110mm}{0.4pt}                                                                 \\
 |          &\textb