# Transforms的使用

 `transforms.py` 工具箱内的函数主要对图像进行变换，包括数据结构转换、尺度变换、高斯模糊、归一化、流水线、裁剪等。
+ .Compose：流水线操作，将多个图像操作步骤整合到一起。
+ .ToTensor：将PIL或ndarray类型的图像转换成张量（tensor）类型。
+ .Resize：将输入的图像转换成不同的大小。
+ .CenterCrop：对输入的图像进行中心裁剪。
+ .ToPILImage：将tensor或ndarray类型的图像转换成PIL类型的图像。
+ .GaussianBlur：对输入的图像进行高斯模糊处理。   
---
### 1. 导入方法：

In [3]:
from torchvision import transforms

目标： python的用法  ==>  tensor数据类型   
通过  transforms.ToTensor  解决：   
1. transformsd的使用
2. 为什么需要使用Tensor数据类型

In [4]:
from PIL import Image

In [6]:
img_path = "../dataset/train/ants/0013035.jpg"
img = Image.open(img_path)
print(img)

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=768x512 at 0x7FEF65E759B0>


### 2. transforms.ToTensor方法基本介绍

---
类注释：
```c
class ToTensor:
    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor. This transform does not support torchscript.

    Converts a PIL Image or numpy.ndarray (H x W x C) in the range
    [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
    if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    or if the numpy.ndarray has dtype = np.uint8

    In the other cases, tensors are returned without scaling.

    .. note::
        Because the input image is scaled to [0.0, 1.0], this transformation should not be used when
        transforming target image masks. See the `references`_ for implementing the transforms for image masks.

    .. _references: https://github.com/pytorch/vision/tree/master/references/segmentation
    """

    def __call__(self, pic):
        """
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.

        Returns:
            Tensor: Converted image.
        """
        return F.to_tensor(pic)

    def __repr__(self):
        return self.__class__.__name__ + '()'
```   

（1)   将 PIL Image 或 numpy.ndarray 转为 tensor    
（2）如果 PIL Image 属于 (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) 中的一种图像类型，或者 numpy.ndarray 格式数据类型是 np.uint8 ，则将 [0, 255] 的数据转为 [0.0, 1.0] ，也就是说将所有数据除以 255 进行归一化。    
（3）将 HWC 的图像格式转为 CHW 的 tensor 格式。CNN训练时需要的数据格式是[N,C,N,W]，也就是说经过 ToTensor() 处理的图像可以直接输入到CNN网络中，不需要再进行reshape。    
   (4)   类实现的内部调用__call__方法，which is magic method, 可以像普通函数那样调用这个class下的任何实例，方法如下：   
           tensor_trans = transforms.ToTensor()   
           tensor_img = tensor_trans(img)
 
 #### 2.1 传入PILImage参数
 ---

In [8]:
tensor_trans = transforms.ToTensor()
tensor_img = tensor_trans(img)
print(tensor_img)

tensor([[[0.3137, 0.3137, 0.3137,  ..., 0.3176, 0.3098, 0.2980],
         [0.3176, 0.3176, 0.3176,  ..., 0.3176, 0.3098, 0.2980],
         [0.3216, 0.3216, 0.3216,  ..., 0.3137, 0.3098, 0.3020],
         ...,
         [0.3412, 0.3412, 0.3373,  ..., 0.1725, 0.3725, 0.3529],
         [0.3412, 0.3412, 0.3373,  ..., 0.3294, 0.3529, 0.3294],
         [0.3412, 0.3412, 0.3373,  ..., 0.3098, 0.3059, 0.3294]],

        [[0.5922, 0.5922, 0.5922,  ..., 0.5961, 0.5882, 0.5765],
         [0.5961, 0.5961, 0.5961,  ..., 0.5961, 0.5882, 0.5765],
         [0.6000, 0.6000, 0.6000,  ..., 0.5922, 0.5882, 0.5804],
         ...,
         [0.6275, 0.6275, 0.6235,  ..., 0.3608, 0.6196, 0.6157],
         [0.6275, 0.6275, 0.6235,  ..., 0.5765, 0.6275, 0.5961],
         [0.6275, 0.6275, 0.6235,  ..., 0.6275, 0.6235, 0.6314]],

        [[0.9137, 0.9137, 0.9137,  ..., 0.9176, 0.9098, 0.8980],
         [0.9176, 0.9176, 0.9176,  ..., 0.9176, 0.9098, 0.8980],
         [0.9216, 0.9216, 0.9216,  ..., 0.9137, 0.9098, 0.

#### 2.2 传入numpy数据格式
---
需要先安装好opencv-python
```bash
pip install opencv-python==4.1.2.30
```

In [10]:
import cv2

In [11]:
cv_img = cv2.imread(img_path)
print(type(cv_img))

<class 'numpy.ndarray'>


In [12]:
from torch.utils.tensorboard import SummaryWriter

In [13]:
writer = SummaryWriter("./logs")
writer.add_image("Tensor Image", tensor_img)
writer.close()

#### 2.3 为什么要用ToTensor,为什么要用Tensor作为数据类型
**Tensor这个类别下内置了很多神经网络训练时所需要用到的方法如：`_backward_hooks`, `device`, `_grad` 等等。**

### 3. .ToTensor方法的使用

#### 3.1 内置__call__方法测试：
---

In [14]:
class Person:
    
    def __call__(self, name):
        print("__call__" + "Hello" + name)
        
    def hello(self, name):
        print("hello" + name)

In [16]:
person = Person()          # ==============>  初始化这个类的实例
person("zhangsan")     # ==============>  直接用调用函数的方法就能直接调用内置的__call__方法 
person.hello("lisi")       # ==============>   一般的方法只能通过.调用

__call__Hellozhangsan
hellolisi


#### 3.2 Full Code
---

In [19]:
from PIL import Image
from torch.utils.tensorboard import SummaryWriter
from torchvision import transforms


writer = SummaryWriter("./logs")                    # 打开TensorBoard
img = Image.open(img_path)                            # convert common image to PILImage

trans_totensor = transforms.ToTensor()       #  创建ToTensor实例
img_tensor = trans_totensor(img)                 #   convert PILImage to Tensor Image

writer.add_image("Full code for ToTensor Test", img_tensor, 5)    # 写入TensorBoard
writer.close()

### 4. .Normalize方法

类注释：   
```python
class Normalize(torch.nn.Module):
    """Normalize a tensor image with mean and standard deviation.
    This transform does not support PIL Image.
    Given mean: ``(mean[1],...,mean[n])`` and std: ``(std[1],..,std[n])`` for ``n``
    channels, this transform will normalize each channel of the input
    ``torch.*Tensor`` i.e.,
    ``output[channel] = (input[channel] - mean[channel]) / std[channel]``

    .. note::
        This transform acts out of place, i.e., it does not mutate the input tensor.

    Args:
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
        inplace(bool,optional): Bool to make this operation in-place.

    """

    def __init__(self, mean, std, inplace=False):
        super().__init__()
        self.mean = mean
        self.std = std
        self.inplace = inplace

    def forward(self, tensor: Tensor) -> Tensor:
        """
        Args:
            tensor (Tensor): Tensor image to be normalized.

        Returns:
            Tensor: Normalized Tensor image.
        """
        return F.normalize(tensor, self.mean, self.std, self.inplace)

    def __repr__(self):
        return self.__class__.__name__ + '(mean={0}, std={1})'.format(self.mean, self.std)
```   
非标准正态分布函数的标准化： $z = \frac{(x - \mu)}{\sigma}$, 对比代码注释给出的公式：  $\text{output[channel]} = \frac{(\text{input[channel]} - \text{mean[channel]})}{\text{std[channel]}}$，实际上：
**简单来说就是将数据按通道进行计算，将每一个通道的数据先计算出其方差与均值，然后再将其每一个通道内的每一个数据减去均值，再除以方差，得到归一化后的结果。
在深度学习图像处理中，标准化处理之后，可以使数据更好的响应激活函数，提高数据的表现力，减少梯度爆炸和梯度消失的出现。**

Pytorch图像预处理时，通常使用transforms.Normalize(mean, std)对图像按通道进行标准化，即减去均值，再除以方差。这样做可以加快模型的收敛速度。**其中参数mean和std分别表示图像每个通道的均值和方差序列。**
Imagenet数据集的均值和方差为：mean=(0.485, 0.456, 0.406)，std=(0.229, 0.224, 0.225)，因为这是在百万张图像上计算而得的，所以我们通常见到在训练过程中使用它们做标准化。而对于特定的数据集，选择这个值的结果可能并不理想。接下来给出计算特定数据集的均值和方差的方法。

#### 测试代码：

In [21]:
print(img_tensor[0][0][0])
trans_norm = transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])      # 第一个数组是各个通道的均值， 第二个数组是各个通道的标准差
img_norm = trans_norm(img_tensor)
print(img_norm[0][0][0])
writer.add_image("Normalization Test", img_norm)

writer.close()

tensor(0.3137)
tensor(-0.3725)


### 5. .Resize方法
---
```python
class Resize(torch.nn.Module):
"""Resize the input image to the given size.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions

    Args:
        size (sequence or int): Desired output size. If size is a sequence like
            (h, w), output size will be matched to this. If size is an int,
            smaller edge of the image will be matched to this number.
            i.e, if height > width, then image will be rescaled to
            (size * height / width, size).
            In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.
        interpolation (InterpolationMode): Desired interpolation enum defined by
            :class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
            If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and
            ``InterpolationMode.BICUBIC`` are supported.
            For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.

    """

    def __init__(self, size, interpolation=InterpolationMode.BILINEAR):
        super().__init__()
        if not isinstance(size, (int, Sequence)):
            raise TypeError("Size should be int or sequence. Got {}".format(type(size)))
        if isinstance(size, Sequence) and len(size) not in (1, 2):
            raise ValueError("If size is a sequence, it should have 1 or 2 values")
        self.size = size

        # Backward compatibility with integer value
        if isinstance(interpolation, int):
            warnings.warn(
                "Argument interpolation should be of type InterpolationMode instead of int. "
                "Please, use InterpolationMode enum."
            )
            interpolation = _interpolation_modes_from_int(interpolation)

        self.interpolation = interpolation

    def forward(self, img):
        """
        Args:
            img (PIL Image or Tensor): Image to be scaled.

        Returns:
            PIL Image or Tensor: Rescaled image.
        """
        return F.resize(img, self.size, self.interpolation)

    def __repr__(self):
        interpolate_str = self.interpolation.value
        return self.__class__.__name__ + '(size={0}, interpolation={1})'.format(self.size, interpolate_str)
 ```


In [27]:
# Resize - 1 - 双参数 ===================================>   直接将传入的图片缩放成填入的数组的格式 
print(img.size) 
trans_resize = transforms.Resize((512, 512))
# img: PIL  =>  resize =>  img_resize : PIL 
img_resize = trans_resize(img)
# img_resize : PIL  =>  totensor  =>   img_resize  tensor
img_resize = trans_totensor(img_resize)
print(img_resize)

writer.add_image("Image Resize Test", img_resize, 3)

(768, 512)
tensor([[[0.3137, 0.3137, 0.3176,  ..., 0.3137, 0.3137, 0.3020],
         [0.3176, 0.3176, 0.3176,  ..., 0.3098, 0.3137, 0.3020],
         [0.3216, 0.3216, 0.3176,  ..., 0.3059, 0.3137, 0.3059],
         ...,
         [0.3412, 0.3373, 0.3373,  ..., 0.0196, 0.2196, 0.3608],
         [0.3412, 0.3373, 0.3373,  ..., 0.3490, 0.3373, 0.3373],
         [0.3412, 0.3373, 0.3373,  ..., 0.3529, 0.3137, 0.3216]],

        [[0.5922, 0.5922, 0.5961,  ..., 0.5922, 0.5922, 0.5804],
         [0.5961, 0.5961, 0.5961,  ..., 0.5882, 0.5922, 0.5804],
         [0.6000, 0.6000, 0.5961,  ..., 0.5843, 0.5922, 0.5843],
         ...,
         [0.6275, 0.6235, 0.6235,  ..., 0.1020, 0.4157, 0.6157],
         [0.6275, 0.6235, 0.6235,  ..., 0.5373, 0.5882, 0.6078],
         [0.6275, 0.6235, 0.6235,  ..., 0.6392, 0.6275, 0.6275]],

        [[0.9137, 0.9137, 0.9176,  ..., 0.9137, 0.9137, 0.9020],
         [0.9176, 0.9176, 0.9176,  ..., 0.9098, 0.9137, 0.9020],
         [0.9216, 0.9216, 0.9176,  ..., 0.9059,

### 6. .Compose方法
---
```python
class Compose:
    """Composes several transforms together. This transform does not support torchscript.
    Please, see the note below.

    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.

    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])

    .. note::
        In order to script the transformations, please use ``torch.nn.Sequential`` as below.

        >>> transforms = torch.nn.Sequential(
        >>>     transforms.CenterCrop(10),
        >>>     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
        >>> )
        >>> scripted_transforms = torch.jit.script(transforms)

        Make sure to use only scriptable transformations, i.e. that work with ``torch.Tensor``, does not require
        `lambda` functions or ``PIL.Image``.

    """

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img):
        for t in self.transforms:
            img = t(img)
        return img

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        for t in self.transforms:
            format_string += '\n'
            format_string += '    {0}'.format(t)
        format_string += '\n)'
        return format_string
```

In [30]:
# Compose相当于就是把transforms的很多操作打包在一起再对输入img对象进行处理
# 后面参数的输入是前一个参数的输出
trans_resize_2 = transforms.Resize(512)                                                                          #  Resize只填入一个参数的时候的缩放方式是不改变长宽比的等比缩放，并且将较短的边缩放至512
trans_compose = transforms.Compose([trans_resize_2,  trans_totensor])      #  img  ===>  .Resize(512)  ===>  .ToTensor  ===>  img_resize_2
img_resize_2 = trans_compose(img)
writer.add_image("Compose Test", img_resize_2, 4)

### 7. .RandomCrop方法
---
```python
class RandomCrop(torch.nn.Module):
    """Crop the given image at a random location.
    If the image is torch Tensor, it is expected
    to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions,
    but if non-constant padding is used, the input is expected to have at most 2 leading dimensions

    Args:
        size (sequence or int): Desired output size of the crop. If size is an
            int instead of sequence like (h, w), a square crop (size, size) is
            made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
        padding (int or sequence, optional): Optional padding on each border
            of the image. Default is None. If a single int is provided this
            is used to pad all borders. If sequence of length 2 is provided this is the padding
            on left/right and top/bottom respectively. If a sequence of length 4 is provided
            this is the padding for the left, top, right and bottom borders respectively.
            In torchscript mode padding as single int is not supported, use a sequence of length 1: ``[padding, ]``.
        pad_if_needed (boolean): It will pad the image if smaller than the
            desired size to avoid raising an exception. Since cropping is done
            after padding, the padding seems to be done at a random offset.
        fill (number or str or tuple): Pixel fill value for constant fill. Default is 0. If a tuple of
            length 3, it is used to fill R, G, B channels respectively.
            This value is only used when the padding_mode is constant.
            Only number is supported for torch Tensor.
            Only int or str or tuple value is supported for PIL Image.
        padding_mode (str): Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.

             - constant: pads with a constant value, this value is specified with fill

             - edge: pads with the last value on the edge of the image

             - reflect: pads with reflection of image (without repeating the last value on the edge)

                padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode
                will result in [3, 2, 1, 2, 3, 4, 3, 2]

             - symmetric: pads with reflection of image (repeating the last value on the edge)

                padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode
                will result in [2, 1, 1, 2, 3, 4, 4, 3]

    """

    @staticmethod
    def get_params(img: Tensor, output_size: Tuple[int, int]) -> Tuple[int, int, int, int]:
        """Get parameters for ``crop`` for a random crop.

        Args:
            img (PIL Image or Tensor): Image to be cropped.
            output_size (tuple): Expected output size of the crop.

        Returns:
            tuple: params (i, j, h, w) to be passed to ``crop`` for random crop.
        """
        w, h = F._get_image_size(img)
        th, tw = output_size

        if h + 1 < th or w + 1 < tw:
            raise ValueError(
                "Required crop size {} is larger then input image size {}".format((th, tw), (h, w))
            )

        if w == tw and h == th:
            return 0, 0, h, w

        i = torch.randint(0, h - th + 1, size=(1, )).item()
        j = torch.randint(0, w - tw + 1, size=(1, )).item()
        return i, j, th, tw

    def __init__(self, size, padding=None, pad_if_needed=False, fill=0, padding_mode="constant"):
        super().__init__()

        self.size = tuple(_setup_size(
            size, error_msg="Please provide only two dimensions (h, w) for size."
        ))

        self.padding = padding
        self.pad_if_needed = pad_if_needed
        self.fill = fill
        self.padding_mode = padding_mode

    def forward(self, img):
        """
        Args:
            img (PIL Image or Tensor): Image to be cropped.

        Returns:
            PIL Image or Tensor: Cropped image.
        """
        if self.padding is not None:
            img = F.pad(img, self.padding, self.fill, self.padding_mode)

        width, height = F._get_image_size(img)
        # pad the width if needed
        if self.pad_if_needed and width < self.size[1]:
            padding = [self.size[1] - width, 0]
            img = F.pad(img, padding, self.fill, self.padding_mode)
        # pad the height if needed
        if self.pad_if_needed and height < self.size[0]:
            padding = [0, self.size[0] - height]
            img = F.pad(img, padding, self.fill, self.padding_mode)

        i, j, h, w = self.get_params(img, self.size)

        return F.crop(img, i, j, h, w)

    def __repr__(self):
        return self.__class__.__name__ + "(size={0}, padding={1})".format(self.size, self.padding)
    
```


In [35]:
# RandomCrop
trans_randomcrop = transforms.RandomCrop((200, 400))
trans_compose2 = transforms.Compose([trans_randomcrop, trans_totensor])
for _ in range(10):
    img_crop = trans_compose2(img)
    writer.add_image("RandomCrop Test", img_crop, _)

*tips: TensorBoard,故名思义, 🉐输入tensor格式的数据才能进行现实*