## 1. Preprocessing Transforms

These transforms convert your raw images into tensors and prepare them for model input. They generally include:

- **Resize:**  
  Scales the image to a target size. This is useful for ensuring that all images in a batch have consistent dimensions.  
  ```python
  transforms.Resize((256, 256))
  ```  
  Alternatively, you might use:
  ```python
  transforms.Resize(256)  # Maintains aspect ratio if a single number is provided.
  ```

- **Center Crop / Random Crop:**  
  When working with images that have been resized, cropping helps to focus on the central region (for validation/testing) or to introduce variability (for training).  
  - **CenterCrop:** Used during evaluation to ensure a consistent crop.  
    ```python
    transforms.CenterCrop(224)
    ```
  - **RandomCrop:** Provides a random crop and is part of data augmentation during training.
    ```python
    transforms.RandomCrop(224)
    ```

- **ToTensor:**  
  Converts a PIL image or NumPy array into a PyTorch tensor and scales pixel values to \[0, 1\].  
  ```python
  transforms.ToTensor()
  ```

- **Normalization:**  
  Adjusts pixel values by subtracting the dataset’s mean and dividing by its standard deviation. This normalization is key for training stability, especially when working with pretrained models. For example, the ImageNet dataset uses:
  ```python
  transforms.Normalize(mean=[0.485, 0.456, 0.406],
                       std=[0.229, 0.224, 0.225])
  ```
  Note that the mean and std should match the dataset your model was trained on or adjusted for your custom dataset.

---

## 2. Data Augmentation Transforms

Data augmentation increases the diversity of your training data and can help prevent overfitting. Common augmentation strategies include:

- **Random Horizontal Flip:**  
  Flips the image horizontally with a given probability (by default 0.5).  
  ```python
  transforms.RandomHorizontalFlip(p=0.5)
  ```

- **Random Vertical Flip:**  
  Flips the image vertically. Use this judiciously – it can be useful for datasets where vertical orientation is not semantically important (e.g., some aerial or medical images).  
  ```python
  transforms.RandomVerticalFlip(p=0.5)
  ```

- **Random Rotation:**  
  Rotates the image within a specified degree range.  
  ```python
  transforms.RandomRotation(degrees=15)
  ```
  A small degree range is often sufficient unless you know your objects can appear at wide angles.

- **Random Resized Crop:**  
  This transform combines random cropping with resizing. It randomly crops a portion of the image and then scales it to a target size. This is very popular for training on natural images.  
  ```python
  transforms.RandomResizedCrop(224, scale=(0.08, 1.0), ratio=(0.75, 1.33))
  ```
  The scale parameter specifies the range of size of the cropped image relative to the original, and ratio controls the aspect ratio range.

- **Color Jitter:**  
  Adjusts brightness, contrast, saturation, and hue. This augmentation is helpful when lighting conditions vary.  
  ```python
  transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1)
  ```

- **Random Affine / Perspective:**  
  - **RandomAffine:** Applies random translations, rotations, scaling, and shearing.  
    ```python
    transforms.RandomAffine(degrees=10, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=5)
    ```
  - **RandomPerspective:** Simulates perspective distortions.  
    ```python
    transforms.RandomPerspective(distortion_scale=0.5, p=0.5)
    ```

- **Additional Transforms:**  
  - **GaussianBlur:** For datasets where blurring might simulate realistic scenarios (especially in noisy environments).  
    ```python
    transforms.GaussianBlur(kernel_size=3)
    ```
  - **Random Erasing:** For occlusion augmentation, which randomly erases a portion of the image. This can help the network become robust to missing parts of an image.  
    ```python
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3))
    ```

---

## 3. Building the Transform Pipeline

It is common practice to define separate pipelines for training and evaluation (validation/testing). Below are examples:

### Training Transform Pipeline
When building the training pipeline, the focus is on introducing variability:

```python
from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),                # Randomly crop and resize
    transforms.RandomHorizontalFlip(),                # Random flip horizontally
    transforms.RandomRotation(degrees=15),            # Random rotation
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1),  # Vary colors
    transforms.ToTensor(),                            # Convert to tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
    transforms.RandomErasing(p=0.5)                   # Randomly erase parts of the image
])
```

### Evaluation Transform Pipeline
For evaluation, you want consistency and reproducibility:

```python
from torchvision import transforms

val_transform = transforms.Compose([
    transforms.Resize(256),                           # Resize images to a consistent scale
    transforms.CenterCrop(224),                       # Crop the center of the image
    transforms.ToTensor(),                            # Convert to tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])
```

---

## 4. Choosing and Tuning Transforms

- **Dataset Characteristics:**  
  Not every augmentation is beneficial for every dataset. For example, if your dataset has objects with strict orientation (like digits or text), heavy rotations might hurt performance. Analyze your dataset to decide which augmentations are appropriate.

- **Model Architecture:**  
  Some models are sensitive to the scale or other characteristics of the input. Make sure your normalization values and the spatial dimensions match the pretrained model’s expected input.

- **Experimentation:**  
  It is common to experiment with a subset of these transforms and tune their parameters (e.g., rotation degrees, crop sizes, probability parameters) to see what optimally improves your validation performance.

- **Data Augmentation Libraries:**  
  While torchvision.transforms covers many use cases, for complex scenarios you might explore libraries like [albumentations](https://albumentations.ai/) which offer a more extensive suite of augmentation methods and more flexibility.

---

## Conclusion

For deep learning image classification with PyTorch, a combination of the following is usually recommended:

- **Normalization:** Essential for proper convergence.  
- **Size Standardization (Resize/CenterCrop/RandomResizedCrop):** To ensure input size consistency.  
- **Random Augmentation Techniques:** Such as horizontal (or vertical) flip, random crop, rotation, color jitter, and advanced techniques like Random Erasing, to improve model robustness.

A well-crafted transform pipeline can significantly improve your model’s ability to generalize while keeping training efficient. Experiment with these suggestions, and tailor them to your specific problem domain to achieve the best results.

# Standardization Vs Normalization

## Normalization (min-max Normalization or Feature Scaling)
Normalization rescales the values into a range of [0,1]. This might be useful in some cases where all parameters need to have the same positive scale.

$X_{norm}=\frac{X-X_{min}}{X_{max}-X_{min}}$


Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.

## Standardization (Z-Score Normalization)
Scaling to normal distribution $\mu=0$ and $\sigma^2=1$

$X_{standard}=\frac{X-\mu}{\sigma}$

Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does **not** have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.


## Effects

In theory, regression is insensitive to standardization since any linear transformation of input data can be counteracted by adjusting model parameters.

Despite the fact that in theroy standardization plays little role in regression, it is used in regression because of the followings:

1) Standardization improves the numerical stability of your model

2) Standardization may speed up the training process
if different features have drastically different ranges, the learning rate is determined by the feature with the largest range. This leads to another advantage of standardization: speeds up the training process.


PyTorch allows us to normalize our dataset using the standardization process we've just seen by passing in the mean and standard deviation values for each color channel to the Normalize() transform.

torchvision.transforms.Normalize(
      [meanOfChannel1, meanOfChannel2, meanOfChannel3] 
    , [stdOfChannel1, stdOfChannel2, stdOfChannel3] 
)

Refs [1](https://towardsdatascience.com/understand-data-normalization-in-machine-learning-8ff3062101f0), [2](https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/), [3](https://en.wikipedia.org/wiki/Correlation_and_dependence), [4](https://deeplizard.com/learn/video/lu7TCu7HeYc)

## Pytorch Normalization
In pytorch normalization means we transform our data such that aftrwards our data becomes : $\mu=0, \sigma^2=1$.
If you read the data directly from pytorch, they are in range of [0,255]

In [1]:
import torch
import torchvision

train_transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

CIFAR10_train_dataset=torchvision.datasets.CIFAR10(root='../data',download=True,transform=train_transform,train=True)

min_value=CIFAR10_train_dataset.data.min()
max_value=CIFAR10_train_dataset.data.max()

print("CIFAR10_train_dataset.data.min(): ",min_value)
print("CIFAR10_train_dataset.data.max(): ",max_value)

r_mean, g_mean, b_mean=CIFAR10_train_dataset.data.mean(axis=(0,1,2))
r_std, g_std, b_std=CIFAR10_train_dataset.data.std(axis=(0,1,2))

print("mean of r, g, b channel:",r_mean, g_mean, b_mean)
print("standard deviation  of r, g, b channel:",r_std, g_std, b_std)

Files already downloaded and verified
CIFAR10_train_dataset.data.min():  0
CIFAR10_train_dataset.data.max():  255
mean of r, g, b channel: 125.306918046875 122.950394140625 113.86538318359375
standard deviation  of r, g, b channel: 62.99321927813685 62.088707640014405 66.70489964063101


If you load the data with DataLoader without using any transformer they will be in the range of [0,1]

In [2]:
trainloader = torch.utils.data.DataLoader(CIFAR10_train_dataset, batch_size=4,
                                          shuffle=True, num_workers=2)



dataiter = iter(trainloader)
images, labels = dataiter.next()
print("images.min(): ",images.min())
print("images.max(): ", images.max())

images.min():  tensor(0.)
images.max():  tensor(1.)


Since you want to load the input to your network in the form of normal distribution with $\mu=0, \sigma^2=1$
you should compute the mean and std of your data in advance from dataset directly divide it by max value (since DataLoader will make it in the range of [0,1] ) and use it when loading data from DataLoader

In [3]:
r_mean, g_mean, b_mean=[r_mean/max_value,  g_mean/max_value, b_mean/max_value]
r_std, std_g, b_std=[r_std/max_value, g_std/max_value, b_std/max_value]

train_transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                                                torchvision.transforms.Normalize(
                                                    (r_mean, g_mean, b_mean),
                                                    (r_std, b_std, g_std)  ) ])

CIFAR10_train_dataset=torchvision.datasets.CIFAR10(root='../data',download=True,transform=train_transform,train=True)

trainloader = torch.utils.data.DataLoader(CIFAR10_train_dataset, batch_size=4,
                                          shuffle=True, num_workers=2)



dataiter = iter(trainloader)
images, labels = dataiter.next()
print('\nNow data are in the form of normal distribution\n')

print("images.min(): ",images.min())
print("images.max(): ", images.max())
print("shape of batch: batch_size x channel x row x column: ",images.shape)
print("shape of training dataset: ",CIFAR10_train_dataset.data.shape)
print("size of images: row x column x channel: ",CIFAR10_train_dataset.data[0].shape)


Files already downloaded and verified

Now data are in the form of normal distribution

images.min():  tensor(-1.9892)
images.max():  tensor(2.0430)
shape of batch: batch_size x channel x row x column:  torch.Size([4, 3, 32, 32])
shape of training dataset:  (50000, 32, 32, 3)
size of images: row x column x channel:  (32, 32, 3)


complete source code: [1](index.py), [2](datasets_normalization_preprocessing), [3](custome_dataset.py)