# 1. Standardization vs. Normalization

### 1.1 Normalization (Min-Max Scaling or Feature Scaling)
Normalization rescales the feature values to a fixed range, usually $[0, 1]$. The formula is:

$
X_{\text{norm}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
$

- **Purpose**: Useful when you want all features to contribute equally to the model, especially when they are on different scales.
- **Assumptions**: Does **not** assume any particular distribution of the data.
- **Sensitive to outliers**: Yes — since it relies on the minimum and maximum values, outliers can significantly affect the scaling.
- **Use cases**: Algorithms that rely on distances or assume bounded input features, such as:
  - K-Nearest Neighbors (KNN)
  - Neural Networks (e.g., when using sigmoid/tanh activations)
  - Principal Component Analysis (PCA), when interpretability is not affected by bounded scale

---

### 1.2 Standardization (Z-score Normalization)
Standardization transforms the data to have zero mean and unit variance. The formula is:

$
X_{\text{standard}} = \frac{X - \mu}{\sigma}
$

- **Purpose**: Useful when features have different means and variances and you want to center them around 0.
- **Assumptions**: Works well if the data is approximately normally distributed, but this is **not a strict requirement**.
- **Sensitive to outliers**: Less than min-max normalization, but outliers still affect mean and standard deviation.
- **Use cases**: Algorithms that assume data is centered or use covariance:
  - Linear Regression
  - Logistic Regression
  - Support Vector Machines (SVM)
  - Principal Component Analysis (PCA) (for preserving variance direction)
  - K-Means Clustering

---



| Aspect               | Normalization \([0,1]\)                  | Standardization ($\mu=0$, $\sigma=1$) |
|----------------------|-------------------------------------------|--------------------------------------------|
| Range                | Bounded $([0,1]$)                      | Unbounded                                 |
| Sensitive to outliers | High                                    | Medium                                     |
| Assumes normality    | No                                       | No (but benefits from it)                 |
| Preserves outliers   | No                                       | Yes (to an extent)                        |
| Use cases            | KNN, Neural Nets                         | SVM, Linear Models, PCA, K-Means          |

---


# 2. Preprocessing Transforms

These transforms convert your raw images into tensors and prepare them for model input. They generally include:

- **Resize:**  
  Scales the image to a target size. This is useful for ensuring that all images in a batch have consistent dimensions.  
  ```python
  transforms.Resize((256, 256))
  ```  
  Alternatively, you might use:
  ```python
  transforms.Resize(256)  # Maintains aspect ratio if a single number is provided.
  ```

- **Center Crop / Random Crop:**  
  When working with images that have been resized, cropping helps to focus on the central region (for validation/testing) or to introduce variability (for training).  
  - **CenterCrop:** Used during evaluation to ensure a consistent crop.  
    ```python
    transforms.CenterCrop(224)
    ```
  - **RandomCrop:** Provides a random crop and is part of data augmentation during training.
    ```python
    transforms.RandomCrop(224)
    ```

- **ToTensor:**  
  Converts a PIL image or NumPy array into a PyTorch tensor and scales pixel values to \[0, 1\].  
  ```python
  transforms.ToTensor()
  ```

- **Normalization:**  
  Adjusts pixel values by subtracting the dataset’s mean and dividing by its standard deviation. This normalization is key for training stability, especially when working with pretrained models. For example, the ImageNet dataset uses:
  ```python
  transforms.Normalize(mean=[0.485, 0.456, 0.406],
                       std=[0.229, 0.224, 0.225])
  ```
  Note that the mean and std should match the dataset your model was trained on or adjusted for your custom dataset.

---



In **PyTorch**, the term **"normalization" often refers to standardization**, i.e., **Z-score normalization**.

### Specifically:
When using `torchvision.transforms.Normalize(mean, std)` in PyTorch, the transformation applied is:

$
X' = \frac{X - \mu}{\sigma}
$

This is exactly **Z-score standardization**, where:
- `mean` = $\mu$ (per channel)
- `std` = $\sigma$ (per channel)

So even though it's **called "Normalize"**, it's actually **standardizing** the image pixel values, usually with:
```python
transforms.Normalize(mean=[0.5], std=[0.5])
```
which scales image values from $[0, 1]$ to $[-1, 1]$, or:
```python
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
```
which is standardization using ImageNet statistics.


when you use `transforms.ToTensor()`, it **does not involve any normalization using mean or standard deviation**. It simply rescales the pixel values from `[0, 255]` to `[0.0, 1.0]` using:

$
X_{\text{norm}} = \frac{X}{255}
$

So your assumption is correct — it's not applying:

$
X_{\text{standardized}} = \frac{X - \mu}{\sigma}
$

To apply mean/std normalization (standardization), you would explicitly add `transforms.Normalize(mean, std)` to your transform pipeline, like this:

```python
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.4914, 0.4822, 0.4465],  # CIFAR-10 mean
                         std=[0.2023, 0.1994, 0.2010])   # CIFAR-10 std
])
```



When you use `transforms.ToTensor()` in PyTorch, it **automatically converts the image pixel values from the range [0, 255] (uint8)** to **floating-point values in the range [0.0, 1.0]**.

So here:

```python
CIFAR10_train_dataset = datasets.CIFAR10(root='../data', train=True,
                                         download=True, transform=transforms.ToTensor())
```

Each image in the dataset will be a `torch.Tensor` of shape `(3, 32, 32)` with values **in the range [0.0, 1.0]**.

If you want to keep the original pixel values in the `[0, 255]` range (e.g., for visualization or custom processing), you should avoid `transforms.ToTensor()` and instead use a custom transform or read the raw image.



### Calculate the  $\mu$ and $\sigma$:
If you need to calculate the  $\mu$ and $\sigma$:


```python
def calculate_mean_std(dataset,  batch_size=64, num_workers=2):
    
    loader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    mean = 0.
    std = 0.
    total_images = 0

    for images, lables in tqdm(loader, desc='Computing mean and std'):
        print("images shape is: B, C, H, W", images.shape)

        # so you will get something like: images shape is: B, C, H, W torch.Size([64, 3, 224, 224]

        images = images.view(images.size(0), images.size(1), -1)  # (B, C, H*W)

        # since we want the mean over the image, we call images.mean(2), since 
        # images.mean(0) -> B
        # images.mean(1) -> C

        # now images.mean(2) is the mean of each channel for all batches so it is something like:
        # [[0.0691, 0.0691, 0.0691],
        # [0.1690, 0.1690, 0.1690],
        # .
        # .
        # .
        # [0.1031, 0.1031, 0.1031],
        # [0.1088, 0.1088, 0.1088],

        # so  images.mean(2).sum(0) will collapse the 0 dimention which rows, so you get sum along rows: 
        # images.mean(2).sum(0) -> tensor([7.3089, 7.3089, 7.3089])
        
        mean += images.mean(2).sum(0)
        std += images.std(2).sum(0)
        
        total_images += images.size(0)

    # Here we have the sum of the avg of all images, so just divide by the total number of images
    mean /= total_images
    std /= total_images
    return mean, std
```    

on **Windows you must wrap your DataLoader code inside a `if __name__ == '__main__':` block**, or you'll run into issues due to how multiprocessing works.

Unlike Linux/macOS, Windows **does not fork processes** — it **spawns them**, which means the entire script is re-imported in each subprocess. If your `DataLoader` with `num_workers > 0` is not guarded properly, it can cause **recursive subprocess creation** or errors like:

```
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
```

---

###  Here's how to fix it:
in the file `data_utils.py`

```python
from torch.utils.data import DataLoader
from tqdm import tqdm
import torch

def calculate_mean_std(dataset, batch_size=64, num_workers=2):
    loader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    mean = 0.
    std = 0.
    total_images = 0

    for images, _ in tqdm(loader, desc='Computing mean and std'):
        images = images.view(images.size(0), images.size(1), -1)
        mean += images.mean(2).sum(0)
        std += images.std(2).sum(0)
        total_images += images.size(0)

    mean /= total_images
    std /= total_images
    return mean, std
```

This is totally fine because you’re **just defining the function** — it won’t be executed when the module is imported. Then, in your `train.py`, you do:


```python
from torchvision import datasets, transforms
from data_utils import calculate_mean_std

if __name__ == '__main__':
    transform = transforms.ToTensor()
    dataset = datasets.CIFAR10(root='../data', train=True, download=True, transform=transform)

    mean, std = calculate_mean_std(dataset, num_workers=2)  # Safe here
    print(f"Mean: {mean}")
    print(f"Std: {std}")
```

---


- Always put `DataLoader` usage inside `if __name__ == '__main__':` when `num_workers > 0` on Windows.
- **Never create a `DataLoader` with `num_workers > 0` at the top level of a module if that module might be imported.**
- For quick tests or small datasets, setting `num_workers=0` avoids this hassle.

# 3. Data Augmentation Transforms

Data augmentation increases the diversity of your training data and can help prevent overfitting. Common augmentation strategies include:

- **Random Horizontal Flip:**  
  Flips the image horizontally with a given probability (by default 0.5).  
  ```python
  transforms.RandomHorizontalFlip(p=0.5)
  ```

- **Random Vertical Flip:**  
  Flips the image vertically. Use this judiciously – it can be useful for datasets where vertical orientation is not semantically important (e.g., some aerial or medical images).  
  ```python
  transforms.RandomVerticalFlip(p=0.5)
  ```

- **Random Rotation:**  
  Rotates the image within a specified degree range.  
  ```python
  transforms.RandomRotation(degrees=15)
  ```
  A small degree range is often sufficient unless you know your objects can appear at wide angles.

- **Random Resized Crop:**  
  This transform combines random cropping with resizing. It randomly crops a portion of the image and then scales it to a target size. This is very popular for training on natural images.  
  ```python
  transforms.RandomResizedCrop(224, scale=(0.08, 1.0), ratio=(0.75, 1.33))
  ```
  The scale parameter specifies the range of size of the cropped image relative to the original, and ratio controls the aspect ratio range.

- **Color Jitter:**  
  Adjusts brightness, contrast, saturation, and hue. This augmentation is helpful when lighting conditions vary.  
  ```python
  transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1)
  ```

- **Random Affine / Perspective:**  
  - **RandomAffine:** Applies random translations, rotations, scaling, and shearing.  
    ```python
    transforms.RandomAffine(degrees=10, translate=(0.1, 0.1), scale=(0.9, 1.1), shear=5)
    ```
  - **RandomPerspective:** Simulates perspective distortions.  
    ```python
    transforms.RandomPerspective(distortion_scale=0.5, p=0.5)
    ```

- **Additional Transforms:**  
  - **GaussianBlur:** For datasets where blurring might simulate realistic scenarios (especially in noisy environments).  
    ```python
    transforms.GaussianBlur(kernel_size=3)
    ```
  - **Random Erasing:** For occlusion augmentation, which randomly erases a portion of the image. This can help the network become robust to missing parts of an image.  
    ```python
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3))
    ```

---

# 4. Building the Transform Pipeline

It is common practice to define separate pipelines for training and evaluation (validation/testing). Below are examples:

### Training Transform Pipeline
When building the training pipeline, the focus is on introducing variability:

```python
from torchvision import transforms

train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224),                # Randomly crop and resize
    transforms.RandomHorizontalFlip(),                # Random flip horizontally
    transforms.RandomRotation(degrees=15),            # Random rotation
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1),  # Vary colors
    transforms.ToTensor(),                            # Convert to tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
    transforms.RandomErasing(p=0.5)                   # Randomly erase parts of the image
])
```

### Evaluation Transform Pipeline
For evaluation, you want consistency and reproducibility:

```python
from torchvision import transforms

val_transform = transforms.Compose([
    transforms.Resize(256),                           # Resize images to a consistent scale
    transforms.CenterCrop(224),                       # Crop the center of the image
    transforms.ToTensor(),                            # Convert to tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])
```

---

# 5. Choosing and Tuning Transforms

- **Dataset Characteristics:**  
  Not every augmentation is beneficial for every dataset. For example, if your dataset has objects with strict orientation (like digits or text), heavy rotations might hurt performance. Analyze your dataset to decide which augmentations are appropriate.

- **Model Architecture:**  
  Some models are sensitive to the scale or other characteristics of the input. Make sure your normalization values and the spatial dimensions match the pretrained model’s expected input.

- **Experimentation:**  
  It is common to experiment with a subset of these transforms and tune their parameters (e.g., rotation degrees, crop sizes, probability parameters) to see what optimally improves your validation performance.

- **Data Augmentation Libraries:**  
  While torchvision.transforms covers many use cases, for complex scenarios you might explore libraries like [albumentations](https://albumentations.ai/) which offer a more extensive suite of augmentation methods and more flexibility.

---

A well-crafted transform pipeline can significantly improve your model’s ability to generalize while keeping training efficient. Experiment with these suggestions, and tailor them to your specific problem domain to achieve the best results.