# Standardization Vs Normalization

## Normalization (min-max Normalization or Feature Scaling)
Normalization rescales the values into a range of [0,1]. This might be useful in some cases where all parameters need to have the same positive scale.

$X_{norm}=\frac{X-X_{min}}{X_{max}-X_{min}}$


Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.

## Standardization (Z-Score Normalization)
Scaling to normal distribution $\mu=0$ and $\sigma^2=1$

$X_{standard}=\frac{X-\mu}{\sigma}$

Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does **not** have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.


## Effects

In theory, regression is insensitive to standardization since any linear transformation of input data can be counteracted by adjusting model parameters.

Despite the fact that in theroy standardization plays little role in regression, it is used in regression because of the followings:

1) Standardization improves the numerical stability of your model

2) Standardization may speed up the training process
if different features have drastically different ranges, the learning rate is determined by the feature with the largest range. This leads to another advantage of standardization: speeds up the training process.


PyTorch allows us to normalize our dataset using the standardization process we've just seen by passing in the mean and standard deviation values for each color channel to the Normalize() transform.

torchvision.transforms.Normalize(
      [meanOfChannel1, meanOfChannel2, meanOfChannel3] 
    , [stdOfChannel1, stdOfChannel2, stdOfChannel3] 
)

Refs [1](https://towardsdatascience.com/understand-data-normalization-in-machine-learning-8ff3062101f0), [2](https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/), [3](https://en.wikipedia.org/wiki/Correlation_and_dependence), [4](https://deeplizard.com/learn/video/lu7TCu7HeYc)

## Pytorch Normalization
In pytorch normalization means we transform our data such that aftrwards our data becomes : $\mu=0, \sigma^2=1$.
If you read the data directly from pytorch, they are in range of [0,255]

In [1]:
import torch
import torchvision

train_transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

CIFAR10_train_dataset=torchvision.datasets.CIFAR10(root='../data',download=True,transform=train_transform,train=True)

min_value=CIFAR10_train_dataset.data.min()
max_value=CIFAR10_train_dataset.data.max()

print("CIFAR10_train_dataset.data.min(): ",min_value)
print("CIFAR10_train_dataset.data.max(): ",max_value)

r_mean, g_mean, b_mean=CIFAR10_train_dataset.data.mean(axis=(0,1,2))
r_std, g_std, b_std=CIFAR10_train_dataset.data.std(axis=(0,1,2))

print("mean of r, g, b channel:",r_mean, g_mean, b_mean)
print("standard deviation  of r, g, b channel:",r_std, g_std, b_std)

Files already downloaded and verified
CIFAR10_train_dataset.data.min():  0
CIFAR10_train_dataset.data.max():  255
mean of r, g, b channel: 125.306918046875 122.950394140625 113.86538318359375
standard deviation  of r, g, b channel: 62.99321927813685 62.088707640014405 66.70489964063101


If you load the data with DataLoader without using any transformer they will be in the range of [0,1]

In [2]:
trainloader = torch.utils.data.DataLoader(CIFAR10_train_dataset, batch_size=4,
                                          shuffle=True, num_workers=2)



dataiter = iter(trainloader)
images, labels = dataiter.next()
print("images.min(): ",images.min())
print("images.max(): ", images.max())

images.min():  tensor(0.)
images.max():  tensor(1.)


Since you want to load the input to your network in the form of normal distribution with $\mu=0, \sigma^2=1$
you should compute the mean and std of your data in advance from dataset directly divide it by max value (since DataLoader will make it in the range of [0,1] ) and use it when loading data from DataLoader

In [3]:
r_mean, g_mean, b_mean=[r_mean/max_value,  g_mean/max_value, b_mean/max_value]
r_std, std_g, b_std=[r_std/max_value, g_std/max_value, b_std/max_value]

train_transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
                                                torchvision.transforms.Normalize(
                                                    (r_mean, g_mean, b_mean),
                                                    (r_std, b_std, g_std)  ) ])

CIFAR10_train_dataset=torchvision.datasets.CIFAR10(root='../data',download=True,transform=train_transform,train=True)

trainloader = torch.utils.data.DataLoader(CIFAR10_train_dataset, batch_size=4,
                                          shuffle=True, num_workers=2)



dataiter = iter(trainloader)
images, labels = dataiter.next()
print('\nNow data are in the form of normal distribution\n')

print("images.min(): ",images.min())
print("images.max(): ", images.max())
print("shape of batch: batch_size x channel x row x column: ",images.shape)
print("shape of training dataset: ",CIFAR10_train_dataset.data.shape)
print("size of images: row x column x channel: ",CIFAR10_train_dataset.data[0].shape)


Files already downloaded and verified

Now data are in the form of normal distribution

images.min():  tensor(-1.9892)
images.max():  tensor(2.0430)
shape of batch: batch_size x channel x row x column:  torch.Size([4, 3, 32, 32])
shape of training dataset:  (50000, 32, 32, 3)
size of images: row x column x channel:  (32, 32, 3)


complete source code: [1](index.py), [2](datasets_normalization_preprocessing), [3](custome_dataset.py)

## Datasets Loader

## Display Tensor Images

## Data Loader

## Custom Dataset

## Image Folder

## Transformers