### PyTorch

PyTorch is a popular open source machine learning library based on Torch library. Pytorch provides three set of libraries, i.e., torchvision, torchaudio, torchtext for Computer Vision, Audio and Text respectively.

It provides two high-level features:

* Tensor computation (like NumPy) with strong GPU acceleration
* Deep neural networks built on a type-based autograd system

### Topics Covered

- Introducing Batch Dimension.
- Load Batch Of Images (Not Recommended Approach).
- Normalization
    - Resize.
    - Standardization.
    - Plotting.
- Creating One-Hot Encoding.
    - Convert Vector Into One-Hot Encoded Matrix.
    - Sample Example On Scatter_ with Zero and One Dimension.
    - Filter observation based on Condition.
- Norm
    - L2 Norm
    - L1 Norm

### Importing Libraries

In [None]:
import os
import numpy as np

import torch

from PIL import Image
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

### Importing Image

In [None]:
image = Image.open('SampleImages/dog.jpg')
np_image = np.rollaxis(np.asarray(image), 2, 0)
image

### Working with Images

In [None]:
print(f'Numpy Image Shape: {np_image.shape}')
torchImage = torch.from_numpy(np_image)
print(f'Convert Numpy to PyTorch: {torchImage.shape}')

* Pytorch format for image is C * H * W
* Convert numpy into PyTorch Format
* If the image is not in required format, then we can use Permute to align the dimension as required by pytorch, it moves dimension as mentioned in permute.

If you've built a deep learning model, during training, we should mention the batch size of the data. Now, we have seen that an image has three dimension. So there must be another dimension(batch), which needs to be included.

**Note: Sometimes images also have an alpha channel indicating transparency.**

### Introducing Batch Dimension

**Since we have only one image, we have batch size of 1. Unsqueeze introduces the dimension along with position in the argument. Standard practice is to keep the batch dimension at first position.**

In [None]:
print(f'Batch Dimension at First Position: {torchImage.unsqueeze(0).shape}')
print(f'Batch Dimension at Second Position: {torchImage.unsqueeze(1).shape}')
print(f'Batch Dimension at Last Position: {torchImage.unsqueeze(-1).shape}')

### Load Batch Of Images (Not Recommended Approach)

Declaring Zero matrix of an Image Dimension with constant resolution and batch dimension. Here, we are creating block of tensor to handle the batch of images. But torchvision library provides dataloader to load bulk of images. We will create dataloader in the next notebook.

In [None]:
"""Creating a batch of three image."""
batch_size = 3
imageBatch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.int16)

Now, we can insert three images into this batch tensor. 

* Moving a set of three images of different dimensions into imageBatch.
* We should resize the dimensions of the image to imageBatch dimension.

A bunch of things are happening in the below cell

- We are only selecting "jpg" files.
- Iterating through image folder and generating corresponding path for each image.
- Resizing each image w.r.t imageBatch.

In [None]:
data = 'SampleImages'
filenames = [name for name in os.listdir(data) if os.path.splitext(name)[-1] == '.jpg']
pilImages = [Image.open(os.path.join(data, f)) for f in filenames[:]]
f = lambda: [img.resize((256, 256)) for img in pilImages]

**Converting the PIL to Numpy Array, from Numpy to Torch. There is other way around for conversion from PIL to Torch image using torchvision library.**

In [None]:
for i, file in enumerate(f()):
    file = np.asarray(file)
    file = torch.from_numpy(file).permute(2, 0, 1)
    imageBatch[i] = file

**We have three images in a imageBatch Tensor with resized dimension to 256x256.**

In [None]:
imageBatch.shape

### Normalization

Normalization is a common Machine Learning concept applied on the features to  scale down the values. Normalization on images can be done by finding the image's mean and standard deviation.

**One way to normalize a grayscale image is image/=255.0**

In the below cell, we perform **Standardization**. A series of steps are followed here

- Convert Int type tensor to float tensor.
- Get the number of channels in the Image.
- Iterate through each channel.
- Calculate mean and standard deviation by combining three image's channel iteratively.

In [None]:
### Second way to Normalize, It is technically called as Standardization.
imageBatch = imageBatch.float()

n_channels = imageBatch.shape[1]
for c in range(n_channels):
    mean = torch.mean(imageBatch[:, c])
    std = torch.std(imageBatch[:, c])
    imageBatch[:, c] = (imageBatch[:, c] - mean) / std

### Plotting

In [None]:
f, axarr = plt.subplots(1,2, figsize=(10,10))
axarr[0].imshow(np.asarray(pilImages[0]))
axarr[1].imshow(imageBatch[0].permute(1,2,0))
axarr[0].set(xlabel="Original")
axarr[1].set(xlabel="Edited-256x256 and Standardized")
plt.show()

### One Hot Encoding

To handle categorical variables like class names or text feature etc, we use one hot encoding.
We create a vector of 25 rows and then plug a value 1 at each index along the dimension mentioned in _scatter. 

**Converting Vector Into One-Hot Encoded Matrix**

Steps to convert a vector into one-hot encoding.

- Create a Vector.
- Create zero matrix with unique categories present in vector.
- Converting 1D Vector into 2D Matrix Using unsqueeze.
- torch scatter_ writes values on the indices as mentioned by vector.
- scatter_ method's first argument of 1 refers to dimension. 
    - If one, it sets 1.0 along column dimension.
    - If zero, it sets 1.0 along row dimension.
- Mapping value 1.0 to Indices.

In [None]:
vector = torch.randint(0, 5, (25,))
print(vector)

In [None]:
target_onehot = torch.zeros(vector.shape[0], len(vector.unique()))
target_onehot.scatter_(1, vector.unsqueeze(1), 1.0)

### Sample Example On Scatter_ with Zero and One Dimension

In [None]:
src = torch.arange(1, 11).reshape((2, 5))

In [None]:
index = torch.tensor([[0, 1, 2, 0]])
torch.zeros(3, 5, dtype=src.dtype).scatter_(0, index, src)

In [None]:
index = torch.tensor([[0, 1, 2, 0]])
torch.zeros(3, 5, dtype=src.dtype).scatter_(1, index, src)

In [None]:
index = torch.tensor([[0, 1, 3], [0, 1, 4]])
torch.zeros(3, 5, dtype=src.dtype).scatter_(1, index, src)    

**Filtering records based on condition.**

Consider we have 25 observations with 10 features and a target variable, we can assume the previously created vector as target variable. If we can filter observations based on target value as mentioned in the below cell.

In [None]:
"""
We find the index of the vector with values less than or equal to two 
and utilize those indexes to get relevant observation.
"""
data = torch.randn(25, 10)
print(f'Total Observation: {data.shape}')
bad_index = vector<=2
bad_data = data[bad_index]
print(f'Filtered Observations: {bad_data.shape}')

### Calculating Norm of a Vector

**In Machine Learning, we often hear about norm of a vector. It refers to magnitude or length of a vector in the vector space.**

**L2 Norm**

In [None]:
u = torch.tensor([3.0, -4.0])
print(f'L2 Norm of a Vector: {torch.norm(u, p=2)}')

**L1 Norm**

In [None]:
u = torch.tensor([3.0, -4.0])
print(f'L1 Norm of a Vector: {torch.norm(u, p=1)}')

### Thanks For Reading. For Feedback, reach out on Github. Please don't spam.