In [1]:
import numpy as np
import torch
torch.set_printoptions(edgeitems=2, threshold=50)

In [6]:
import imageio.v2 as imageio  # Use version 2 API
img_arr = imageio.imread('../data/p1ch4/image-dog/bobby.jpg')
img_arr.shape

(720, 1280, 3)

`torch.from_numpy()` creates a tensor that shares the same underlying memory as the NumPy array, meaning modifications to img will also affect img_arr (and vice versa).\
`permute(2, 0, 1)` reorders these dimensions to (channels, height, width), resulting in a shape of (3, 720, 1280).
This is a common convention in PyTorch for image data, where neural networks (e.g., convolutional layers) expect the channel dimension first (e.g., C x H x W instead of H x W x C).

In [7]:
img = torch.from_numpy(img_arr)
out = img.permute(2, 0, 1)

In [8]:
batch_size = 3
batch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.uint8)

This creates a list `filenames` containing the names of all files in `data_dir` that end with `.png`.\
`os.listdir(data_dir)`: Lists all files and directories in data_dir.\
`os.path.splitext(name)`: Splits a filename into its base name and extension (e.g., 'cat.png' becomes ('cat', '.png')).\
`[-1]`: Takes the extension part (e.g., .png).\
The list comprehension filters for files with the .png extension.


In [9]:
import os

data_dir = '../data/p1ch4/image-cats/'
filenames = [name for name in os.listdir(data_dir)
             if os.path.splitext(name)[-1] == '.png']

`enumerate(filenames)` provides both an index i (starting at 0) and the corresponding filename.\
Since batch_size is 3, the loop will process up to 3 images (assuming there are at least 3 PNG files).\
When we apply img_t[:3], the slicing operates on the first dimension of the tensor (index 0). In PyTorch (and Python generally), the syntax [:3] means “take all elements from the start up to (but not including) the 3rd index along this dimension.” Since the first dimension is now the channel dimension with a size of 3, img_t[:3] selects: Channel 0 (e.g., Red), Channel 1 (e.g., Green), Channel 2 (e.g., Blue).\
Assigns the processed tensor img_t to the i-th position in the batch tensor.

`batch[i] = img_t`: batch has shape (3, 3, 256, 256), so `img_t` must have shape (3, 256, 256) to fit. This assumes all images are 256x256 pixels; otherwise, resizing would be needed (not shown here).

What if I want the slicing to operate on the second dimension of the tensor?
First 100 elements: `tensor[:, :100, :]` → shape (3, 100, 256).

What if I want the slicing to operate on the third dimension of the tensor?
Let’s say your tensor has shape (3, 256, 256) (3 channels, 256 height, 256 width).\
`tensor[:, :, :100]`

# Can I Use `tensor[:, :100]` Instead of `tensor[:, :100, :]` for a 3D Tensor of Shape `(3, 256, 256)`?

Yes, you can use `tensor[:, :100]` rather than `tensor[:, :100, :]` for a 3D tensor of shape `(3, 256, 256)` in PyTorch, and both will produce the same result.

## Why It Works

In PyTorch, tensor slicing follows the format `tensor[dim0_slice, dim1_slice, dim2_slice, ...]`, where each slice corresponds to one of the tensor’s dimensions. For a 3D tensor with shape `(3, 256, 256)`:
- The first dimension (size 3) typically represents channels or some similar grouping.
- The second dimension (size 256) represents the height.
- The third dimension (size 256) represents the width.

### Breaking Down the Slicing

- **`tensor[:, :100, :]`**:
  - `:` in the first position means "select all elements of the first dimension" (all 3 channels).
  - `:100` in the second position means "select the first 100 elements of the second dimension" (the first 100 rows out of 256).
  - `:` in the third position means "select all elements of the third dimension" (all 256 columns).
  - Resulting shape: `(3, 100, 256)`.

- **`tensor[:, :100]`**:
  - When you provide fewer slices than the number of dimensions in PyTorch, the remaining dimensions are implicitly sliced with `:` (select all).
  - So, `tensor[:, :100]` is interpreted as `tensor[:, :100, :]`.
  - This means:
    - `:` selects all 3 elements of the first dimension.
    - `:100` selects the first 100 elements of the second dimension.
    - (Implicit) `:` selects all 256 elements of the third dimension.
  - Resulting shape: `(3, 100, 256)`.

### Why They’re Equivalent

In both cases, the slicing selects:
- All channels (3),
- The first 100 rows (out of 256),
- All columns (256).

Thus, both `tensor[:, :100]` and `tensor[:, :100, :]` result in a tensor with the same shape, `(3, 100, 256)`, and contain the same data. This behavior is consistent with how PyTorch (and NumPy) handles slicing: if fewer indices are provided than the tensor’s number of dimensions, the unspecified dimensions are fully selected by default.

## Conclusion

For a 3D tensor of shape `(3, 256, 256)`, `tensor[:, :100]` is a valid shorthand for `tensor[:, :100, :]` and achieves the exact same outcome. You can use either form interchangeably in this context.

In [10]:
for i, filename in enumerate(filenames):
    img_arr = imageio.imread(os.path.join(data_dir, filename))
    img_t = torch.from_numpy(img_arr)
    img_t = img_t.permute(2, 0, 1)
    img_t = img_t[:3]
    batch[i] = img_t

Divide the values of the pixels by 255 (the maximum representable number in 8-bit unsigned)

In [11]:
batch = batch.float()
batch /= 255.0

This extracts the number of channels from batch’s shape.\
`batch.shape` is (3, 3, 256, 256), so `batch.shape[1]` is 3 (the channel dimension).\
`n_channels` is set to 3, representing the RGB channels.

In [12]:
n_channels = batch.shape[1]

`batch[:, c]` selects the c-th channel for all images in the batch.

In [14]:
for c in range(n_channels):
    mean = torch.mean(batch[:, c])
    std = torch.std(batch[:, c])
    batch[:, c] = (batch[:, c] - mean) / std