### Data Preprocessing Pipeline for Image Segmentation
- We are developing a simple data preprocessing pipeline using OpenCV and PyTorch for image segmentation tasks.
- This pipeline includes loading an image and its corresponding mask, resizing, normalization, augmentation (random horizontal flip), and converting to PyTorch tensors.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#### 1. Import libraries

In [2]:
import cv2
import numpy as np
import torch
from torchvision import transforms

#### 2. Load image and mask

In [6]:
def load_image_and_mask(image_path, mask_path):
    # Load image in BGR, convert to RGB
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # Load mask in grayscale (assumed: 0 = background, >0 = subject)
    mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)

    return image, mask

# Example usage
image_path = "/content/drive/MyDrive/AI_Vision_Extract_Nov25/notebooks/image.jpg"
mask_path  = "/content/drive/MyDrive/AI_Vision_Extract_Nov25/notebooks/image.jpg"

image, mask = load_image_and_mask(image_path, mask_path)
print(f"Image shape: {image.shape}, Mask shape: {mask.shape}")

Image shape: (241, 242, 3), Mask shape: (241, 242)


- `cv2.imread(image_path)` loads the image in BGR format (OpenCV default).

- `cv2.cvtColor(..., cv2.COLOR_BGR2RGB)` converts it to RGB so it matches the usual PyTorch convention.

- `cv2.imread(..., cv2.IMREAD_GRAYSCALE)` loads the mask as a single-channel grayscale image (0 for background, non-zero for subject).

#### 3. Resize image and mask

In [7]:
def resize_image_and_mask(image, mask, target_size=(256, 256)):
    # Resize image using bilinear interpolation
    image_resized = cv2.resize(image, target_size, interpolation=cv2.INTER_LINEAR)

    # Resize mask using nearest neighbor (to avoid blurring class boundaries)
    mask_resized = cv2.resize(mask, target_size, interpolation=cv2.INTER_NEAREST)

    return image_resized, mask_resized

# Example usage
target_size = (256, 256)
image_resized, mask_resized = resize_image_and_mask(image, mask, target_size)
print(f"Resized image shape: {image_resized.shape}, mask shape: {mask_resized.shape}")

Resized image shape: (256, 256, 3), mask shape: (256, 256)


- `cv2.INTER_LINEAR` is good for images (smooth resizing).

- `cv2.INTER_NEAREST` is used for masks to preserve sharp boundaries (no interpolation between classes).

- `target_size` is usually a tuple like `(H, W)` or `(W, H)`; OpenCV uses `(width, height)`.

#### 4. Normalize image to and convert to float

In [8]:
def normalize_image(image):
    # Convert to float32 and scale to [0, 1]
    image_float = image.astype(np.float32) / 255.0
    return image_float

# Example usage
image_normalized = normalize_image(image_resized)
print(f"Image dtype: {image_normalized.dtype}, range: [{image_normalized.min():.3f}, {image_normalized.max():.3f}]")

Image dtype: float32, range: [0.000, 1.000]


- Converts the image from uint8 (0–255) to float32 (0.0–1.0).

- This is needed before applying PyTorch normalization transforms.

#### 5. Apply random horizontal flip (augmentation)

In [9]:
def random_horizontal_flip(image, mask, p=0.5):
    if np.random.rand() < p:
        image = np.flip(image, axis=1)
        mask  = np.flip(mask,  axis=1)
    return image, mask

# Example usage
image_aug, mask_aug = random_horizontal_flip(image_normalized, mask_resized, p=0.5)

- Flips both image and mask horizontally with probability p.

- This is a simple data augmentation that helps the model generalize better.

- Always flip the mask in the same way as the image so the labels stay aligned.

#### 6. Convert to PyTorch tensors

In [11]:
def to_tensor(image, mask):
    image = image.copy()
    mask  = mask.copy()
    # Convert image: HWC → CHW and to tensor
    image_tensor = torch.from_numpy(image).permute(2, 0, 1)  # HWC → CHW
    mask_tensor  = torch.from_numpy(mask).long()              # mask as long tensor

    return image_tensor, mask_tensor

# Example usage
image_tensor, mask_tensor = to_tensor(image_aug, mask_aug)
print(f"Image tensor shape: {image_tensor.shape}, mask tensor shape: {mask_tensor.shape}")

Image tensor shape: torch.Size([3, 256, 256]), mask tensor shape: torch.Size([256, 256])


- `permute(2, 0, 1)` changes the order from `(H, W, C)` to `(C, H, W)` as expected by PyTorch models.

- `mask` is converted to `long` (int64) because segmentation masks are class indices (not floats).

#### 7. Normalize image using ImageNet stats

In [12]:
# Define normalization transform (ImageNet mean/std)
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

# Apply normalization
image_normalized_tensor = normalize(image_tensor)

- Normalizes each channel using ImageNet statistics (common for models like ResNet, VGG, etc.).

- This helps the model train faster and more stably.

- Only applied to the image; the mask is left as-is.

#### 8. Convert mask to binary (subject vs background)

In [13]:
def make_binary_mask(mask_tensor):
    # Convert any non-zero value to 1 (subject), 0 remains background
    binary_mask = (mask_tensor > 0).long()
    return binary_mask

# Example usage
binary_mask = make_binary_mask(mask_tensor)
print(f"Unique values in mask: {torch.unique(binary_mask)}")  # Should be tensor([0, 1])

Unique values in mask: tensor([0, 1])


- Converts a multi-class mask into a binary mask:

  - `0` → background

  - `>0` → subject (set to 1)

- Useful if the original mask has multiple object classes but the task is just “subject vs background”.


#### Task: Implement the full pipeline and execute it.

In [14]:
import os

# Create dummy directories if they don't exist
image_dir = "data/images"
mask_dir  = "data/masks"
os.makedirs(image_dir, exist_ok=True)
os.makedirs(mask_dir, exist_ok=True)

# Create a dummy image (e.g., a simple white image)
dummy_image = np.ones((256, 256, 3), dtype=np.uint8) * 255 # White image
cv2.imwrite(os.path.join(image_dir, "train_001.jpg"), dummy_image)

# Create a dummy mask (e.g., a simple circle in the center)
dummy_mask = np.zeros((256, 256), dtype=np.uint8)
cv2.circle(dummy_mask, (128, 128), 50, 255, -1) # White circle
cv2.imwrite(os.path.join(mask_dir, "train_001.png"), dummy_mask)

print(f"Created dummy image: {os.path.join(image_dir, 'train_001.jpg')}")
print(f"Created dummy mask: {os.path.join(mask_dir, 'train_001.png')}")


Created dummy image: data/images/train_001.jpg
Created dummy mask: data/masks/train_001.png
