# **Lab 1: ROI & Mask Refinement**

## Overview
In this lab, you will learn how to apply a region of interest (ROI) to a binary lane mask and refine the mask using morphological operations. These steps are essential to remove noise, focus on the area of interest, and produce a clean lane mask for subsequent geometry and control steps in the lane-keeping pipeline.

## Learning Objectives
- Understand the concept of a region of interest (ROI) in image processing and why it is used.
- Implement a function to apply an ROI polygon to a binary lane mask.
- Understand basic morphological operations such as closing and opening to refine the mask.
- Implement a function to refine a binary lane mask by removing small noisy regions and filling gaps.

## Platform Requirements
This lab assumes you are familiar with Python, NumPy, and OpenCV. You need to know how to read images and perform simple array operations.

### **Task 1. Environment Setup**

In this lab, we use OpenCV (`cv2`) and NumPy. Run the following cell to import the required libraries.

Initialize the libraries that will be used throughout the entire lab.

- OpenCV (cv2): used to read images, process images, create ROI and perform morphological operations.

- NumPy (np): handles numerical arrays, data type conversions, and mask operations.

- Matplotlib (plt): displays images before and after applying ROI or refining the mask.

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt

### **Task 2. Loading a Sample Mask**
In practice, the binary lane mask comes from a neural network segmentation model. For this lab, you should prepare a sample binary mask image (values 0 or 1) and place it in the `data/` folder. The code below loads an image file into a NumPy array.

##### Initialize input path

In [None]:
# TODO: provide the path to your sample mask image (grayscale PNG or JPG)
mask_path = ...  # e.g. 'data/lane_mask.png' OR r"data\lane_mask.png"

##### Read mask image + check existence

Lane mask after segmentation is a binary image, requiring only a single channel.
Therefore:
- No need to read the image in RGB (3 channels).
- Grayscale is sufficient to reduce processing cost and match the nature of the data.

Additionally, ROI and morphology operations in the pipeline assume the mask is a single-channel array.

In [None]:
# Load the mask image as grayscale
mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE) 
if mask is None:
    raise FileNotFoundError('Mask file not found. Please place a binary mask image in the data/ folder.')

##### Mask Normalization Note

The entire Mask Processing pipeline (including *ROI Filtering*, *Morphology Closing*, *Connected Components*) assumes the mask is in binary form:

$$mask\_01(x, y) \in \{0, 1\}$$

However, when saving the image in PNG/JPG format, the actual pixel values can range from $0-255$. Therefore, we need to perform the normalization step:

* Pixel $> 0 \to 1$
* Pixel $= 0 \to 0$

> Important: This step is necessary to ensure the mathematical correctness for the region of interest (ROI) and to help the refinement steps operate accurately.

In [None]:
# Convert mask to binary 0/1
mask01 = (mask > 0).astype(np.uint8)

# Display the original mask
plt.figure(figsize=(4, 4))
plt.title('Original mask')
plt.imshow(mask01, cmap='gray')
plt.axis('off')
plt.show()

### **Task 3. Applying Region of Interest (ROI)**
##### **Intuitive Idea**

The camera mounted on the AutoCar-Kit has a low viewing angle. In the image:

- The upper part is usually the ceiling, walls, people, table legs, etc.
- The lower part is the actual road surface and lane markings.

For the lane-keeping problem, we only need the area near the car—meaning we only retain the bottom 40% polygon where the road lanes actually appear.
Instead of processing the entire image, we crop a Region of Interest (ROI):

> ROI = a polygon covering the area of the road surface we are interested in.

In the main pipeline, the ROI is defined by coordinate ratios relative to the image size, for example:

##### **Mathematical Representation of ROI**

Assume the output image from the segmentation model has dimensions $(H \times W)$.
We denote:

- $M(x, y)$: the value of the lane mask at pixel $(x,y)$, where $M(x,y) \in \{0,1\}$.
- $(x_i^{\text{ratio}},\, y_i^{\text{ratio}})$: the ratio coordinates of the ROI polygon vertices, where each value belongs to $[0,1]$.

##### **Converting Ratio Coordinates → Pixel Coordinates**

To convert the ROI points to actual pixel coordinates in the image, we multiply by the height and width:

$$
x_i^{\text{pix}}
= x_i^{\text{ratio}} \cdot W, 
\qquad
y_i^{\text{pix}}
= y_i^{\text{ratio}} \cdot H.
$$

This ensures that the ROI is always independent of the image resolution, and only changes with the true dimensions $(W, H)$.

##### **Constructing the ROI Mask**

We define the binary function $R(x,y)$ — the ROI mask — as follows:


$$
R(x,y) =
\begin{cases}
1, & \text{if } (x,y) \text{ is inside the ROI polygon}, \\
0, & \text{if } (x,y) \text{ is outside the ROI}. 
\end{cases}
$$

This function is created by filling the ROI polygon on a blank image.

##### **Applying ROI to the Lane Mask**

The mask after applying the ROI is the point-wise multiplication between the original mask and the ROI mask:

$$
M_{\text{ROI}}(x,y)
= M(x,y) \cdot R(x,y).
$$

Meaning:

- If $(x,y)$ is outside the ROI
  $\Rightarrow R(x,y)=0$
  $\Rightarrow$ the pixel is eliminated.

- If $(x,y)$ is inside the ROI
  $\Rightarrow R(x,y)=1$
  $\Rightarrow$ the mask value is retained.

##### **Illustration**

Applying the ROI helps to:

- Eliminate the entire upper part of the image (where the road surface is not present).
- Reduce noise from objects, lighting, and shadows.
- Make the subsequent pipeline (refine mask $\to$ BEV $\to$ scanline $\to$ controller) more stable.

The ROI acts as a spatial filter, retaining only what the vehicle is truly interested in.

#### **Practice – Apply ROI**

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt

def apply_roi(mask01: np.ndarray, roi_poly: np.ndarray) -> np.ndarray:
    """
    Apply Region of Interest (ROI) mask to the binary lane mask.
    Students will fill in missing code segments.
    """
# -------------------------------------------------------------
# ... YOUR CODE HERE...

    # TODO: Scale ROI polygon to pixel coordinates
    # Hint: use mask01.shape and multiply roi_poly by [W, H]
    H, W = ...          
    pts = ...           
    
    # Create ROI mask and fill polygon with ones
    roi_mask = np.zeros_like(mask01, dtype=np.uint8)
    cv2.fillPoly(roi_mask, [pts], 1)


    # TODO: Zero-out pixels outside ROI
    # Hint: mask_roi should be a copy of mask01 before modification
    mask_roi = ...
    mask_roi[roi_mask == 0] = 0
    return mask_roi


# TODO: Define the ROI polygon (normalized 0–1 coordinates)
# Hint: Use a 4-point polygon (e.g., same as pipeline)
roi_poly = ...

# TODO: Apply ROI using apply_roi(mask01, roi_poly)
mask_roi = ...


# ... END ...
# -------------------------------------------------------------

# Display the result
plt.figure(figsize=(8, 4))

plt.subplot(1, 2, 1)
plt.title('Original mask')
plt.imshow(mask01, cmap='gray')
plt.axis('off')

plt.subplot(1, 2, 2)
plt.title('Mask after ROI')
plt.imshow(mask_roi, cmap='gray')
plt.axis('off')

plt.show()


### **Task 4. Mask Refinement**

After applying the ROI, the lane mask often still contains numerous errors due to model prediction:

- Broken lanes (gaps),
- Small white spots appear (noise),
- Multiple disconnected regions that are not the actual lane.

The goal of the mask refinement step is:

> To transform the initial raw mask into a single—clean—continuous—stable lane region for the geometry step.

To achieve this, the pipeline performs 3 main operations:
1. Normalize the mask to binary form {0,1}
2. Apply morphological closing to connect broken lane segments
3. Retain the region (contour) with the largest area

##### **Normalize Mask to Binary Form**

In many cases, the model might return a mask with various values (0 or >0).
We convert it into binary form:

$$
m(x,y) =
\begin{cases}
1, & \text{when } M(x,y) > 0, \\
0, & \text{otherwise}.
\end{cases}
$$

This ensures the mask is "clean" data before performing morphological operations.

##### **Morphological Closing – Connecting Broken Lane Segments**

##### Mathematical Definition

The closing operation of a binary image $(m)$ with a kernel $(K)$ is:

$$
m \bullet K = (m \oplus K) \ominus K
$$

Where:

- $(m \oplus K)$: dilation – expands the white region (connecting nearby points) 
- $(m \ominus K)$: erosion – shrinks the white region (removing bulges caused by dilation) 

Intuitive Meaning:

> Dilation fills small gaps $\to$ Erosion shrinks back $\to$ the result retains the large outlines but no longer has small gaps.

##### Vertical Kernel (5 × 25)

The pipeline uses a kernel:

- Width of 5 pixels $\to$ prevents adjacent lanes from sticking together
- Height of 25 pixels $\to$ long enough to connect gaps along the vertical direction (the lane's direction)

Kernel:

```python
ker_vertical = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 25))

#### **Practice – Refine Mask**

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt

def refine_mask01(mask01: np.ndarray) -> np.ndarray:
    """
    Refine the binary lane mask by applying morphological closing
    and removing small connected components.
    Students will fill in missing code segments.
    """
# -------------------------------------------------------------
# ...YOUR CODE HERE...

    # TODO: Convert (0/1) mask into uint8 0–255
    mask_uint8 = ...

    # -------------------------------------------------------------
    # TODO: Apply morphological closing with a vertical kernel (5×25)
    kernel = ...
    closed = ...

    # -------------------------------------------------------------
    # TODO: Connected components: keep only the largest 1–2 components
    num_labels, labels, stats, centroids = ...
    areas = ...
    sorted_idx = ...
    mask_filtered = ...
    # Hint: loop over sorted_idx[:2] and assign 255 to the largest components
    
    # -------------------------------------------------------------
    # TODO: Convert refined mask back to binary (0/1)
    refined = ...
    return refined


# -------------------------------------------------------------
# TODO: Apply refine_mask01() to the ROI mask
refined_mask = ...


# ... END ...
# -------------------------------------------------------------


# Visualization
plt.figure(figsize=(8, 4))

plt.subplot(1, 2, 1)
plt.title('Mask after ROI')
plt.imshow(mask_roi, cmap='gray')
plt.axis('off')

plt.subplot(1, 2, 2)
plt.title('Refined mask')
plt.imshow(refined_mask, cmap='gray')
plt.axis('off')

plt.tight_layout()
plt.show()


### **Task 5. Saving the Image**

In [None]:
import os
import cv2
import time

# mask_refined: the final refined mask (binary 0/1 or 0/255)

# 1) Create the folder result_refine if it does not already exist
save_dir = "result_refine"
os.makedirs(save_dir, exist_ok=True)

# 2) Generate an output filename using a timestamp to avoid overwriting
filename = f"refined_{int(time.time())}.png"
save_path = os.path.join(save_dir, filename)

# 3) Convert the mask to 0–255 format before saving
mask_to_save = (refined_mask * 255).astype("uint8")

# 4) Save the image
cv2.imwrite(save_path, mask_to_save)

print("Refined mask has been saved at:", save_path)


### Summary
In this lab, you implemented two important pre-processing steps for lane segmentation masks: applying a region of interest and refining the mask with morphological operations. A clean mask is critical for accurate geometry estimation and reliable lane following.