# 1. Theoretical Background

---

## A. Object Detection (Bounding Box) vs Instance Segmentation (Pixel Mask)

In early Computer Vision applications, the most widely used method was **Object Detection**, where the model predicts:

1. The **location** of an object using a bounding box, and  
2. The **class label** of that object.

A bounding box is only an approximate area—it is always rectangular and does not follow the true contour of the object.  
This approach works well for simple objects, but becomes limiting when dealing with shapes that are thin, irregular, or overlapping, such as steel beam profiles.

To overcome this, we use **Instance Segmentation**, which predicts **pixel-level masks** that match the real shape of each object.

| Method | What It Predicts | Strengths | Limitations |
|--------|------------------|-----------|-------------|
| Object Detection | Bounding box + label | Fast and efficient | Cannot capture exact object shape |
| Instance Segmentation | Pixel mask + label | Highly detailed and accurate | Heavier than detection |

In short, segmentation allows the model not only to *locate* the object but also to *trace its outline*.

---

## B. How Models Predict Segmentation Masks

Models such as **YOLOv8-Seg** create masks through a structured set of steps rather than drawing them directly:

1. **Feature Extraction**  
   The image is passed through a backbone network (e.g., CSPDarknet), where the model learns patterns such as edges, corners, textures, and shapes.

2. **Multi-scale Feature Fusion**  
   Features from different depths of the network are combined.  
   This helps the model understand both fine details and broader context.

3. **Prototype Mask Generation**  
   YOLOv8 produces a set of base masks called **prototypes**, which act like “universal components” the model can reuse.

4. **Mask Coefficient Prediction**  
   For each detected object, the model predicts coefficients that determine how to combine these prototypes.

5. **Constructing the Final Mask**  
   The final mask is produced by mixing prototype masks using those coefficients.  
   This allows precise boundaries even when objects overlap or have complex shapes—common in steel beam imagery.

In essence:  
**The model does not paint masks from scratch; it assembles them by blending learned prototype shapes.**

---

## C. Transfer Learning in Segmentation

Training a segmentation model from scratch would require massive datasets and very long training time.  
To make the process efficient, we use **transfer learning**.

The base model (e.g., YOLOv8-Seg) is pre-trained on a large dataset like **COCO**, so it already understands general visual patterns, including:

- edges and contours  
- common object shapes  
- texture differences  
- how to distinguish foreground and background  

When we fine-tune the model on a steel-beam dataset, we are essentially adapting this prior knowledge to recognize new, domain-specific shapes such as:

- I-beam  
- L-beam  
- O-beam  
- O-pipe  
- T-beam  
- square-bar  
- square-pipe  

### Benefits of Transfer Learning
- Dramatically faster training  
- Requires fewer labeled images  
- More stable learning from early epochs  
- Produces cleaner, more accurate masks  

After fine-tuning, the model becomes specialized and can reliably segment steel-beam profiles in new images.

---

## 2. Project Implementation

### 2.1 Dataset Setup and Preprocessing

In this project, I use the **steel beam instance segmentation** dataset from Roboflow, exported in the **YOLOv8 format**.  
After extraction, the folder structure becomes:

- `datasets/yolov8data/train/images`  
- `datasets/yolov8data/train/labels`  
- `datasets/yolov8data/valid/images`  
- `datasets/yolov8data/valid/labels`  
- `datasets/yolov8data/data.yaml`

YOLOv8 automatically handles the **image resizing** and **augmentation** based on the `imgsz` parameter and training configuration.

A few important preprocessing notes:

- All images are resized to a fixed resolution (for example `320 × 320`) during training.  
- Mask labels stored in YOLOv8 polygon `.txt` format are resized consistently along with the images.  
- Augmentations used include horizontal flips, light brightness/contrast adjustments, and random cropping provided by YOLOv8.

The following section contains code to confirm that the dataset is correctly loaded and to display sample images along with their annotations.


In [None]:
from pathlib import Path
from ultralytics import YOLO
import matplotlib.pyplot as plt
import cv2

DATA_YAML = "../datasets/yolov8data/data.yaml"

data_path = Path("../datasets/yolov8data")
print("Ada data.yaml? ", (data_path / "data.yaml").exists())
print("Folder train images: ", (data_path / "train" / "images").exists())
print("Folder valid images: ", (data_path / "valid" / "images").exists())

Example of displaying one training image along with its label (optional):

In [None]:
import random

train_img_dir = data_path / "train" / "images"
sample_img = random.choice(list(train_img_dir.glob("*.jpg")))
print("Contoh gambar:", sample_img.name)

img = cv2.imread(str(sample_img))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(5,5))
plt.imshow(img)
plt.axis("off")
plt.title("Contoh Gambar Train")
plt.show()

### 2.2 Model Training

For this project, I used **YOLOv8-seg (n version / yolov8n-seg)** as the initial backbone.  
The reasons are:

- The available GPU only has 4 GB of VRAM, so the lightweight `n` variant is safer for both training and inference.
- Training time is shorter while still allowing the model to learn patterns from all 7 steel profile classes.

Main training configuration (modifiable):

- `imgsz` : 320  
- `epochs`: 40  
- `batch` : 4  
- `optimizer`: default YOLO  
- `device`: CUDA  
- `data`   : `data.yaml` from the dataset  

Below is the training code snippet used.

In [None]:
from ultralytics import YOLO
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

DATA_YAML = "../datasets/yolov8data/data.yaml"

model = YOLO("yolov8n-seg.pt") 

results_train = model.train(
    data=DATA_YAML,
    imgsz=320,
    epochs=50,
    batch=4,
    lr0=1e-3,
    patience=10,
    device=device,
    workers=0,
    name="steelbeam-nseg-320-b4-e50"
)

### 2.3 Evaluation: mAP and IoU

The metrics used in this project are not simple accuracy, but:

- **mAP@0.5** (mean Average Precision at an IoU threshold of 0.5)  
- **mAP@0.5:0.95** (the average mAP across multiple IoU thresholds)  
- In addition, I also inspected the IoU values and visually checked the mask quality on the validation data.

The evaluation was performed using the `model.val()` command provided by YOLOv8.

In [None]:
best_model = YOLO("runs/segment/steelbeam-upgrade/weights/best.pt") 

metrics = best_model.val(
    data=DATA_YAML,
    imgsz=320,
    device=device,
    split="val"
)

print("mAP50 (box) :", float(metrics.box.map50))
print("mAP50 (mask):", float(metrics.seg.map50))
print("mAP50-95 (mask):", float(metrics.seg.map))

### 2.4 Selecting the Best Model

Several experimental runs were compared using different epochs, image sizes, and backbone variants.  
Model selection was based on three criteria:

1. **Segmentation Performance**  
   The model with the most stable **mAP@0.5 (mask)** across validation images was prioritized.

2. **Visual Mask Quality**  
   Beyond numerical scores, visual results were examined to check:
   - boundary sharpness  
   - mask consistency across classes  
   - ability to segment thin structures or stacked beams

3. **Trade-off Between Precision and Speed**  
   Since the model would later be deployed in a Streamlit app, inference speed and VRAM usage also influenced the final choice.  
   Smaller backbones (like YOLOv8n-Seg) offered a good balance between segmentation quality and runtime efficiency.

The final selected model was exported as **`best.pt`** and used for deployment in the next stage.

---

## 3. Deployment with Streamlit

To make the model easier to test and use, I deployed the best-performing model as a simple web application using **Streamlit**.

The main steps were:

1. Creating an `app.py` file inside the `src/` folder.  
2. Inside `app.py`, I loaded the best checkpoint (`best.pt`) from  
   `runs/segment/steelbeam-upgrade/weights/`.  
3. The application provides several features:
   - Uploading an image (JPG/PNG format)
   - Displaying the original image on the left side
   - Running YOLOv8-Seg inference in the background
   - Showing the segmentation results (overlay mask + labels) on the right side
   - Adjustable settings such as confidence threshold, IoU threshold, and image size

How to run the app (it depends on your environment and path):

```bash
cd STEELBEAM
.\.venv\Scripts\activate      # activate the environment
streamlit run src/app.py