# Modell 2 mit pretrained YOLO

## Theory
 - Understanding YOLOv8: https://medium.com/@melissa.colin/yolov8-explained-understanding-object-detection-from-scratch-763479652312
 - Slightly more detailed: https://medium.com/@vindyalenawala/yolov8-architecture-a-detailed-overview-5e2c371cf82a
 - Architecture of YOLOv8: https://arxiv.org/html/2408.15857v1#:~:text=The%20architecture%20of%20YOLOv8%20is%20structured%20around,minimize%20computational%20overhead%20while%20retaining%20representational%20power.
 - CSPNet: https://arxiv.org/pdf/1911.11929v1
 - offizielle YOLOv8: https://yolov8.org/yolov8-architecture/

## Flood Area Segmentation mit YOLO

Wir haben den Code in Kaggle gefunden: https://www.kaggle.com/code/myoungjinson/flood-area-segmentation/notebook#YOLOv8



In [1]:
import os
#os.environ.pop('MPLBACKEND', None)
import cv2
import numpy as np
from glob import glob
import matplotlib.pyplot as plt
from PIL import Image

Daten laden (gleiche Daten wie in U-Net Modell von Kaggle) 

https://www.youtube.com/watch?v=diZj_nPVLkE

In [4]:
import kagglehub
from pathlib import Path

# collect data
path = kagglehub.dataset_download("faizalkarim/flood-area-segmentation")
base_dir = Path(path)

input_dir_masks = os.listdir(os.path.join(base_dir, "Mask"))
input_dir_images = os.listdir(os.path.join(base_dir, "Image"))

#convert Mask into label for YOLOv8
#target folder to create labels
output_dir_labels = os.path.join(base_dir, "labels")
if not os.path.exists(output_dir_labels):
    os.makedirs(output_dir_labels)


#### Datenvorbereitung

Create lables from masks -> polygons (contours) in txt


In [None]:
for j in os.listdir(input_dir_masks):
    image_path = os.path.join(input_dir_masks, j)
    # load the binary mask and get its contours
    mask = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    # convert mask in pure black and white images
    _, mask = cv2.threshold(mask, 1, 255, cv2.THRESH_BINARY)

    H, W = mask.shape
    #find the boundaries of the white area and the outlines  
    contours, hiearchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # convert the contours to polygons
    polygons = []
    for cnt in contours:
        if cv2.contourArea(cnt) > 200:
            polygon = []
            for point in cnt:
                x, y = point [0]
                polygon.append(x / W)
                polygon.append(y / H)
            polygons.append(polygon)

    # print the polygons
    with open('{}.txt'.format(os.path.join(output_dir_labels, j)[:-4]), 'w') as f:
        for polygon in polygons:
            for p_, p in enumerate(polygon):
                if p_ == len(polygon) - 1:
                    f.write('{}\n'.format(p))
                elif p_ == 0:
                    f.write('0 {} '.format(p))
                else:
                    f.write('{} '.format(p))
        f.close()

Split in Train, Validation und Test - wird von U-Net Datenvorbereitung übernommen.

In [17]:
from sklearn.model_selection import train_test_split
import shutil

train_imgs, temp_imgs, train_masks, temp_masks = train_test_split(
    img_paths, mask_paths,
    train_size=0.7,
    random_state=42,
    shuffle=True
)

val_imgs, test_imgs, val_masks, test_masks = train_test_split(
    temp_imgs, temp_masks,
    train_size=0.5,
    random_state=42,
    shuffle=True
)

# copy in directories
def copy_subset(img_list, mask_list, subset):
    for img, mask in zip(img_list, mask_list):
        shutil.copy(img,  f"dataset/images/{subset}/{img.name}")
        shutil.copy(mask, f"dataset/labels/{subset}/{mask.stem}.png")

copy_subset(train_imgs, train_masks, 'train')
copy_subset(val_imgs,   val_masks,   'val')
copy_subset(test_imgs,  test_masks,  'test')

# 7) Kontrolle
print("Train images:", len(os.listdir("dataset/images/train")))
print("Train masks: ", len(os.listdir("dataset/labels/train")))
print("Val images:  ", len(os.listdir("dataset/images/val")))
print("Val masks:   ", len(os.listdir("dataset/labels/val")))
print("Test images: ", len(os.listdir("dataset/images/test")))
print("Test masks:  ", len(os.listdir("dataset/labels/test")))

Train images: 232
Train masks:  435
Val images:   74
Val masks:    101
Test images:  44
Test masks:   44


Masken müssen wir für YOLO in text files umwandeln

In [None]:
import matplotlib.pyplot as plt
from PIL import Image

example_mask = Image.open("dataset/labels/train/0.png")
plt.imshow(example_mask, cmap='gray')
plt.title("Sample Mask")
plt.axis("off")
plt.show()

## YOLOv8

https://medium.com/@melissa.colin/yolov8-explained-understanding-object-detection-from-scratch-763479652312

steht für "You Only Look Once" -> für schnelle Inferenzzeiten

wird hier für Segementierung und Objektlokalisierung verwendet -> YOLOv8 ermöglicht Pixel-genaue Masken vorhersagen (vergleichbar mit U-NET, aber zusätzlich mit Objektdetektion)

trainiert wird mit der Library ultralytics; außerdem wird hier ein vortrainiertes Modell geladen und verwendet (verfeinert) -> YOLO("yolo11n-seg.yaml").load("yolo11n.pt")

### Architektur

![U-Net Architektur](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*frzqTxCGA8DHEHMeHug7JQ.png)

#### Backbone – Merkmalsextraktion

Das Backbone extrahiert grundlegende Merkmale wie Kanten, Formen und Texturen aus dem Eingabebild. YOLOv8 verwendet eine modifizierte CSPDarknet53-Architektur, die folgende Techniken kombiniert:

- Convolutional Layers: für Merkmalsextraktion
- Residual Blocks: verhindern den Informationsverlust in tiefen Netzen
- CSPNet (Cross Stage Partial Networks): reduziert Rechenaufwand und verbessert den Gradientendurchfluss
- Darknet53 / CSPDarknet53: als Grundlage des Feature-Backbones

#### Neck – Merkmalsfusion auf mehreren Skalen

Der Neck kombiniert Merkmale unterschiedlicher Auflösung (also unterschiedlicher Conv Layer zB) und verwendet:

- FPN (Feature Pyramid Networks): verarbeitet Bildinformationen auf verschiedenen Skalen
- PANet (Path Aggregation Network): verbessert den Informationsfluss zwischen Ebenen, besonders für kleinere Objekte

Die Feature-Maps (zB P3, P4, P5) werden auf verschiedenen Ebenen zusammengeführt, um die Erkennung von Objekten unterschiedlicher Größe zu ermöglichen.

#### Head – Erkennung & Klassifikation

Der Head erzeugt:

- Bounding Boxes
- Konfidenzwerte
- Klassenvorhersagen



In [None]:
import numpy as np

def mask_to_yolo_txt(mask_path, txt_path, class_id=0):
    mask = cv2.imread(str(mask_path), 0)
    if mask is None:
        print(f"Failed to read {mask_path}")
        return False

    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    if not contours:
        print(f"No contours found in {mask_path}")
        return False
    h, w = mask.shape

    with open(txt_path, "w") as f:
        for contour in contours:
            if len(contour) < 3:
                continue
            contour = contour.squeeze()
            norm = contour / [w, h]
            norm = norm.flatten()
            norm = ' '.join(map(str, norm))
            f.write(f"{class_id} {norm}\n")

    return True


for subset in ['train', 'val']:
    mask_dir = Path(f"dataset/labels/{subset}")
    for mask_path in mask_dir.glob("*.png"):
        txt_path = mask_path.with_suffix(".txt")  # same folder, same name
        success = mask_to_yolo_txt(mask_path, txt_path)
        if success:
            os.remove(mask_path)  # only remove PNG if txt was created
        else:
            if txt_path.exists():
                os.remove(txt_path)  # remove empty or invalid txt


example_txts = list(Path("dataset/labels/val").glob("*.txt"))
if example_txts:
    print(f"\nSuccessfully created: {example_txts[0]}")
    with open(example_txts[0], "r") as f:
        print("Example content:\n", f.read())
else:
    print("No YOLO label files found.")


print("Train labels:", len(os.listdir("dataset/labels/train")))
print("Val labels:  ", len(os.listdir("dataset/labels/val")))

jetzt ein data.yaml erstellen

In [None]:
import yaml

data_yaml = {
    'path': 'dataset',                  # root directory
    'train': 'images/train',            # relative to dataset/
    'val': 'images/val',                # relative to dataset/
    'nc': 1,                            # number of classes
    'names': ['flood']                  # list of class names
}

with open('data.yaml', 'w') as f:
    yaml.dump(data_yaml, f)

!cat data.yaml

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n-seg.yaml")  # build a new model from YAML
model = YOLO("yolo11n-seg.pt")  # load a pretrained model (recommended for training)
model = YOLO("yolo11n-seg.yaml").load("yolo11n.pt")  # build from YAML and transfer weights

# Train the model with coco8-seg.yaml
results = model.train(data="data.yaml", epochs=100, imgsz=640, batch=8)

In [None]:
model.info()

In [None]:
results

In [None]:
model.info(verbose=True)

In [None]:
print(model.model)

In [None]:
!pip install torchinfo

In [None]:
from torchinfo import summary
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.model.to(device)

summary(model.model, input_size=(1, 3, 640, 640), col_names=["input_size", "output_size", "num_params", "trainable"])

In [None]:
import cv2
import matplotlib.pyplot as plt

# 1. load image
val_image_path = "dataset/images/val/1002.jpg"
img = cv2.imread(val_image_path)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 2. load mask
path = kagglehub.dataset_download("faizalkarim/flood-area-segmentation")
base_dir = Path(path)
mask_filename = "1002.png"
mask_path = base_dir / "Mask" / mask_filename
mask = Image.open(mask_path).convert("L")
mask = np.array(mask)

# 3. prediction of model
results = model.predict(source=val_image_path, save=False)
pred_mask = results[0].masks.data[0].cpu().numpy()

# 4. plot
plt.figure(figsize=(15, 5))

plt.subplot(1, 3, 1)
plt.title("Original Image")
plt.imshow(img_rgb)
plt.axis('off')

plt.subplot(1, 3, 2)
plt.title("Ground Truth Mask")
plt.imshow(mask, cmap='gray')
plt.axis('off')

plt.subplot(1, 3, 3)
plt.title("Predicted Mask")
plt.imshow(pred_mask, cmap='gray')
plt.axis('off')

plt.tight_layout()
plt.show()
