# BAB 8: Image Segmentation (Segmentasi Gambar)

## Ringkasan Lengkap

### Pendahuluan

Bab ini membahas tentang **image segmentation** (segmentasi gambar), yaitu tugas computer vision yang lebih kompleks dibandingkan klasifikasi gambar. Berbeda dengan klasifikasi yang hanya menentukan apakah objek ada dalam gambar, segmentasi gambar mengenali multiple objek sekaligus dan menentukan **lokasi** mereka dalam gambar.

### Konsep Utama

#### 1. Jenis-Jenis Segmentasi

Ada dua kategori utama dalam image segmentation:

- **Semantic Segmentation**: Algoritma hanya mengidentifikasi kategori objek yang berbeda. Jika ada beberapa orang dalam gambar, semua pixel yang bersesuaian akan ditandai dengan kelas yang sama.

- **Instance Segmentation**: Algoritma mengidentifikasi setiap objek secara terpisah. Jika ada beberapa orang, pixel untuk setiap orang direpresentasikan dengan kelas unik. Instance segmentation dianggap lebih sulit dari semantic segmentation.

#### 2. Dataset: PASCAL VOC 2012

Bab ini menggunakan dataset PASCAL VOC 2012 yang berisi:
- 22 kelas objek berbeda (termasuk background)
- Gambar input standar RGB
- Gambar target yang setiap pixelnya memiliki warna dari palette yang telah ditentukan
- Objek seperti: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train, tv/monitor

#### 3. Karakteristik Data

Data segmentasi berbeda dari klasifikasi:
- **Input**: Gambar RGB standar
- **Target**: Gambar di mana setiap pixel memiliki warna dari palette warna yang telah ditentukan
- Target disimpan sebagai **palettized images** untuk efisiensi memori
- White pixels merepresentasikan batas objek atau objek tidak dikenal

### Pipeline Data dengan TensorFlow

#### Komponen Pipeline tf.data

Pipeline data yang dibangun melakukan:

1. **Mendapatkan filenames** untuk subset tertentu (training, validation, testing)
2. **Membaca gambar** dari disk
3. **Preprocessing**: normalisasi, resize, cropping
4. **Augmentasi data** (hanya untuk training):
   - Random horizontal flipping
   - Random hue adjustment (±10%)
   - Random brightness adjustment (±10%)
   - Random contrast adjustment (±20%)
5. **Batching** data dalam batch kecil
6. **Optimasi** menggunakan caching dan prefetching

#### Teknik Optimasi Pipeline

- **Caching**: Menyimpan data di memory setelah loading pertama kali
- **Prefetching**: Menggunakan background threads untuk load data sambil model training

### Arsitektur Model: DeepLab v3

#### Komponen Utama

DeepLab v3 terdiri dari beberapa komponen:

1. **Backbone: ResNet-50 (Pretrained)**
   - Diunduh dari ImageNet
   - Digunakan hingga conv4 block
   - Conv5 block dimodifikasi dengan atrous convolution

2. **Atrous Convolution (Dilated Convolution)**
   - Convolution dengan "holes" (lubang) di antara parameter
   - Dikontrol oleh parameter **dilation rate**
   - Meningkatkan receptive field tanpa menambah parameter
   - Membantu mengatasi masalah output yang mengecil akibat stride/pooling

3. **Atrous Spatial Pyramid Pooling (ASPP)**
   - Modul aggregasi pyramidal yang mengumpulkan informasi multi-scale
   - Komponen:
     - 1×1 convolution (256 filters)
     - 3×3 convolution dengan dilation rate=6 (256 filters)
     - 3×3 convolution dengan dilation rate=12 (256 filters)
     - 3×3 convolution dengan dilation rate=18 (256 filters)
     - Global average pooling → 1×1 conv → bilinear upsampling
   - Semua output dikoncatenasi

4. **Bilinear Upsampling**
   - Memperbesar output ke ukuran yang diinginkan
   - Menggunakan interpolasi bilinear

### Loss Functions dan Metrics

#### Loss Functions

1. **Weighted Categorical Cross-Entropy Loss**
   - Mengatasi class imbalance dengan memberikan weight berbeda per kelas
   - Sparse version digunakan (tidak perlu one-hot encoding)
   - Computed from logits untuk stabilitas gradient

2. **Dice Loss**
   - Fokus pada memaksimalkan intersection antara prediksi dan target
   - Formula: \( 1 - \frac{2 \times \text{Intersection}}{\text{Union} + \text{Intersection}} \)
   - Intersection: element-wise multiplication
   - Union: element-wise addition

3. **Combined Loss**: CE + Dice Loss

#### Evaluation Metrics

1. **Pixel Accuracy**
   - Mengukur persentase pixel yang diprediksi dengan benar
   - Formula: \( \frac{\text{Correct Pixels}}{\text{Total Pixels}} \)

2. **Mean Accuracy**
   - Average dari per-class accuracy
   - Lebih baik untuk menangani class imbalance

3. **Mean IoU (Intersection over Union)**
   - Metric paling populer untuk segmentasi
   - Formula: \( \frac{\text{True Positives}}{\text{True Positives + False Positives + False Negatives}} \)

### Training dan Evaluasi

#### Hyperparameters

- **Input size**: 384 × 384
- **Batch size**: Sesuai kebutuhan memory
- **Optimizer**: Adam atau SGD
- **Learning rate**: Dengan learning rate scheduling
- **Data augmentation**: Enabled untuk training, disabled untuk validation/test

#### Hasil

Model DeepLab v3 mencapai akurasi sekitar **62% mean IoU** pada Pascal VOC 2012 dataset, menunjukkan performa yang sangat baik.

### Kesimpulan Penting

1. **Segmentation vs Classification**: Segmentation adalah dense prediction task (setiap pixel diprediksi), sedangkan classification adalah sparse prediction (satu label per gambar).

2. **Transfer Learning**: Pretrained models (ResNet-50 dari ImageNet) sangat membantu sebagai backbone.

3. **Multi-scale Information**: ASPP module penting untuk menggabungkan informasi dari berbagai scale (fine-grained dan coarse).

4. **Data Pipeline**: tf.data API powerful untuk membuat pipeline yang efisien dan scalable.

5. **Custom Components**: TensorFlow/Keras memungkinkan implementasi custom loss functions dan metrics dengan mudah.

---

## Program-Program Penting

### 1. Download dan Extract Data

```python
import os
import requests
import tarfile

# Retrieve the data
if not os.path.exists(os.path.join('data','VOCtrainval_11-May-2012.tar')):
    url = "http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar"
    
    # Get the file from web
    r = requests.get(url)      
    if not os.path.exists('data'):
        os.mkdir('data')
    
    # Write to a file
    with open(os.path.join('data','VOCtrainval_11-May-2012.tar'), 'wb') as f:
        f.write(r.content)               
else:
    print("The tar file already exists.")
    
if not os.path.exists(os.path.join('data', 'VOCtrainval_11-May-2012')):    
    with tarfile.open(os.path.join('data','VOCtrainval_11-May-2012.tar'), 'r') as tar:
        tar.extractall('data')
else:
    print("The extracted data already exists")
```

### 2. Rekonstruksi Gambar dari Palettized Image

```python
import numpy as np
from PIL import Image
from PIL.PngImagePlugin import PngImageFile

def rgb_image_from_palette(image):
    """ This function restores the RGB values form a palletted PNG image """
    palette = image.get_palette()       
    palette = np.array(palette).reshape(-1,3)    
    
    if isinstance(image, PngImageFile):
        h, w = image.height, image.width     
        # Squash height and width dimensions (makes slicing easier)
        image = np.array(image).reshape(-1)   
    elif isinstance(image, np.ndarray):    
        h, w = image.shape[0], image.shape[1]
        image = image.reshape(-1)
        
    rgb_image = np.zeros(shape=(image.shape[0],3))                 
    rgb_image[(image != 0),:] = palette[image[(image != 0)], :]   
    rgb_image = rgb_image.reshape(h, w, 3)   
    
    return rgb_image
```

### 3. Generator untuk Mendapatkan Filenames

```python
import pandas as pd
import random

def get_subset_filenames(orig_dir, seg_dir, subset_dir, subset, random_seed=42):
    """ Get the filenames for a given subset (train/valid/test)"""
    if subset.startswith('train'):
        ser = pd.read_csv(                         
            os.path.join(subset_dir, "train.txt"),
            index_col=None, header=None, squeeze=True
        ).tolist()
    elif subset.startswith('val') or subset.startswith('test'):
        random.seed(random_seed)   
        ser = pd.read_csv(                   
            os.path.join(subset_dir, "val.txt"),
            index_col=None, header=None, squeeze=True
        ).tolist()

        random.shuffle(ser)   
        if subset.startswith('val'):
            ser = ser[:len(ser)//2]     
        else:
            ser = ser[len(ser)//2:]       
    else:
        raise NotImplementedError("Subset={} is not recognized".format(subset))
    
    orig_filenames = [os.path.join(orig_dir,f+'.jpg') for f in ser]   
    seg_filenames = [os.path.join(seg_dir, f+'.png') for f in ser]  
    
    for o, s in zip(orig_filenames, seg_filenames):
        yield o, s
```

### 4. Random Crop atau Resize

```python
import tensorflow as tf

def randomly_crop_or_resize(x, y, input_size, resize_to_before_crop, augmentation=False):
    """ Randomly crops or resizes the images """
    
    def rand_crop(x, y):
        """ Randomly crop images after enlarging them """
        x = tf.image.resize(x, resize_to_before_crop, method='bilinear')   
        y = tf.cast(                                     
                tf.image.resize(
                    tf.transpose(y,[1,2,0]),           
                    resize_to_before_crop, method='nearest'
                ),
                'float32'
            )          
        offset_h = tf.random.uniform(
            [], 0, x.shape[0]-input_size[0], dtype='int32'
        )   
        offset_w = tf.random.uniform(
            [], 0, x.shape[1]-input_size[1], dtype='int32'
        )   
        x = tf.image.crop_to_bounding_box(
            image=x,
            offset_height=offset_h, offset_width=offset_w,
            target_height=input_size[0], target_width=input_size[1]  
        )
        y = tf.image.crop_to_bounding_box(
            image=y,
            offset_height=offset_h, offset_width=offset_w,
            target_height=input_size[0], target_width=input_size[1]  
        )
        return x, y
    
    def resize(x, y):
        """ Resize images to a desired size """
        x = tf.image.resize(x, input_size, method='bilinear')   
        y = tf.cast(
                tf.image.resize(
                    tf.transpose(y,[1,2,0]),                                        
                    input_size, method='nearest'                
                ),
                'float32'
            )          
        return x, y
    
    rand = tf.random.uniform([], 0.0, 1.0)    
    
    if augmentation and \
        (input_size[0] < resize_to_before_crop[0] or \
         input_size[1] < resize_to_before_crop[1]):
        x, y = tf.cond(
                rand < 0.5,                
                lambda: rand_crop(x, y),
                lambda: resize(x, y)
                )
    else:
        x, y = resize(x, y)    
    
    return x, y
```

### 5. Augmentasi Data

```python
def randomly_flip_horizontal(x, y):
    """ Randomly flip images horizontally. """
    rand = tf.random.uniform([], 0.0, 1.0)   
    
    def flip(x, y):
        return tf.image.flip_left_right(x), tf.image.flip_left_right(y)   
    
    x, y = tf.cond(rand < 0.5, lambda: flip(x, y), lambda: (x, y))     
    return x, y
```

### 6. Pipeline tf.data Lengkap

```python
import tensorflow as tf
from functools import partial

def load_image_func(image):
    """ Load the image given a filename """
    img = np.array(Image.open(image))        
    return img

def fix_shape(x, y, size):
    """ Set the shape of the input/target tensors """
    x.set_shape((size[0], size[1], 3))
    y.set_shape((size[0], size[1], 1))
    return x, y

def get_subset_tf_dataset(
    subset_filename_gen_func, batch_size, epochs,
    input_size=(256, 256), output_size=None, resize_to_before_crop=None,
    augmentation=False, shuffle=False
):
    
    if augmentation and not resize_to_before_crop:
        raise RuntimeError(
            "You must define resize_to_before_crop when augmentation is enabled."
        )
        
    # Create dataset from generator
    filename_ds = tf.data.Dataset.from_generator(
        subset_filename_gen_func, output_types=(tf.string, tf.string)
    )
    
    # Load images
    image_ds = filename_ds.map(lambda x,y: (
        tf.image.decode_jpeg(tf.io.read_file(x)),
        tf.numpy_function(load_image_func, [y], [tf.uint8])
    )).cache()
    
    # Normalize
    image_ds = image_ds.map(lambda x, y: (tf.cast(x, 'float32')/255.0, y))
    
    # Random crop or resize
    image_ds = image_ds.map(lambda x,y: randomly_crop_or_resize(
        x, y, input_size, resize_to_before_crop, augmentation
    ))
    
    # Fix shape
    image_ds = image_ds.map(lambda x,y: fix_shape(x, y, target_size=input_size))
    
    # Augmentation
    if augmentation:    
        image_ds = image_ds.map(lambda x, y: randomly_flip_horizontal(x,y))
        image_ds = image_ds.map(lambda x, y: (tf.image.random_hue(x, 0.1), y))
        image_ds = image_ds.map(lambda x, y: (tf.image.random_brightness(x, 0.1), y))
        image_ds = image_ds.map(lambda x, y: (tf.image.random_contrast(x, 0.8, 1.2), y))
    
    # Resize output if needed
    if output_size:
        image_ds = image_ds.map(
            lambda x, y: (x, tf.image.resize(y, output_size, method='nearest'))
        )
    
    # Shuffle
    if shuffle:
        image_ds = image_ds.shuffle(buffer_size=batch_size*5)
    
    # Batch and repeat
    image_ds = image_ds.batch(batch_size).repeat(epochs)
    
    # Prefetch
    image_ds = image_ds.prefetch(tf.data.experimental.AUTOTUNE)
    
    # Squeeze target
    image_ds = image_ds.map(lambda x, y: (x, tf.squeeze(y)))
    
    return image_ds
```

### 7. Membuat Instances Pipeline Train/Val/Test

```python
# Define directories
orig_dir = os.path.join(
    'data', 'VOCtrainval_11-May-2012', 'VOCdevkit', 'VOC2012', 'JPEGImages'
)
seg_dir = os.path.join(
    'data', 'VOCtrainval_11-May-2012', 'VOCdevkit', 'VOC2012', 'SegmentationClass'
)
subset_dir = os.path.join(
    'data', 'VOCtrainval_11-May-2012', 'VOCdevkit', 'VOC2012', 'ImageSets', 'Segmentation'
)

# Create partial functions
partial_subset_fn = partial(
    get_subset_filenames, orig_dir=orig_dir, seg_dir=seg_dir, subset_dir=subset_dir
)

train_subset_fn = partial(partial_subset_fn, subset='train')
val_subset_fn = partial(partial_subset_fn, subset='val')
test_subset_fn = partial(partial_subset_fn, subset='test')

# Parameters
input_size = (384, 384)
batch_size = 8
epochs = 50

# Create pipelines
tr_image_ds = get_subset_tf_dataset(
    train_subset_fn, batch_size, epochs,
    input_size=input_size, resize_to_before_crop=(444,444),
    augmentation=True, shuffle=True
)

val_image_ds = get_subset_tf_dataset(
    val_subset_fn, batch_size, epochs,
    input_size=input_size,
    shuffle=False
)

test_image_ds = get_subset_tf_dataset(
    test_subset_fn, batch_size, 1,
    input_size=input_size,
    shuffle=False
)
```

### 8. Level 3 Block (Atrous Convolution Block)

```python
from tensorflow.keras import layers

def block_level3(inp, filters, kernel_size, rate, block_id, convlayer_id, activation=True):
    """ A single convolution layer with atrous convolution and batch normalization """
    
    conv5_block_conv_out = layers.Conv2D(
        filters, kernel_size, dilation_rate=rate, padding='same',
        name='conv5_block{}_{}_conv'.format(block_id, convlayer_id)
    )(inp)
    
    conv5_block_bn_out = layers.BatchNormalization(
        name='conv5_block{}_{}_bn'.format(block_id, convlayer_id)
    )(conv5_block_conv_out)
    
    if activation:
        conv5_block_relu_out = layers.Activation(
            'relu', name='conv5_block{}_{}_relu'.format(block_id, convlayer_id)
        )(conv5_block_bn_out)
        return conv5_block_relu_out
    else:
        return conv5_block_bn_out
```

### 9. Level 2 Block

```python
def block_level2(inp, rate, block_id):
    """ A level 2 resnet block that consists of three level 3 blocks """
    block_1_out = block_level3(inp, 512, (1,1), rate, block_id, 1)
    block_2_out = block_level3(block_1_out, 512, (3,3), rate, block_id, 2)
    block_3_out = block_level3(
        block_2_out, 2048, (1,1), rate, block_id, 3, activation=False
    )
    return block_3_out
```

### 10. ResNet Block dengan Atrous Convolution

```python
from tensorflow.keras.layers import Add, Activation

def resnet_block(inp, rate):
    """ Redefining a resnet block with atrous convolution """
    
    # Block 0 - for residual connection
    block0_out = block_level3(
        inp, 2048, (1,1), 1, block_id=1, convlayer_id=0, activation=False
    )
    
    # Block 1
    block1_out = block_level2(inp, 2, block_id=1)
    block1_add = Add(name='conv5_block{}_add'.format(1))([block0_out, block1_out])
    block1_relu = Activation('relu', name='conv5_block{}_relu'.format(1))(block1_add)
    
    # Block 2
    block2_out = block_level2(block1_relu, 2, block_id=2)
    block2_add = Add(name='conv5_block{}_add'.format(2))([block1_add, block2_out])
    block2_relu = Activation('relu', name='conv5_block{}_relu'.format(2))(block2_add)
    
    # Block 3
    block3_out = block_level2(block2_relu, 2, block_id=3)
    block3_add = Add(name='conv5_block{}_add'.format(3))([block2_add, block3_out])
    block3_relu = Activation('relu', name='conv5_block{}_relu'.format(3))(block3_add)
    
    return block3_relu
```

### 11. ASPP Module (Atrous Spatial Pyramid Pooling)

```python
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Lambda, UpSampling2D, Concatenate

def atrous_spatial_pyramid_pooling(inp):
    """ Defining the ASPP (Atrous spatial pyramid pooling) module """
    
    # Part A: 1x1 and atrous convolutions
    outa_1_conv = block_level3(inp, 256, (1,1), 1, '_aspp_a', 1, activation=True)
    outa_2_conv = block_level3(inp, 256, (3,3), 6, '_aspp_a', 2, activation=True)
    outa_3_conv = block_level3(inp, 256, (3,3), 12, '_aspp_a', 3, activation=True)
    outa_4_conv = block_level3(inp, 256, (3,3), 18, '_aspp_a', 4, activation=True)
    
    # Part B: global pooling
    outb_1_avg = Lambda(
        lambda x: K.mean(x, axis=[1,2], keepdims=True)
    )(inp)
    
    outb_1_conv = block_level3(
        outb_1_avg, 256, (1,1), 1, '_aspp_b', 1, activation=True
    )
    
    outb_1_up = UpSampling2D((24,24), interpolation='bilinear')(outb_1_conv)
    
    # Concatenate all outputs
    out_aspp = Concatenate()([outa_1_conv, outa_2_conv, outa_3_conv, outa_4_conv, outb_1_up])
    
    return out_aspp
```

### 12. Model DeepLab v3 Lengkap

```python
from tensorflow.keras import layers, models

def build_deeplabv3(input_size=(384, 384), num_classes=21):
    """ Build complete DeepLab v3 model """
    
    # Input layer
    inp = layers.Input(shape=input_size+(3,))
    
    # Load pretrained ResNet50
    resnet50 = tf.keras.applications.ResNet50(
        include_top=False, input_tensor=inp, pooling=None
    )
    
    # Get output up to conv4 block
    for layer in resnet50.layers:
        if layer.name == "conv5_block1_1_conv":
            break
        out = layer.output
    
    resnet50_upto_conv4 = models.Model(resnet50.input, out)
    
    # Add modified conv5 block with atrous convolution
    resnet_block4_out = resnet_block(resnet50_upto_conv4.output, 2)
    
    # Add ASPP module
    out_aspp = atrous_spatial_pyramid_pooling(resnet_block4_out)
    
    # Final convolution
    out = layers.Conv2D(num_classes, (1,1), padding='same')(out_aspp)
    
    # Bilinear upsampling to original size
    final_out = layers.UpSampling2D((16,16), interpolation='bilinear')(out)
    
    # Create model
    deeplabv3 = models.Model(resnet50_upto_conv4.input, final_out)
    
    return deeplabv3, resnet50

# Build model
target_size = (384, 384)
num_classes = 21
deeplabv3, resnet50 = build_deeplabv3(target_size, num_classes)
```

### 13. Weighted Cross-Entropy Loss

```python
def get_label_weights(y_true, y_pred):
    """ Get weights for each class based on frequency """
    
    y_true = tf.reshape(y_true, [-1])
    
    # Count occurrences of each class
    class_counts = tf.math.bincount(
        tf.cast(y_true, 'int32'),
        minlength=num_classes,
        maxlength=num_classes
    )
    
    # Calculate weights (inverse frequency)
    total_count = tf.reduce_sum(class_counts)
    class_weights = total_count / (tf.cast(class_counts, 'float32') + 1.0)
    
    # Normalize weights
    class_weights = class_weights / tf.reduce_sum(class_weights) * num_classes
    
    # Get weight for each pixel
    pixel_weights = tf.gather(class_weights, tf.cast(y_true, 'int32'))
    
    return pixel_weights

def ce_weighted_from_logits(num_classes):
    """ Weighted cross-entropy loss from logits """
    
    def loss_fn(y_true, y_pred):
        # Cast and set shape
        y_true = tf.cast(y_true, 'int32')
        y_true.set_shape([None, y_pred.shape[1], y_pred.shape[2]])
        
        # Create valid mask
        valid_mask = tf.reshape((y_true <= num_classes - 1), [-1])
        
        # Get weights
        weights = get_label_weights(y_true, y_pred)
        
        # Unwrap tensors
        y_true_unwrap = tf.reshape(y_true, [-1])
        y_pred_unwrap = tf.reshape(y_pred, [-1, num_classes])
        
        # Mask valid pixels
        y_true_unwrap = tf.boolean_mask(y_true_unwrap, valid_mask)
        y_pred_unwrap = tf.boolean_mask(y_pred_unwrap, valid_mask)
        weights = tf.boolean_mask(weights, valid_mask)
        
        # Compute loss
        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
            labels=y_true_unwrap,
            logits=y_pred_unwrap
        )
        
        # Weight the loss
        weighted_loss = loss * weights
        
        return tf.reduce_mean(weighted_loss)
    
    return loss_fn
```

### 14. Dice Loss

```python
def dice_loss_from_logits(num_classes):
    """ Defining the dice loss: 1 - [(2*i + 1)/(u + i)] """
    
    def loss_fn(y_true, y_pred):
        smooth = 1.0
        
        # Cast and set shape
        y_true = tf.cast(y_true, 'int32')
        y_true.set_shape([None, y_pred.shape[1], y_pred.shape[2]])
        
        # Get pixel weights
        y_weights = tf.reshape(get_label_weights(y_true, y_pred), [-1, 1])
        
        # Apply softmax to logits
        y_pred = tf.nn.softmax(y_pred)
        
        # Unwrap and one-hot encode
        y_true_unwrap = tf.reshape(y_true, [-1])
        y_true_unwrap = tf.cast(
            tf.one_hot(tf.cast(y_true_unwrap, 'int32'), num_classes),
            'float32'
        )
        y_pred_unwrap = tf.reshape(y_pred, [-1, num_classes])
        
        # Compute intersection and union
        intersection = tf.reduce_sum(y_true_unwrap * y_pred_unwrap * y_weights)
        union = tf.reduce_sum((y_true_unwrap + y_pred_unwrap) * y_weights)
        
        # Compute dice coefficient
        score = (2.0 * intersection + smooth) / (union + smooth)
        
        # Compute dice loss
        loss = 1.0 - score
        
        return loss
    
    return loss_fn
```

### 15. Combined Loss Function

```python
def ce_dice_loss_from_logits(num_classes):
    """ Combined cross-entropy and dice loss """
    
    ce_loss_fn = ce_weighted_from_logits(num_classes)
    dice_loss_fn = dice_loss_from_logits(num_classes)
    
    def loss_fn(y_true, y_pred):
        ce_loss = ce_loss_fn(y_true, y_pred)
        dice_loss = dice_loss_fn(y_true, y_pred)
        
        # Combine losses (equal weight)
        total_loss = ce_loss + dice_loss
        
        return total_loss
    
    return loss_fn
```

### 16. Pixel Accuracy Metric

```python
class PixelAccuracyMetric(tf.keras.metrics.Mean):
    """ Pixel accuracy metric """
    
    def __init__(self, num_classes, name='pixel_accuracy', **kwargs):
        super(PixelAccuracyMetric, self).__init__(name=name, **kwargs)
        self.num_classes = num_classes
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Set shape
        y_true.set_shape([None, y_pred.shape[1], y_pred.shape[2]])
        
        # Flatten
        y_true = tf.reshape(y_true, [-1])
        y_pred = tf.reshape(tf.argmax(y_pred, axis=-1), [-1])
        
        # Create valid mask
        valid_mask = tf.reshape((y_true <= self.num_classes - 1), [-1])
        
        # Mask valid pixels
        y_true = tf.boolean_mask(y_true, valid_mask)
        y_pred = tf.boolean_mask(y_pred, valid_mask)
        
        # Compute accuracy
        correct = tf.cast(tf.equal(y_true, y_pred), 'float32')
        accuracy = tf.reduce_mean(correct)
        
        # Update state
        super(PixelAccuracyMetric, self).update_state(accuracy)
```

### 17. Mean Accuracy Metric

```python
class MeanAccuracyMetric(tf.keras.metrics.Mean):
    """ Mean (class-weighted) accuracy metric """
    
    def __init__(self, num_classes, name='mean_accuracy', **kwargs):
        super(MeanAccuracyMetric, self).__init__(name=name, **kwargs)
        self.num_classes = num_classes
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        smooth = 1
        
        # Set shape and flatten
        y_true.set_shape([None, y_pred.shape[1], y_pred.shape[2]])
        y_true = tf.reshape(y_true, [-1])
        y_pred = tf.reshape(tf.argmax(y_pred, axis=-1), [-1])
        
        # Create valid mask
        valid_mask = tf.reshape((y_true <= self.num_classes - 1), [-1])
        y_true = tf.boolean_mask(y_true, valid_mask)
        y_pred = tf.boolean_mask(y_pred, valid_mask)
        
        # Compute confusion matrix
        conf_matrix = tf.cast(
            tf.math.confusion_matrix(y_true, y_pred, num_classes=self.num_classes),
            'float32'
        )
        
        # Get true positives (diagonal)
        true_pos = tf.linalg.diag_part(conf_matrix)
        
        # Compute mean accuracy
        mean_accuracy = tf.reduce_mean(
            (true_pos + smooth) / (tf.reduce_sum(conf_matrix, axis=1) + smooth)
        )
        
        # Update state
        super(MeanAccuracyMetric, self).update_state(mean_accuracy)
```

### 18. Mean IoU Metric

```python
class MeanIoUMetric(tf.keras.metrics.MeanIoU):
    """ Mean Intersection over Union metric """
    
    def __init__(self, num_classes, name='mean_iou', **kwargs):
        super(MeanIoUMetric, self).__init__(num_classes=num_classes, name=name, **kwargs)
        self.num_classes = num_classes
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Set shape and flatten
        y_true.set_shape([None, y_pred.shape[1], y_pred.shape[2]])
        y_true = tf.reshape(y_true, [-1])
        y_pred = tf.reshape(tf.argmax(y_pred, axis=-1), [-1])
        
        # Create valid mask
        valid_mask = tf.reshape((y_true <= self.num_classes - 1), [-1])
        
        # Get pixels corresponding to valid mask
        y_true = tf.boolean_mask(y_true, valid_mask)
        y_pred = tf.boolean_mask(y_pred, valid_mask)
        
        # Update parent state
        super(MeanIoUMetric, self).update_state(y_true, y_pred)
```

### 19. Compile Model

```python
# Define optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)

# Compile model
deeplabv3.compile(
    loss=ce_dice_loss_from_logits(num_classes),
    optimizer=optimizer,
    metrics=[
        MeanIoUMetric(num_classes),
        MeanAccuracyMetric(num_classes),
        PixelAccuracyMetric(num_classes)
    ]
)

# Copy weights from original conv5 block
w_dict = {}
for l in ["conv5_block1_0_conv", "conv5_block1_0_bn",
          "conv5_block1_1_conv", "conv5_block1_1_bn",
          "conv5_block1_2_conv", "conv5_block1_2_bn",
          "conv5_block1_3_conv", "conv5_block1_3_bn"]:
    w_dict[l] = resnet50.get_layer(l).get_weights()

# Set weights to new model
for l in w_dict:
    deeplabv3.get_layer(l).set_weights(w_dict[l])

print("Model compiled successfully!")
```

### 20. Training Model

```python
# Define callbacks
callbacks = [
    tf.keras.callbacks.ModelCheckpoint(
        'deeplabv3_best.h5',
        monitor='val_mean_iou',
        mode='max',
        save_best_only=True,
        verbose=1
    ),
    tf.keras.callbacks.EarlyStopping(
        monitor='val_mean_iou',
        mode='max',
        patience=10,
        verbose=1
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_mean_iou',
        mode='max',
        factor=0.5,
        patience=5,
        verbose=1
    )
]

# Calculate steps
train_steps = 1464 // batch_size  # 1464 training images
val_steps = 732 // batch_size     # 732 validation images

# Train model
history = deeplabv3.fit(
    tr_image_ds,
    steps_per_epoch=train_steps,
    validation_data=val_image_ds,
    validation_steps=val_steps,
    epochs=epochs,
    callbacks=callbacks,
    verbose=1
)
```

### 21. Evaluasi Model

```python
# Load best model
deeplabv3.load_weights('deeplabv3_best.h5')

# Evaluate on test set
test_steps = 732 // batch_size  # 732 test images
test_results = deeplabv3.evaluate(test_image_ds, steps=test_steps)

print("Test Results:")
print(f"Loss: {test_results[0]:.4f}")
print(f"Mean IoU: {test_results[1]:.4f}")
print(f"Mean Accuracy: {test_results[2]:.4f}")
print(f"Pixel Accuracy: {test_results[3]:.4f}")
```

### 22. Prediksi dan Visualisasi

```python
import matplotlib.pyplot as plt

def visualize_segmentation(model, image_path, target_path=None):
    """ Visualize segmentation result """
    
    # Load and preprocess image
    img = tf.image.decode_jpeg(tf.io.read_file(image_path))
    img = tf.image.resize(img, (384, 384))
    img_normalized = tf.cast(img, 'float32') / 255.0
    img_batch = tf.expand_dims(img_normalized, 0)
    
    # Predict
    pred = model.predict(img_batch)
    pred_mask = tf.argmax(pred[0], axis=-1).numpy()
    
    # Visualize
    fig, axes = plt.subplots(1, 3 if target_path else 2, figsize=(15, 5))
    
    axes[0].imshow(img.numpy().astype('uint8'))
    axes[0].set_title('Original Image')
    axes[0].axis('off')
    
    axes[1].imshow(pred_mask, cmap='tab20')
    axes[1].set_title('Predicted Segmentation')
    axes[1].axis('off')
    
    if target_path:
        target = np.array(Image.open(target_path))
        target = tf.image.resize(
            tf.expand_dims(target, -1),
            (384, 384),
            method='nearest'
        ).numpy().squeeze()
        
        axes[2].imshow(target, cmap='tab20')
        axes[2].set_title('Ground Truth')
        axes[2].axis('off')
    
    plt.tight_layout()
    plt.show()

# Example usage
test_image = os.path.join(orig_dir, '2007_000033.jpg')
test_target = os.path.join(seg_dir, '2007_000033.png')

visualize_segmentation(deeplabv3, test_image, test_target)
```

---

## Kesimpulan

Bab 8 ini memberikan penjelasan komprehensif tentang image segmentation dengan fokus pada:

1. **Konsep fundamental** segmentasi gambar dan perbedaannya dengan klasifikasi
2. **Pipeline data yang efisien** menggunakan tf.data API dengan berbagai optimasi
3. **Arsitektur DeepLab v3** yang menggunakan atrous convolution dan ASPP module
4. **Loss functions dan metrics** yang specialized untuk tugas segmentasi
5. **Implementasi lengkap** dari data loading hingga training dan evaluasi

Semua kode di atas dapat dijalankan secara berurutan untuk membuat sistem image segmentation yang lengkap menggunakan dataset PASCAL VOC 2012.
