# Lab 5 - Medical image segmentation with U-net 

## Lab Exercises

This notebook contains lab exercises for sharpening spatial filters and Laplacian.

1. Build  a  semantic  segmentation  model  based  on  the  U-net  architecture.  You  can  use  an 
already implemented such model. You have examples in the following links: 
 
https://pyimagesearch.com/2022/02/21/u-net-image-segmentation-in-keras/ 
https://becominghuman.ai/u-net-architecture-explained-and-implementation-
470a5095ad57 
https://www.tensorflow.org/tutorials/images/segmentation 
https://asperbrothers.com/blog/image-segmentation/ 

In [1]:
# U-Net model implementation (encoder-decoder with skip connections)
import tensorflow as tf
from tensorflow.keras import layers, Model

# Helper metrics
@tf.function
def dice_coef(y_true, y_pred, smooth=1e-6):
    y_true_f = tf.reshape(y_true, [-1])
    y_pred_f = tf.reshape(y_pred, [-1])
    y_pred_f = tf.cast(y_pred_f > 0.5, tf.float32)
    intersection = tf.reduce_sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + smooth)

@tf.function
def iou(y_true, y_pred, smooth=1e-6):
    y_true_f = tf.reshape(y_true, [-1])
    y_pred_f = tf.reshape(y_pred, [-1])
    y_pred_f = tf.cast(y_pred_f > 0.5, tf.float32)
    intersection = tf.reduce_sum(y_true_f * y_pred_f)
    union = tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) - intersection
    return (intersection + smooth) / (union + smooth)

# Basic U-Net blocks
def conv_block(x, filters, kernel_size=3, activation='relu'):
    x = layers.Conv2D(filters, kernel_size, padding='same', activation=activation)(x)
    x = layers.Conv2D(filters, kernel_size, padding='same', activation=activation)(x)
    return x

def encoder_block(x, filters):
    c = conv_block(x, filters)
    p = layers.MaxPooling2D((2, 2))(c)
    return c, p

def decoder_block(x, skip, filters):
    x = layers.Conv2DTranspose(filters, (2, 2), strides=2, padding='same')(x)
    x = layers.Concatenate()([x, skip])
    x = conv_block(x, filters)
    return x

# U-Net builder
def unet_model(input_shape=(128, 128, 1), num_classes=1, base_filters=64):
    inputs = layers.Input(input_shape)

    c1, p1 = encoder_block(inputs, base_filters)
    c2, p2 = encoder_block(p1, base_filters * 2)
    c3, p3 = encoder_block(p2, base_filters * 4)
    c4, p4 = encoder_block(p3, base_filters * 8)

    bottleneck = conv_block(p4, base_filters * 16)

    d4 = decoder_block(bottleneck, c4, base_filters * 8)
    d3 = decoder_block(d4, c3, base_filters * 4)
    d2 = decoder_block(d3, c2, base_filters * 2)
    d1 = decoder_block(d2, c1, base_filters)

    if num_classes == 1:
        outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(d1)
        loss = 'binary_crossentropy'
        metric_list = [dice_coef, iou, 'accuracy']
    else:
        outputs = layers.Conv2D(num_classes, (1, 1), activation='softmax')(d1)
        loss = 'sparse_categorical_crossentropy'
        metric_list = [iou, 'accuracy']

    model = Model(inputs, outputs, name='U-Net')
    model.compile(optimizer=tf.keras.optimizers.Adam(1e-4), loss=loss, metrics=metric_list)
    return model

# Example: build and show a summary for a small U-Net
if __name__ == "__main__":
    model = unet_model(input_shape=(128, 128, 1), num_classes=1, base_filters=32)
    model.summary()


ModuleNotFoundError: No module named 'tensorflow'

2. Download the Brain Tumor dataset from: 
https://drive.google.com/file/d/1RyOkJ7yb45P0NCvVqrE3mtmtVgVqwRji/view?usp=sharing 

In [None]:
# Download Brain Tumor dataset from Google Drive and extract
# Installs gdown if not present, downloads the zip by file id, and extracts it.
import os
import zipfile

try:
    import gdown
except Exception:
    import sys
    !{sys.executable} -m pip install --quiet gdown
    import gdown

file_id = "1RyOkJ7yb45P0NCvVqrE3mtmtVgVqwRji"
url = f"https://drive.google.com/uc?id={file_id}"
output_zip = "brain_tumor_dataset.zip"
extract_dir = "brain_tumor_dataset"

if not os.path.exists(output_zip):
    print('Downloading dataset...')
    gdown.download(url, output_zip, quiet=False)
else:
    print(f"{output_zip} already exists")

if os.path.exists(output_zip):
    os.makedirs(extract_dir, exist_ok=True)
    with zipfile.ZipFile(output_zip, 'r') as z:
        print('Extracting...')
        z.extractall(extract_dir)
    print(f'Extracted to {extract_dir}/')
else:
    print('Download failed or file not found.')


3. Split  the  dataset  in  70%  train  images  (and  masks)  and  30%  test  images.  Use  as  many 
images as your computer allows you. 

In [None]:
# Split dataset into 70% train and 30% test (images and masks)
from pathlib import Path
import shutil
import random

# Adjust these as needed
dataset_dir = Path("brain_tumor_dataset")  # folder created by the extraction step
max_images = None  # set to an int to limit number of pairs used, or None to use all
train_frac = 0.7
seed = 42

# find image files and mask files (common extensions)
image_exts = ("*.png", "*.jpg", "*.jpeg", "*.tif", "*.bmp")
all_files = []
for ext in image_exts:
    all_files.extend(dataset_dir.rglob(ext))
all_files = [p for p in all_files if p.is_file()]

# simple heuristic to separate masks from images
def is_mask(p: Path):
    name = p.name.lower()
    if "mask" in name or "seg" in name or "gt" in name or "label" in name:
        return True
    # also if parent folder name includes mask-like words
    if any(k in p.parent.name.lower() for k in ("mask", "masks", "seg", "labels", "groundtruth", "gt")):
        return True
    return False

masks = [p for p in all_files if is_mask(p)]
images = [p for p in all_files if p not in masks]

# Try to pair by stem (or by removing common suffixes like _mask)
mask_map = {p.stem: p for p in masks}
paired = []
for img in images:
    key = img.stem
    if key in mask_map:
        paired.append((img, mask_map[key]))
        continue
    # try removing common suffixes from mask stems
    alt = key + "_mask"
    if alt in mask_map:
        paired.append((img, mask_map[alt]))
        continue
    # try the reverse (mask has suffix)
    found = None
    for m in masks:
        if m.stem.startswith(key) or key.startswith(m.stem):
            found = m
            break
    if found:
        paired.append((img, found))

# fallback: if no mask heuristics worked but equal number of images and masks, pair by sorted order
if not paired and len(images) == len(masks) and len(images) > 0:
    images_sorted = sorted(images)
    masks_sorted = sorted(masks)
    paired = list(zip(images_sorted, masks_sorted))

if not paired:
    print("No image/mask pairs found automatically. Found {} images and {} masks.".format(len(images), len(masks)))
    print("Please verify dataset structure (image files and mask files). Typical layout: dataset/images/ and dataset/masks/ or consistent filename suffixes like *_mask.png")
else:
    print(f"Found {len(paired)} image/mask pairs.")

# optionally limit
if max_images is not None:
    paired = paired[:max_images]

# shuffle and split
random.seed(seed)
random.shuffle(paired)
n = len(paired)
n_train = int(n * train_frac)
train_pairs = paired[:n_train]
test_pairs = paired[n_train:]

# create output dirs and copy files
out_base = dataset_dir / "data_split"
train_img_dir = out_base / "train" / "images"
train_mask_dir = out_base / "train" / "masks"
test_img_dir = out_base / "test" / "images"
test_mask_dir = out_base / "test" / "masks"
for d in (train_img_dir, train_mask_dir, test_img_dir, test_mask_dir):
    d.mkdir(parents=True, exist_ok=True)

for src, dst in train_pairs:
    shutil.copy2(src, train_img_dir / src.name)
    shutil.copy2(dst, train_mask_dir / dst.name)
for src, dst in test_pairs:
    shutil.copy2(src, test_img_dir / src.name)
    shutil.copy2(dst, test_mask_dir / dst.name)

# save file lists
(out_base / "train_list.txt").write_text("\n".join(f"{p[0]}\t{p[1]}" for p in train_pairs))
(out_base / "test_list.txt").write_text("\n".join(f"{p[0]}\t{p[1]}" for p in test_pairs))

print(f"Saved {len(train_pairs)} train pairs and {len(test_pairs)} test pairs under {out_base}")

4. Train  de  U-net  model  using  the  training  set  (if  necessary,  use  augmentation)  and  then 
segment  the  images  in  the  test  set.  Evaluate  the  efficacy  of  the  segmentation  task  by 
computing the mean pixel accuracy, mean Jaccard’s index (intersection over union) and 
mean Dice coefficient1. 
 True Positive True NegativePixel Accuracy True Positive False Negative False Positive True Negative
+= + + +   
 ' ( ) TPJaccard s index IoU Intersection over Union TP FN FP= = + + 
 2 TP 2 IntersectionDice coefficient 2 TP FN FP Union Intersection
  = =  + + +

In [None]:
# U-Net training and evaluation pipeline
# Assumes dataset directory with two subfolders: images/ and masks/ where filenames match.
# Adjust DATA_DIR to point to the extracted dataset.

import os
import glob
import random
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.model_selection import train_test_split
from PIL import Image

# ---------- User configuration ----------
DATA_DIR = r"./brain_tumor_dataset"  # <-- set this to the extracted dataset root
IMG_SIZE = (128, 128)
BATCH_SIZE = 8
EPOCHS = 20
SEED = 42
# ----------------------------------------

random.seed(SEED)
np.random.seed(SEED)

# Helper: gather image/mask pairs
def list_image_mask_pairs(data_dir):
    images_dir = os.path.join(data_dir, 'images')
    masks_dir = os.path.join(data_dir, 'masks')
    image_paths = sorted(glob.glob(os.path.join(images_dir, '*')))
    pairs = []
    for img_path in image_paths:
        fname = os.path.basename(img_path)
        mask_path = os.path.join(masks_dir, fname)
        if os.path.exists(mask_path):
            pairs.append((img_path, mask_path))
    return pairs

pairs = list_image_mask_pairs(DATA_DIR)
print(f'Found {len(pairs)} image/mask pairs')
if len(pairs) == 0:
    raise RuntimeError('No data found. Update DATA_DIR to point to the dataset with images/ and masks/.')

# Train/test split 70/30
train_pairs, test_pairs = train_test_split(pairs, test_size=0.30, random_state=SEED)
print(f'Train: {len(train_pairs)}, Test: {len(test_pairs)}')

# Loader
def load_image(path, target_size=IMG_SIZE, grayscale=False):
    img = Image.open(path)
    if grayscale:
        img = img.convert('L')
    else:
        img = img.convert('RGB')
    img = img.resize(target_size, Image.BILINEAR)
    arr = np.array(img)
    if not grayscale and arr.ndim == 2:
        arr = np.stack([arr]*3, axis=-1)
    return arr

# Prepare tf.data dataset
def make_dataset(pairs, batch_size=BATCH_SIZE, augment=False, shuffle=True):
    image_paths = [p for p,_ in pairs]
    mask_paths = [m for _,m in pairs]

    def _load(img_path, mask_path):
        img = tf.numpy_function(lambda p: load_image(p.decode('utf-8'), IMG_SIZE, False), [img_path], tf.uint8)
        mask = tf.numpy_function(lambda p: load_image(p.decode('utf-8'), IMG_SIZE, True), [mask_path], tf.uint8)
        img = tf.cast(img, tf.float32) / 255.0
        mask = tf.cast(mask, tf.float32) / 255.0
        # Ensure mask is binary single channel
        mask = tf.where(mask > 0.5, 1.0, 0.0)
        mask = tf.expand_dims(mask[...,0], -1)
        return img, mask

    ds = tf.data.Dataset.from_tensor_slices((image_paths, mask_paths))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(image_paths), seed=SEED)
    ds = ds.map(_load, num_parallel_calls=tf.data.AUTOTUNE)

    if augment:
        def _augment(img, mask):
            if tf.random.uniform(()) > 0.5:
                img = tf.image.flip_left_right(img)
                mask = tf.image.flip_left_right(mask)
            if tf.random.uniform(()) > 0.5:
                img = tf.image.flip_up_down(img)
                mask = tf.image.flip_up_down(mask)
            return img, mask
        ds = ds.map(_augment, num_parallel_calls=tf.data.AUTOTUNE)

    ds = ds.batch(batch_size).prefetch(tf.data.AUTOTUNE)
    return ds

train_ds = make_dataset(train_pairs, augment=True)
val_ds = make_dataset(test_pairs, augment=False, shuffle=False)

# Simple U-Net
def conv_block(x, filters):
    x = layers.Conv2D(filters, 3, padding='same', activation='relu')(x)
    x = layers.Conv2D(filters, 3, padding='same', activation='relu')(x)
    return x

def up_conv_block(x, skip, filters):
    x = layers.UpSampling2D(2)(x)
    x = layers.Concatenate()([x, skip])
    x = conv_block(x, filters)
    return x

def build_unet(input_shape=(IMG_SIZE[0], IMG_SIZE[1], 3), n_filters=32):
    inputs = layers.Input(input_shape)
    # Encoder
    c1 = conv_block(inputs, n_filters)
    p1 = layers.MaxPooling2D(2)(c1)
    c2 = conv_block(p1, n_filters*2)
    p2 = layers.MaxPooling2D(2)(c2)
    c3 = conv_block(p2, n_filters*4)
    p3 = layers.MaxPooling2D(2)(c3)
    c4 = conv_block(p3, n_filters*8)
    p4 = layers.MaxPooling2D(2)(c4)
    # Bridge
    b = conv_block(p4, n_filters*16)
    # Decoder
    u1 = up_conv_block(b, c4, n_filters*8)
    u2 = up_conv_block(u1, c3, n_filters*4)
    u3 = up_conv_block(u2, c2, n_filters*2)
    u4 = up_conv_block(u3, c1, n_filters)
    outputs = layers.Conv2D(1, 1, activation='sigmoid')(u4)
    model = models.Model(inputs, outputs)
    return model

model = build_unet()
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[])
model.summary()

# Train
callbacks = [
    tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.5, patience=3),
    tf.keras.callbacks.EarlyStopping(monitor='loss', patience=6, restore_best_weights=True)
]

history = model.fit(train_ds, epochs=EPOCHS, validation_data=val_ds, callbacks=callbacks)

# Evaluation utilities
def compute_metrics(model, pairs):
    pix_accs = []
    ious = []
    dices = []
    for img_path, mask_path in pairs:
        img = load_image(img_path, IMG_SIZE, False)
        gt = load_image(mask_path, IMG_SIZE, True)
        img_in = np.expand_dims(img.astype(np.float32)/255.0, 0)
        pred = model.predict(img_in)[0,...,0]
        pred_bin = (pred > 0.5).astype(np.uint8)
        gt_bin = (gt[...,0] > 127).astype(np.uint8)

        # Pixel accuracy
        acc = np.mean(pred_bin == gt_bin)
        pix_accs.append(acc)

        # IoU / Jaccard
        intersection = np.logical_and(pred_bin, gt_bin).sum()
        union = np.logical_or(pred_bin, gt_bin).sum()
        iou = intersection / union if union != 0 else 1.0
        ious.append(iou)

        # Dice
        dice = (2.0 * intersection) / (pred_bin.sum() + gt_bin.sum()) if (pred_bin.sum() + gt_bin.sum()) != 0 else 1.0
        dices.append(dice)

    return np.mean(pix_accs), np.mean(ious), np.mean(dices)

mpa, miou, mdice = compute_metrics(model, test_pairs)
print(f'Mean Pixel Accuracy: {mpa:.4f}')
print(f'Mean IoU (Jaccard): {miou:.4f}')
print(f'Mean Dice Coefficient: {mdice:.4f}')

# Optionally save the model
# model.save('unet_brain_tumor.h5')



 
5. Modify the U-net architecture, hoping to obtain better results. U-net variants 


In [None]:
# Residual U-Net variant (BatchNorm + Dropout) to try improving results
import tensorflow as tf
from tensorflow.keras import layers, models

def conv_block_bn(x, filters, kernel_size=3, batchnorm=True):
    x = layers.Conv2D(filters, kernel_size, padding='same')(x)
    if batchnorm:
        x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Conv2D(filters, kernel_size, padding='same')(x)
    if batchnorm:
        x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    return x

def residual_block(x, filters, kernel_size=3, batchnorm=True):
    # projection for the residual path
    res = layers.Conv2D(filters, 1, padding='same')(x)
    if batchnorm:
        res = layers.BatchNormalization()(res)
    x = conv_block_bn(x, filters, kernel_size, batchnorm)
    x = layers.add([x, res])
    x = layers.Activation('relu')(x)
    return x

def encoder_resblock(x, filters, pool=True, dropout=0.0):
    c = residual_block(x, filters)
    if dropout and dropout > 0:
        c = layers.Dropout(dropout)(c)
    p = layers.MaxPooling2D((2, 2))(c) if pool else c
    return c, p

def decoder_resblock(x, skip, filters, dropout=0.0):
    x = layers.Conv2DTranspose(filters, (2, 2), strides=2, padding='same')(x)
    x = layers.Concatenate()([x, skip])
    x = residual_block(x, filters)
    if dropout and dropout > 0:
        x = layers.Dropout(dropout)(x)
    return x

def build_resunet(input_shape=(128, 128, 3), n_filters=32, dropout=0.1):
    inputs = layers.Input(input_shape)
    c1, p1 = encoder_resblock(inputs, n_filters, pool=True, dropout=0.0)
    c2, p2 = encoder_resblock(p1, n_filters * 2, dropout=dropout)
    c3, p3 = encoder_resblock(p2, n_filters * 4, dropout=dropout)
    c4, p4 = encoder_resblock(p3, n_filters * 8, dropout=dropout)

    b = residual_block(p4, n_filters * 16)

    d1 = decoder_resblock(b, c4, n_filters * 8, dropout=dropout)
    d2 = decoder_resblock(d1, c3, n_filters * 4, dropout=dropout)
    d3 = decoder_resblock(d2, c2, n_filters * 2, dropout=dropout)
    d4 = decoder_resblock(d3, c1, n_filters, dropout=0.0)

    outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(d4)
    model = models.Model(inputs, outputs, name='ResUNet')
    return model

# Build and show a summary for the Residual U-Net variant
# If IMG_SIZE is defined earlier in the notebook it will be used; otherwise default to (128,128)
try:
    _img_size = (IMG_SIZE[0], IMG_SIZE[1], 3)
except Exception:
    _img_size = (128, 128, 3)

resunet = build_resunet(input_shape=_img_size, n_filters=32, dropout=0.1)
resunet.summary()


 
6. Test U-Net on the dataset from Homwork 3 (Pratheepan dataset)  
 1 https://towardsdatascience.com/how-accurate-is-image-segmentation-dd448f896388 


In [None]:
# Test U-Net on the Pratheepan dataset (Homework 3)
# Set PRATHEEPAN_DIR to the root of the Pratheepan dataset (contains images/ and masks/ or files with mask in name)
PRATHEEPAN_DIR = r"./pratheepan_dataset"  # <-- change this to your dataset path

import os, glob

def list_pairs_pratheepan(data_dir):
    # Try standard images/ and masks/ subfolders first
    img_exts = ("*.jpg","*.png","*.jpeg","*.bmp","*.tif")
    images = []
    masks = []
    for ext in img_exts:
        images += glob.glob(os.path.join(data_dir, 'images', ext))
        masks += glob.glob(os.path.join(data_dir, 'masks', ext))
    if images and masks:
        img_map = {os.path.basename(p): p for p in images}
        mask_map = {os.path.basename(p): p for p in masks}
        pairs = [(img_map[n], mask_map[n]) for n in img_map.keys() if n in mask_map]
        return pairs
    # Fallback: search all files and use simple heuristics to separate masks
    all_files = []
    for ext in img_exts:
        all_files += glob.glob(os.path.join(data_dir, '**', ext), recursive=True)
    masks = [p for p in all_files if any(k in p.lower() for k in ('mask','seg','label','gt'))]
    images = [p for p in all_files if p not in masks]
    mask_map = {os.path.splitext(os.path.basename(m))[0]: m for m in masks}
    pairs = []
    for im in images:
        key = os.path.splitext(os.path.basename(im))[0]
        if key in mask_map:
            pairs.append((im, mask_map[key]))
    return pairs

pairs = list_pairs_pratheepan(PRATHEEPAN_DIR)
print(f'Found {len(pairs)} image/mask pairs in: {PRATHEEPAN_DIR}')
if len(pairs) == 0:
    raise RuntimeError('No pairs found. Update PRATHEEPAN_DIR to point to the dataset root.')

# Ensure a model is available: try existing `model` variable or load saved model file
try:
    model
except NameError:
    if os.path.exists('unet_brain_tumor.h5'):
        from tensorflow.keras.models import load_model
        model = load_model('unet_brain_tumor.h5', custom_objects={
            'dice_coef': globals().get('dice_coef', None),
            'iou': globals().get('iou', None)
        })
        print('Loaded model from unet_brain_tumor.h5')
    else:
        raise RuntimeError('No trained model found in variable `model` and unet_brain_tumor.h5 not present.')

# Ensure load_image and IMG_SIZE exist, otherwise define a small helper
try:
    load_image
except NameError:
    from PIL import Image
    import numpy as np
    def load_image(path, target_size=(128,128), grayscale=False):
        img = Image.open(path)
        if grayscale:
            img = img.convert('L')
        else:
            img = img.convert('RGB')
        img = img.resize(target_size, Image.BILINEAR)
        return np.array(img)

try:
    IMG_SIZE
except NameError:
    IMG_SIZE = (128, 128)

# Compute metrics over the dataset
import numpy as np
pix_accs = []
ious = []
dices = []
for img_p, mask_p in pairs:
    img = load_image(img_p, IMG_SIZE, False)
    gt = load_image(mask_p, IMG_SIZE, True)
    inp = np.expand_dims(img.astype('float32')/255.0, 0)
    pred = model.predict(inp)[0,...,0]
    pred_bin = (pred > 0.5).astype(np.uint8)
    gt_bin = (gt[...,0] > 127).astype(np.uint8) if gt.ndim == 3 else (gt > 127).astype(np.uint8)

    acc = np.mean(pred_bin == gt_bin)
    pix_accs.append(acc)
    intersection = np.logical_and(pred_bin, gt_bin).sum()
    union = np.logical_or(pred_bin, gt_bin).sum()
    ious.append(intersection / union if union != 0 else 1.0)
    dices.append((2.0 * intersection) / (pred_bin.sum() + gt_bin.sum()) if (pred_bin.sum() + gt_bin.sum()) != 0 else 1.0)

print(f'Mean Pixel Accuracy: {np.mean(pix_accs):.4f}')
print(f'Mean IoU: {np.mean(ious):.4f}')
print(f'Mean Dice: {np.mean(dices):.4f}')

# Visualize a few examples
import matplotlib.pyplot as plt
n_show = min(3, len(pairs))
for i in range(n_show):
    img_p, mask_p = pairs[i]
    img = load_image(img_p, IMG_SIZE, False)
    gt = load_image(mask_p, IMG_SIZE, True)
    pred = model.predict(np.expand_dims(img.astype('float32')/255.0, 0))[0,...,0]
    pred_bin = (pred > 0.5).astype(np.uint8) * 255

    fig, ax = plt.subplots(1,3, figsize=(9,3))
    ax[0].imshow(img.astype('uint8'))
    ax[0].set_title('Image'); ax[0].axis('off')
    ax[1].imshow(gt[...,0] if gt.ndim==3 else gt, cmap='gray')
    ax[1].set_title('Ground truth'); ax[1].axis('off')
    ax[2].imshow(pred_bin, cmap='gray')
    ax[2].set_title('Prediction'); ax[2].axis('off')
    plt.show()


 
 
U-net 
U-Net is a semantic segmentation convolutional neural network that was developed for 
biomedical image segmentation2. This model was developed to work with fewer training images 
(data  augmentation  with  elastic  deformations  reduces  the  number  of  annotated  images  required 
for  training)  and  yield  more  precise  segmentation.  Its  key  features  are  that  U-Net  learns 
segmentation in an end-to-end setting (one inputs an image and gets a segmentation map as the 
output).      U-Net  performs  classification  on  every  pixel  so  that  the  input  and  output  share  the 
same size. 
U-net is a special type of encoder-decoder network (see Figure 1): 
• encoder (left part of a “U”) –  encodes  image  into  an  abstract  representation  of  image 
features by applying a sequence of convolutional blocks that gradually decrease 
representation’s height and width but an increasing number  of  channels  that  correspond 
to image features. 
• decoder  (right  part  of  a  “U”)  –  decodes  image  representation  into  a  binary  mask  by 
applying a sequence of up-convolutions (NOT the same as deconvolution) that gradually 
increase representation’s height and width to the size of the original image and decreases 
the number of channels to the number of classes that we are segmenting 
• additionally,  U-Net  implements  skip  connections  that  connect  corresponding  levels  of 
encoder  and  decoder.  They  allow  the  model  not  to “lose” features extracted by earlier 
blocks of an encoder, which increases segmentation performance. 
 
 
 2  O.  Ronneberger,  P.  Fischer,  T.  Brox,  U-net:  Convolutional  networks  for  biomedical 
image segmentation, International Conference on Medical Image Computing and 
Computer-Assisted Intervention (2015) 234–241. 
 
 
 
Figure 1 – U-net architecture 