# Semantic segmentation of aerial imagery using U-Net

By employing a U-Net architecture and patchifying aerial images into **256x256x3** tensors, the model achieved notable performance, demonstrating a **test accuracy of 87%**, a Jaccard coefficient of 0.7308, and a loss of 0.8974. These results suggest strong generalization ability, as the model also exhibited a **validation accuracy of 84%**, a validation Jaccard coefficient of 0.6869, and a validation loss of 0.9149. These metrics indicate the model's proficiency in segmenting aerial imagery, effectively classifying pixels into distinct object categories. 

*     Building: #3C1098
*     Land (unpaved area): #8429F6
*     Road: #6EC1E4
*     Vegetation: #FEDD3A
*     Water: #E2A929
*     Unlabeled: #9B9B9B

# 01. Enable GPU as the physical device

It's essential to enable GPU acceleration within the Kaggle notebook environment. This is achieved by navigating to the right-side panel of the notebook and selecting "Session options"  Within the Session options, choose "GPU T4 x2" as the ACCELERATOR. 
For more informations refer [enable GPU T4 x2 as the ACCELERATOR](https://www.kaggle.com/docs/notebooks#settings) 

In [None]:
import tensorflow as tf

print("TensorFlow version:", tf.__version__)

# List GPUs available
gpus = tf.config.list_physical_devices('GPU')
print("GPUs:", gpus)

# Check if GPUs are available
if gpus:
    print("GPU is available.")
else:
    print("GPU is not available.")

# Set memory growth for each GPU
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)


# 02. Install Dependencies

In [None]:
!pip install matplotlib
!pip install scikit-learn
!pip install -U segmentation-models
!pip install patchify
!pip install Pillow

# 03. Patchify images and masks

In [None]:
import os
import cv2
import numpy as np

import os
os.environ["SM_FRAMEWORK"] = "tf.keras"

from matplotlib import pyplot as plt
from patchify import patchify
from PIL import Image
import segmentation_models as sm
from tensorflow.keras.metrics import MeanIoU

from sklearn.preprocessing import MinMaxScaler, StandardScaler
import random

In [None]:
dataset_root_folder = "/kaggle/input/semantic-segmentation-of-aerial-imagery/Semantic segmentation dataset"

In [None]:
dataset_name = 'Semantic segmentation dataset'

In [None]:
os.walk(dataset_root_folder)

In [None]:
for path, subdirs, files in os.walk(dataset_root_folder):
  dir_name = path.split(os.path.sep)[-1]
  print(dir_name)

In [None]:
image_folder = f"{dataset_root_folder}/Tile 1/images"
image_files = [f for f in os.listdir(image_folder) if f.endswith('.jpg')]
num_images_to_show = min(20, len(image_files))
for i in range(num_images_to_show):
    image_path = os.path.join(image_folder, image_files[i])
    image = cv2.imread(image_path, cv2.IMREAD_COLOR)
    print(f"Obraz {image_files[i]} has shape: {image.shape}")
   

The images have different sizes that need to be unified.
Tile and mask procesing:

1. Choosing path size to 256 or 512
2. Make all tiles and masks images sizes is the multiple of patch size
3. Split all the images into patch sizes and convert them into numpy array

Processing was presented on one image and will be performed on the entire data set later.

In [None]:
image = cv2.imread(f"{dataset_root_folder}/Tile 1/images/image_part_001.jpg")

In [None]:
image

In [None]:
image.shape

In [None]:
plt.imshow(image)
plt.title("Tile 1 - image_part_001.jpg BGR")
plt.axis("on")
plt.show()
    


In [None]:
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)
plt.title("Tile 1 - image_part_001.jpg RGB")
plt.axis("on")
plt.show()


Choosing patch size: 256 or 512?

Compatibility with neural network architectures: 256 or 512 are popular sizes because they correspond to the resolutions used by many popular neural network architectures such as ResNet, VGG, U-Net, etc. Many of these models are designed for image sizes such as 224x224, 256x256, or 512x512 because they are easy to scale to at the architecture level.

Computational power: These sizes often provide a good balance between accuracy and computation time. They provide enough information about the image structure without putting too much strain on the GPU memory, which is especially important for tasks such as image segmentation.

Divisibility by layer sizes in models: 256 and 512 are numbers that are easily divisible by 2 (e.g. 512 is 2^9, 256 is 2^8). The divisibility by 2 is important because during processing in a neural network (e.g. in convolutional networks), the image is often reduced to a smaller size by operations such as pooling or downsampling.
Neural networks such as U-Net, ResNet or VGG are designed in such a way that the size of the input image should fit into their architecture, where the image is gradually reduced (e.g. by convolutional and pooling layers) and then rebuilt (e.g. by transposed layers). Choosing a patch size of 256 or 512 ensures that the image easily fits into this process.

Computational efficiency (GPU memory): A larger patch size (e.g. 512x512) can offer more details in a single image, but a larger patch size means more GPU memory is required. A larger patch means that the network has to process more data in one pass, which can slow down training, especially if you have limited GPU memory.
A smaller patch size (e.g. 256x256) means less memory, which can speed up computation, but can reduce detail if the patch is too small for the required image analysis.

Averaged results and stability: 256x256 and 512x512 are popular sizes because they help to obtain stable results in image analysis, especially when images are divided into smaller pieces (patches). A patch size that is too large can contain too much information, which can make the model harder to train or unstable.

On the other hand, a patch that is too small (e.g. 128x128 or 64x64) can contain too little information to correctly recognize patterns and features in the image, which can make accurate segmentation difficult.

Specific task: In image segmentation or image analysis (e.g. in satellite image processing, medical image processing, etc.), the patch size also depends on the scale and characteristics of the objects we want to detect. Often a 256x256 or 512x512 patch is adequate to capture the structures in the image, but not too large to include information that may be irrelevant or redundant.

256x256 - A popular patch size in neural networks, especially in medical segmentation (e.g. MRI, CT images) where details need to be captured at different levels.

512x512 - More commonly used for larger images where we want the patch to contain more context (e.g. satellite images).

256 or 512? If you care about training speed and have limited GPU memory, 256x256 is a good compromise.

If you care about accuracy and have enough processing power (e.g. large GPU memory), 512x512 may be more appropriate.

In [None]:
image_patch_size = 256

In [None]:
size_x = (image.shape[1] // image_patch_size) * image_patch_size
size_y = (image.shape[0] // image_patch_size) * image_patch_size

The model requires uniform image slices to ensure consistency in the neural network processing). Adjusting the image dimensions to a size that is a multiple of the "patch" size. The goal is for the image to be able to be divided into equal pieces (patches) without any leftovers at the edges of the image. The size_x and size_y values ​​are used to crop the image so that its dimensions are a multiple of image_patch_size.

In [None]:
image1 = Image.fromarray(image)

In [None]:
print(type(image))

In [None]:
print(type(image1))

In [None]:
image2 = image1.crop((0,0,size_x,size_y))

In [None]:
image2.size

In [None]:
plt.imshow(image2)
plt.show()

In [None]:
image3 = np.array(image2)

In [None]:
image3.shape

In [None]:
patch_size = 256

print("Rozmiar image3:", image3.shape)

h, w, _ = image3.shape
patches_vertical = h // patch_size
patches_horizontal = w // patch_size

print(f"Number of patches vertical: {patches_vertical}")
print(f"Number of patches horizontal: {patches_horizontal}")

In [None]:
image_patches = patchify(image3, (image_patch_size, image_patch_size, 3), step=image_patch_size)

In [None]:
image_patches.shape

The value of image_patches.shape as (2, 3, 1, 256, 256, 3) means that image_patches is a NumPy array with six dimensions. Each of these dimensions refers to a different aspect of the data that the array stores.

Syntax: (2, 3, 1, 256, 256, 3)
First dimension (2):

This is the number of "vertical" blocks (slices) of the image. This could be the number of rows in the patch grid that were created after the image was split. In this case, this means that the image was split into 2 pieces vertically.

Second dimension (3):

This is the number of "horizontal" blocks (slices) of the image. This means that the image was split into 3 pieces horizontally.

Third dimension (1):
In this case, there is only one image, so it is not a collection of multiple images. This dimension represents the number of sets of images.

Fourth dimension (256):

This is the height of each patch (image fragment). In this case, each patch will be 256 pixels high.

Fifth dimension (256):

This is the width of each patch (image fragment). Each patch is 256 pixels wide.

Sixth dimension (3):

This is the number of color channels. In this case, we have 3 channels, meaning the image is in RGB (red, green, blue) color.

In [None]:
# image_patches.shape -> (3, 2, 1, 256, 256, 3)

patches_height = image_patches.shape[0]
patches_width = image_patches.shape[1]

fig, axs = plt.subplots(patches_height, patches_width, figsize=(patches_width * 3, patches_height * 3))

for i in range(patches_height):
    for j in range(patches_width):
        
        patch = image_patches[i, j, 0, :, :, :]
        axs[i, j].imshow(patch)
        axs[i, j].set_title(f'Patch ({i}, {j})')
        axs[i, j].axis('off')

plt.tight_layout()
plt.show()

The code image_x = image_patches[0, 0, :, :]  selects the first patch from image_patches with dimensions of 256x256x3 and assigns it to the variable image_x, which contains a fragment of the image in the form of a 256x256x3 matrix (that is, an RGB color image with a resolution of 256x256 px).

In [None]:
image_x = image_patches[0,0,:,:]

In [None]:
image_x.shape

In [None]:
image_x

In [None]:
minmax_scaler = MinMaxScaler()

1. Flattens the image image_x into an array of dimensions (256*256, 3).

2. Scales the pixel values of the image to the range [0, 1] using MinMaxScaler.

3. Then restores the original shape of the image (256, 256, 3) with the scaled pixel values.

In [None]:
image_y = minmax_scaler.fit_transform(image_x.reshape(-1, image_x.shape[-1])).reshape(image_x.shape)

In [None]:
image_y.shape

In [None]:
image_y

In [None]:
# image_patch_size = 256  # Jeśli nie masz zdefiniowanego, dodaj tę linię

# dataset_root_folder = "/kaggle/input/semantic-segmentation-of-aerial-imagery/Semantic segmentation dataset"

In [None]:
image_dataset = []
mask_dataset = []

for image_type in ['images', 'masks']:
    image_extension = 'jpg' if image_type == 'images' else 'png'

    for tile_id in range(1, 8):  
        for image_id in range(1, 10): 
            image_path = f"{dataset_root_folder}/Tile {tile_id}/{image_type}/image_part_{image_id:03d}.{image_extension}"

            
            if os.path.exists(image_path):
                image = cv2.imread(image_path, 1)

                if image is not None:
                    if image_type == 'masks':
                        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

                    size_x = (image.shape[1] // image_patch_size) * image_patch_size
                    size_y = (image.shape[0] // image_patch_size) * image_patch_size

                    image = Image.fromarray(image)
                    image = image.crop((0, 0, size_x, size_y))
                    image = np.array(image)

                    patched_images = patchify(image, (image_patch_size, image_patch_size, 3), step=image_patch_size)

                    for i in range(patched_images.shape[0]):
                        for j in range(patched_images.shape[1]):
                            individual_patched_image = patched_images[i, j, :, :]

                            if image_type == 'images':
                                minmax_scaler = MinMaxScaler()
                                individual_patched_image = minmax_scaler.fit_transform(
                                    individual_patched_image.reshape(-1, individual_patched_image.shape[-1])
                                ).reshape(individual_patched_image.shape)

                                individual_patched_image = individual_patched_image[0]
                                image_dataset.append(individual_patched_image)

                            elif image_type == 'masks':
                                individual_patched_image = individual_patched_image[0]
                                mask_dataset.append(individual_patched_image)

In [None]:
len(image_dataset)

In [None]:
len(mask_dataset)

In [None]:
image_dataset = np.array(image_dataset)
mask_dataset = np.array(mask_dataset)

In [None]:
print(image_dataset.shape)
print(mask_dataset.shape)

In [None]:
image_dataset[0].shape

In [None]:
random_image_id = random.randint(0, len(image_dataset)-1)

plt.figure(figsize=(10,10))
plt.subplot(1,2,1)
plt.imshow(image_dataset[random_image_id])
plt.subplot(1,2,2)
plt.imshow(mask_dataset[random_image_id])
plt.show()

Converting a color in hexadecimal (HEX) format to RGB (Red, Green, Blue) values, which are used in analysis such as image processing. The HEX format is commonly used to specify colors in graphics applications and web pages. Each color in this format is represented by a string of six characters, where the first two are the Red value, the next two are the Green value, and the last two are the Blue value.

Conversion from hexadecimal (HEX) to decimal (RGB):
Each color in HEX format (e.g. '#3C1098') is a string of six characters.

The first two digits represent the red component of the color, the next two are the green component, and the last two are the blue component.

Hexadecimal (base 16) values ​​are converted to decimal numbers.
Example for color #3C1098:

'3C' is the hexadecimal value that converts to decimal, which is 60 (red component).

'10' is the hexadecimal value that converts to 16 (green component).

'98' is the hexadecimal value that converts to 152 (blue component).

Therefore, for #3C1098, the resulting RGB value is [60, 16, 152].
For each color we convert HEX to RGB:
Building = '#3C1098' – converts to [60, 16, 152]
Land = '#8429F6' – converts to [132, 41, 246]
Road = '#6EC1E4' – converts to [110, 193, 228]
Vegetation = 'FEDD3A' – converts to [254, 221, 58]
Water = 'E2A929' – converts to [226, 169, 41]
Unlabeled = '#9B9B9B' – converts to [155, 155, 155]

In [None]:

a=int('3C', 16)  #3C with base 16. Should return 60.
print(a)
#Do the same for all RGB channels in each hex code to convert to RGB
Building = '#3C1098'.lstrip('#')
Building = np.array(tuple(int(Building[i:i+2], 16) for i in (0, 2, 4))) # 60, 16, 152

Land = '#8429F6'.lstrip('#')
Land = np.array(tuple(int(Land[i:i+2], 16) for i in (0, 2, 4))) #132, 41, 246

Road = '#6EC1E4'.lstrip('#')
Road = np.array(tuple(int(Road[i:i+2], 16) for i in (0, 2, 4))) #110, 193, 228

Vegetation =  'FEDD3A'.lstrip('#')
Vegetation = np.array(tuple(int(Vegetation[i:i+2], 16) for i in (0, 2, 4))) #254, 221, 58

Water = 'E2A929'.lstrip('#')
Water = np.array(tuple(int(Water[i:i+2], 16) for i in (0, 2, 4))) #226, 169, 41

Unlabeled = '#9B9B9B'.lstrip('#')
Unlabeled = np.array(tuple(int(Unlabeled[i:i+2], 16) for i in (0, 2, 4))) #155, 155, 155

label = individual_patched_image

In [None]:
image_number = random.randint(0, len(image_dataset))
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(np.reshape(image_dataset[image_number], (patch_size, patch_size, 3)))
plt.subplot(122)
plt.imshow(np.reshape(mask_dataset[image_number], (patch_size, patch_size, 3)))
plt.show()

The rgb_to_2D_label(label) function is for converting an image (mask) in RGB format to a 2D image that contains numeric labels instead of colors. Instead of three channels (RGB), the function maps each color to a specific numeric value (label assignment step).

Input: This function takes an RGB label image, where each pixel is represented by a vector (e.g. [60, 16, 152] for the color Building).

Create an empty array label_seg: Creates an empty array of the same size as the input label image, with values ​​set to 0 (e.g. uint8).
Convert RGB values ​​to numeric labels: Then, for each pixel in the image, checks if its color matches one of the colors in the defined variables (such as Building, Land, Road, etc.). If a pixel has a specific color, it assigns a corresponding numeric label to it:

Building (RGB: [60, 16, 152]) -> label 0

Land (RGB: [132, 41, 246]) -> label 1

Road (RGB: [110, 193, 228]) -> label 2

Vegetation (RGB: [254, 221, 58]) -> label 3

Water (RGB: [226, 169, 41]) -> label 4

Unlabeled (RGB: [155, 155, 155]) -> label 5

Each of these colors is assigned a corresponding number (0, 1, 2, 3, 4, 5).

In [None]:
def rgb_to_2D_label(label):
    """
    Suply our labale masks as input in RGB format.
    Replace pixels with specific RGB values ...
    """
    label_seg = np.zeros(label.shape,dtype=np.uint8)
    label_seg [np.all(label == Building,axis=-1)] = 0
    label_seg [np.all(label==Land,axis=-1)] = 1
    label_seg [np.all(label==Road,axis=-1)] = 2
    label_seg [np.all(label==Vegetation,axis=-1)] = 3
    label_seg [np.all(label==Water,axis=-1)] = 4
    label_seg [np.all(label==Unlabeled,axis=-1)] = 5

    label_seg = label_seg[:,:,0] 

    return label_seg

In [None]:
labels = []
for i in range(mask_dataset.shape[0]):
    label = rgb_to_2D_label(mask_dataset[i])
    labels.append(label)

labels = np.array(labels)
labels = np.expand_dims(labels, axis=3)


print("Unique labels in label dataset are: ", np.unique(labels))

In [None]:
image_number = random.randint(0, len(image_dataset))
plt.figure(figsize=(12, 6))
plt.subplot(121)
plt.imshow(image_dataset[image_number])
plt.subplot(122)
plt.imshow(labels[image_number][:,:,0])
plt.show()

In [None]:
n_classes = len(np.unique(labels))
from keras.utils import to_categorical
labels_cat = to_categorical(labels, num_classes=n_classes)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(image_dataset, labels_cat, test_size = 0.20, random_state = 42)


In [None]:
weights = [0.1666, 0.1666, 0.1666, 0.1666, 0.1666, 0.1666]
dice_loss = sm.losses.DiceLoss(class_weights=weights)
focal_loss = sm.losses.CategoricalFocalLoss()
total_loss = dice_loss + (1 * focal_loss)

In [None]:
IMG_HEIGHT = X_train.shape[1]
IMG_WIDTH  = X_train.shape[2]
IMG_CHANNELS = X_train.shape[3]

In [None]:
from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate, Conv2DTranspose, BatchNormalization, Dropout, Lambda
from keras import backend as K

In [None]:
def jacard_coef(y_true, y_pred):
    y_true_f = tf.keras.backend.flatten(y_true)
    y_pred_f = tf.keras.backend.flatten(y_pred)
    intersection = tf.keras.backend.sum(y_true_f * y_pred_f)
    return (intersection + 1.0) / (tf.keras.backend.sum(y_true_f) + tf.keras.backend.sum(y_pred_f) - intersection + 1.0)

# 04. Build the U-Net model 

In [None]:
def multi_unet_model(n_classes=4, IMG_HEIGHT=256, IMG_WIDTH=256, IMG_CHANNELS=1):
    
    inputs = Input((IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS))
    
    s = inputs

    #Contraction path
    c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(s)
    c1 = Dropout(0.2)(c1)  # Original 0.1
    c1 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c1)
    p1 = MaxPooling2D((2, 2))(c1)

    c2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p1)
    c2 = Dropout(0.2)(c2)  # Original 0.1
    c2 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c2)
    p2 = MaxPooling2D((2, 2))(c2)

    c3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p2)
    c3 = Dropout(0.2)(c3)
    c3 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c3)
    p3 = MaxPooling2D((2, 2))(c3)

    c4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p3)
    c4 = Dropout(0.2)(c4)
    c4 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c4)
    p4 = MaxPooling2D(pool_size=(2, 2))(c4)

    c5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(p4)
    c5 = Dropout(0.3)(c5)
    c5 = Conv2D(256, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c5)

    #Expansive path
    u6 = Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(c5)
    u6 = concatenate([u6, c4])
    c6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u6)
    c6 = Dropout(0.2)(c6)
    c6 = Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c6)

    u7 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(c6)
    u7 = concatenate([u7, c3])
    c7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u7)
    c7 = Dropout(0.2)(c7)
    c7 = Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c7)

    u8 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same')(c7)
    u8 = concatenate([u8, c2])
    c8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u8)
    c8 = Dropout(0.2)(c8)  # Original 0.1
    c8 = Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c8)

    u9 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same')(c8)
    u9 = concatenate([u9, c1], axis=3)
    c9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(u9)
    c9 = Dropout(0.2)(c9)  # Original 0.1
    c9 = Conv2D(16, (3, 3), activation='relu', kernel_initializer='he_normal', padding='same')(c9)

    outputs = Conv2D(n_classes, (1, 1), activation='softmax')(c9)

    model = Model(inputs=[inputs], outputs=[outputs])

    return model

In [None]:
metrics=['accuracy', jacard_coef]

In [None]:
def get_model():
    return multi_unet_model(n_classes=6, IMG_HEIGHT=256, IMG_WIDTH=256, IMG_CHANNELS=3)

In [None]:
model = get_model()
model.compile(optimizer='adam', loss=total_loss, metrics=metrics)
model.summary()

# 05. Minor adjustments to avoid Kaggle T4-x2 GPU errors (Optional)

In [None]:
import tensorflow as tf
from tensorflow.keras import backend as K

# Set a specific convolution algorithm
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.force_gpu_compatible = True
session = tf.compat.v1.Session(config=config)


In [None]:
import tensorflow as tf
from tensorflow.keras import mixed_precision

# Set the global policy to mixed precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)

# Verify the policy
print(f'Mixed precision policy: {tf.keras.mixed_precision.global_policy()}')


In [None]:
import tensorflow as tf
tf.debugging.set_log_device_placement(True)

# 06. Train U-Net model with the preprocessed dataset

In [None]:

history1 = model.fit(X_train, y_train, 
                    batch_size = 16, 
                    verbose=1, 
                    epochs=50, 
                    validation_data=(X_test, y_test), 
                    shuffle=False)

# 07. Save trained model to Kaggle Output

In [None]:
model.save("/kaggle/working/satellite_standard_unet_100epochs.hdf5")

# 08. Plot training and validation accuracy and loss at each epoch

In [None]:
history = history1
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'y', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

acc = history.history['jacard_coef']
val_acc = history.history['val_jacard_coef']

plt.plot(epochs, acc, 'y', label='Training IoU')
plt.plot(epochs, val_acc, 'r', label='Validation IoU')
plt.title('Training and validation IoU')
plt.xlabel('Epochs')
plt.ylabel('IoU')
plt.legend()
plt.show()

# 09. Predict Using Saved Model

In [None]:
import tensorflow as tf

import os
os.environ["SM_FRAMEWORK"] = "tf.keras"

from tensorflow.keras.models import load_model
import segmentation_models as sm
from tensorflow.keras import backend as K

In [None]:
# Define the custom loss functions
weights = [0.1666, 0.1666, 0.1666, 0.1666, 0.1666, 0.1666]
dice_loss = sm.losses.DiceLoss(class_weights=weights)
focal_loss = sm.losses.CategoricalFocalLoss()
total_loss = dice_loss + (1 * focal_loss)  # Composite loss function

In [None]:
def jacard_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (intersection + 1.0) / (K.sum(y_true_f) + K.sum(y_pred_f) - intersection + 1.0)

In [None]:
# Path to model
model_path = "/kaggle/working/satellite_standard_unet_100epochs.hdf5"

In [None]:
custom_objects = {
    "dice_loss_plus_1focal_loss": total_loss,
    "jacard_coef": jacard_coef
}

# Load the model with custom objects
model = load_model(model_path, custom_objects=custom_objects)

IoU (Intersection over Union) is a popular metric used to evaluate the quality of image segmentation, especially in tasks such as semantic segmentation with the U-Net model. IoU measures the overlap between predicted (prediction) and actual (ground truth) class masks.
Intersection = number of pixels that are correctly classified (both y_pred and y_true have the same class value).

Union = number of pixels that are classified as a given class in y_pred or y_true.

y_pred – model prediction (has shape: [batch, height, width, num_classes]).

np.argmax(..., axis=3) – converts predictions from one-hot or softmax to classes (0, 1, 2, etc.), i.e. selects the most probable class for each pixel.

y_pred_argmax and y_test_argmax – these are now 2D class maps (for each image).

In [None]:
#IOU
y_pred=model.predict(X_test)
y_pred_argmax=np.argmax(y_pred, axis=3)
y_test_argmax=np.argmax(y_test, axis=3)

In [None]:
#Using built in keras function for IoU
from keras.metrics import MeanIoU
n_classes = 6
IOU_keras = MeanIoU(num_classes=n_classes)
IOU_keras.update_state(y_test_argmax, y_pred_argmax)
print("Mean IoU =", IOU_keras.result().numpy())

IoU (Intersection over Union) is a metric used to evaluate segmentation performance by comparing the predicted area to the ground truth area:

IoU = Intersection/Union = TP/TP + FP + FN
 
TP (True Positive): pixels correctly classified as a given class

FP (False Positive): pixels incorrectly predicted as that class

FN (False Negative): actual pixels of the class that were missed



Mean IoU Range	Segmentation Quality
0.90 – 1.00	Excellent
0.75 – 0.90	Very good
0.50 – 0.75	Decent / Acceptable
0.25 – 0.50	Weak
0.00 – 0.25	Very poor

In [None]:
import random
test_img_number = random.randint(0, len(X_test))
test_img = X_test[test_img_number]
ground_truth=y_test_argmax[test_img_number]
test_img_input=np.expand_dims(test_img, 0)
prediction = (model.predict(test_img_input))
predicted_img=np.argmax(prediction, axis=3)[0,:,:]


plt.figure(figsize=(12, 8))
plt.subplot(231)
plt.title('Testing Image')
plt.imshow(test_img)
plt.subplot(232)
plt.title('Testing Label')
plt.imshow(ground_truth)
plt.subplot(233)
plt.title('Prediction on test image')
plt.imshow(predicted_img)
plt.show()

In [None]:
# Number of images to predict
num_images = 5

# Plot setup
plt.figure(figsize=(20, 20))

for i in range(num_images):
    test_img_number = random.randint(0, len(X_test))
    test_img = X_test[test_img_number]
    ground_truth=y_test_argmax[test_img_number]
    test_img_input=np.expand_dims(test_img, 0)
    prediction = (model.predict(test_img_input))
    predicted_img=np.argmax(prediction, axis=3)[0,:,:]

    plt.figure(figsize=(12, 8))
    plt.subplot(231)
    plt.title('Testing Image')
    plt.imshow(test_img)
    plt.subplot(232)
    plt.title('Testing Label')
    plt.imshow(ground_truth)
    plt.subplot(233)
    plt.title('Prediction on test image')
    plt.imshow(predicted_img)
    plt.show()