# 5. Sentinel-2 application
## Neural Net Mapping of Hudson Bay Sea Ice

The Canadian Ice Service produces weekly regional sea ice charts for ship safety and environmental monitoring. In this project I use a convolutional neural network to automatically generate ice charts from satellite imagery. With increasing availability of satellite data, this network may be able to produce similar ice charts globally at higher detail.

-  Collected 3392 satellite images of Hudson Bay sea ice in the Canadian Arctic from 2016-1-1 to 2018-7-31
-  Generated sea ice concentrations masks for each image using Canadian Regional Ice Chart shapefiles
-  Trained a Convolutional Neural Network (U-Net) to generate sea ice charts from satellite images based on eight different classes (seven levels of ice concentration and land)
    -  Model Accuracy: 83%
    -  Model Mean IoU (intersection over union) score: 0.44
- Found a strong class imbalance favoring thick solid ice due to complete freezing in the winter months. Future work should focus on collecting more data during the spring months when ice is thawing and there is a greater variety in ice concentration.
- Future work could also take advantage of additional satellite wavelength collection bands beyond the visible spectrum.

# 1. Data Collection

There are two main data sources for this project: Sentinel-2 satellite images and Canadian Regional Ice Charts. These were used to generate images and masks, respectively.

## 1.1 Sentinel-2 

The Sentinel-2 mission is made up of a pair of satellites that image the globe roughly every 5 days. They capture 12 optical bands including the visible spectrum. Bands 3, 4, and 8 were used for this project, representing near infra-red, red and green wavelengths. Sentinelhub provides a python API for acquiring Sentinel-2 images.

## 1.2 Canadian Regional Ice Charts

Canadian Regional Ice Charts show geospatial sea ice concentrations for ship safety and environmental monitoring. They are produced weekly on Mondays by the Canadian Ice Service for five large regions:

- Hudson Bay
- Western Arctic
- Eastern Arctic
- Eastern Coast
- Great Lakes

This project investigated the Hudson Bay region. A sample ice chart for Hudson Bay on April 12, 2021 is shown below. Each region on the chart has a corresponding set of codes giving information on (among other things) the concentration of sea ice. The table below shows the codes corresponding to ice concentration. All charts are archived and available as shapefiles from the National Snow and Ice Data Centre dating back to 2006.

| <img src="Data/Sentinel-2/Images/Ice_Chart_ex.gif" width="600" />  | <img src="Data/Sentinel-2/Images/Chart_Codes.PNG" height="400" /> |  
|:--:|:--:| 
| *Sample Ice Chart for Hudson Bay* | *SIGRID-3 Ice Chart Codes* |

## 1.3 Data Collection Workflow

Data was collected using the EO-Learn python library, which provides a framework for slicing large geographical areas into smaller, more manageable tiles called EOPatches. 

| <img src="Data/Sentinel-2/Images/Region-Grid.png" width="600" />   |  
|:--:|
| *Sliced hudson bay region. Image/mask pairs are generated on each tile.* |

After slicing the region, an EO-Learn workflow was developed to aquire satellite images through the Sentinelhub API. The workflow includes filtering steps to remove cloudy images and a custom step to add a time-dependent image mask (from ice chart chapefiles). The data collection workflow loops over each EOPatch and consists of:

- **add_data:** Collect all available satellite images for the EOPatch in false color (bands B03, B04, and B08)
- **remove_dates:** Discard images that were taken more than 36 hours away from an available ice chart
- **add_valid_mask:** Collect a mask for each image that says which pixels are valid data
- **add_coverage:** Collect a mask for each image that says which pixels are blocked by clouds
- **remove_cloudy_scenes:** Remove images where the sum of cloudy and non-valid pixels is greater than 5%
- **time_raster:** Custom task to locate the ice chart temporally closest to the image, locate the area of the chart associated with the image, and rasterize into an ice concentration mask for the image
- **save_im:** Save each image and mask 

| <img src="Data/Sentinel-2/Images/image-mask.png" width="600" />     |  
|:--:|
| *Image and mask pair generated through the EO-Learn workflow* |

# 2 Data Processing

## 2.1 Class Definitions

In order to simplify analysis, the 31 SIGRID-3 classes shown in section 1.2 were binned into 8 classes broadly defined as:

- 0: <10% ice
- 1: 10-30% ice
- 2: 30-50% ice
- 3: 50-70% ice
- 4: 70-90% ice
- 5: 90-100% ice
- 6: fast ice (thick ice that is 'fastened' to the coastline)
- 7: land

With these definitions, the pixel-wise distribution of classes across all 3,392 images in the dataset was calculated. There is a strong class imbalance, with open water, 90-100% ice, and land occupying most of the dataset.

|<img src="Data/Sentinel-2/Images/class_dist.png" width="400" /> |
|:--:|
| *Pixel-wise class distribution over all images* |

## 2.2 Data Input Pipeline

Before being fed into the model for training, the following operations were performed on the dataset:

- Image/mask pairs were split into training (80%) and validation (20%) sets
    - The split was stratified based the most common class represented in the images
- Within the training data, images with high amounts of under-represented pixels were over-sampled (eg. duplicated in the training set to increase their weight)
    - This helped address the class imbalance in the dataset that would skew a model towards the over-represented classes
- Random image augmentation:
    - Random flip left-right
    - Random flip up-down
    - Random image rotation by +- 5 degrees (corners we mapped to black in the image and land in the mask)

The result is a stream of image/mask pairs like this:

|<img src="Data/Sentinel-2/Images/input_image_mask.png" width="600" /> |
|:--:|
| *Sample image/mask pair training data. Note the random image rotation.* |

# 3 Model Building

The goal of the model was to automatically generate a sea ice chart based on a satellite image. This is an image segmentation problem, wherein a model is expected to predict a class for each pixel in an image. 

## 3.1 U-Net

A popular convolutional neural network architecture for image segmentation is the 'U-Net'. It consists of a contraction path (composed of successive convolution, ReLU activation, and max pooling operations) followed by an expansion path. In the expansion path, a combination of up-sampling and concatenation with high resolution images from the contraction path allows the network to localize features of the image at higher and higher resolution until each pixel of the image has a predicted class. A diagram of the basic architecture of the network is shown below.

|<img src="Data/Sentinel-2/Images/U-Net.png" width="600" />  |
|:--:|
| *Base U-Net Architecture. Source: [https://arxiv.org/pdf/1505.04597.pdf](https://arxiv.org/pdf/1505.04597.pdf)* |

## 3.2 Model Definition

The model for this project is an adapted version of a U-NET from the Dstl Satellite Imagery Feature Detection Kaggle competition. That competition also aimed to classify pixels in satellite images, so this model architecture was expected to be good fit here too. See [here](https://www.kaggle.com/drn01z3/end-to-end-baseline-with-u-net-keras) for the original model writeup. The base model architecture was supplemented with dropout layers to help with over-fitting. A diagram of the final model architecture is shown below.

<p float="left">
  <img src="Data/Sentinel-2/Images/model-map.png" width="400" /> 
</p>

## 3.3 Training and Predictions

The neural network above was trained for 100 epochs (where an epoch is a run through the entire training dataset). Plots of training/validation loss and mean IoU metric are shown below. IoU is also known as the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index).

|<img src="Data/Sentinel-2/Images/train-val.png" width="600" />   |
|:--:|
| *Model performance over training epochs* |

The lowest validation data loss is achieved after roughly 50 epochs, after which the model begins to over-train. The model weights at this 'optimal' point were saved and used for the final model. A confusion matrix with these weights is below. The model is very good at preicting open water and land, and somewhat poorer at predicting the intermediate ice concentrations (10-90%). This could almost certainly be improved by collecting more images with these intermediate ice concentrations, which would be best achieved by focusing data collection on the springtime/early summer months when the ice is thawing.

|<img src="Data/Sentinel-2/Images/confusion_matrix.png" width="600" />   |
|:--:|
| *Final model confusion matrix for validation data* |

A series of validation data images, true masks, and predicted masks are shown below. The class imbalance of the dataset is apparent here, with land and solid ice dominating many of the images. Nevertheless the model is able to provide a good prediction of localized ice concentration in many cases. It is also interesting to note that the model provides finer detail than the published ice charts. With additional training data, perhaps algorith such as this could be used to develop finer-detailed ice charts than are curently available. 


Here, we will collect and save satellite images over the Hudson Bay Region. It includes data cleaning wherin images are filtered based on data availability, cloud cover, and date. The workflow also generates a sea-ice concentration mask for each image using downloaded Canadian Regional Sea Ice Charts. 

In [None]:
!pip install tensorflow keras

In [None]:
import os
import glob
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory
from keras.preprocessing.image import array_to_img, img_to_array, load_img
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from PIL import Image 

# # set the necessary directories
# img_dir = 'Data/Images/*.jpg'

# img_filenames = os.listdir(img_dir)
# img_names = [s.split('.')[0] for s in img_filenames]

# img_ext = '.jpg'
# print(np.size(img_names))
img_dir_all = 'Data/Sentinel-2/Data/Images/*.jpg'
img_dir = 'Data/Sentinel-2/Data/Images/'
# Use glob to get the list of filenames
img_filenames = glob.glob(img_dir_all)

# Extract the base names without extensions
img_names = [os.path.basename(s).split('.')[0] for s in img_filenames]
img_ext = '.jpg'
# Print the number of images
print(np.size(img_names))

## Defining Masks
Masks are encoded in SIGRID-3 format. See here for more information: https://library.wmo.int/doc_num.php?explnum_id=9270
We will map the ice concentratoin codes according to the following library to simplify the classes

In [None]:
mask_lib = {55:0, #ice free
1:0, #<1/10 (open water)
2:0, #bergy water
10:1, #1/10
12:1, #1/10-2/10
13:1, #1/10-3/10
20:1, #2/20
23:1, #2/20-3/10
24:2, #2/20-4/10
30:2, #...
34:2,
35:2,
40:2,
45:2,
46:3,
50:3,
56:3,
57:3,
60:3,
67:3,
68:4, #...
70:4, #7/10
78:4, #7/10-8/10
79:4, #7/10-9/10
80:4, #8/10
89:4, #8/10-9/10
81:5, #8/10-10/10
90:5, #9/10
91:5, #9/10-10/10
92:6, #10/10 - fast ice
100:7, #land
99:7, #unknown - there is nothing in this class for this dataset
}
#define a colormap for the mask
n_colors=8
ice_colors = n_colors-1
jet = plt.get_cmap('jet', ice_colors)
newcolors = jet(np.linspace(0, 1, ice_colors))
black = np.array([[0, 0, 0, 1]])
white = np.array([[1, 1, 1, 1]])
newcolors = np.concatenate((newcolors, black), axis=0) #land will be black
cmap = ListedColormap(newcolors)

Run through and convert all masks from SIGRID-3 to simplified. Also store the pixel class counts for each mask in a dataframe

In [None]:
#function to map mask values according to above library
def map_mask(mask, lib):
    new_mask = mask.copy()
    for key, val in lib.items():#map the elements of the array to their new values according to the library
        new_mask[mask==key]=val
    return new_mask

#function to calculate the value counts over all pixels in an image (fed in as a numpy array)
def bincount_2d(arr, max_int):
    counts_full = [0 for n in range(max_int)]
    for row in arr:
        counts = np.bincount(row).tolist()#get the counts for the row
        pad = [0 for n in range(max_int-len(counts))]
        counts = counts + pad #add extra zeroes to account for colors above the max in the row
        counts_full = [counts_full[n] + counts[n] for n in range(max_int)]
    return(counts_full)
    
# convert all mask files from SIGRID 3 format to simplified
mask_dir = 'Data/Sentinel-2/Data/Masks/'
mask_ext = '-mask.png'
new_mask_ext = '-mask-mod.png'
dat = []#list that will hold information on the masks
for img_name in img_names:
    name = mask_dir + img_name
    # importing the image
    if os.path.exists(name + mask_ext):
        mask = Image.open(name + mask_ext)

        # converting mask
        mask = np.array(mask)#convert to numpy
        new_mask = map_mask(mask, mask_lib)#map values
    
        #update dataframe
        name = img_name.split('-')  
        d = [img_name, 
             name[0][1:],  #patch id
             name[1][0:4], #year
             name[1][4:6], #month
             name[1][6:8], #day
             name[1][8:10]]#hour
    
        counts = bincount_2d(new_mask, n_colors) #values counts of the class of ice over all pixels in the image
        d.extend(counts)
        dat.append(d)
    
        # exporting the image 
        new_mask = Image.fromarray(new_mask)#convert back to image
        new_mask.save('Data/Sentinel-2/Data/Newmask/' + img_name + new_mask_ext, 'PNG')

mask_dir = 'Data/Sentinel-2/Data/Newmask/'#update mask directory and extension
mask_ext= new_mask_ext

#create dataframe of mask information
mask_df = pd.DataFrame(dat, columns = ['name', 'patch_id', 'year', 'month', 'day', 'hour', 
                            'conc_0', 'conc_1', 'conc_2', 'conc_3', 'conc_4', 'conc_5', 'conc_6',  
                            'conc_land'])

#plot realtive frequency of ice concentrations in images
counts = mask_df.iloc[:,6:].sum()
norm = counts.sum()
probs = counts/norm*100

plt.figure(figsize=(8,5))
probs.plot(kind='bar')
plt.ylabel('Fraction of Pixel Values (%)')
plt.grid()

We see that classes for 0%, 90%, and 100% ice concentration and the land class make up most of the pixels. To deal with this class imbalance in our model, we will over-sample images that contain more than 30% of the minority classes.

In [None]:
mask_df['conc_minor']=mask_df[['conc_1', #lists the total concentration of under-represented ice classes
                               'conc_2', 
                               'conc_3', 
                               'conc_4', 
                              ]].sum(axis=1)

n_pixels = mask_df.iloc[0, 6:].sum(axis=0)#total number of pixels in each image
over_sample_names = mask_df[mask_df['conc_minor']/n_pixels>0.3] #we will over-sample these images of the under-represented classes
over_sample_names = over_sample_names['name'].values.tolist()

In [None]:
mask_df

## Tensorflow Input Pipeline

In [None]:
class_max = mask_df.iloc[:,6:-1].idxmax(axis=1) #category of the most common class in the image. We will stratify our train test split by this
class_max.value_counts()

In [None]:
!pip install tensorflow_addons scikit-learn

In [None]:
import sys
print(sys.executable)

In [None]:
import tensorflow as tf
print(tf.__version__)

In [None]:
img_names_copy = img_names.copy()
#img_names=img_names[0:1000] #I slice the img_names list to match the dataframe
print(np.size(img_names))

In [None]:
print(np.size(img_names))

In [None]:
from sklearn.model_selection import train_test_split
# import tensorflow_addons as tfa  # REMOVED

# Simple replacement for tfa.image.rotate
def rotate_image(image, angle, fill_mode='constant', fill_value=0):
    """
    Simple replacement for tfa.image.rotate using native TensorFlow
    """
    # Convert angle and create transformation matrix
    cos_angle = tf.cos(angle)
    sin_angle = tf.sin(angle)
    
    # Get image dimensions
    height = tf.cast(tf.shape(image)[0], tf.float32)
    width = tf.cast(tf.shape(image)[1], tf.float32)
    
    # Center coordinates
    cx, cy = width / 2.0, height / 2.0
    
    # Rotation matrix around center
    transform = [
        cos_angle, -sin_angle, cx * (1 - cos_angle) + cy * sin_angle,
        sin_angle, cos_angle, cy * (1 - cos_angle) - cx * sin_angle,
        0.0, 0.0
    ]
    
    # Apply transformation
    rotated = tf.raw_ops.ImageProjectiveTransformV3(
        images=tf.expand_dims(image, 0),
        transforms=tf.reshape(transform, [1, 8]),
        output_shape=tf.shape(image)[:2],
        fill_mode=fill_mode.upper(),
        fill_value=fill_value,
        interpolation='BILINEAR' if len(tf.shape(image)) == 3 else 'NEAREST'
    )
    
    return tf.squeeze(rotated, 0)

# pick which images we will use for testing and which for validation
names = mask_df['name'].values
train_names, validation_names, train_max, validation_max = train_test_split(img_names, class_max, 
                                                                            train_size=0.8, test_size=0.2, 
                                                                            random_state=0, stratify=class_max)
#add over-sampled images to the train dataset
train_over_sample_names = np.array([name for name in train_names if name in over_sample_names])
N_over_sample = int(len(train_names)/1.5) #number of additional samples to add
ids = np.arange(len(train_over_sample_names))
choices = np.random.choice(ids, N_over_sample)#an additional set of images to add on to the train names
add_train_names = train_over_sample_names[choices].tolist()
train_names.extend(add_train_names)

IMG_SIZE = (256, 256)

#function to read image and mask from file
def read_image(image_name):
    image = tf.io.read_file(img_dir + image_name + img_ext)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.resize(image, IMG_SIZE)
    image = tf.cast(image, tf.float32) / 255.0
    
    mask = tf.io.read_file(mask_dir + image_name + mask_ext)
    mask = tf.image.decode_image(mask, channels=1, expand_animations=False)
    mask = tf.image.resize(mask, IMG_SIZE)
    mask = tf.cast(mask, tf.uint8)
    return image, mask

import random

#image augmentation function to randomly flip and rotate each image and corresponding mask
def augment_image(image, mask):
    n = tf.random.uniform([], 0,1)
    if n<0.5: 
        image = tf.image.flip_left_right(image)
        mask = tf.image.flip_left_right(mask)
        
    n = tf.random.uniform([], 0,1)
    if n<0.5: 
        image = tf.image.flip_up_down(image)
        mask = tf.image.flip_up_down(mask)
    
    #rotate image randomly in the range of +-5 degrees
    n = tf.random.uniform([], -1,1)
    image = rotate_image(image, np.pi/36*n, fill_mode='constant', fill_value=0) #add black to rotated corners
    mask = rotate_image(mask, np.pi/36*n, fill_mode='constant', fill_value=7) #make this black space correspond to land
    return image, mask

TRAIN_LENGTH = int(len(train_names))
VAL_LENGTH = int(len(validation_names))
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE

ds_train = tf.data.Dataset.from_tensor_slices((train_names))#read filenames
ds_train = ds_train.map(read_image, num_parallel_calls=tf.data.AUTOTUNE) #convert filenames to stream of images/masks
ds_train = ds_train.map(augment_image, num_parallel_calls=tf.data.AUTOTUNE) #convert filenames to stream of images/masks
train_dataset = ds_train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.AUTOTUNE)

ds_val = tf.data.Dataset.from_tensor_slices((validation_names))#read filenames
ds_val = ds_val.map(read_image) #convert filenames to stream of images/masks
val_dataset = ds_val.batch(BATCH_SIZE)

## Display Sample Image and Mask

In [None]:
def display(display_list):
    fig, axs = plt.subplots(nrows=1, ncols = len(display_list), figsize=(15, 6))

    title = ['Input Image', 'True Mask', 'Predicted Mask']

    for i in range(len(display_list)):
        axs[i].set_title(title[i])
        if i==0:
            axs[i].imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
        else:
            msk = axs[i].imshow(display_list[i], cmap = cmap, vmin=0, vmax=n_colors-1)
        axs[i].axis('off')
        
    #plot colorbar
    cbar = fig.colorbar(msk, ax=axs, location='right')
    tick_locs = (np.arange(n_colors) + 0.5)*(n_colors-1)/n_colors#new tick locations so they are in the middle of the colorbar
    cbar.set_ticks(tick_locs)
    cbar.set_ticklabels(np.arange(n_colors))
    plt.show()

for image, mask in ds_train.take(11):
    sample_image, sample_mask = image, mask
display([sample_image, sample_mask])

In [None]:
# add_train_names

In [None]:
for add_train_names in add_train_names:
    name = mask_dir + add_train_names
    # importing the image
    if os.path.exists(name + mask_ext):
        mask = Image.open(name + mask_ext)
        # converting mask
        mask = np.array(mask)#convert to numpy
        new_mask = map_mask(mask, mask_lib)#map values
        #update dataframe
        name = img_name.split('-')
        d = [img_name,
             name[0][1:], #patch id
             name[1][0:4], #year
             name[1][4:6], #month
             name[1][6:8], #day
             name[1][8:10]]#hour
        counts = bincount_2d(new_mask, n_colors) #values counts of the class of ice over all pixels in the image
        d.extend(counts)
        dat.append(d)
        # exporting the image
        new_mask = Image.fromarray(new_mask) #convert back to image
        new_mask.save('Data/Sentinel-2/Data/Newmask/' + img_name + new_mask_ext, 'PNG')

mask_dir = 'Data/Sentinel-2/Data/Newmask/'#update mask directory and extension
mask_ext= new_mask_ext
#create dataframe of mask information
mask_df = pd.DataFrame(dat, columns = ['name', 'patch_id', 'year', 'month', 'day', 'hour',
'conc_0', 'conc_1', 'conc_2', 'conc_3', 'conc_4', 'conc_5', 'conc_6',
'conc_land'])
#plot realtive frequency of ice concentrations in images
counts = mask_df.iloc[:,6:].sum()
norm = counts.sum()
probs = counts/norm*100
plt.figure(figsize=(8,5))
probs.plot(kind='bar')
plt.ylabel('Fraction of Pixel Values (%)')
plt.grid()        

## Display Sample Image and Mask

In [None]:
import PIL
print(PIL.__version__)
from keras.preprocessing.image import array_to_img

In [None]:
def display(display_list):
    fig, axs = plt.subplots(nrows=1, ncols = len(display_list), figsize=(15, 6))
    title = ['Input Image', 'True Mask', 'Predicted Mask']
    for i in range(len(display_list)):
        axs[i].set_title(title[i])
        if i==0:
            axs[i].imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
        else:
            msk = axs[i].imshow(display_list[i], cmap = cmap, vmin=0, vmax=n_colors-1)
            axs[i].axis('off')
    #plot colorbar
    cbar = fig.colorbar(msk, ax=axs, location='right')
    tick_locs = (np.arange(n_colors) + 0.5)*(n_colors-1)/n_colors#new tick locations so they are in the middle of the colorbar
    cbar.set_ticks(tick_locs)
    cbar.set_ticklabels(np.arange(n_colors))
    plt.show()

In [None]:
for image, mask in ds_train.take(20):
    sample_image, sample_mask = image, mask
    display([sample_image, sample_mask])

## Define Model
This model is an adapted version U-NET from the Dstl Satellite Imagery Feature Detection Kaggle competition. That competition also aimed to classify pixels in satelite images, so this model architucture might be a good fit here too. https://www.kaggle.com/drn01z3/end-to-end-baseline-with-u-net-keras

In [None]:
!pip install pydot

In [None]:
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate, Dropout
from tensorflow.keras import Model
def get_unet():
    inputs = Input(shape=[IMG_SIZE[0], IMG_SIZE[1], 3])
    conv1 = Conv2D(32, 3, 1, activation='relu', padding='same')(inputs)
    conv1 = Conv2D(32, 3, 1, activation='relu', padding='same')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    drop1 = Dropout(0.5)(pool1)

    conv2 = Conv2D(64, 3, 1, activation='relu', padding='same')(drop1)
    conv2 = Conv2D(64, 3, 1, activation='relu', padding='same')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    drop2 = Dropout(0.5)(pool2)

    conv3 = Conv2D(128, 3, 1, activation='relu', padding='same')(drop2)
    conv3 = Conv2D(128, 3, 1, activation='relu', padding='same')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    drop3 = Dropout(0.5)(pool3)

    conv4 = Conv2D(256, 3, 1, activation='relu', padding='same')(drop3)
    conv4 = Conv2D(256, 3, 1, activation='relu', padding='same')(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(conv4)
    drop4 = Dropout(0.5)(pool4)

    conv5 = Conv2D(512, 3, 1, activation='relu', padding='same')(drop4)
    conv5 = Conv2D(512, 3, 1, activation='relu', padding='same')(conv5)

    up6 = Conv2D(256, 3, activation = 'relu', padding = 'same')(UpSampling2D(size=(2, 2))(conv5))
    merge6 = concatenate([up6, conv4], axis=3)
    drop6 = Dropout(0.5)(merge6)
    conv6 = Conv2D(256, 3, 1, activation='relu', padding='same')(drop6)
    conv6 = Conv2D(256, 3, 1, activation='relu', padding='same')(conv6)
    
    up7 = Conv2D(128, 3, activation = 'relu', padding = 'same')(UpSampling2D(size=(2, 2))(conv6))
    merge7 = concatenate([up7, conv3], axis=3)
    drop7 = Dropout(0.5)(merge7)
    conv7 = Conv2D(128, 3, 1, activation='relu', padding='same')(drop7)
    conv7 = Conv2D(128, 3, 1, activation='relu', padding='same')(conv7)
    
    up8 = Conv2D(64, 3, activation = 'relu', padding = 'same')(UpSampling2D(size=(2, 2))(conv7))
    merge8 = concatenate([up8, conv2], axis=3)
    drop8 = Dropout(0.5)(merge8)
    conv8 = Conv2D(64, 3, 1, activation='relu', padding='same')(drop8)
    conv8 = Conv2D(64, 3, 1, activation='relu', padding='same')(conv8)
    
    up9 = Conv2D(32, 3, activation = 'relu', padding = 'same')(UpSampling2D(size=(2, 2))(conv8))
    merge9 = concatenate([up9, conv1], axis=3)
    drop9 = Dropout(0.5)(merge9)
    conv9 = Conv2D(32, 3, 1, activation='relu', padding='same')(drop9)
    conv9 = Conv2D(32, 3, 1, activation='relu', padding='same')(conv9)

    conv10 = Conv2D(n_colors, 1, 1, activation='softmax')(conv9) #softmax converts the output to a list of probabilities that must sum to 1

    model = Model(inputs=inputs, outputs=conv10)
    return model

model = get_unet() 
tf.keras.utils.plot_model(model, show_shapes=True)

## Train Model

In [None]:
#function to generate a mask from the model predictions
def create_mask(pred_mask, ele=0):
    pred_mask = tf.argmax(pred_mask, axis=-1)#use the highest proabbaility class as the prediction
    pred_mask = pred_mask[..., tf.newaxis]
    return pred_mask[ele]

#helper functions to plot image, mask, and predicted mask while training
def show_predictions(dataset=None, num=1, ele=0):
    if dataset:
        for image, mask in dataset.take(num):
            pred_mask = model.predict(image)
            display([image[ele], mask[ele], create_mask(pred_mask, ele)])
    else:
        display([sample_image, sample_mask, create_mask(model.predict(sample_image[tf.newaxis, ...]))])

#function to display loss during training
def plot_loss_acc(loss, val_loss, epoch):#, acc, val_acc, epoch):
    
    epochs = range(epoch+1)
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8,5))

    ax.plot(epochs, loss, 'r', label='Training loss')
    ax.plot(epochs, val_loss, 'bo', label='Validation loss')
    ax.set_title('Training and Validation Loss')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss Value')
    ax.legend()
    plt.show()
    
#callback to clear output and show predictions
from IPython.display import clear_output

class DisplayCallback(tf.keras.callbacks.Callback):
    def on_train_begin(self, logs=None):
        self.loss = []
        self.val_loss = []
    
    def on_epoch_end(self, epoch, logs=None):
        clear_output(wait=True)
        
        self.loss.append(logs['loss'])
        self.val_loss.append(logs['val_loss'])
        
        show_predictions()
        plot_loss_acc(self.loss, self.val_loss, epoch)
        
#callback to reduce learning rate when loss plateaus
lr_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.8, patience=8, verbose=1,)

#Define IoU metric (by stack overflow user HuckleberryFinn)
class UpdatedMeanIoU(tf.keras.metrics.MeanIoU):
    def __init__(self,
               y_true=None,
               y_pred=None,
               num_classes=None,
               name=None,
               dtype=None):
        super(UpdatedMeanIoU, self).__init__(num_classes = num_classes,name=name, dtype=dtype)

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.math.argmax(y_pred, axis=-1)
        return super().update_state(y_true, y_pred, sample_weight)

# Create a callback that saves the model's weights
checkpoint_path = "Data/Sentinel-2/training/cp-{epoch:04d}.keras"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    verbose=1,
    # save_weights_only=True,
    save_freq=5*BATCH_SIZE)

## First try - default settings with 10 epochs

In [None]:
#train model first try
model=get_unet()
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy', UpdatedMeanIoU(num_classes=n_colors)])

In [None]:
EPOCHS = 10
VAL_SUBSPLITS = 5
VALIDATION_STEPS = VAL_LENGTH//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_dataset, 
                          epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=val_dataset,
                          callbacks=[DisplayCallback(), lr_callback, cp_callback])

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,5))
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']
epochs = range(EPOCHS)
ax[0].plot(epochs, loss, 'r', label='Training')
ax[0].plot(epochs, val_loss, 'bo', label='Validation')
ax[0].set_title('Training and Validation Loss')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Loss Value')
ax[0].legend()
IoU_key = list(model_history.history.keys())[2]
acc = model_history.history[IoU_key]
val_acc = model_history.history['val_'+IoU_key]
ax[1].plot(epochs, acc, 'r', label='Training')
ax[1].plot(epochs, val_acc, 'bo', label='Validation')
ax[1].set_title('Training and Validation IoU')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('IoU Value')
ax[1].legend()
#plt.show()
plt.savefig('10E_sparse_categorical_accuracy.png', dpi=300, bbox_inches='tight')

In [None]:
#load weights for checkpoint 4
# print(os.listdir(checkpoint_dir))
model.load_weights(checkpoint_dir + '/cp-0005.keras')
scores = model.evaluate(val_dataset, verbose=0)
print('Final Model Validation Scores')
print('Loss: {:.3f}'.format(scores[0]))
print('Accuracy: {:.3f}'.format(scores[1]))
print('IoU: {:.3f}'.format(scores[2]))

In [None]:
!pip install seaborn

In [None]:
#plot a confusion matrix for the first try
from sklearn.metrics import confusion_matrix
import seaborn as sns

def get_cm(model, val_ds):
    cm = np.zeros((8,8))
    for img_batch, mask_batch in val_dataset:
        y_pred = []
        y_true = []
        pred_batch = model.predict(img_batch)
        pred_batch = tf.argmax(pred_batch, axis=-1)#take the highest probability as the prediction for each pixel
        for n, pred in enumerate(pred_batch):
            pred = np.array(pred).flatten() #flattened array of predicted pixels for each image
            mask = np.array(mask_batch[n, ...]).flatten() #flattened array of mask pixels for the image
            y_pred.extend(pred)
            y_true.extend(mask)
        cm = cm + confusion_matrix(y_true, y_pred)
    return cm

cm = get_cm(model, val_dataset)
plt.figure(figsize=(12,8))
sns.heatmap(cm.astype(int), annot=True, fmt="d")
plt.title('Confusion matrix')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.show()

In [None]:
# Normalize the confusion matrix by rows (actual values)
cm_row_normalized = cm.astype(float) / cm.sum(axis=1, keepdims=True)
# Plot row-normalized confusion matrix
plt.figure(figsize=(12, 8))
sns.heatmap(cm_row_normalized, annot=True, fmt=".2%", cmap="YlGnBu")
plt.title('Row-Normalized Confusion Matrix (Rows sum to 100%)')
plt.ylabel('Actual Label')
plt.xlabel('Predicted Label')
plt.show()

## Different metric

In [None]:
import tensorflow.keras.backend as K

# train model second try with only 10 epochs and 1000 images and with different metrics
smooth = 1e-12

def jaccard_coef(y_true, y_pred):
    # **author** = Vladimir Iglovikov (modified for sparse categorical)
    # Convert sparse labels to one-hot for Jaccard calculation
    y_true_one_hot = tf.one_hot(tf.cast(y_true, tf.int32), depth=n_colors)
    y_true_one_hot = tf.squeeze(y_true_one_hot, axis=-2)  # Remove extra dimension
    
    intersection = K.sum(y_true_one_hot * y_pred, axis=[0, 1, 2])
    sum_ = K.sum(y_true_one_hot + y_pred, axis=[0, 1, 2])
    jac = (intersection + smooth) / (sum_ - intersection + smooth)
    return K.mean(jac)


def jaccard_coef_int(y_true, y_pred):
    # **author** = Vladimir Iglovikov (modified for sparse categorical)
    y_pred_pos = tf.nn.softmax(y_pred)  # Apply softmax to get probabilities
    y_true_one_hot = tf.one_hot(tf.cast(y_true, tf.int32), depth=n_colors)
    y_true_one_hot = tf.squeeze(y_true_one_hot, axis=-2)  # Remove extra dimension
    
    intersection = K.sum(y_true_one_hot * y_pred_pos, axis=[0, 1, 2])
    sum_ = K.sum(y_true_one_hot + y_pred_pos, axis=[0, 1, 2])
    jac = (intersection + smooth) / (sum_ - intersection + smooth)
    return K.mean(jac)

# Alternative: Use built-in IoU metric (recommended)
def mean_iou_metric(y_true, y_pred):
    # Convert predictions to class predictions
    y_pred_classes = tf.argmax(y_pred, axis=-1)
    y_true_squeeze = tf.squeeze(y_true, axis=-1)
    
    # Use TensorFlow's built-in MeanIoU
    m = tf.keras.metrics.MeanIoU(num_classes=n_colors)
    m.update_state(y_true_squeeze, y_pred_classes)
    return m.result()

model = get_unet()

# Option 1: Use custom Jaccard metrics
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), 
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False), 
    metrics=[jaccard_coef, jaccard_coef_int, 'sparse_categorical_accuracy']
)

EPOCHS = 10
VAL_SUBSPLITS = 5
VALIDATION_STEPS = VAL_LENGTH//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_dataset,
                          epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=val_dataset,
                          callbacks=[DisplayCallback(), lr_callback, cp_callback])

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,5))
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']
epochs = range(EPOCHS)
ax[0].plot(epochs, loss, 'r', label='Training')
ax[0].plot(epochs, val_loss, 'bo', label='Validation')
ax[0].set_title('Training and Validation Loss')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Loss Value')
ax[0].legend()
IoU_key = list(model_history.history.keys())[2]
acc = model_history.history[IoU_key]
val_acc = model_history.history['val_'+IoU_key]
ax[1].plot(epochs, acc, 'r', label='Training')
ax[1].plot(epochs, val_acc, 'bo', label='Validation')
ax[1].set_title('Training and Validation IoU')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('IoU Value')
ax[1].legend()
#plt.show()
plt.savefig('10E_Jaccard_coef.png', dpi=300, bbox_inches='tight')

In [None]:
#load weights for checkpoint 5
# print(os.listdir(checkpoint_dir))
model.load_weights(checkpoint_dir + '/cp-0005.ckpt')
scores = model.evaluate(val_dataset, verbose=0)
print('Final Model Validation Scores')
print('Loss: {:.3f}'.format(scores[0]))
print('Accuracy: {:.3f}'.format(scores[1]))
print('IoU: {:.3f}'.format(scores[2]))

In [None]:
show_predictions(val_dataset, num=10, ele=3)

## Different loss function

In [None]:
#train model with mean squared error as loss function
model=get_unet()
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['sparse_categorical_accuracy', UpdatedMeanIoU(num_classes=n_colors)])

In [None]:
EPOCHS = 10
VAL_SUBSPLITS = 5
VALIDATION_STEPS = VAL_LENGTH//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_dataset,
                          epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=val_dataset,
                          callbacks=[DisplayCallback(), lr_callback, cp_callback])

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15,5))
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']
epochs = range(EPOCHS)
ax[0].plot(epochs, loss, 'r', label='Training')
ax[0].plot(epochs, val_loss, 'bo', label='Validation')
ax[0].set_title('Training and Validation Loss')
ax[0].set_xlabel('Epoch')
ax[0].set_ylabel('Loss Value')
ax[0].legend()
IoU_key = list(model_history.history.keys())[2]
acc = model_history.history[IoU_key]
val_acc = model_history.history['val_'+IoU_key]
ax[1].plot(epochs, acc, 'r', label='Training')
ax[1].plot(epochs, val_acc, 'bo', label='Validation')
ax[1].set_title('Training and Validation IoU')
ax[1].set_xlabel('Epoch')
ax[1].set_ylabel('IoU Value')
ax[1].legend()
#plt.show()
plt.savefig('10E_Jaccard_coef.png', dpi=300, bbox_inches='tight')