# Test task - Airbus ship detection

### By Rostyslav Kostiuk

## Pre-coding part

So before I started this task, I had to prepare myself a little bit because I had never dealt with image segmentation before. I started from the concept of convolutional neural networks. In order to remind myself how they work, I watched a video on these networks from MIT. There I noticed an interesting architecture of networks that are used for this kind of problem - U-net. And in order to understand how it works, I looked at a whole playlist on this topic and also read a couple of articles. So I was almost ready but had to deal with the nuances of this challenge and so I watched a few more videos in which people talked about the experience of participating in this challenge (no code).

In the end, given the fact that I am still passing a session, I completed the analysis of the theory on Friday and there are 3 days left for the coding and training itself.

## Preparation
Before running the key blocks lets install and import some libraries that I used for the solution

In [None]:
!pip uninstall -y tensorflow-io
!pip install -U -q segmentation_models

In [None]:
import numpy as np
import pandas as pd
from PIL import Image
from albumentations import HorizontalFlip, VerticalFlip, Compose, RandomBrightnessContrast,ShiftScaleRotate, GaussNoise
import os
os.environ["SM_FRAMEWORK"] = "tf.keras"

from tensorflow import keras
import cv2
import segmentation_models as sm
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.applications import MobileNet
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import img_to_array, load_img
import numpy as np
import random
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.mixed_precision import set_global_policy
from sklearn.metrics import jaccard_score

## Key idea
The key idea behind this task is to develop a two-step solution for the Airbus Ship Detection Challenge. The first step involves creating a classification model to determine whether or not an image contains a ship. The second step involves creating a segmentation model to identify the exact location of the ship in the images that were classified as containing a ship. The classification model helps to reduce the computational load for the segmentation model by filtering out images without ships.

## Classification model - training data
The first step in training a classification model is preparing the data. In this task, we have a set of images, each of which may or may not contain a ship. The images are labeled accordingly, providing us with a supervised learning problem.

Firstly I have prepared a functions that will help to crop the image into the patches 256 x 256 and another one that helps to preprocess our run length encoding into  the binary mask

In [None]:
def crop_image(image, crop_size=256):
    if isinstance(image, str):  # If the image is a file path
        img = Image.open(image)
    elif isinstance(image, np.ndarray):  # If the image is a numpy array
        img = Image.fromarray(image)

    width, height = img.size

    crops = []
    for i in range(0, height, crop_size):
        for j in range(0, width, crop_size):
            box = (j, i, j + crop_size, i + crop_size)
            crop = img.crop(box)
            crops.append(np.array(crop))

    return crops


def rle_to_mask(rle_list, shape=(768, 768)):
    # Convert the run-length encoding to a binary mask
    mask = np.zeros(shape[0]*shape[1], dtype=np.uint8)

    for rle in rle_list:
        if pd.isnull(rle):  # If the RLE is NaN, skip this loop iteration
            continue
        starts, lengths = map(np.asarray, (rle.split()[0:][::2], rle.split()[1:][::2]))
        starts = starts.astype(int) - 1
        lengths = lengths.astype(int)  # Convert lengths to int
        ends = starts + lengths
        for start, end in zip(starts, ends):
            mask[start:end] = 1

    return mask.reshape(shape).T  # Reshape the mask

Here we just read the data segmentation csv, so change the path for a fie

In [None]:
# Load the CSV file into a DataFrame
df = pd.read_csv('path_to_file.csv')

# Group the DataFrame by 'ImageId' to get a list of masks for each image
grouped = df.groupby('ImageId')['EncodedPixels'].apply(list)

And as a last step we create a patch images and responding for them labels. We take the image crop it and the respencive mask. Then if there in binary mask exist a pixel that classifies a ship we label this image with 1.

In [None]:
# Initialize empty lists to store the image patches and labels
patch_images = []
labels = []

# Loop over the grouped DataFrame
for filename, rle_list in grouped[:int(len(grouped)*0.012)].items():
    image_path = 'path_to_image_folder' + filename
    # Crop the image into patches
    crops = crop_image(image_path)
    # Convert the run-length encoding to a binary mask
    mask = rle_to_mask(rle_list)
    # Crop the mask into patches
    mask_crops = crop_image(mask)
    # Loop over the image and mask patches
    for img, msk in zip(crops, mask_crops):
        # Label the patch as '1' if it contains a ship (mask is not all zeros), '0' otherwise
        label = 1 if np.any(msk) else 0
        # Append the image patch and its label to their respective lists
        patch_images.append(img)
        labels.append(label)

# Convert the lists into numpy arrays for future use
patch_images = np.array(patch_images)
labels = np.array(labels)

X_train, X_val, y_train, y_val = train_test_split(patch_images, labels, test_size=0.2, random_state=42)


## Classification model - training model
For the classification task, we chose to use a MobileNet model. MobileNet is a type of convolutional neural network designed for mobile and embedded vision applications. It's known for being lightweight and efficient, with lower computational requirements than many other models, making it a good choice for tasks where computational resources may be limited.

MobileNet achieves this efficiency through the use of depthwise separable convolutions, a type of convolution that reduces the number of parameters and computations in the network. This makes the network faster and smaller, while still maintaining a high level of performance.

In our case, we use MobileNet as a feature extractor, and add a few additional layers on top to perform the binary classification task (ship or no ship). The model is trained using a binary cross-entropy loss function, which is suitable for binary classification problems.

In [None]:
set_global_policy('mixed_float16')

# Define a data generator for on-the-fly normalization
datagen = ImageDataGenerator(rescale=1./255)

# Use the data generator to load the data
train_generator = datagen.flow(X_train, y_train, batch_size=32)
val_generator = datagen.flow(X_val, y_val, batch_size=32)

# Load the pre-trained MobileNetV2 model, excluding the top layer
base_model = MobileNet(weights='imagenet', include_top=False)

# Add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)

# Add a fully-connected layer
x = Dense(512, activation='relu')(x)

# Add a logistic layer for binary classification
# Change the policy for the output layer to 'float32' for stability
predictions = Dense(1, activation='sigmoid', dtype='float32')(x)

# Construct the final model
model = Model(inputs=base_model.input, outputs=predictions)

# Freeze the layers of the base model
for layer in base_model.layers:
    layer.trainable = False

# Compile the model
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Set up the early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Set up the model checkpoint callback to save the best model based on validation accuracy
model_checkpoint = ModelCheckpoint('best_model.h5', monitor='val_accuracy', save_best_only=True)

model.fit(train_generator, validation_data=val_generator, steps_per_epoch=len(X_train) // 32,
          epochs=20, callbacks=[early_stopping, model_checkpoint])

## Classification model - testing model
To test the model's performance I decide to  perform just a 'sanity check' to ensure the model's predictions make sense. This involves visually inspecting some of the images along with their predicted labels and the model's predicted probabilities. This can help us catch any obvious errors or issues with the model's predictions.



In [None]:
def test_model(X_test, y_test):
    # Randomly select an index from the test data
    idx = random.randint(0, len(X_test) - 1)

    # Get the corresponding image and label
    image = X_test[idx]
    label = y_test[idx]

    # Normalize the image
    image = image / 255.0

    # Convert the image to an array and expand dimensions for model prediction
    image_array = np.expand_dims(image, axis=0)

    # Make a prediction
    prediction = model.predict(image_array)

    # Interpret the model's prediction
    if prediction > 0.5:
        prediction_text = "Ship Detected"
    else:
        prediction_text = "No Ship Detected"

    # Display the image and the model's prediction
    plt.imshow(X_test[idx])
    plt.title(f"Actual: {'Ship Detected' if label == 1 else 'No Ship Detected'}\nPredicted: {prediction_text}")
    plt.show()


In [None]:
test_model(X_val, y_val)

## Segmentation model - training data
The training data for the segmentation model is prepared in a similar way to the classification model, with a few key differences. The main difference is that instead of binary labels indicating the presence or absence of a ship, we have pixel-level labels indicating the exact location of the ship in the image. These labels are typically in the form of a binary mask, with 1s where the ship is located and 0s elsewhere.

Another key aspect of preparing the data for the segmentation model is the use of data augmentation. Data augmentation involves creating modified versions of the images in the training set, which can help the model generalize better to new data. For this task, we use a variety of augmentation techniques, including horizontal and vertical flips, random brightness and contrast adjustments, shift and scale transformations, and the addition of Gaussian noise. These augmentations can help the model learn to recognize ships in a variety of different conditions and orientations.

As with the classification model, the data is then split into a training set and a validation set. The training set is used to train the model, while the validation set is used to evaluate the model's performance and tune hyperparameters.

In [None]:
# Function to augment images and masks
def augment_image(image, mask):
    transform = Compose([
        HorizontalFlip(p=0.4),
        VerticalFlip(p=0.4),
        RandomBrightnessContrast(p=0.2),
        ShiftScaleRotate(p=0.1),
        GaussNoise(p=0.2)
    ])
    data = {"image": np.array(image), "mask": mask}
    augmented = transform(**data)

    return augmented["image"], augmented["mask"]

In [None]:
aug_images = []
aug_masks = []

# Loop over the grouped DataFrame
for filename, rle_list in grouped.items():
    image_path = 'path_to_folder_with_photo' + filename
    # Crop the image into patches
    crops = crop_image(image_path)
    # Convert the run-length encoding to a binary mask
    mask = rle_to_mask(rle_list)
    # Crop the mask into patches
    mask_crops = crop_image(mask)
    # Loop over the image and mask patches
    for img, msk in zip(crops, mask_crops):
        # Normalize the image patch
        img_norm = img / 255.0
        # Reshape the image patch for prediction
        img_pred = np.expand_dims(img_norm, axis=0)
        # Predict if the image patch contains a ship
        prediction = model.predict(img_pred, verbose=0)
        if prediction >= 0.5:  # If the model predicts a ship
            for _ in range(3):
                aug_img, aug_msk = augment_image(img, msk)
                # Append the augmented image and mask patches to their respective lists
                aug_images.append(aug_img)
                aug_masks.append(aug_msk)

# Convert the lists into numpy arrays for future use
aug_images = np.array(aug_images)
aug_masks = np.array(aug_masks)

And again a small sanity check

In [None]:
idx = np.random.randint(len(aug_images))

# Display the image patch
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.imshow(aug_images[idx])
plt.title('Image Patch')

# Display the corresponding mask
plt.subplot(1, 2, 2)
plt.imshow(aug_masks[idx], cmap='gray')
plt.title('Mask Patch')

plt.show()

## Segmentation model - training model (Resnet18)
For the segmentation task, we chose to use a U-Net architecture with a ResNet18 backbone. The U-Net architecture is particularly suited for image segmentation tasks because it combines the strengths of a contracting path (for context) and an expansive path (for localization).

The ResNet18 backbone is used to extract features from the images. ResNet, or Residual Network, is a type of convolutional neural network that uses skip connections or shortcuts to jump over some layers. This helps to solve the vanishing gradient problem, allowing the network to be deeper and therefore able to learn more complex features.

The choice of U-Net with a ResNet18 backbone was driven by a few factors. First, U-Net has proven to be very effective for image segmentation tasks, making it a natural choice for this task. Second, ResNet18 is relatively lightweight and efficient, making it feasible to train even with limited hardware resources. Finally, the combination of U-Net and ResNet18 is known to be effective and has been used successfully in many similar tasks.

The model is trained using a suitable loss function for segmentation tasks, such as binary cross-entropy or Dice loss. During training, we monitor the model's performance on the validation set to avoid overfitting and to tune hyperparameters.

In [None]:
BACKBONE = 'resnet18'  # Use ResNet18 as the backbone
preprocess_input = sm.get_preprocessing(BACKBONE)

# Split your data into training and validation sets
x_train, x_val, y_train, y_val = train_test_split(aug_images, aug_masks, test_size=0.2, random_state=42)

# Preprocess input
x_train = preprocess_input(x_train)
x_val = preprocess_input(x_val)
y_train = y_train.astype('float32')
y_val = y_val.astype('float32')  # replace with your validation masks


# Define model
model = sm.Unet(BACKBONE, encoder_weights='imagenet')
model.compile(
    optimizer='Adam',
    loss=sm.losses.DiceLoss(),
    metrics=[sm.metrics.IOUScore()],
)

checkpoint = ModelCheckpoint("best_model_segment.h5", monitor='val_loss', verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=1)

callbacks_list = [checkpoint, early_stopping]

# Fit model
model.fit(
   x=x_train,
   y=y_train,
   batch_size=16,
   epochs=100,
   validation_data=(x_val, y_val),
   callbacks=callbacks_list
)

## Model evaluations or its inference (Resnet18)
After training the segmentation model, we evaluate its performance using two common metrics for segmentation tasks: the Dice coefficient and the Intersection over Union (IoU) score.

But firstly let`s prepare the testing data as in the kaggle dataset there is no adequately prepared one soI decide to use the part of training that I have never used before


In [None]:
# Get a list of unique image IDs
image_ids = grouped.index.tolist()
# Calculate the 90% mark
split_index = int(len(image_ids) * 0.90)
# Get the test image IDs
test_image_ids = image_ids[split_index:]


# Path to the images
image_dir = "path_to_images"

# Initialize variables to store the true and predicted masks
true_masks = []
pred_masks = []




# Iterate over the test image IDs
for image_id in test_image_ids:
    # Load the image
    image = cv2.imread(os.path.join(image_dir, image_id))

    # Preprocess the image
    image = cv2.resize(image, (256, 256))  # resize to match the model's expected input size
    image = preprocess_input(image)

    # Reshape the image
    image = np.expand_dims(image, axis=0)

    # Make the prediction
    prediction = model.predict(image, verbose=0)[0]

    # Add the predicted mask to the list
    pred_masks.append(prediction)

    # Get the true mask for this image
    rle_encoded_mask = grouped.loc[image_id]

    # Decode the RLE-encoded mask
    true_mask = rle_to_mask(rle_encoded_mask, (768, 768))  # replace with your function to decode the mask
    true_mask = cv2.resize(true_mask, (256, 256))

    # Add the true mask to the list
    true_masks.append(true_mask)

# convert to a binary mask
binary_pred_masks = np.where(np.concatenate(pred_masks) > 0.5, 1, 0)

**Dice Coefficient:**  For image segmentation tasks, it can be used to measure the similarity between the predicted segmentation and the ground truth. The Dice coefficient ranges from 0 to 1, where 1 indicates perfect agreement between the two samples, and 0 indicates no agreement. Its so called f1 score in terms of area.

**Intersection over Union (IoU):** Also known as the Jaccard index, this is another statistical tool used to measure the similarity between two samples. It is defined as the size of the intersection divided by the size of the union of two label sets. For image segmentation tasks, it can be used to measure the overlap between the predicted segmentation and the ground truth.

In [None]:
iou_score = jaccard_score(np.concatenate(true_masks).ravel(), binary_pred_masks.ravel())

print(f"IOU Score: {iou_score}")

In [None]:
def dice_score(y_true, y_pred):
    y_true_f = y_true.flatten()
    y_pred_f = y_pred.flatten()
    intersection = np.sum(y_true_f * y_pred_f)
    return (2. * intersection + 1.) / (np.sum(y_true_f) + np.sum(y_pred_f) + 1.)


In [None]:
dice = dice_score(np.concatenate(true_masks),  binary_pred_masks.ravel())

print(f"Dice Score: {dice}")


The evaluation of the resnet gave us such results (you can check by yourself):

- IOU Score - 12%
- Dice Score - 5%

These are a pretty bad result so lets perform again a small check and look where is an issue

In [None]:
# Select a random image ID
random_id = np.random.choice(test_image_ids)

# Load the image
image = cv2.imread(os.path.join(image_dir, random_id))

# Preprocess the image
preprocessed_image = cv2.resize(image, (256, 256))  # resize to match the model's expected input size
preprocessed_image = preprocess_input(preprocessed_image)

# Reshape the image
preprocessed_image = np.expand_dims(preprocessed_image, axis=0)

# Make the prediction
prediction = model.predict(preprocessed_image)[0]

# Get the true mask for this image
rle_encoded_mask = grouped.loc[random_id]

# Decode the RLE-encoded mask
true_mask = rle_to_mask(rle_encoded_mask, (768, 768))  # replace with your function to decode the mask
true_mask = cv2.resize(true_mask, (256, 256))

# Plot the image, the actual mask, and the predicted mask
fig, axs = plt.subplots(1, 3, figsize=(20, 20))

axs[0].imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
axs[0].set_title('Image')

axs[1].imshow(true_mask, cmap='gray')
axs[1].set_title('Actual Mask')

axs[2].imshow(prediction.squeeze(), cmap='gray')
axs[2].set_title('Predicted Mask')

for ax in axs:
    ax.axis('off')

plt.show()



As we can see there are a lot of problems: in some places it don`t see a ship (rarely), in a lot of situation it predicts the ship where it is not actually, and also the form of prediction is like spot not a rectangle.

## Segmentation model - training model (EfficientNetB4)
After evaluating the performance of the ResNet18 model, I decided to train a second model using the EfficientNetB4 architecture. EfficientNet is a family of models that were designed to achieve good performance while being efficient in terms of computational resources.

The key idea behind EfficientNet is to scale up the width, depth, and resolution of the model in a balanced way. Traditional approaches to improving the performance of convolutional neural networks often involve increasing the depth (adding more layers) or the width (adding more channels) of the model. However, these approaches can lead to increased computational requirements and may not always lead to improved performance. EfficientNet addresses this by using a compound scaling method that scales up all dimensions of the model in a balanced way.

The choice of EfficientNetB4 was driven by its balance of performance and efficiency, as well as its success in similar tasks.

In [None]:
BACKBONE = 'efficientnetb4'  # Use EfficientNetB4 as the backbone
preprocess_input = sm.get_preprocessing(BACKBONE)

# Split your data into training and validation sets
x_train, x_val, y_train, y_val = train_test_split(aug_images, aug_masks, test_size=0.2, random_state=42)

# Preprocess input
x_train = preprocess_input(x_train)
x_val = preprocess_input(x_val)
y_train = y_train.astype('float32')
y_val = y_val.astype('float32')

# Define model
model = sm.Unet(BACKBONE, encoder_weights='imagenet')
model.compile(
    optimizer=Adam(lr=1e-4),  # Decrease learning rate
    loss=sm.losses.binary_crossentropy + sm.losses.dice_loss,  # Use combination of BCE and Dice loss
    metrics=[sm.metrics.IOUScore()],
)

checkpoint = ModelCheckpoint("best_model_segment.h5", monitor='val_loss', verbose=1, save_best_only=True)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1)  # Increase patience
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=1e-6, verbose=1)  # Add ReduceLROnPlateau callback

callbacks_list = [checkpoint, early_stopping, reduce_lr]

# Fit model
model.fit(
   x=x_train,
   y=y_train,
   batch_size=16,
   epochs=100,
   validation_data=(x_val, y_val),
   callbacks=callbacks_list
)


## Model evaluations or its inference (EfficientNetB4)

To exaluate the performance of these model I used the same approach as for the ResNet18

In [None]:
# Initialize variables to store the true and predicted masks
true_masks = []
pred_masks = []

# Iterate over the test image IDs
for image_id in test_image_ids:
    # Load the image
    image = cv2.imread(os.path.join(image_dir, image_id))

    # Preprocess the image
    image = cv2.resize(image, (256, 256))  # resize to match the model's expected input size
    image = preprocess_input(image)

    # Reshape the image
    image = np.expand_dims(image, axis=0)

    # Make the prediction
    prediction = model.predict(image, verbose=0)[0]

    # Add the predicted mask to the list
    pred_masks.append(prediction)

    # Get the true mask for this image
    rle_encoded_mask = grouped.loc[image_id]

    # Decode the RLE-encoded mask
    true_mask = rle_to_mask(rle_encoded_mask, (768, 768))  # replace with your function to decode the mask
    true_mask = cv2.resize(true_mask, (256, 256))

    # Add the true mask to the list
    true_masks.append(true_mask)

# convert to a binary mask
binary_pred_masks = np.where(np.concatenate(pred_masks) > 0.5, 1, 0)

In [None]:
iou_score = jaccard_score(np.concatenate(true_masks).ravel(), binary_pred_masks.ravel())

print(f"IOU Score: {iou_score}")

In [None]:
dice = dice_score(np.concatenate(true_masks),  binary_pred_masks.ravel())

print(f"Dice Score: {dice}")

Well the resuts are (again you can check):
- IOU Score - 18%
- Dice Score - 30%

It`s kinda better if take into account that it was trained just on 6 epoches as I had no time to learn it more. But lets look where are the main error again via sanity check.

In [None]:
# Select a random image ID
random_id = np.random.choice(test_image_ids)

# Load the image
image = cv2.imread(os.path.join(image_dir, random_id))

# Preprocess the image
preprocessed_image = cv2.resize(image, (256, 256))  # resize to match the model's expected input size
preprocessed_image = preprocess_input(preprocessed_image)

# Reshape the image
preprocessed_image = np.expand_dims(preprocessed_image, axis=0)

# Make the prediction
prediction = model.predict(preprocessed_image)[0]

# Get the true mask for this image
rle_encoded_mask = grouped.loc[random_id]

# Decode the RLE-encoded mask
true_mask = rle_to_mask(rle_encoded_mask, (768, 768))  # replace with your function to decode the mask
true_mask = cv2.resize(true_mask, (256, 256))

# Plot the image, the actual mask, and the predicted mask
fig, axs = plt.subplots(1, 3, figsize=(20, 20))

axs[0].imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
axs[0].set_title('Image')

axs[1].imshow(true_mask, cmap='gray')
axs[1].set_title('Actual Mask')

axs[2].imshow(prediction.squeeze(), cmap='gray')
axs[2].set_title('Predicted Mask')

for ax in axs:
    ax.axis('off')

plt.show()



Well I was testing making these test when it was trained just with a 2 epoches and what can i say that it is reducing the error. Like there were a lot of problems with a clouds now these area that it predicts because of clouds is much smaller and in some cases it does not make this error. To my mind we can mitigate these problem by using the combination of classification and segmentation models.
 Also, when it predicts the shape becomes more look like a rectangle.
 There are error because of small ships and I think it because it was trained on the crops but now for testing we use rescaled images, that why it is difficult for him to predict them. So I should to add such an images to the training set or use the zoom augmentations.