<a href="https://colab.research.google.com/github/JoKoum/satellite-image-classification/blob/main/satellite_image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project in Deep Learning Class: Satellite Image Classification
---

### John Koumentis, Sotiris Panopoulos
---

The project evaluates the performance of the encoder - decoder with attention mechanism architecture, over predicting the correct set of labels of the given satellite image chip.

The JPG version of the satellite images dataset [*Planet: Understanding the Amazon from Space*](https://www.kaggle.com/c/planet-understanding-the-amazon-from-space) was used. Those screenshots are chips extracted from the bigger dataset that are provided as a reference to the scene content.

In [8]:
%%capture
!gdown https://drive.google.com/uc?id=19U9KgKaqbl2NntvQfcQrPXYMNR3ZzxuL
!tar -xvf "./train-jpg.tar" -C "./"

In [9]:
import numpy as np
import pandas as pd
import collections
from tqdm import tqdm
import random
import time
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
from PIL import Image
from IPython.display import Image as Img
import tensorflow as tf
from sklearn.metrics import fbeta_score

### Dataset
---
The dataset that maps the image names with the respective tags (labels) is read and modified, so as to inlude each image's whole filepath in Colab notebook. Additionally, another column that includes the respective labels as list items is created. Using that column, we extracted the unique labels existing at the dataset, and plotted a respective image per label.

In [10]:
df = pd.read_csv('https://drive.google.com/uc?id=1MsAf8Iktmf1dC1pYQpJ-QjC3lOAg5J_3&export=download')
df.head(10)

HTTPError: HTTP Error 404: Not Found

In [6]:
df['image_name'] = './train-jpg/' + df['image_name'] + '.jpg'

df['tags_split'] = df['tags'].apply(lambda x: x.split(' '))
labels_list = sum(list(df['tags_split'].values), [])
labels = set(labels_list)
labels

NameError: name 'df' is not defined

In [None]:
cnt_label = {}
for label_names in df['tags_split'].values:
  for l in label_names:
    cnt_label[l] = cnt_label[l] + 1 if l in cnt_label else 0

plt.figure(figsize=(18,8))
idxs = range(len(cnt_label.values()))
plt.bar(idxs, cnt_label.values())
plt.xticks(idxs, cnt_label.keys(), rotation=-50)
plt.title('Labels Countplot')
plt.show()

In [None]:
images_title = [df[df['tags'].str.contains(label)].iloc[i]['image_name'] for i, label in enumerate(labels)]

_, ax = plt.subplots(5,4, figsize=(15,20))
ax = ax.ravel()

for i, (image_name, label) in enumerate(zip(images_title, labels)):
  img = mpimg.imread(image_name)
  ax[i].imshow(img)
  ax[i].set_title('{}'.format(label))

Details on the labels are present at [Chip (Image) Data Format](https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data) section in Kaggle.

### Image captioning approach
---
The [Image captioning with visual attention](https://www.tensorflow.org/tutorials/text/image_captioning) example was used as a base for the experiments that took place.

As first step, we create a dictionary that maps each image path with its labels. In order to create a text sequence representation of labels, that is going to be used later, the \<start\> and the \<end\> tokens have been added at the beginning and the end of each labels set.

In [None]:
image_path_to_label = collections.defaultdict(list)
for image_path, label in zip(df['image_name'],df['tags'].values):
  image_path_to_label[image_path].append(f"<start> {label} <end>")

In [None]:
image_paths = list(image_path_to_label.keys())
train_labels = []
img_name_vector = []

for image_path in image_paths:
  label_list = image_path_to_label[image_path]
  train_labels.extend(label_list)
  img_name_vector.extend([image_path]*len(label_list))

print(len(img_name_vector))

### Feature Extractor
---
A pre-trained Convolutional Neural Network was used, as feature extractor. The output of the last convolutional layer with shape (7, 7, 2048) will be used as the input feature for the Attention mechanism. ResNet50_V2 was chosen as Residual Networks are a state-of-the-art Convolutional Neural Network category that has been also used as feature extractor in similar tasks like the ["Recurrent neural networks for remote sensing image classification"](https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/iet-cvi.2017.0420) and ["Self-attention for raw optical Satellite Time Series Classification"](https://arxiv.org/pdf/1910.10536.pdf). Initial tests were also made using MobileNet architecture with similar results.

In [None]:
base_model = tf.keras.applications.resnet_v2.ResNet50V2(input_shape=(224,224,3),
                                               include_top=False,
                                               weights='imagenet')

new_input = base_model.input
hidden_layer = base_model.layers[-1].output

image_features_extract_model = tf.keras.Model(new_input, hidden_layer)

tf.keras.utils.plot_model(base_model, to_file='feature_extractor.png', show_shapes=True)
Img(filename='feature_extractor.png')

### Image Preprocessing
---
We preprocess the images using the preprocess_input method to normalize the image so that it contains pixels with values in the range [-1, 1], which matches the format of the images used to train ResNet50_V2.


In [7]:
def load_image(image_path):
    """
    Input: Image path
    Output: Procesed image tensor, Image path
    """
    img = tf.io.read_file(image_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, (224, 224))
    img = tf.keras.applications.resnet_v2.preprocess_input(img)
    return img, image_path

Image features (last convolutional layer) are extracted and cached ((7 * 7 * 2048) floats per image) in NumPy binary file format, to be used during the netwoks training procedure.

In [None]:
encode_train = sorted(set(img_name_vector))

image_dataset = tf.data.Dataset.from_tensor_slices(encode_train)
image_dataset = image_dataset.map(load_image, num_parallel_calls=tf.data.AUTOTUNE).batch(16)

for img, path in tqdm(image_dataset):
  batch_features = image_features_extract_model(img)
  batch_features = tf.reshape(batch_features,
                              (batch_features.shape[0], -1, batch_features.shape[3]))


  for bf, p in zip(batch_features, path):
    path_of_feature = p.numpy().decode("utf-8")
    np.save(path_of_feature, bf.numpy())

### Tokenize labels
---
Using the TensorFlow Tokenizer method, labels are tokenized by getting split on spaces. Word to index and index to word mapping is created and finally, padding is applied, so as each created sequence to have the same length as the longest one.

In [None]:
def calc_max_length(tensor):
  """
  Calculate max length of any labelset in the dataset
  """
  return max(len(t) for t in tensor)

In [None]:
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=len(set(train_labels)),
                                                  filters='!"#$%&()*+.,-/:;=?@[\]^`{|}~')
tokenizer.fit_on_texts(train_labels)

tokenizer.word_index['<pad>'] = 0
tokenizer.index_word[0] = '<pad>'

train_sequences = tokenizer.texts_to_sequences(train_labels)

In [None]:
tokenizer.word_index

In [None]:
# Pad each vector to the max_length of the labels
label_vector = tf.keras.preprocessing.sequence.pad_sequences(train_sequences, padding='post')
print('Labels vector:\n', label_vector)

In [None]:
max_length = calc_max_length(train_sequences)
print('Max sequence length: ',max_length)

### Split to training and validation
---
Dataset is split into training and validation sets, 80% and 20% of the examples, respectively.

In [None]:
img_to_label_vector = collections.defaultdict(list)

for image, label in zip(img_name_vector, label_vector):
  img_to_label_vector[image].append(label)

img_keys = list(img_to_label_vector.keys())
random.shuffle(img_keys)

slice_index = int(len(img_keys) *  0.8)
img_name_train_keys, img_name_val_keys = img_keys[:slice_index], img_keys[slice_index:]

img_name_train = []
label_train = []

for imgt in img_name_train_keys:
  label_len = len(img_to_label_vector[imgt])
  img_name_train.extend([imgt] * label_len)
  label_train.extend(img_to_label_vector[imgt])

img_name_val = []
label_val = []

for imgt in img_name_val_keys:
  label_len = len(img_to_label_vector[imgt])
  img_name_val.extend([imgt] * label_len)
  label_val.extend(img_to_label_vector[imgt])

In [None]:
# Print the respective training and validation lengths
len(img_name_train), len(label_train), len(img_name_val), len(label_val)

### Create Dataset for training
---
Two datasets are created, one for the train and one for the validation set, using the TensorFlow Dataset module. Moreover, model hyperparameters are set up.

In [None]:
BATCH_SIZE = 64
BUFFER_SIZE = 1000
embedding_dim = 256
units = 512
vocab_size =  len(set(train_labels)) + 1
num_steps = len(img_name_train) // BATCH_SIZE
num_steps_val = len(img_name_val) // BATCH_SIZE
# These two variables represent the feature vector shape
features_shape = 2048
attention_features_shape = 49

In [None]:
def map_func(img_name, label):
  """
  Load previously stored numpy files
  """
  img_tensor = np.load(img_name.decode('utf-8')+'.npy')
  return img_tensor, label

In [None]:
dataset = tf.data.Dataset.from_tensor_slices((img_name_train, label_train))

# Use map to load the numpy files in parallel
dataset = dataset.map(lambda item1, item2: tf.numpy_function(
          map_func, [item1, item2], [tf.float32, tf.int32]),
          num_parallel_calls=tf.data.AUTOTUNE)

# Shuffle and batch
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)

In [None]:
val_set = tf.data.Dataset.from_tensor_slices((img_name_val, label_val))

# Use map to load the numpy files in parallel
val_set = val_set.map(lambda item1, item2: tf.numpy_function(
          map_func, [item1, item2], [tf.float32, tf.int32]),
          num_parallel_calls=tf.data.AUTOTUNE)

# Batch only
val_set = val_set.batch(BATCH_SIZE)
val_set = val_set.prefetch(buffer_size=tf.data.AUTOTUNE)

### Model
---
Model architecture is inspired by the [Show, Attend and Tell](https://arxiv.org/pdf/1502.03044.pdf) paper. The authors propose an attention based model that automatically learns to describe the contents of images.

We extract the features from the lower convolutional layer of ResNet50_v2 giving us a vector of shape (7, 7, 2048).
Then, it is flattened to a shape of (49, 2048). This vector is passed through the CNN Encoder. The RNN decoder attends over the image to predict the next word.

The used attention mechanism is based on [Bahdanau's additive attention](https://arxiv.org/pdf/1409.0473.pdf). This frees the model from having to encode the whole input feature into a fixed-length vector, and lets the model focus only on information relevant to the generation of the next target word.



In [None]:
class BahdanauAttention(tf.keras.Model):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, features, hidden):
    # features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)

    # hidden shape == (batch_size, hidden_size)
    # hidden_with_time_axis shape == (batch_size, 1, hidden_size)
    hidden_with_time_axis = tf.expand_dims(hidden, 1)

    # attention_hidden_layer shape == (batch_size, 64, units)
    attention_hidden_layer = (tf.nn.tanh(self.W1(features) +
                                         self.W2(hidden_with_time_axis)))

    # score shape == (batch_size, 64, 1)
    # This gives you an unnormalized score for each image feature.
    score = self.V(attention_hidden_layer)

    # attention_weights shape == (batch_size, 64, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * features
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

CNN encoder reads the input sentence from the features previously extracted and encodes that information in vectors which are called hidden states. Since the convolution output (features) from the pretrained network have been extracted in advance, CNN encoder consists of a single fully connected layer. Performing the feature extraction during training could become a bottleneck.

In [None]:
class CNN_Encoder(tf.keras.Model):
    # Since you have already extracted the features and dumped it
    # This encoder passes those features through a Fully connected layer
    def __init__(self, embedding_dim):
        super(CNN_Encoder, self).__init__()
        # shape after fc == (batch_size, 64, embedding_dim)
        self.fc = tf.keras.layers.Dense(embedding_dim)

    def call(self, x):
        x = self.fc(x)
        x = tf.nn.relu(x)
        return x

RNN decoder produces a label set, by generating one word at every time step conditioned on a context vector, the previous hidden state and the
previously generated words.

The decoder receives the complete encoder output.
It uses an RNN to keep track of what it has generated so far and it's RNN output as the query to the attention over the encoder's output, producing the context vector. Then, it combines the RNN output and the context vector to generate the "attention vector" and logit predictions for the next token based on the "attention vector".[[1]](https://www.tensorflow.org/text/tutorials/nmt_with_attention#the_decoder)


Gated recurrent unit (GRU) layer was the one used at the decoder.

In [None]:
class RNN_Decoder(tf.keras.Model):
  def __init__(self, embedding_dim, units, vocab_size):
    super(RNN_Decoder, self).__init__()
    self.units = units

    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc1 = tf.keras.layers.Dense(self.units)
    self.fc2 = tf.keras.layers.Dense(vocab_size)

    self.attention = BahdanauAttention(self.units)

  def call(self, x, features, hidden):
    # defining attention as a separate model
    context_vector, attention_weights = self.attention(features, hidden)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # shape == (batch_size, max_length, hidden_size)
    x = self.fc1(output)

    # x shape == (batch_size * max_length, hidden_size)
    x = tf.reshape(x, (-1, x.shape[2]))

    # output shape == (batch_size * max_length, vocab)
    x = self.fc2(x)

    return x, state, attention_weights

  def reset_state(self, batch_size):
    return tf.zeros((batch_size, self.units))

The most important distinguishing feature of this approach from the basic encoder–decoder is that it does not attempt to encode a whole input feature into a single fixed-length vector. Instead, it encodes the input into a sequence of vectors and chooses a subset of these vectors adaptively while decoding the labels.

In [None]:
# Initialize encoder and decoder networks

encoder = CNN_Encoder(embedding_dim)
decoder = RNN_Decoder(embedding_dim, units, vocab_size)

Initializing Adam optimizer and selecting Sparse Categorical Cross Entropy loss function.

Adam optimizer was used, since it [*is computationally efficient, has little memory requirement, invariant to diagonal rescaling of gradients, and is well suited for problems that are large in terms of data/parameters*](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam).

The Sparse Categorical Crossentropy Loss was used since the labels type is a vector of integers.

In [None]:
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

#SP: for calculating accuracy during training
metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]

train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()

In [None]:
def loss_function(real, pred):
  mask = tf.math.logical_not(tf.math.equal(real, 0))
  loss_ = loss_object(real, pred)

  mask = tf.cast(mask, dtype=loss_.dtype)
  loss_ *= mask

  return tf.reduce_mean(loss_)

In [None]:
# Create checkpoint to store trained model parameters for future use
checkpoint_path = "./checkpoints"
ckpt = tf.train.Checkpoint(encoder=encoder,
                           decoder=decoder,
                           optimizer=optimizer)
ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)

In [None]:
start_epoch = 0
if ckpt_manager.latest_checkpoint:
  start_epoch = int(ckpt_manager.latest_checkpoint.split('-')[-1])
  # restoring the latest checkpoint in checkpoint_path
  ckpt.restore(ckpt_manager.latest_checkpoint)

In [None]:
# List to keep the recorded training loss
train_loss_plot = []
val_loss_plot = []

### Training pipeline
---
- The extracted features stored in the respective .npy files are passed through the CNN encoder.
- The encoder output, hidden state (initialized to 0) and the decoder input (which is the start token) is passed to the RNN decoder.
- The decoder returns the predictions and the decoder hidden state.
- The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.
- Teacher forcing is used, to decide the next input to the decoder. Teacher forcing is the technique where the target word is passed as the next input to the decoder.
- The final step is to calculate the gradients, apply them to the optimizer and backpropagate.

In [None]:
@tf.function
def train_step(img_tensor, target):
  loss = 0

  # initializing the hidden state for each batch
  # because the labels are not related from image to image
  hidden = decoder.reset_state(batch_size=target.shape[0])

  dec_input = tf.expand_dims([tokenizer.word_index['<start>']] * target.shape[0], 1)

  with tf.GradientTape() as tape:
      features = encoder(img_tensor)

      for i in range(1, target.shape[1]):
          # passing the features through the decoder
          predictions, hidden, _ = decoder(dec_input, features, hidden)

          loss += loss_function(target[:, i], predictions)

          #train_acc_metric.update_state(target[:, i], predictions)

          # using teacher forcing
          dec_input = tf.expand_dims(target[:, i], 1)

  total_loss = (loss / int(target.shape[1]))

  trainable_variables = encoder.trainable_variables + decoder.trainable_variables

  gradients = tape.gradient(loss, trainable_variables)

  optimizer.apply_gradients(zip(gradients, trainable_variables))


  return loss, total_loss

In [None]:
@tf.function
def validation_step(img_tensor, target):
  loss = 0
  hidden = decoder.reset_state(batch_size=target.shape[0])
  dec_input = tf.expand_dims([tokenizer.word_index['<start>']] * target.shape[0], 1)

  features = encoder(img_tensor)

  for i in range(1, target.shape[1]):
    predictions, hidden, _ = decoder(dec_input, features, hidden)

    loss += loss_function(target[:, i], predictions)
    #val_acc_metric.update_state(target[:, i], predictions)

    dec_input = tf.expand_dims(target[:, i], 1)

  total_loss = (loss / int(target.shape[1]))

  return loss, total_loss

In [None]:
EPOCHS = 20

val_loss_min = np.Inf

for epoch in range(start_epoch, EPOCHS):
    start = time.time()
    total_loss = 0
    val_loss = 0

    for (batch, (img_tensor, target)) in enumerate(dataset):
        batch_loss, t_loss = train_step(img_tensor, target)
        total_loss += t_loss

        if batch % 100 == 0:
            average_batch_loss = batch_loss.numpy()/int(target.shape[1])
            print(f'Epoch {epoch+1} Batch {batch} Loss {average_batch_loss:.4f}')

    # storing the epoch end loss value to plot later
    train_loss_plot.append(total_loss / num_steps)

    # Display metrics at the end of each epoch.
   # train_acc = train_acc_metric.result()
    #print("Training accuracy over epoch: %.4f" % (float(train_acc),))

    # Reset training metrics at the end of each epoch
    #train_acc_metric.reset_states()

    for (batch, (img_tensor,target)) in enumerate(val_set):

      batch_loss, v_loss = validation_step(img_tensor, target)
      val_loss += v_loss

    val_loss_plot.append(val_loss/num_steps_val)

    #val_acc = val_acc_metric.result()
    #val_acc_metric.reset_states()
    #print("Validation accuracy: %.4f" % (float(val_acc),))

    if val_loss < val_loss_min:
      ckpt_manager.save()
      val_loss_min = val_loss
      print('Validation loss decreased, saving model')

    print(f'Epoch {epoch+1} Loss {total_loss/num_steps:.6f}')
    print(f'Validation loss {val_loss/num_steps_val:.6f}')
    print(f'Time taken for 1 epoch {time.time()-start:.2f} sec\n')

In [None]:
plt.figure(figsize=(15,15))
plt.plot(train_loss_plot, label='Train Loss')
plt.plot(val_loss_plot, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Plot')
plt.legend()
plt.show()

Summary of the encoder - decoder architecture

In [None]:
encoder.summary()

In [None]:
decoder.summary()

### Evaluation procedure
---
The model is reset to the state where validation loss had the lowest value.

The average F-beta score is calculated within the validation dataset.

In [None]:
def test(image):

    hidden = decoder.reset_state(batch_size=1)

    temp_input = tf.expand_dims(load_image(image)[0], 0)
    img_tensor_val = image_features_extract_model(temp_input)
    img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0],
                                                 -1,
                                                 img_tensor_val.shape[3]))

    features = encoder(img_tensor_val)

    dec_input = tf.expand_dims([tokenizer.word_index['<start>']], 0)
    result = []
    flag = False

    for i in range(max_length-1):
        if not flag:
          predictions, hidden, _ = decoder(dec_input,features, hidden)


          predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy()
          #result.append(tokenizer.index_word[predicted_id])
          result.append(predicted_id)

          if tokenizer.index_word[predicted_id] == '<end>':
              flag = True
        else:
          result.append(0)

        dec_input = tf.expand_dims([predicted_id], 0)

    return result

In [None]:
def evaluation_score():
  score = 0
  for id in range(0, len(img_name_val)-1):
    image = img_name_val[id]
    real_caption = [i for i in label_val[id] if i != 1]
    real = np.sum(tf.one_hot(real_caption, len(tokenizer.word_index)).numpy(), axis=0)

    result = test(image)
    predicted = np.sum(tf.one_hot(result,len(tokenizer.word_index)).numpy(), axis=0)

    f = fbeta_score(real, predicted, average='weighted', beta=1)

    score += f/len(img_name_val)

  return score

In [None]:
score = evaluation_score()
print('Average F-beta score at validation dataset: ', score)

The attention per image label is plotted along with the image chip, to show graphically the image region where the model 'looked' so as to output the label.

In [None]:
def evaluate(image):
    attention_plot = np.zeros((max_length, attention_features_shape))

    hidden = decoder.reset_state(batch_size=1)

    temp_input = tf.expand_dims(load_image(image)[0], 0)
    img_tensor_val = image_features_extract_model(temp_input)
    img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0],
                                                 -1,
                                                 img_tensor_val.shape[3]))

    features = encoder(img_tensor_val)

    dec_input = tf.expand_dims([tokenizer.word_index['<start>']], 0)
    result = []

    for i in range(max_length):
        predictions, hidden, attention_weights = decoder(dec_input,
                                                         features,
                                                         hidden)

        attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()

        predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy()
        result.append(tokenizer.index_word[predicted_id])

        if tokenizer.index_word[predicted_id] == '<end>':
            return result, attention_plot

        dec_input = tf.expand_dims([predicted_id], 0)

    attention_plot = attention_plot[:len(result), :]
    return result, attention_plot

In [None]:
def plot_attention(image, result, attention_plot):
    image = Image.open(image)

    fig = plt.figure(figsize=(10, 10))

    len_result = len(result)
    for i in range(len_result):
        temp_att = np.resize(attention_plot[i], (8, 8))
        grid_size = max(np.ceil(len_result/2), 2)
        ax = fig.add_subplot(grid_size, grid_size, i+1)
        ax.set_title(result[i])
        img = ax.imshow(image)
        ax.imshow(temp_att, cmap='gray', alpha=0.6, extent=img.get_extent())

    plt.tight_layout()
    plt.show()

In [None]:
def test_image(img_id):
  # Labels on the validation set
  image = img_name_val[img_id]
  real_caption = ' '.join([tokenizer.index_word[i] for i in label_val[img_id] if i not in [0]])
  result, attention_plot = evaluate(image)

  actual_image = Image.open(image)
  plt.imshow(actual_image)

  print('Real Caption:', real_caption)
  print('Prediction Caption:', ' '.join(result))
  plot_attention(image, result, attention_plot)

In [None]:
# Restore networks at their latest saved state
ckpt.restore(ckpt_manager.latest_checkpoint)

In [None]:
!zip -r './checkpoints.zip' './checkpoints'

Evaluating the performance on 8 chip images from the validation dataset

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)

In [None]:
id = np.random.randint(0, len(img_name_val))
test_image(id)