### Training the model with Triplet Hard Loss

In [None]:
# Inputs for Anchor, positive and negative images
anchor_in = Input(name='anchor',shape=target_shape)
pos_in = Input(name='positive',shape=target_shape)
neg_in = Input(name='negative',shape=target_shape)

# Extract embeddings using VGG19
anchor_out = embedding(anchor_in)
pos_out = embedding(pos_in)
neg_out = embedding(neg_in)

# Define the model
model_triplet_loss = Model(inputs=[anchor_in, pos_in, neg_in], outputs=[anchor_out,pos_out,neg_out])

model_triplet_loss.summary()

In [None]:
class SiameseModel(Model):

  """The Siamese Network model with a custom training and testing loops.

    Computes the triplet loss using the three embeddings produced by the
    Siamese Network.

    The triplet loss is defined as:
       L(A, P, N) = max(‖f(A) - f(P)‖² - ‖f(A) - f(N)‖² + margin, 0)
    """
    

  def __init__(self, siamese_network, margin=0.5):
      super().__init__()
      self.siamese_network = siamese_network
      self.margin = margin
      self.loss_tracker = metrics.Mean(name='loss')
        
  def call(self, inputs):
      return self.siamese_network(inputs)
    
  def train_step(self, data):
    # GradientTape is a context manager that records every operation that
    # you do inside. We are using it here to compute the loss so we can get
    # the gradients and apply them using the optimizer specified in
    # `compile()`.

    with tf.GradientTape() as tape:

      loss = self._compute_loss(data)

      # Storing the gradients of the loss function with respect to the
      # weights/parameters.

      gradients = tape.gradient(loss,self.siamese_network.trainable_weights)

      # Applying the gradients on the model using the specified optimizer
      self.optimizer.apply_gradients(
                    zip(gradients,self.siamese_network.trainable_weights)
        )
        
      # Let's update and return the training loss metric
      self.loss_tracker.update_state(loss)
      return {"loss": self.loss_tracker.result()}
    
  def test_step(self, data):
    loss = self._compute_loss(data)
    # Let's update and return the loss metric.
    self.loss_tracker.update_state(loss)
    return {"loss": self.loss_tracker.result()}
    
  def _compute_loss(self, data):
    # Computing Triplet Hard loss
    anchor, positive, negative = self.siamese_network(data)
      
    # Computing the Triplet Loss by subtracting both distances and
    # making sure we don't get a negative value.
    pos_dist = tf.reduce_sum(tf.square(anchor-positive), -1)
    neg_dist = tf.reduce_sum(tf.square(anchor-negative), -1)

    # Compute the hardest positive and negative examples for each anchor.
    hardest_positives = K.max(pos_dist, axis=0, keepdims=True)
    hardest_negatives = K.min(neg_dist, axis=0, keepdims=True)

    loss = hardest_positives - hardest_negatives + self.margin
    loss = tf.maximum(loss, 0.0)
    
    return K.mean(loss)
    
  @property
  def metrics(self):
    return [self.loss_tracker]

In [None]:
siamese_model = SiameseModel(model_triplet_loss)
siamese_model.compile(optimizer=Adam(learning_rate=0.0001))

In [None]:
history_hard_triplet = siamese_model.fit(train_data, validation_data=val_data, steps_per_epoch=len(train_data)//8,validation_steps = len(val_data)//8,
epochs=10) 

In [None]:
# Train loss vs validation loss
plt.plot(history_hard_triplet.history['loss'],label='Train Loss')
plt.plot(history_hard_triplet.history['val_loss'],label='Validation Loss')
plt.legend()
plt.show()

In [None]:
# Embedding layer

triplet_hard_embeddings = model_triplet_loss.layers[-1]
triplet_hard_embeddings

In [None]:
triplet_hard_embeddings.save("triplet_hard_embeddings.h5")
triplet_hard_embeddings.save_weights("triplet_hard_embeddings.h5")

### What is the problem with triplet loss?

So what is the problem, it seems to work fine, doesn’t it?

- The issue is with this line of the loss function.
```
loss = K.maximum(basic_loss,0.0)
```
- There is a major issue here, every time your loss gets below 0, you lose information, a ton of information. First let’s look at this function.


- It basically does this:

<img src="https://drive.google.com/uc?id=1ej1BjkTxwFf_SUke0GxhlJyGOUbnlekC">

- It tries to bring close the Anchor (current image) with the Positive (A image that is similar with the Anchor) as far as possible from the Negative (A image that is different from the Anchor)

The actual formula for this loss is:
$L(A,P,N)=max(0,d(A,P)-d(A,N)+\alpha)$


Let’s pretend that:
- Alpha is 0.2
- Negative Distance is 2.4
- Positive Distance is 1.2

 - The loss function result will be 1.2–2.4+0.2 = -1. 

 - Then when we look at Max(-1,0) we end up with 0 as a loss.

 - The Positive Distance could be anywhere above 1 and the loss would be the same.

 - With this reality, it’s going to be very hard for the algorithm to reduce the distance between the Anchor and the Positive value.

As a more visual example, here is 2 scenarios A and B. They both represent what the loss function measure for us.

<img src="https://drive.google.com/uc?id=1dtzGxkuZE0lBBd_gj_JlWKyuNdUMj1GH">

> $A=1.2-2.4-0.4=-1.6$

> $B=0.2-2.4-0.4=-2.6$

- After the Max function both A and B now return 0 as their loss, which is a clear lost of information.
- By looking simply, we can say that B is better than A.

> To make a loss function that will capture the “lost” information below 0.
-If you contain the N dimension space where the loss is calculated you can more efficiently control this.
- So the first step was to modify the model.
- The last layer (Embedding layer) needed to be controlled in size.
-By using a Sigmoid activation function instead of a linear we can guarantee that each dimension will be between 0 and 1.








So Lets change the activation function to sigmoid in the last layer of the embeddings

#### **Quiz-4**

How to make the triplet loss capture the lost information below 0?

(a) Increasing the number of neurons in the last layer of the embedding.

(b) Controlling the size of last layer of embedding by using sigmoid activation function instead of linear.

(c) Using tanh activation function in the last layer of the embedding

**Answer:** (b)

In [None]:
input_dim = target_shape =  (180,180,3)

In [None]:
base_cnn = resnet.ResNet50(
    weights="imagenet", input_shape=input_dim, include_top=False
)

base_cnn.trainable=False

glob_pool = layers.GlobalAveragePooling2D()(base_cnn.output)
dense1 = layers.Dense(128)(glob_pool)
output = layers.Dense(128,activation='sigmoid')(dense1)

embedding_new = Model(base_cnn.input, output, name="Embedding")

# trainable = False
# for layer in base_cnn.layers:
#     if layer.name == "conv5_block1_out":
#         trainable = True
#     layer.trainable = trainable

In [None]:
# Inputs for Anchor, positive and negative images
anchor_in = Input(name='anchor',shape=target_shape)
pos_in = Input(name='positive',shape=target_shape)
neg_in = Input(name='negative',shape=target_shape)

# Extract embeddings using VGG19
anchor_out = embedding_new(anchor_in)
pos_out = embedding_new(pos_in)
neg_out = embedding_new(neg_in)
# Concatenate the embeddings
# merged_vector = concatenate([anchor_out, pos_out, neg_out],axis=1,name='triplet_layer')

# loss = layers.Lambda(triplet_loss)([anchor_out,pos_out,neg_out])
# distances = DistanceLayer()(
#         anchor_out,
#         pos_out,
#         neg_out
#     )


# Define the model
model_lossless_triplet_loss = Model(inputs=[anchor_in, pos_in, neg_in], outputs=[anchor_out,pos_out,neg_out])

# Compiling the model using Adam optimizer and triplet loss
# model_triplet_loss.compile(optimizer=Adam(learning_rate=1e-3, epsilon=1e-01))
model_lossless_triplet_loss.summary()

Now, we have changed the activation function of the last layer of the embedding to **sigmoid activation**

But, the problem with triplet loss is
- loss becomes zero when trained for more epochs

### How can we solve this?

- We can create a loss function which breaks the linearity in cost function.
- In other words, make it really costly as more the error grows.

Here comes the **lossless triplet loss**

It is defined as 

$\sum_{i=1} ^{n}[-ln(-\frac{(f^a_{i}-f^p_{i})^2}{\beta})+1+\epsilon)-ln(-\frac{N-(f^a_{i}-f^n_{i})^2}{\beta})+1+\epsilon)]$

- Where N is the number of dimensions (Number of output of your network; Number of features for your embedding)
- β is a scaling factor.
- $\epsilon$ is the margin
- $(f^a_{i}-f^p_{i})^2$ is the distance between anchor and positive
- $(f^a_{i}-f^n_{i})^2$ is the distance between anchor and negative



In [None]:
# Custom Function for lossless triplet loss
def lossless_triplet_loss(y_true, y_pred, beta=3, epsilon=1e-8):
    anchor, positive, negative =y_pred[0,0:512], y_pred[0,512:1024], y_pred[0,1024:1536]
    
    pos_dist = K.mean(K.square(anchor - positive),axis=-1)
    neg_dist = K.mean(K.square(anchor - negative),axis=-1)
    
    N=3
    pos_dist = -tf.math.log(-tf.divide((pos_dist),beta)+1+epsilon)
    neg_dist = -tf.math.log(-tf.divide((N-neg_dist),beta)+1+epsilon)
    loss = neg_dist + pos_dist
    
    return loss

In [None]:
class SiameseModel(Model):
    
    def __init__(self, siamese_network, margin=0.5):
        super().__init__()
        self.siamese_network = siamese_network
        self.margin = margin
        self.loss_tracker = metrics.Mean(name='loss')
        self.N = 3
        self.beta = 3
        self.epsilon = 1e-8
        
    def call(self, inputs):
        return self.siamese_network(inputs)
    
    def train_step(self, data):
        with tf.GradientTape() as tape:
            loss = self._compute_loss(data)
        gradients = tape.gradient(loss,self.siamese_network.trainable_weights)
        self.optimizer.apply_gradients(
                    zip(gradients,self.siamese_network.trainable_weights)
        )
        
        self.loss_tracker.update_state(loss)
        return {"loss": self.loss_tracker.result()}
    
    def test_step(self, data):
        loss = self._compute_loss(data)
        self.loss_tracker.update_state(loss)
        return {"loss": self.loss_tracker.result()}
    
    def _compute_loss(self, data):
        anchor, positive, negative = self.siamese_network(data)
        pos_dist = K.mean(K.square(anchor - positive),axis=-1)
        neg_dist = K.mean(K.square(anchor - negative),axis=-1)
        pos_dist = -tf.math.log(-tf.divide((pos_dist),self.beta)+1+self.epsilon)
        neg_dist = -tf.math.log(-tf.divide((self.N-neg_dist),self.beta)+1+self.epsilon)
        loss = neg_dist + pos_dist
    
        return loss 
    
    @property
    def metrics(self):
        return [self.loss_tracker]

In [None]:
siamese_model_lossless = SiameseModel(model_lossless_triplet_loss)
siamese_model_lossless.compile(optimizer=Adam(learning_rate=0.00001))

In [None]:
history_lossless = siamese_model_lossless.fit(train_data, validation_data=val_data, steps_per_epoch=len(train_data)//8,validation_steps = len(val_data)//8,
epochs=10) 

In [None]:
model_lossless_triplet_loss.layers

In [None]:
signature_embeddings_new = model_lossless_triplet_loss.layers[-1]

In [None]:
signature_embeddings_new.save("signature_embeddings_new.h5")
signature_embeddings_new.save_weights("signature_embeddings_new_weights.h5")

In [None]:
pred_distances_new = []

In [None]:
for i in range(len(inp_imgs)):
  inp_feat = preprocess_image(inp_imgs[i])
  val_feat = preprocess_image(val_imgs[i])
  res = L2(signature_embeddings_new.predict(np.expand_dims(inp_feat,axis=0)),signature_embeddings_new.predict(np.expand_dims(val_feat,axis=0)))
  pred_distances_new.append(res)

In [None]:
labels = pairs[2].values

In [None]:
pred_distances_new_ = []

In [None]:
for i in range(len(pred_distances_new)):
  pred_distances_new_.append(pred_distances_new[i][0][0])

In [None]:
acc,thresh = compute_accuracy_thresh(pred_distances_new_,labels)

In [None]:
# Accuracy 
acc

In [None]:
# Threshold
thresh

**Conclusion**

- As you can see above, when the model is trained with triplet loss and lossless triplet loss, the embeddings learn to differentiate between the similar images and dis-similar images clearly.

- Lossless triplet loss is better than the standard triplet loss because:
 - In **Standard triplet loss**, if loss value becomes negative then, it'll be zero, which is a clear lost of information
 - In **lossless triplet loss**, the non-linearity is introduced which makes it really costly as more the error grows.

- So, we can say that **Lossless Triplet Loss** is the best loss function for pairwise learning which can be used in siamese network for best performance.
