#  Modified Triplet Loss : Ungraded Lecture Notebook
In this notebook you'll see how to calculate the full triplet loss, step by step, including the mean negative and the closest negative. You'll also calculate the matrix of similarity scores.

## Background
This is the original triplet loss function:

$\mathcal{L_\mathrm{Original}} = \max{(\mathrm{s}(A,N) -\mathrm{s}(A,P) +\alpha, 0)}$

It can be improved by including the mean negative and the closest negative, to create a new full loss function. The inputs are the Anchor $\mathrm{A}$, Positive $\mathrm{P}$ and Negative $\mathrm{N}$.

$\mathcal{L_\mathrm{1}} = \max{(mean\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{2}} = \max{(closest\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{Full}} = \mathcal{L_\mathrm{1}} + \mathcal{L_\mathrm{2}}$

Let me show you what that means exactly, and how to calculate each step.

## Imports

In [40]:
import numpy as np

## Similarity Scores
The first step is to calculate the matrix of similarity scores using cosine similarity so that you can look up $\mathrm{s}(A,P)$, $\mathrm{s}(A,N)$ as needed for the loss formulas.

### Two Vectors
First I'll show you how to calculate the similarity score, using cosine similarity, for 2 vectors.

$\mathrm{s}(v_1,v_2) = \mathrm{cosine \ similarity}(v_1,v_2) = \frac{v_1 \cdot v_2}{||v_1||~||v_2||}$
* Try changing the values in the second vector to see how it changes the cosine similarity.




In [41]:
def cosine_similarity(l1 : np.array, l2 : np.array):
    numerator = np.dot(l1, l2)
    denominator = np.sqrt(np.dot(l1, l1)) * np.sqrt(np.dot(l2, l2))
    return numerator / denominator

In [42]:
v1 = np.array([1,2,3], dtype=float)
v2 = np.array([1,2,3.5])

print(f"cosine_similarity: {cosine_similarity(v1, v2)}")

cosine_similarity: 0.9974086507360697


### Two Batches of Vectors
Now i'll show you how to calculate the similarity scores, using cosine similarity, for 2 batches of vectors. These are rows of individual vectors, just like in the example above, but stacked vertically into a matrix. They would look like the image below for a batch size (row count) of 4 and embedding size (column count) of 5.

The data is setup so that $v_{1\_1}$ and $v_{2\_1}$ represent duplicate inputs, but they are not duplicates with any other rows in the batch. This means $v_{1\_1}$ and $v_{2\_1}$ (green and green) have more similar vectors than say $v_{1\_1}$ and $v_{2\_2}$ (green and magenta).

I'll show you two different methods for calculating the matrix of similarities from 2 batches of vectors.

<img src = 'images/v1v2_stacked.png' width="width" height="height" style="height:250px;"/>

Setup vector batches

In [43]:
v1_1 = np.array([1, 2, 3])
v1_2 = np.array([9, 8, 7])
v1_3 = np.array([-1, -4, -2])
v1_4 = np.array([1, -7, 2])

v1 = np.stack([v1_1, v1_2, v1_3, v1_4])

In [44]:
v2_1 = v1_1 + np.random.normal(0, 2, 3)
v2_2 = v1_2 + np.random.normal(0, 2, 3)
v2_3 = v1_3 + np.random.normal(0, 2, 3)
v2_4 = v1_4 + np.random.normal(0, 2, 3)

v2 = np.stack([v2_1, v2_2, v2_3, v2_4])

### Similarity Score Calculations

In [45]:
def similarity_scores_1(m1 : np.array, m2 : np.array):
    assert m1.shape == m2.shape
    sim = np.zeros([len(m1), len(m2)])
    
    for row in range(m1.shape[0]):
        for col in range(m2.shape[1]):
            sim[row][col] = cosine_similarity(m1[row], m2[col])
    
    return sim

In [46]:
def similarity_scores_2(m1 : np.array, m2 : np.array):
    assert m1.shape == m2.shape
    
    def norm(x : np.array):
        return x / np.sqrt(np.sum(x * x, axis = 1, keepdims=True))

    return np.dot(norm(m1), norm(m2).T)

In [47]:
v1_test = np.ones(shape=(3,3))
v2_test = np.ones(shape=(3,3))
sim_scores_1 = similarity_scores_1(v1_test, v2_test)
sim_scores_2 = similarity_scores_2(v1_test, v2_test)

sim_scores_1, sim_scores_2

(array([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]),
 array([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]))

In [48]:
v1_test[0][1]=2

sim_scores_1 = similarity_scores_1(v1_test, v2_test)
sim_scores_2 = similarity_scores_2(v1_test, v2_test)
sim_scores_1, sim_scores_2

(array([[0.94280904, 0.94280904, 0.94280904],
        [1.        , 1.        , 1.        ],
        [1.        , 1.        , 1.        ]]),
 array([[0.94280904, 0.94280904, 0.94280904],
        [1.        , 1.        , 1.        ],
        [1.        , 1.        , 1.        ]]))

In [49]:
sim_scores_1 = similarity_scores_1(v1, v2)
sim_scores_2 = similarity_scores_2(v1, v2)
sim_scores_1, sim_scores_2

(array([[ 0.75414925,  0.80763641, -0.58169271,  0.        ],
        [ 0.37620945,  0.98723393, -0.43778957,  0.        ],
        [-0.44085198, -0.85491104,  0.83337639,  0.        ],
        [ 0.1514151 , -0.40580418,  0.86766685,  0.        ]]),
 array([[ 0.75414925,  0.80763641, -0.58169271,  0.46182764],
        [ 0.37620945,  0.98723393, -0.43778957,  0.34460704],
        [-0.44085198, -0.85491104,  0.83337639,  0.01207406],
        [ 0.1514151 , -0.40580418,  0.86766685,  0.73836405]]))

## Hard Negative Mining

I'll now show you how to calculate the mean negative $mean\_neg$ and the closest negative $close\_neg$ used in calculating $\mathcal{L_\mathrm{1}}$ and $\mathcal{L_\mathrm{2}}$.


$\mathcal{L_\mathrm{1}} = \max{(mean\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{2}} = \max{(closest\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

You'll do this using the matrix of similarity scores you already know how to make, like the example below for a batch size of 4. The diagonal of the matrix contains all the $\mathrm{s}(A,P)$ values, similarities from duplicate question pairs (aka Positives). This is an important attribute for the calculations to follow.

<img src = 'images/ss_matrix.png' width="width" height="height" style="height:250px;"/>


### Mean Negative
$mean\_neg$ is the average of the off diagonals, the $\mathrm{s}(A,N)$ values, for each row.

### Closest Negative
$closest\_neg$ is the largest off diagonal value, $\mathrm{s}(A,N)$, that is smaller than the diagonal $\mathrm{s}(A,P)$ for each row.
* Try using a different matrix of similarity scores. 

In [50]:
sim_hardcoded = np.array(
    [
        [0.9, -0.8, 0.3, -0.5],
        [-0.4, 0.5, 0.1, -0.1],
        [0.3, 0.1, -0.4, -0.8],
        [-0.5, -0.2, -0.7, 0.5],
    ]
)
sim_hardcoded

array([[ 0.9, -0.8,  0.3, -0.5],
       [-0.4,  0.5,  0.1, -0.1],
       [ 0.3,  0.1, -0.4, -0.8],
       [-0.5, -0.2, -0.7,  0.5]])

In [51]:
sim = sim_hardcoded
b = sim.shape
b

(4, 4)

In [52]:
sim_diag = np.diag(sim)
sim_diag

array([ 0.9,  0.5, -0.4,  0.5])

In [53]:
sim_ap = np.diag(sim_diag)
sim_ap

array([[ 0.9,  0. ,  0. ,  0. ],
       [ 0. ,  0.5,  0. ,  0. ],
       [ 0. ,  0. , -0.4,  0. ],
       [ 0. ,  0. ,  0. ,  0.5]])

In [54]:
def get_mean_negative(m : np.array):
    b = m.shape[0]
    m_ap = np.diag(m)
    m_an = m - np.diag(m_ap)
    return np.sum(m_an, axis=1, keepdims=True) / (b - 1)

In [55]:
mean_negative = get_mean_negative(sim)
mean_negative

array([[-0.33333333],
       [-0.13333333],
       [-0.13333333],
       [-0.46666667]])

In [56]:
b = sim.shape[0]
np.identity(b)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [57]:
def get_closest_negative(m : np.array):
    b = m.shape[0]
    m_ap = np.diag(m)
    m_an = m - np.diag(m_ap)
    mask_1 = (np.identity(b) == 1)
    mask_2 = m_an > m_ap.reshape(b, 1)
    mask = (mask_1 | mask_2)
    m_masked = np.copy(m)
    m_masked[mask] = -2
    return np.max(m_masked, axis=1, keepdims=True)
    

In [58]:
closest_negative = get_closest_negative(sim)
closest_negative

array([[ 0.3],
       [ 0.1],
       [-0.8],
       [-0.2]])

## The Loss Functions

The last step is to calculate the loss functions.

$\mathcal{L_\mathrm{1}} = \max{(mean\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{2}} = \max{(closest\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{Full}} = \mathcal{L_\mathrm{1}} + \mathcal{L_\mathrm{2}}$

In [70]:
def siamese_loss_function(m : np.array, alpha = 0.25):
    b = m.shape[0]
    mean_neg = get_mean_negative(m)
    closest_neg = get_closest_negative(m)
    sim_ap = np.diag(m)
    
    l1 = np.maximum(mean_neg - sim_ap.reshape(b, 1) + alpha, 0)
    l2 = np.maximum(closest_neg - sim_ap.reshape(b, 1) + alpha, 0)
    return l1 + l2

In [71]:
loss = siamese_loss_function(sim)
loss

array([[0.        ],
       [0.        ],
       [0.51666667],
       [0.        ]])