# Loss Functions
This document outlines a variety of different loss functions that we have discussed.

## Notation
We follow the notation in Schmidt et. al. as closely as possible

- $D(I_a, I_b, u_a, u_b)$ is the L2 norm between descriptor of image $I_a$ at pixel $u_a$ and the descriptor of image $I_b$ at pixel $u_b$. 
- $||u - u'||_2$ is the L2 norm in pixel space between location $u$ and location $u'$.



## Background Non-Match Loss

Randomly sample lots of background pixels, call them $U_b$. Then the loss is

$$ loss = \sum_{u_b \in U_b} \max\left( M - D(I_a, u_a, I_b, u_b), 0  \right) $$

## Loss function to target "best match"

One task that we are particularly interested in is given a pixel $u_a$ in image $I_a$, find the corresponding pixel $u_b^* = g(I_b, u_a)$. The best-match pixel $u_b'$ in image $I_b$ can be found by computing

$$ u_b' = \arg \min_{u_b} D(I_a,u_a,I_b,u_b)$$

Consider the non-match loss given by

$$ l = \sum_{u_b \in I_b} \min(||u_b -  u_b^*||_2, M_p) \cdot \max\left( M_d - \frac{D(I_a, u_a, I_b, u_b)}{D(I_a, u_a, I_b, u_b^*)}, 0  \right) $$

Reasonable settings could be

$$ M_p = 100, M_d = 1.5$$

The idea is to penalize points that are within a fraction $M_b$ of l2 norm in descrptor space of the "true match" $u_b^*$. We also want to penalize more heavily points that are further away in pixel space.

We could also 

### Variations
There are multiple options for the second term in the above sum. A few options are listed below. It's unclear to me what the tradeoffs between these different fomulations are. Tanner et. al. uses a formulation like (2), while some of the other papers with triplet loss use something more similar to (1). 

1. $ \max\left( M - \frac{D(I_a, u_a, I_b, u_b)}{D(I_a, u_a, I_b, u_b^*)}, 0  \right)$
2. $\max\left( M - D(I_a, u_a, I_b, u_b), 0  \right)$
3. $\max\left( M \cdot D(I_a, u_a, I_b, u_b^*) - D(I_a, u_a, I_b, u_b), 0  \right)$


### Implementation Details

If we implement the above over the "entire" image $I_b$ then this would be prohibitively expensive since for each choice of $u_a$ we would have to do the norm-diff over the entire image. One option is to heavily down-sample the number the masked image. Let $\Gamma_b$ be the downsampled masked image. Then just sum over those pixels.`