# Loss Functions to be Explored

$$
\begin{aligned}
\mathcal{L}_\text{Weighted Cross-Entropy} &= - \sum_i \left[w_+ b_i \log p_i + w_- (1 - b_i) \log (1 - p_i)\right] = - w_+ \mathbf{b} \cdot \log \mathbf{p} + w_- \bar{\mathbf{b}} \cdot \log \bar{\mathbf{p}}\\

\mathcal{L}_\text{Dice} &= 1 - \frac{2\mathbf{p} \cdot \mathbf {b}}{\sum_i p_i + \sum_i b_i}\\

\mathcal{L}_\text{Focal-Tversky} &= \left( 1 - \frac{2\mathbf{p}\cdot\mathbf{b}}{\mathbf{p}\cdot\mathbf{b} + \beta \bar{\mathbf{p}} \cdot \mathbf{b} + (1 - \beta) \mathbf{p} \cdot \bar{\mathbf{b}}} \right)^\gamma.
\end{aligned}
$$

where $\mathbf{p}$ are the probabilities (softmaxxed outputs along channel dimension) from the model, and $\mathbf{b}$ are the binary mask (0, 1) values. We use $\bar{\cdot}$ to represent the complement, i.e. $\bar{x_i} = 1 - x_i$ in index notation.

There are actually many variations of the Dice loss, with some of them listed below,
$$
\mathcal{L}_\text{Dice} =
\begin{cases}
    &1 - \frac{2\mathbf{p} \cdot \mathbf {b}}{\sum_i p_i + \sum_i b_i},\\
    \\
    &1 - \frac{2\mathbf{p} \cdot \mathbf {b}}{\lvert p \rvert _2^2 + \lvert b \rvert _2 ^2},\\
    \\
    &1 - \frac{2\mathbf{p} \cdot \mathbf {b}}{\sum_i p_i + \sum_i b_i}
\end{cases}
$$

# Model Selection

Loss Function: $\mathcal{L}_\text{Weighted Cross-Entropy} + \alpha \mathcal{L}_\text{Focal-Tversky}$,
and model parameters as listed: $\alpha = 1,\; \beta = 0.7,\; \gamma = 0.75,\; (w_-, w_+) = (0.1, 0.9)$.

Only tackling 2D mode, since FCN-ResNets can only handle 2D data.

- Training set: 1000 pieces of synthetic data
- Validation set: 3 pieces of real data

- FCN-ResNet50 had peak Val IoU of ~55% requiring ~20 epochs.
- FCN-ResNet101 had peak Val IoU of ~63% requiring ~60 epochs.
- U-Net had peak Val IoU of ~70% requiring 5 epochs.

Selected U-Net.

# Loss Selection

## $\mathcal{L}_\text{Weighted Cross-Entropy} + \alpha \mathcal{L}_\text{Focal-Tversky}$

$\alpha = 1,\; \beta = 0.7,\; \gamma = 0.75,\; (w_-, w_+) = (0.1, 0.9)$

- Training set: 1000 pieces of synthetic data
- Validation set: 3 pieces of real data
- Unstable training and exploding gradients. This was not an issue during model selection and I do not believe I made any changes that should have impacted training stability. Nevertheless, we look to tackle issues as they arise.
- Trained well before running into NaNs. 2D U-Net had peak Val IoU of ~50% and 3D U-Net had peak Val IoU of ~79.3%.
Note that segmentations looked near perfect on evaluation, with 2D U-Net only doing badly on Carotid.

## $\mathcal{L}_\text{Weighted Cross-Entropy}$

$(w_-, w_+) = (0.1, 0.9)$

- Training set: 1000 pieces of synthetic data
- Validation set: 3 pieces of real data
- No exploding gradients
- 2D U-Net 60% Val IoU in 2 epochs
- 3D U-Net 76% Val IoU in 5 epochs

## $\mathcal{L}_\text{Dice}$

- Training set: 1000 pieces of synthetic data
- Validation set: 3 pieces of real data
- No exploding gradients
- 2D U-Net 66% Val IoU in 3 epochs
- 3D U-Net 82% Val IoU in 5 epochs

## $\mathcal{L}_\text{Focal-Tversky}$
$\beta = 0.7,\; \gamma = 0.75$

- Training set: 1000 pieces of synthetic data
- Validation set: 3 pieces of real data
- Experienced exploding gradients for both 2D and 3D cases.
- 2D U-Net 49% Val IoU in 1 epoch
- 3D U-Net 79% Val IoU in 1 epoch
- I suspect exploding gradients to be attributed to the Focal-Tversky Loss function.

## $\mathcal{L}_\text{Weighted Cross-Entropy} + \alpha \mathcal{L}_\text{Dice}$

$\alpha = 1.$

- Training set: 1000 pieces of synthetic data
- Validation set: 3 pieces of real data
- 
- 
- 





# Dataset Trials
## Test 1 -- Reducing training set size
Training set: 100 pieces of synthetic data
Validation set: 3 pieces of real data

## Test 2 -- Padding validation set with synthetic data + 20% validation set data split
Training set: 80 pieces of synthetic data
Validation set: 3 pieces of real data + 17 pieces of synthetic data

## Test 3

