# Breast Cancer Segmentation TciaBD

We will designing and developing MSGRAP DL model based on H. Lee's research paper. We will adapt it for be trained on breast cancer dataset (CT, MRI, PET) since H. Lee originally trained it on breast cancer ultrasound images. We will note down the differences as we work out the implementation.

[1] H. Lee, J. Park and J. Y. Hwang, "Channel Attention Module With Multiscale Grid Average Pooling for Breast Cancer Segmentation in an Ultrasound Image," in IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 67, no. 7, pp. 1344-1353, July 2020, doi: [10.1109/TUFFC.2020.2972573](https://ieeexplore-ieee-org.libaccess.sjlibrary.org/document/8988165)

## Outline

- Prepare Tcia Breast Diagnosis Data
- Breast Segmentation Model Architecture
- Train Breast Segmentation ML/DL Models
- Evaluate Breast Segmentation ML/DL Models Quantitatively
- Evaluate Breast Segmentation ML/DL Models Qualitatively
- Deploy Breast Segmentation DL Model for Inference

## Breast Segmentation Model Architecture

DL MSGRAP Breast Cancer Segmentation Architecture:

- Encoder
    - Based on VGGNet except for batch normalization and channel attention modules
    - Use all conv layers with C' filters with size 3x3xC
    - Upsample the feature maps using a 4x4 transpose conv with a stride of 2
    - Unlike UNet, only 2 feature maps from encoder are connected to decoder
- Decoder
    - symmetrical encoder architecture built into decoder
    - Use all conv layers with C' filters with size 3x3xC
    - Final conv layer has 2 filters with size 3x3x64
    - Obtain final segmentation results into binary classes via argmax with threshold 0.5

The network receives a breast ultrasound image as input and predicts its semantic segmentation result.

- Note: C and C' are the previous and current number of feature maps, except for the final conv layer

- Note: batch normalization is highly influenced by a batch size: the smaller the batch size, the lower the performance is. Small batch size reduces the generalization ability. H. Lee et al uses group batch normalization since it has little effect on batch size and the dataset used in experiments had enough spatial size

- Note: group normalization divides each channel into N groups and normalizes the features within each group regardless of the batch size. Thus, it doesnt depend on the batch size and can overcome the generalization issues caused by the small batch size when the network is trained with large input imagess.

- Note: After performing additional experiment, H. Lee et al demonstrated that their  network architecture with 2 feature maps connected between the encoder and decoder performed better than UNet like architectures.
    - It is better to not use low-level features in the network since most ultrasound images are noisy. (similar for PET, SPECT).
    
- Note: H. Lee et al also conducts the ablation study for architecture with two of the tenfold datasets.

- Note: Semantic segmentation F1 Score: UNet = 0.78, H. Lee's MSGRAP = 0.79

![msgrap_h_lee_breast_cancer_segmentation](./msgrap_h_lee_breast_cancer_segmentation.jpg)


## Train Breast Segmentation ML/DL Models

To evaluate the performance of their proposed networks (MSGRAP, etc), H. Lee et al trained different models, such as 

- FCN
- UNet
- SegNet
- PSPNet-18

Then compared their performance.

Loss function:

`L_theta_k_D = -(M-1)_sum_(c=0) (GT_c)log(f(I_theta)_c)`

- Note: GT_c is the predicted probability and the binary indicator for the class, c
- Note: breast cancer segmentation is a binary classification for each pixel, use `M = 2`

- Note: since the breast ultrasound cancer datasets were limited, H. Lee et al  configured the training and testing processes as tenfold cross-validation.

- Note: Divided the data into 146 or 147 breast cancer ultrasound images for training
    - 16 or 17 breast cancer ultrasound images for testing in each validation step

- Note: H. Lee et al agumented each patch by random horizontal and vertical flips and random 90 deg rotations

- Note: for training and testing, all image sizes were set to average 454x537 pixels

- Note: for training, the weights of all conv layers were initialized `Kaiming initialization`

Note: Adam optimization method with parameters was used:

- Beta_1 = 0.9
- Beta_2 = 0.999
- epsilon = 10^-8

- Note: `Learning rate = 10^-3` and was `reduced by half` every `30 epochs`

- Note: `mini-batch size = 8`

- Note: models were trained for `120 epochs`

- Note: PyTorch was used to implement and train networks

NOTE: it took four days to train the models using:

- Intel Zeon E5-2620 at 2.0 GHz
- NVIDIA TITAN RTX (24GB)

## Evaluate Breast Segmentation ML/DL Models Quantitatively

### Performance Metrics

For quantitative comparisions of DL MSGRAP with other methods, H. Lee et al used global accuracy, F1 score, sensitivity, specificity.

Accuracy = `(TP+TN)/(TP+FP+FN+TN)`

F1 = `(2*TP)/(2*TP+FP+FN)`

IoU = `TP/(TP+FP+FN)`

- Note: Accuracy is most basic metric for several CV tasks
- Note: F1 score is good for imbalanced data. For ex, in H. Lee et al's dataset, there was a small portion of cancer among all breast ultrasound images, this data set can be considered imbalanced.
- Note: this dataset was imbalanced, it consists of 5% of cancer pixels and 95% of nomal pixels. So, H. Lee et al used FPR, precision, intersection over union (IoU), and area under the curve (AUC) of precision and recall (PR) for fair evaluation.
- Note: AUC metrics used a sweep of the threshold from `p=0` to `p=1` as opposed to `p=0.5` (argmax) used for the remaining non-AUC metrics.
- Note: FPR is number of false positive over the one of the condition negatives.
- Note: IoU metric is commonly used in semantic segmentation

Show metrics in a table for models:

- FCN, UNet, SegNet, PSPNet-18, ENCNet-18
- Ours-GAP, Ours-GRAP, Ours-MSGRAP

H. Lee et al didn't use semantic segmentation networks based on ResNet-34, 51 and DesneNet that have more than 18 conv layers, so there would be fair comparisons since those networks have many more parameters than theirs.

- Ours-MSGRAP had better `F1 score = 0.7658` than other models
    - FCN = 0.7123, UNet = 0.7132, SegNet = 0.7225, PSPNet-18 = 0.7520, ENCNet-18 = 0.7266
    
Ours-MSGRAP showed higher performance than other models in global accuracy, specificity, FPR, precision and IoU