Pre-trained encoder: R50+ViT-B/16
Input shape: (224, 224, 3)
Encoder trainable: False
Batch size: 64
Epochs: 2,190
Input Image (224, 224, 3) → Vision Transformer + CUP (224, 224, 16) → Conv1x1 (224, 224, 5)
Total params: 100,861,024
Trainable params: 7,409,632
Non-trainable params: 93,451,392
- Remove malformed images from entire dataset.
- Fill missing polygons from label data using CVAT. (Thanks to @kdh93)
- Exports its ROI, binary masks and transformed original images.
Used tf.data.Dataset
to boost training performance
- Scaling (1.0 / 255)
- Random Flip Left-Right
- Random Flip Up-Down
- Random Crop
- Random Brightness (-0.2 ~ +0.2)
- Random Rotation (90°, 180°, 270°)
- Gaussian Noise (mean = 0, stddev = 0.05)
The results are hilariously bad, because we didn't know that we must use Focal Loss or Weighted CCE for imbalanced dataset. Although we no longer have access to the GPU server, now we know that we could have better results.
- CCE * 0.5 + Dice * 0.5
- Binary IoU
- Number of classes = 5 (default)
- Threshold = 0.5
- Masked with [0, 1, 1, 1, 1] to ignore non-cancer area
- SGD
- Momentum = 0.9
- Learning Rate Scheduler
- Cosine Annealing Warmup Restarts
- First cycle steps = 100
- Initial learning rate = 1e-3
- First decay steps = 300
- t_mul = 1.0
- m_mul = 1.0 (default)
- alpha = 0.0 (default)
- Cosine Annealing Warmup Restarts
Cosine Annealing Warmup Restarts: