---

# üéØ PHASE 1: QUICK WINS - Foundation Improvements

## 1.1 Advanced Data Augmentation

### ‚ùì T·∫°i Sao C·∫ßn C·∫£i Thi·ªán?

**V·∫•n ƒë·ªÅ c·ªßa m√¥ h√¨nh g·ªëc:**
- Ch·ªâ s·ª≠ d·ª•ng augmentation c∆° b·∫£n (flip, rotate)
- Kh√¥ng t·∫≠n d·ª•ng domain knowledge c·ªßa medical imaging
- Generalization k√©m khi g·∫∑p variations m·ªõi

### üí° Gi·∫£i Ph√°p

S·ª≠ d·ª•ng **Albumentations** v·ªõi c√°c augmentation ƒë∆∞·ª£c thi·∫øt k·∫ø ƒë·∫∑c bi·ªát cho X-ray:

1. **CLAHE (Contrast Limited Adaptive Histogram Equalization)**
   - C·∫£i thi·ªán contrast cho ·∫£nh X-ray
   - Gi√∫p highlight c√°c v√πng b·ªánh l√Ω kh√¥ng r√µ r√†ng

2. **ShiftScaleRotate**
   - M√¥ ph·ªèng c√°c g√≥c ch·ª•p kh√°c nhau
   - Robust v·ªõi positioning variations

3. **GaussNoise & GaussianBlur**
   - M√¥ ph·ªèng ch·∫•t l∆∞·ª£ng ·∫£nh kh√°c nhau
   - Robust v·ªõi imaging equipment variations

### üìà Expected Impact
- **+1-2% AUC** improvement
- Better generalization to unseen data
- Reduced overfitting

In [1]:
def get_train_transforms(img_size=224):
    """
    Advanced augmentation pipeline cho training data
    
    Thi·∫øt k·∫ø d·ª±a tr√™n:
    1. Medical imaging best practices
    2. Empirical studies on chest X-ray augmentation
    3. ImageNet normalization cho transfer learning
    """
    return A.Compose([
        # Resize & crop
        A.Resize(int(img_size * 1.15), int(img_size * 1.15)),
        A.RandomCrop(img_size, img_size),
        
        # Geometric transformations
        A.HorizontalFlip(p=0.5),  # X-ray c√≥ th·ªÉ flip horizontally
        A.ShiftScaleRotate(
            shift_limit=0.1,      # Shift 10% - m√¥ ph·ªèng positioning
            scale_limit=0.15,     # Scale ¬±15% - m√¥ ph·ªèng kho·∫£ng c√°ch ch·ª•p
            rotate_limit=15,      # Rotate ¬±15¬∞ - m√¥ ph·ªèng g√≥c ch·ª•p
            border_mode=cv2.BORDER_CONSTANT,
            value=0,
            p=0.5
        ),
        
        # Noise & blur - m√¥ ph·ªèng ch·∫•t l∆∞·ª£ng thi·∫øt b·ªã
        A.OneOf([
            A.GaussNoise(var_limit=(10, 50), p=1.0),
            A.GaussianBlur(blur_limit=(3, 5), p=1.0),
            A.MotionBlur(blur_limit=5, p=1.0),
        ], p=0.3),
        
        # Contrast & brightness - critical for X-ray
        A.RandomBrightnessContrast(
            brightness_limit=0.2,
            contrast_limit=0.2,
            p=0.5
        ),
        
        # CLAHE - Medical imaging specific
        # C·∫£i thi·ªán contrast c·ª•c b·ªô, quan tr·ªçng cho ph√°t hi·ªán b·ªánh l√Ω
        A.CLAHE(
            clip_limit=4.0,
            tile_grid_size=(8, 8),
            p=0.5
        ),
        
        # Optional: Grid distortion (m√¥ ph·ªèng deformation)
        A.GridDistortion(
            num_steps=5,
            distort_limit=0.05,
            p=0.2
        ),
        
        # Normalization - ImageNet stats cho transfer learning
        A.Normalize(
            mean=[0.485, 0.456, 0.406],  # ImageNet mean
            std=[0.229, 0.224, 0.225],   # ImageNet std
        ),
        ToTensorV2(),
    ])

def get_valid_transforms(img_size=224):
    """
    Validation transforms - NO augmentation
    Ch·ªâ resize v√† normalize
    """
    return A.Compose([
        A.Resize(img_size, img_size),
        A.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],
        ),
        ToTensorV2(),
    ])

print("‚úÖ Advanced augmentation pipeline created")
print("üìã Training augmentations:")
print("   - Resize & Random Crop")
print("   - Horizontal Flip")
print("   - ShiftScaleRotate")
print("   - Noise & Blur variations")
print("   - Brightness & Contrast")
print("   - CLAHE (Medical-specific)")
print("   - Grid Distortion")

‚úÖ Advanced augmentation pipeline created
üìã Training augmentations:
   - Resize & Random Crop
   - Horizontal Flip
   - ShiftScaleRotate
   - Noise & Blur variations
   - Brightness & Contrast
   - CLAHE (Medical-specific)
   - Grid Distortion


### üîç Visualization: So S√°nh Augmentation

H√£y xem s·ª± kh√°c bi·ªát gi·ªØa augmentation c∆° b·∫£n v√† advanced

In [2]:
def visualize_augmentations(image_path, n_samples=6):
    """
    Visualize effect of augmentation pipeline
    """
    # Load image
    image = cv2.imread(str(image_path))
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Get transforms
    train_transform = get_train_transforms(224)
    
    # Create figure
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    fig.suptitle('Advanced Augmentation Examples', fontsize=16, fontweight='bold')
    
    axes = axes.ravel()
    
    for idx in range(n_samples):
        # Apply augmentation
        augmented = train_transform(image=image)
        aug_image = augmented['image']
        
        # Denormalize for visualization
        mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
        std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
        aug_image = aug_image * std + mean
        aug_image = aug_image.permute(1, 2, 0).numpy()
        aug_image = np.clip(aug_image, 0, 1)
        
        axes[idx].imshow(aug_image)
        axes[idx].set_title(f'Augmented Sample {idx+1}')
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.show()

print("üì∏ Augmentation visualization function ready")
print("   Use: visualize_augmentations('path/to/xray.png')")

üì∏ Augmentation visualization function ready
   Use: visualize_augmentations('path/to/xray.png')
