##  Baseline Mixture of Experts (MoE) Inference — Initial Version

This notebook implements our **first version** of the **Mixture of Experts (MoE)** inference system for fire detection, before introducing any architectural or fusion improvements.

###  Gating Network:
- A **simple convolutional neural network (CNN)** is used as the gating module.
- It takes the input image and predicts softmax weights for the 4 expert models, each trained on a different scenario (Outdoor, Indoor, FarField, Satellite).
- These weights control how much influence each expert's prediction has during inference.

###  Expert Models:
- Four YOLOv8 models trained on different scene-specific datasets.
- Each expert independently makes predictions on the same input image.

###  Prediction Fusion:
- After obtaining the weighted predictions from all experts, we combine their outputs using **traditional Non-Maximum Suppression (NMS)**.
- NMS suppresses overlapping bounding boxes by retaining only the highest-confidence ones.

###  Purpose of This Notebook:
- Establish a **baseline MoE setup** without attention or advanced fusion techniques.
- Serve as a comparison point for future iterations that include **attention-enhanced gating** and **Weighted Box Fusion (WBF)**.




In [None]:
class GatingCNN(nn.Module):
    def __init__(self, num_classes=4):
        super(GatingCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),

            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),

            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x


In [None]:
from torchvision.ops import nms
from ultralytics.engine.results import Boxes

def run_moe(image_path, conf_threshold=0.3, iou_threshold=0.5):
    # Step 1: Preprocess the image
    img_tensor = preprocess_image(image_path)
    
    # Step 2: Get expert weights from gating model
    gate_weights = get_gate_weights(img_tensor)
    # print("Gating weights:", gate_weights)

    all_boxes_raw = []

    # Step 3: Run each expert and collect weighted predictions
    for i, expert in enumerate(experts):
        result = expert(image_path, verbose=False)[0]
        weight = gate_weights[i]
        if result.boxes is not None and result.boxes.conf is not None:
            # Scale confidence by gating weight
            conf = result.boxes.conf * weight
            keep_mask = conf > conf_threshold

            if keep_mask.sum() > 0:
                xyxy = result.boxes.xyxy[keep_mask]
                conf = conf[keep_mask]
                cls = result.boxes.cls[keep_mask]

                # Format: [x1, y1, x2, y2, conf, cls]
                combined = torch.cat([xyxy, conf.unsqueeze(1), cls.unsqueeze(1)], dim=1)
                all_boxes_raw.append(combined)

    # Step 4: If no boxes from any expert, return empty
    if not all_boxes_raw:
        return []

    # Step 5: Merge all boxes and apply NMS
    all_boxes_combined = torch.cat(all_boxes_raw, dim=0)
    boxes = all_boxes_combined[:, :4]
    scores = all_boxes_combined[:, 4]
    classes = all_boxes_combined[:, 5]

    # Apply Non-Maximum Suppression
    keep_indices = nms(boxes, scores, iou_threshold=iou_threshold)

    final_kept = all_boxes_combined[keep_indices]
    kept_boxes = Boxes(final_kept, orig_shape=result.orig_img.shape[:2])
    
    return [kept_boxes]
