<a href="https://colab.research.google.com/github/JohnTichenor/Locating-Bacterial-Flagellar-Motors/blob/main/Submission_Notebook_John.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# IMPORTANT: SOME KAGGLE DATA SOURCES ARE PRIVATE
# RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES.
#import kagglehub
#kagglehub.login()


In [2]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

#byu_locating_bacterial_flagellar_motors_2025_path = kagglehub.competition_download('byu-locating-bacterial-flagellar-motors-2025')
#andrewjdarley_ultralytics_for_offline_install_path = kagglehub.notebook_output_download('andrewjdarley/ultralytics-for-offline-install')
#andrewjdarley_train_yolo_path = kagglehub.notebook_output_download('andrewjdarley/train-yolo')

#print('Data source import complete.')


# BYU Locating Flagellar Motors

## Submission Generation Notebook

This is the fourth and final notebook in a series for the BYU Locating Bacterial Flagellar Motors 2025 Kaggle challenge. This notebook creates predictions on test data and generates the competition submission file.

### Notebook Series:
1. **[Parse Data](https://www.kaggle.com/code/andrewjdarley/parse-data)**: Extracting and preparing 2D slices containing motors to make a YOLO dataset
2. **[Visualize Data](https://www.kaggle.com/code/andrewjdarley/visualize-data)**: Exploratory data analysis and visualization of annotated motor locations
3. **[Train YOLO](https://www.kaggle.com/code/andrewjdarley/train-yolo)**: Fine tuning an YOLOv8 object detection model on the prepared dataset
4. **Submission Notebook (Current)**: Running inference and generating submission files

## Important: Offline Execution
This notebook is designed to run in an offline environment. The Ultralytics YOLOv8 package has been installed using the offline installation method from [this reference notebook](https://www.kaggle.com/code/itsuki9180/ultralytics-for-offline-install). This implementation was brilliant. I use my own copy as input that works effectively the same as the original.

## About this Notebook

This submission notebook implements an optimized inference pipeline that:

1. **Model Loading**: Loads the best trained YOLOv8 weights from the training notebook
2. **GPU Optimization**: Configures CUDA optimizations, half-precision inference, and memory management
3. **Parallel Processing**: Uses CUDA streams and batch processing for efficient GPU utilization
4. **3D Detection**: Processes each slice to locate motors
5. **Non-Maximum Suppression**: Applies 3D NMS to cluster and merge detections across slices
6. **Submission Generation**: Creates the final CSV file with predicted motor coordinates

The code includes advanced optimizations like dynamic batch sizing based on available GPU memory, preloading batches while processing the current batch, and GPU profiling to monitor performance. The CONCENTRATION parameter can be adjusted to trade off between processing speed and detection accuracy. The only reason you'd ever modify CONCENTRATION is just to verify submission capability since full submission takes a few hours.

In [3]:
# !tar xfvz /kaggle/input/ultralytics-for-offline-install/archive.tar.gz
# !pip install --no-index --find-links=./packages ultralytics
# !rm -rf ./packages

## Mount collab

In [4]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [5]:
import os
import numpy as np
import pandas as pd
from PIL import Image
import torch
import cv2
from tqdm.notebook import tqdm
from torchvision.models.detection import fasterrcnn_resnet50_fpn
import threading
import time
from contextlib import nullcontext
from concurrent.futures import ThreadPoolExecutor
from torchvision import transforms

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Define paths
# data_path = "/kaggle/input/byu-locating-bacterial-flagellar-motors-2025/"
# test_dir = os.path.join(data_path, "test")
# submission_path = "/kaggle/working/submission.csv"

test_dir = "/content/drive/MyDrive/Phys417FinalProject/BacterialFlagellarMotorsData/test"
submission_path = "/content/drive/MyDrive/Phys417FinalProject/BacterialFlagellarMotorsData/submission_2.csv"

# Model path - adjust if your best model is saved in a different location
model_path = "/content/drive/MyDrive/Phys417FinalProject/BacterialFlagellarMotorsData/fasterrcnn_motor_detector_2.pth"
NUM_CLASSES = 2  # Example: 1 class + background; change as needed


# Detection parameters
CONFIDENCE_THRESHOLD = 0.45  # Lower threshold to catch more potential motors
MAX_DETECTIONS_PER_TOMO = 3  # Keep track of top N detections per tomogram
NMS_IOU_THRESHOLD = 0.2  # Non-maximum suppression threshold for 3D clustering
CONCENTRATION = 1 # ONLY PROCESS 1/20 slices for fast submission

# GPU profiling context manager
class GPUProfiler:
    def __init__(self, name):
        self.name = name
        self.start_time = None

    def __enter__(self):
        if torch.cuda.is_available():
            torch.cuda.synchronize()
        self.start_time = time.time()
        return self

    def __exit__(self, *args):
        if torch.cuda.is_available():
            torch.cuda.synchronize()
        elapsed = time.time() - self.start_time
        print(f"[PROFILE] {self.name}: {elapsed:.3f}s")

# Check GPU availability and set up optimizations
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
BATCH_SIZE = 8  # Default batch size, will be adjusted dynamically if GPU available

if device.startswith('cuda'):
    # Set CUDA optimization flags
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.deterministic = False
    torch.backends.cuda.matmul.allow_tf32 = True  # Allow TF32 on Ampere GPUs
    torch.backends.cudnn.allow_tf32 = True

    # Print GPU info
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9  # Convert to GB
    print(f"Using GPU: {gpu_name} with {gpu_mem:.2f} GB memory")

    # Get available GPU memory and set batch size accordingly
    free_mem = gpu_mem - torch.cuda.memory_allocated(0) / 1e9
    BATCH_SIZE = max(8, min(32, int(free_mem * 4)))  # 4 images per GB as rough estimate
    print(f"Dynamic batch size set to {BATCH_SIZE} based on {free_mem:.2f}GB free memory")
else:
    print("GPU not available, using CPU")
    BATCH_SIZE = 4  # Reduce batch size for CPU


def load_and_preprocess_image(img_path):
    # Loads image as PIL, converts to tensor, normalizes as expected by torchvision models
    img = Image.open(img_path).convert("RGB")
    preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    return preprocess(img)


def normalize_slice(slice_data):
    """
    Normalize slice data using 2nd and 98th percentiles for better contrast
    """
    p2 = np.percentile(slice_data, 2)
    p98 = np.percentile(slice_data, 98)
    clipped_data = np.clip(slice_data, p2, p98)
    normalized = 255 * (clipped_data - p2) / (p98 - p2)
    return np.uint8(normalized)

def preload_image_batch(file_paths):
    """Preload a batch of images to CPU memory"""
    images = []
    for path in file_paths:
        img = cv2.imread(path)
        if img is None:
            # Try with PIL as fallback
            img = np.array(Image.open(path))
        images.append(img)
    return images

def process_tomogram(tomo_id, model, index=0, total=1):
    print(f"Processing tomogram {tomo_id} ({index}/{total})")
    tomo_dir = os.path.join(test_dir, tomo_id)
    slice_files = sorted([f for f in os.listdir(tomo_dir) if f.endswith('.jpg')])

    selected_indices = np.linspace(0, len(slice_files)-1, int(len(slice_files) * CONCENTRATION))
    selected_indices = np.round(selected_indices).astype(int)
    slice_files = [slice_files[i] for i in selected_indices]

    print(f"Processing {len(slice_files)} out of {len(os.listdir(tomo_dir))} slices based on CONCENTRATION={CONCENTRATION}")

    all_detections = []

    for idx, slice_file in enumerate(slice_files):
        img_path = os.path.join(tomo_dir, slice_file)
        slice_num = int(slice_file.split('_')[1].split('.')[0])

        # Preprocess for Faster R-CNN
        image_tensor = load_and_preprocess_image(img_path).to(device).unsqueeze(0)  # shape (1, C, H, W)

        with torch.no_grad():
            outputs = model(image_tensor)

        boxes = outputs[0]['boxes'].cpu().numpy()  # (N, 4)
        scores = outputs[0]['scores'].cpu().numpy()  # (N,)
        labels = outputs[0]['labels'].cpu().numpy()  # (N,)

        # Filter out detections below confidence threshold and (optionally) by class
        for box, score, label in zip(boxes, scores, labels):
            if score >= CONFIDENCE_THRESHOLD:
                x1, y1, x2, y2 = box
                x_center = (x1 + x2) / 2
                y_center = (y1 + y2) / 2

                all_detections.append({
                    'z': round(slice_num),
                    'y': round(y_center),
                    'x': round(x_center),
                    'confidence': float(score)
                })

    # Non-maximum suppression as before
    final_detections = perform_3d_nms(all_detections, NMS_IOU_THRESHOLD)
    final_detections.sort(key=lambda x: x['confidence'], reverse=True)

    if not final_detections:
        return {
            'tomo_id': tomo_id,
            'Motor axis 0': -1,
            'Motor axis 1': -1,
            'Motor axis 2': -1
        }

    best_detection = final_detections[0]
    return {
        'tomo_id': tomo_id,
        'Motor axis 0': round(best_detection['z']),
        'Motor axis 1': round(best_detection['y']),
        'Motor axis 2': round(best_detection['x'])
    }


def perform_3d_nms(detections, iou_threshold):
    """
    Perform 3D Non-Maximum Suppression on detections to merge nearby motors
    """
    if not detections:
        return []

    # Sort by confidence (highest first)
    detections = sorted(detections, key=lambda x: x['confidence'], reverse=True)

    # List to store final detections after NMS
    final_detections = []

    # Define 3D distance function
    def distance_3d(d1, d2):
        return np.sqrt((d1['z'] - d2['z'])**2 +
                       (d1['y'] - d2['y'])**2 +
                       (d1['x'] - d2['x'])**2)

    # Maximum distance threshold (based on box size and slice gap)
    box_size = 24  # Same as annotation box size
    distance_threshold = box_size * iou_threshold

    # Process each detection
    while detections:
        # Take the detection with highest confidence
        best_detection = detections.pop(0)
        final_detections.append(best_detection)

        # Filter out detections that are too close to the best detection
        detections = [d for d in detections if distance_3d(d, best_detection) > distance_threshold]

    return final_detections

def debug_image_loading(tomo_id):
    """
    Debug function to check image loading
    """
    tomo_dir = os.path.join(test_dir, tomo_id)
    slice_files = sorted([f for f in os.listdir(tomo_dir) if f.endswith('.jpg')])

    if not slice_files:
        print(f"No image files found in {tomo_dir}")
        return

    print(f"Found {len(slice_files)} image files in {tomo_dir}")
    sample_file = slice_files[len(slice_files)//2]  # Middle slice
    img_path = os.path.join(tomo_dir, sample_file)

    # Try different loading methods
    try:
        # Method 1: PIL
        img_pil = Image.open(img_path)
        img_array_pil = np.array(img_pil)
        print(f"PIL Image shape: {img_array_pil.shape}, dtype: {img_array_pil.dtype}")

        # Method 2: OpenCV
        img_cv2 = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
        print(f"OpenCV Image shape: {img_cv2.shape}, dtype: {img_cv2.dtype}")

        # Method 3: Convert to RGB
        img_rgb = cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_BGR2RGB)
        print(f"OpenCV RGB Image shape: {img_rgb.shape}, dtype: {img_rgb.dtype}")

        print("Image loading successful!")
    except Exception as e:
        print(f"Error loading image {img_path}: {e}")


def generate_submission():
    """
    Main function to generate the submission file
    """
    # Get list of test tomograms
    test_tomos = sorted([d for d in os.listdir(test_dir) if os.path.isdir(os.path.join(test_dir, d))])
    total_tomos = len(test_tomos)

    print(f"Found {total_tomos} tomograms in test directory")

    # Debug image loading for the first tomogram
    if test_tomos:
        debug_image_loading(test_tomos[0])

    # Clear GPU cache before starting
    if torch.cuda.is_available():
        torch.cuda.empty_cache()


    model = fasterrcnn_resnet50_fpn(pretrained=False, num_classes=NUM_CLASSES)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.to(device)
    model.eval()
    print(f"Loaded Faster R-CNN model from {model_path} onto {device}")

    # Process tomograms with parallelization
    results = []
    motors_found = 0

    # Using ThreadPoolExecutor with max_workers=1 since each worker uses the GPU already
    # and we're parallelizing within each tomogram processing
    with ThreadPoolExecutor(max_workers=1) as executor:
        future_to_tomo = {}

        # Submit all tomograms for processing
        for i, tomo_id in enumerate(test_tomos, 1):
            future = executor.submit(process_tomogram, tomo_id, model, i, total_tomos)
            future_to_tomo[future] = tomo_id

        # Process completed futures as they complete
        for future in future_to_tomo:
            tomo_id = future_to_tomo[future]
            try:
                # Clear CUDA cache between tomograms
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()

                result = future.result()
                results.append(result)

                # Update motors found count
                has_motor = not pd.isna(result['Motor axis 0'])
                if has_motor:
                    motors_found += 1
                    print(f"Motor found in {tomo_id} at position: "
                          f"z={result['Motor axis 0']}, y={result['Motor axis 1']}, x={result['Motor axis 2']}")
                else:
                    print(f"No motor detected in {tomo_id}")

                print(f"Current detection rate: {motors_found}/{len(results)} ({motors_found/len(results)*100:.1f}%)")

            except Exception as e:
                print(f"Error processing {tomo_id}: {e}")
                # Create a default entry for failed tomograms
                results.append({
                    'tomo_id': tomo_id,
                    'Motor axis 0': -1,
                    'Motor axis 1': -1,
                    'Motor axis 2': -1
                })

    # Create submission dataframe
    submission_df = pd.DataFrame(results)

    # Ensure proper column order
    submission_df = submission_df[['tomo_id', 'Motor axis 0', 'Motor axis 1', 'Motor axis 2']]

    # Save the submission file
    submission_df.to_csv(submission_path, index=False)

    print(f"\nSubmission complete!")
    print(f"Motors detected: {motors_found}/{total_tomos} ({motors_found/total_tomos*100:.1f}%)")
    print(f"Submission saved to: {submission_path}")

    # Display first few rows of submission
    print("\nSubmission preview:")
    print(submission_df.head())

    return submission_df

# Run the submission pipeline
if __name__ == "__main__":
    # Time entire process
    start_time = time.time()

    # Generate submission
    submission = generate_submission()

    # Print total execution time
    elapsed = time.time() - start_time
    print(f"\nTotal execution time: {elapsed:.2f} seconds ({elapsed/60:.2f} minutes)")

Using GPU: NVIDIA A100-SXM4-40GB with 42.47 GB memory
Dynamic batch size set to 32 based on 42.47GB free memory
Found 3 tomograms in test directory
Found 500 image files in /content/drive/MyDrive/Phys417FinalProject/BacterialFlagellarMotorsData/test/tomo_003acc
PIL Image shape: (1912, 1847), dtype: uint8
OpenCV Image shape: (1912, 1847), dtype: uint8
OpenCV RGB Image shape: (1912, 1847, 3), dtype: uint8
Image loading successful!


Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 223MB/s]


Loaded Faster R-CNN model from /content/drive/MyDrive/Phys417FinalProject/BacterialFlagellarMotorsData/fasterrcnn_motor_detector_2.pth onto cuda:0
Processing tomogram tomo_003acc (1/3)
Processing 500 out of 500 slices based on CONCENTRATION=1
Processing tomogram tomo_00e047 (2/3)
Motor found in tomo_003acc at position: z=-1, y=-1, x=-1
Current detection rate: 1/1 (100.0%)
Processing 300 out of 300 slices based on CONCENTRATION=1
Processing tomogram tomo_01a877 (3/3)Motor found in tomo_00e047 at position: z=170, y=548, x=607
Current detection rate: 2/2 (100.0%)

Processing 300 out of 300 slices based on CONCENTRATION=1
Motor found in tomo_01a877 at position: z=138, y=639, x=286
Current detection rate: 3/3 (100.0%)

Submission complete!
Motors detected: 3/3 (100.0%)
Submission saved to: /content/drive/MyDrive/Phys417FinalProject/BacterialFlagellarMotorsData/submission_2.csv

Submission preview:
       tomo_id  Motor axis 0  Motor axis 1  Motor axis 2
0  tomo_003acc            -1         