<div align="center">

  <a href="https://ultralytics.com/yolov5" target="_blank">
    <img width="1024", src="https://raw.githubusercontent.com/ultralytics/assets/main/yolov5/v70/splash.png"></a>

[中文](https://docs.ultralytics.com/zh/) | [한국어](https://docs.ultralytics.com/ko/) | [日本語](https://docs.ultralytics.com/ja/) | [Русский](https://docs.ultralytics.com/ru/) | [Deutsch](https://docs.ultralytics.com/de/) | [Français](https://docs.ultralytics.com/fr/) | [Español](https://docs.ultralytics.com/es/) | [Português](https://docs.ultralytics.com/pt/) | [العربية](https://docs.ultralytics.com/ar/)

  <a href="https://bit.ly/yolov5-paperspace-notebook"><img src="https://assets.paperspace.io/img/gradient-badge.svg" alt="Run on Gradient"></a>
  <a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
  <a href="https://www.kaggle.com/models/ultralytics/yolov5"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open In Kaggle"></a>

This <a href="https://github.com/ultralytics/yolov5">YOLOv5</a> 🚀 notebook by <a href="https://ultralytics.com">Ultralytics</a> presents simple train, validate and predict examples to help start your AI adventure.<br>We hope that the resources in this notebook will help you get the most out of YOLOv5. Please browse the YOLOv5 <a href="https://docs.ultralytics.com/yolov5">Docs</a> for details, raise an issue on <a href="https://github.com/ultralytics/yolov5">GitHub</a> for support, and join our <a href="https://ultralytics.com/discord">Discord</a> community for questions and discussions!

</div>

# Setup

Clone GitHub [repository](https://github.com/ultralytics/yolov5), install [dependencies](https://github.com/ultralytics/yolov5/blob/master/requirements.txt) and check PyTorch and GPU.

In [1]:
!git clone https://github.com/ultralytics/yolov5  # clone
%cd yolov5
%pip install -qr requirements.txt comet_ml  # install

import torch
import utils
display = utils.notebook_init()  # checks

YOLOv5 🚀 v7.0-409-ge9ab205e Python-3.11.11 torch-2.6.0+cu124 CPU


Setup complete ✅ (2 CPUs, 12.7 GB RAM, 41.3/107.7 GB disk)


In [2]:
from google.colab import drive
drive.mount('/content/drive')

# Verify files (replace with your path)
!ls "/content/drive/MyDrive/annotated_imgs"

Mounted at /content/drive
K1_msz.png  K3_msz.png	K4_msz.png  T9_msz.png	Z9_msz.png


In [89]:
# Add this before training:
!wget https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s-seg.pt

--2025-03-28 12:52:37--  https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s-seg.pt
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/11a51425-536d-402d-919d-d933efbde7fa?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250328%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250328T125237Z&X-Amz-Expires=300&X-Amz-Signature=53ed6102f4df7f472fbd3da00cc96d2dba9dfeaa5836ffbeeefcd976fc2c0354&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dyolov5s-seg.pt&response-content-type=application%2Foctet-stream [following]
--2025-03-28 12:52:37--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/11a51425-536d-402d-919d-d933efbde7fa?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credent

In [92]:
# ========== MOUNT DRIVE & IMPORTS ==========
from google.colab import drive
#drive.mount('/content/drive')

import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from skimage.segmentation import watershed
from skimage.feature import peak_local_max
from scipy import ndimage
import re

# ========== CONFIGURATION ==========
ANNOTATED_DIR = Path("/content/drive/MyDrive/annotated_imgs")
ORIGINAL_DIR = Path("/content/drive/MyDrive/dataset/originals")
OUTPUT_DIR = Path("/content/drive/MyDrive/dataset/precise_labels")
VISUAL_DIR = Path("/content/drive/MyDrive/dataset/visualizations")

# ========== ULTRA-OPTIMIZED PARAMETERS ==========
# Annotation Detection
BLUE_LOWER = np.array([75, 35, 35])  # Even wider blue range
BLUE_UPPER = np.array([145, 255, 255])
MIN_DOT_AREA = 5                     # Smaller minimum dot size
MAX_DOT_AREA = 500                   # Larger maximum dot size

# Region Growing (Aggressive settings)
GROWTH_STEPS = 20                    # More growth phases
GROWTH_FACTOR = 3.7                  # Faster tolerance relaxation
NEIGHBORHOOD_SIZE = 20               # Larger search area for seed relocation
BASE_TOL = 60                        # Initial tolerance for hue and sat
BASE_VAL_TOL = 60                    # Tighter initial tolerance for value
ALPHA = 2.6                          # Aggressiveness factor for faster relaxation
BETA = 0.55                          # Slower relaxation for V

# Post-processing
MORPH_KERNEL_SIZE = 8                # Larger kernel for better hole filling
MIN_DISTANCE_WATERSHED = 18          # Reduced minimum distance
MIN_BACTERIA_AREA = 10               # Smaller minimum bacteria size
MAX_BACTERIA_AREA = 5000             # Larger maximum bacteria size

# ========== CORE FUNCTIONS ==========
def detect_annotations(annot_img):
    """Enhanced blue dot detection with size filtering"""
    hsv = cv2.cvtColor(annot_img, cv2.COLOR_BGR2HSV)
    blue_mask = cv2.inRange(hsv, BLUE_LOWER, BLUE_UPPER)

    # Enhance dots
    kernel = np.ones((5,5), np.uint8)
    blue_mask = cv2.morphologyEx(blue_mask, cv2.MORPH_CLOSE, kernel)
    blue_mask = cv2.dilate(blue_mask, kernel, iterations=1)

    # Find and filter contours
    contours, _ = cv2.findContours(blue_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    annotations = []
    for cnt in contours:
        area = cv2.contourArea(cnt)
        if MIN_DOT_AREA < area < MAX_DOT_AREA:
            M = cv2.moments(cnt)
            if M["m00"] > 0:
                cX = int(M["m10"] / M["m00"])
                cY = int(M["m01"] / M["m00"])
                annotations.append((cX, cY))

    # Debug output
    print(f"Detected {len(annotations)} annotation points")
    return annotations

# ========== ENHANCED SEED RELOCATION ==========
def relocate_seed(seed, hsv_img, bg_mask, img_shape):
    width, height = img_shape[1], img_shape[0]
    x, y = np.clip(seed[0], 0, width-1), np.clip(seed[1], 0, height-1)
    best_score = -np.inf
    best_pos = (x, y)

    avg_brightness = np.mean(hsv_img[: ,: ,2])
    if avg_brightness < 100:
        w_hue = 0.1
        w_sat = 0.2
        w_val = 0.7
    else:
        w_hue = 0.2
        w_sat = 0.3
        w_val = 0.5
    if avg_brightness < 80:
        w_val = 0.8
        w_sat = 0.1
        w_val = 0.1

    # Larger search neighborhood
    for dx in range(-NEIGHBORHOOD_SIZE, NEIGHBORHOOD_SIZE+1):
        for dy in range(-NEIGHBORHOOD_SIZE, NEIGHBORHOOD_SIZE+1):
            nx = np.clip(x + dx, 0, width-1)
            ny = np.clip(y + dy, 0, height-1)

            if bg_mask[ny, nx]: continue

            # Larger window for better context
            y1, y2 = max(0, ny-7), min(ny+8, height)
            x1, x2 = max(0, nx-7), min(nx+8, width)
            window = hsv_img[y1:y2, x1:x2]

            if window.size == 0: continue

            std_dev = np.std(window, axis=(0,1))
            score = std_dev[0] * 3 + std_dev[1]  # Weighted towards hue
            score = (std_dev[0] * w_hue) + (std_dev[1] * w_sat) + (std_dev[2] * w_val)

            if score > best_score:
                best_score = score
                best_pos = (nx, ny)

    return best_pos

# ========== SUPERCHARGED REGION GROWING ==========
def adaptive_region_growing(annot_img, orig_img):
    # Resize annotation if needed
    if annot_img.shape[:2] != orig_img.shape[:2]:
        annot_img = cv2.resize(annot_img, (orig_img.shape[1], orig_img.shape[0]),
                             interpolation=cv2.INTER_NEAREST)

    # Enhanced annotation detection
    annotations = detect_annotations(annot_img)
    if not annotations:
        print("⚠️ No annotations detected!")
        return []

    # Convert to HSV and create background mask
    hsv = cv2.cvtColor(orig_img, cv2.COLOR_BGR2HSV)
    _, bg_mask = cv2.threshold(hsv[:,:,2], 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
    bg_mask = (bg_mask == 0).astype(np.uint8)

    # calculating avg brightness
    avg_brightness = np.mean(hsv[: ,: ,2])
    if avg_brightness < 100:
      w_hue = 0.1
      w_sat = 0.2
      w_val = 0.7
    else:
      w_hue = 0.2
      w_sat = 0.3
      w_val = 0.5

    # Supercharged region growing
    final_mask = np.zeros_like(hsv[:,:,0], dtype=np.uint8)
    for seed in annotations:
        best_seed = relocate_seed(seed, hsv, bg_mask, orig_img.shape)
        h_ref, s_ref, v_ref = hsv[best_seed[1], best_seed[0]]
        region = np.zeros_like(final_mask)
        queue = [best_seed]
        visited = set()

        for tolerance in np.linspace(1.0, GROWTH_FACTOR, GROWTH_STEPS):
            tol = BASE_TOL * (1 + ALPHA * (tolerance/GROWTH_STEPS))
            val_tol = BASE_VAL_TOL * (1 + BETA * (tolerance/GROWTH_STEPS))

            while queue:
                x, y = queue.pop(0)
                x = np.clip(x, 0, orig_img.shape[1]-1)
                y = np.clip(y, 0, orig_img.shape[0]-1)

                if (x, y) in visited: continue
                if bg_mask[y, x]: continue

                h, s, v = hsv[y, x]
                hue_diff = min(abs(h - h_ref), 180 - abs(h - h_ref))
                sat_diff = abs(s - s_ref)
                val_diff = abs(v - v_ref)

                # Dynamic adaptive thresholding

                if (w_hue * hue_diff + w_sat * sat_diff) < tol and w_val * val_diff < val_tol:
                #if (0.6 * hue_diff + 0.4 * sat_diff) < h_tol:
                    region[y, x] = 255
                    visited.add((x, y))

                    # 8-way expansion with probabilistic sampling
                    neighbors = [(x+dx, y+dy) for dx in [-1,0,1] for dy in [-1,0,1] if dx != 0 or dy != 0]
                    np.random.shuffle(neighbors)
                    for nx, ny in neighbors:
                        nx = np.clip(nx, 0, orig_img.shape[1]-1)
                        ny = np.clip(ny, 0, orig_img.shape[0]-1)
                        if (nx, ny) not in visited:
                            queue.append((nx, ny))

        # Add morphological assistance during growth
        region = cv2.dilate(region, np.ones((3,3), np.uint8))
        final_mask = cv2.bitwise_or(final_mask, region)

    # Aggressive post-processing
    final_mask = cv2.morphologyEx(final_mask, cv2.MORPH_CLOSE,
                                cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (MORPH_KERNEL_SIZE,MORPH_KERNEL_SIZE)))
    final_mask = cv2.dilate(final_mask, np.ones((5,5), np.uint8), iterations=1)

    # More permissive watershed
    distance = ndimage.distance_transform_edt(final_mask)
    #distance = cv2.GaussianBlur(distance, (5, 5), 0)
    coords = peak_local_max(distance, min_distance=MIN_DISTANCE_WATERSHED,
                           threshold_rel=0.3, labels=final_mask)

    markers = np.zeros_like(final_mask, dtype=np.int32)
    for i, (y, x) in enumerate(coords):
        if 0 <= x < orig_img.shape[1] and 0 <= y < orig_img.shape[0]:
            markers[y, x] = i + 1

    labels = watershed(-distance, markers, mask=final_mask)

    # Extract contours with relaxed criteria
    bacteria_contours = []
    for label in np.unique(labels):
        if label == 0: continue

        mask = np.zeros_like(labels, dtype=np.uint8)
        mask[labels == label] = 255

        cnts, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        for cnt in cnts:
            area = cv2.contourArea(cnt)
            if MIN_BACTERIA_AREA < area < MAX_BACTERIA_AREA:
                bacteria_contours.append(cnt)

    return bacteria_contours


# ========== ENHANCED VISUALIZATION ==========
def process_image_pair(annot_path, orig_path):
    # Load images
    annot_img = cv2.imread(str(annot_path))
    orig_img = cv2.imread(str(orig_path))

    if annot_img is None or orig_img is None:
        print(f"⚠️ Error loading {annot_path.name} or {orig_path.name}")
        return

    # Get annotation count BEFORE processing
    blue_mask = cv2.inRange(cv2.cvtColor(annot_img, cv2.COLOR_BGR2HSV),
                           BLUE_LOWER, BLUE_UPPER)
    annot_contours, _ = cv2.findContours(blue_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    annotation_count = len([c for c in annot_contours if MIN_DOT_AREA < cv2.contourArea(c) < MAX_DOT_AREA])

    # Perform segmentation
    contours = adaptive_region_growing(annot_img, orig_img)

    # Create visualization
    display_img = orig_img.copy()
    overlay = display_img.copy()
    cv2.drawContours(overlay, contours, -1, (0,255,255), -1)  # Yellow fill
    cv2.addWeighted(overlay, 0.3, display_img, 0.7, 0, display_img)
    cv2.drawContours(display_img, contours, -1, (255,0,255), 2)  # Purple edges

    # Create comparison figure with improved titles
    fig, ax = plt.subplots(1, 3, figsize=(20, 6))
    fig.suptitle(f"Analysis for {orig_path.name}", fontsize=16, y=0.95)

    # Original Image
    ax[0].imshow(cv2.cvtColor(annot_img, cv2.COLOR_BGR2RGB))
    ax[0].set_title("Original Image\n", fontsize=12)
    ax[0].axis('off')

    # Annotation Points - Now shows actual annotation count
    ax[1].imshow(blue_mask, cmap='gray')
    ax[1].set_title(f"Annotation Points: {annotation_count}\n", fontsize=12)  # Changed to annotation_count
    ax[1].axis('off')

    # Segmentation Result
    ax[2].imshow(cv2.cvtColor(display_img, cv2.COLOR_BGR2RGB))
    ax[2].set_title(f"Segmented Bacteria: {len(contours)}\n", fontsize=12)
    ax[2].axis('off')

    plt.savefig(VISUAL_DIR / f"{orig_path.stem}_seg.png", bbox_inches='tight', dpi=150)
    plt.close()

    # Save labels
    save_yolo_labels(orig_img.shape, contours, orig_path.stem)

    # return segmented bacteria count
    return len(contours)

def save_yolo_labels(img_shape, contours, stem):
    height, width = img_shape[:2]
    with open(OUTPUT_DIR / f"{stem}.txt", "w") as f:
        for cnt in contours:
            rect = cv2.minAreaRect(cnt)
            box = cv2.boxPoints(rect)
            box_norm = box / np.array([width, height])
            line = f"0 " + " ".join([f"{p[0]:.6f} {p[1]:.6f}" for p in box_norm])
            f.write(line + "\n")

def find_matching_pairs(annotated_dir, original_dir):
    pattern = re.compile(r'^(.+)_msz(\.[a-zA-Z]+)$', re.IGNORECASE)
    pairs = []
    for annot_path in annotated_dir.glob('*'):
        match = pattern.match(annot_path.name)
        if match:
            base_name = match.group(1) + match.group(2)
            orig_path = original_dir / base_name
            if orig_path.exists():
                pairs.append((annot_path, orig_path))
            else:
                print(f"⚠️ Missing original for {annot_path.name}")
    return pairs

def main():
    # Setup directories
    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    VISUAL_DIR.mkdir(parents=True, exist_ok=True)

    # Find all valid image pairs
    pairs = find_matching_pairs(ANNOTATED_DIR, ORIGINAL_DIR)

    if not pairs:
        print("❌ No valid image pairs found!")
        print("Ensure annotated files follow _msz. pattern")
        return

    processed = 0
    for annot_path, orig_path in pairs:
        try:
            segmented_bacteria_count = process_image_pair(annot_path, orig_path)
            processed += 1
            print(f"✅ Processed {orig_path.name} - Found {segmented_bacteria_count} bacterias")
        except Exception as e:
            print(f"❌ Error processing {orig_path.name}: {str(e)}")

    print(f"\n🎉 Finished! Processed {processed}/{len(pairs)} image pairs")
    print(f"Labels saved to: {OUTPUT_DIR}")
    print(f"Visualizations saved to: {VISUAL_DIR}")

if __name__ == "__main__":

    # Clear previous outputs (optional)
    !rm -rf "/content/drive/MyDrive/dataset/precise_labels/*"
    !rm -rf "/content/drive/MyDrive/dataset/visualizations/*"

    # Run the processing
    main()


Detected 63 annotation points


  val_diff = abs(v - v_ref)
  sat_diff = abs(s - s_ref)
  hue_diff = min(abs(h - h_ref), 180 - abs(h - h_ref))


✅ Processed K4.png - Found 49 bacterias
Detected 36 annotation points
✅ Processed T9.png - Found 15 bacterias
Detected 64 annotation points
✅ Processed K1.png - Found 57 bacterias
Detected 25 annotation points
✅ Processed Z9.png - Found 24 bacterias
Detected 51 annotation points
✅ Processed K3.png - Found 46 bacterias

🎉 Finished! Processed 5/5 image pairs
Labels saved to: /content/drive/MyDrive/dataset/precise_labels
Visualizations saved to: /content/drive/MyDrive/dataset/visualizations
