# TP3: Detección del logo de Coca-Cola

## 1: Detección única por imagen

Encontrar el logotipo de la gaseosa dentro de las imágenes provistas.

### Estrategia

Para cada imagen se selecciona el método más adecuado según sus características:

| Imagen | Método | Justificación |
|--------|--------|---------------|
| coca_logo_1.png | Edge TM | Fondo limpio, bordes preservados tras Canny |
| coca_logo_2.png | SIFT | Superficie curva, distorsión de perspectiva |
| coca_multi.png | TM invertido | Múltiples logos, contraste invertido |
| coca_retro_1.png | SIFT | Logo retro estructuralmente diferente |
| coca_retro_2.png | SIFT | Logo curvado en emblema circular |
| COCA-COLA-LOGO.jpg | SIFT | Logo grande, fondo complejo |
| logo_1.png | TM invertido | Reflejos en vidrio, variaciones de iluminación |

In [1]:
%matplotlib qt
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
from typing import List, Tuple, Dict, Optional

### Utility Functions

In [2]:
def load_image(path: str, max_size: int = 1200) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """Load image and return RGB, grayscale, and BGR versions."""
    img = cv.imread(path)
    if img is None:
        raise FileNotFoundError(f"Image not found: {path}")
    h, w = img.shape[:2]
    if max(h, w) > max_size:
        scale = max_size / max(h, w)
        img = cv.resize(img, None, fx=scale, fy=scale)
    img_rgb = cv.cvtColor(img, cv.COLOR_BGR2RGB)
    img_gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
    return img_rgb, img_gray, img


def load_template(path: str, max_size: int = 400) -> np.ndarray:
    """Load template as grayscale."""
    template = cv.imread(path, 0)
    if template is None:
        raise FileNotFoundError(f"Template not found: {path}")
    h, w = template.shape[:2]
    if max(h, w) > max_size:
        scale = max_size / max(h, w)
        template = cv.resize(template, None, fx=scale, fy=scale)
    return template


def preprocess_image(img_gray: np.ndarray, method: str = 'clahe') -> np.ndarray:
    """Apply preprocessing to grayscale image."""
    if method == 'clahe':
        clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        return clahe.apply(img_gray)
    elif method == 'smooth':
        return cv.GaussianBlur(img_gray, (3, 3), 0)
    elif method == 'equalize':
        return cv.equalizeHist(img_gray)
    elif method == 'clahe_smooth':
        clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
        return cv.GaussianBlur(clahe.apply(img_gray), (3, 3), 0)
    return img_gray


def create_template_variants(template: np.ndarray, include_inverted: bool = False) -> Dict[str, np.ndarray]:
    """Create template variants for matching."""
    clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
    variants = {'normal': template, 'clahe': clahe.apply(template)}
    if include_inverted:
        template_inv = 255 - template
        variants['inverted'] = template_inv
        variants['inverted_clahe'] = clahe.apply(template_inv)
    return variants


def rotate_template(template: np.ndarray, angle: float) -> np.ndarray:
    """Rotate template by given angle (degrees)."""
    h, w = template.shape[:2]
    center = (w // 2, h // 2)
    M = cv.getRotationMatrix2D(center, angle, 1.0)
    cos, sin = np.abs(M[0, 0]), np.abs(M[0, 1])
    new_w, new_h = int(h * sin + w * cos), int(h * cos + w * sin)
    M[0, 2] += (new_w - w) / 2
    M[1, 2] += (new_h - h) / 2
    return cv.warpAffine(template, M, (new_w, new_h), borderValue=255)


def compute_scale_range(
    img_width: int, template_width: int,
    width_ratio_min: float = 0.10, width_ratio_max: float = 0.80,
    num_scales: int = 25, scale_bounds: Tuple[float, float] = (0.10, 2.0)
) -> np.ndarray:
    """Compute scale range for template matching based on expected logo size ratios."""
    scale_min = max(scale_bounds[0], (img_width * width_ratio_min) / template_width)
    scale_max = min(scale_bounds[1], (img_width * width_ratio_max) / template_width)
    return np.linspace(scale_min, scale_max, num_scales)


def compute_iou(box1: Tuple, box2: Tuple) -> float:
    """Compute IoU between two boxes (x, y, w, h)."""
    x1, y1, w1, h1 = box1[:4]
    x2, y2, w2, h2 = box2[:4]
    xi1, yi1 = max(x1, x2), max(y1, y2)
    xi2, yi2 = min(x1 + w1, x2 + w2), min(y1 + h1, y2 + h2)
    inter_area = max(0, xi2 - xi1) * max(0, yi2 - yi1)
    union_area = w1 * h1 + w2 * h2 - inter_area
    return inter_area / union_area if union_area > 0 else 0


def nms_global(detections: List[Tuple], iou_threshold: float = 0.3) -> List[Tuple]:
    """Apply Non-Maximum Suppression globally."""
    if not detections:
        return []
    detections = sorted(detections, key=lambda d: d[4], reverse=True)
    keep = []
    while detections:
        best = detections.pop(0)
        keep.append(best)
        detections = [d for d in detections if compute_iou(best[:4], d[:4]) < iou_threshold]
    return keep


def draw_bboxes(img_rgb: np.ndarray, bboxes, color=(0, 255, 0), thickness=2, show_numbers: bool = None) -> np.ndarray:
    """Draw bounding boxes on image. Unified function for single or multiple boxes."""
    img_out = img_rgb.copy()
    if bboxes is None:
        return img_out
    if bboxes and not isinstance(bboxes[0], (list, tuple)):
        bboxes = [bboxes]
    bboxes = [b for b in bboxes if b is not None]
    if not bboxes:
        return img_out
    if show_numbers is None:
        show_numbers = len(bboxes) > 1
    for i, bbox in enumerate(bboxes):
        x, y, w, h = int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])
        cv.rectangle(img_out, (x, y), (x + w, y + h), color, thickness)
        if show_numbers:
            cv.putText(img_out, str(i+1), (x+2, y+h-5), cv.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 0), 1)
    return img_out

### Core Detection Functions

In [3]:
def template_match_multiscale(
    img_gray: np.ndarray, template_variants: Dict[str, np.ndarray], scales: np.ndarray,
    threshold: float = 0.30, preprocess: str = None,
    aspect_filter: Tuple[float, float] = None, min_width_ratio: float = 0.05
) -> Tuple[Optional[Tuple], float, str]:
    """Multi-scale template matching. Returns: (bbox, score, method_info)"""
    h, w = img_gray.shape
    img_processed = preprocess_image(img_gray, preprocess) if preprocess else cv.equalizeHist(cv.GaussianBlur(img_gray, (3, 3), 0))
    min_det_width, min_det_height = w * min_width_ratio, h * 0.02
    all_detections = []
    
    for scale in scales:
        for name, tmpl in template_variants.items():
            th, tw = tmpl.shape
            new_w, new_h = int(tw * scale), int(th * scale)
            if new_w > w or new_h > h or new_w < min_det_width or new_h < min_det_height:
                continue
            
            res = cv.matchTemplate(img_processed, cv.resize(tmpl, (new_w, new_h)), cv.TM_CCOEFF_NORMED)
            adaptive_thresh = max(np.mean(res) + 2.5 * np.std(res), threshold)
            
            for pt in zip(*np.where(res >= adaptive_thresh)[::-1]):
                aspect = new_w / new_h if new_h > 0 else 0
                if aspect_filter:
                    if not (aspect_filter[0] < aspect < aspect_filter[1]):
                        continue
                elif not (0.5 < aspect < 4.0):
                    continue
                all_detections.append((pt[0], pt[1], new_w, new_h, res[pt[1], pt[0]], name, scale))
    
    if not all_detections:
        return None, 0, "No detection"
    best = nms_global(all_detections, iou_threshold=0.3)[0]
    return (best[0], best[1], best[2], best[3]), best[4], f"TM-{best[5]}@{best[6]:.2f}"


def template_match_edges(
    img_gray: np.ndarray, template: np.ndarray, scales: np.ndarray,
    threshold: float = 0.25, canny_low: int = 50, canny_high: int = 150,
    y_range: Tuple[int, int] = None
) -> Tuple[Optional[Tuple], float, str]:
    """Edge-based template matching - robust to contrast inversion."""
    h, w = img_gray.shape
    min_det_width, min_det_height = w * 0.05, h * 0.02
    kernel = np.ones((2, 2), np.uint8)
    template_edges = cv.dilate(cv.Canny(template, canny_low, canny_high), kernel, iterations=1)
    img_edges = cv.dilate(cv.Canny(img_gray, canny_low, canny_high), kernel, iterations=1)
    all_detections = []
    
    for scale in scales:
        th, tw = template.shape
        new_w, new_h = int(tw * scale), int(th * scale)
        if new_w >= w or new_h >= h or new_w < min_det_width or new_h < min_det_height:
            continue
        
        res = cv.matchTemplate(img_edges, cv.resize(template_edges, (new_w, new_h)), cv.TM_CCOEFF_NORMED)
        adaptive_thresh = max(np.mean(res) + 2.5 * np.std(res), threshold)
        
        for pt in zip(*np.where(res >= adaptive_thresh)[::-1]):
            if y_range and not (y_range[0] <= pt[1] <= y_range[1]):
                continue
            all_detections.append((pt[0], pt[1], new_w, new_h, res[pt[1], pt[0]], 'edges', scale))
    
    if not all_detections:
        return None, 0, "No detection (edges)"
    best = nms_global(all_detections, iou_threshold=0.3)[0]
    return (best[0], best[1], best[2], best[3]), best[4], f"TM-edges@{best[6]:.2f}"


def detect_by_sift(
    img_gray: np.ndarray, template: np.ndarray, min_matches: int = 8,
    ratio: float = 0.75, min_bbox_ratio: float = 0.02, max_bbox_ratio: float = 0.95
) -> Tuple[Optional[Tuple], int, str]:
    """SIFT-based detection with homography. Returns: (bbox, num_inliers, method_info)"""
    img_h, img_w = img_gray.shape
    img_area = img_h * img_w
    min_bbox_w, min_bbox_h = img_w * 0.03, img_h * 0.02
    
    sift = cv.SIFT_create(nfeatures=2000)
    kp2, des2 = sift.detectAndCompute(img_gray, None)
    if des2 is None or len(des2) < min_matches:
        return None, 0, "SIFT: insufficient features in image"
    
    for tmpl_name, tmpl in [('normal', template), ('inverted', 255 - template)]:
        kp1, des1 = sift.detectAndCompute(tmpl, None)
        if des1 is None or len(des1) < 4:
            continue
        
        flann = cv.FlannBasedMatcher(dict(algorithm=1, trees=5), dict(checks=50))
        matches = flann.knnMatch(des1, des2, k=2)
        good = [m for m, n in matches if len([m, n]) == 2 and m.distance < ratio * n.distance]
        
        if len(good) < min_matches:
            continue
        
        src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
        dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
        M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 5.0)
        
        if M is None:
            continue
        
        num_inliers = mask.ravel().sum() if mask is not None else len(good)
        th, tw = tmpl.shape
        dst = cv.perspectiveTransform(np.float32([[0,0], [0,th-1], [tw-1,th-1], [tw-1,0]]).reshape(-1,1,2), M)
        x, y, bw, bh = cv.boundingRect(dst)
        x, y = max(0, x), max(0, y)
        bw, bh = min(bw, img_w - x), min(bh, img_h - y)
        bbox_area = bw * bh
        
        if (bbox_area < img_area * min_bbox_ratio or bbox_area > img_area * max_bbox_ratio or
            bw < min_bbox_w or bh < min_bbox_h or bw / bh > 10 or bh / bw > 5):
            continue
        
        return (x, y, bw, bh), num_inliers, f"SIFT-{tmpl_name}({num_inliers} inliers)"
    
    return None, 0, "SIFT: no valid homography found"

### Load Template

In [4]:
# Load template and create variants
TEMPLATE_PATH = 'template/pattern.png'
template = load_template(TEMPLATE_PATH)
template_variants = create_template_variants(template, include_inverted=True)

print(f"Template shape: {template.shape}")
print(f"Variants: {list(template_variants.keys())}")

# Display template variants (2x2 grid for 4 variants)
fig, axes = plt.subplots(2, 2, figsize=(10, 6))
axes = axes.flatten()
for ax, (name, img) in zip(axes, template_variants.items()):
    ax.imshow(img, cmap='gray')
    ax.set_title(name)
    ax.axis('off')
plt.suptitle('Template Variants')
plt.tight_layout()
plt.show()

# Results storage
results_summary = []

Template shape: (175, 400)
Variants: ['normal', 'clahe', 'inverted', 'inverted_clahe']


---
### Image Detection

#### 1 coca_logo_1.png

Se utilizó template matching basado en bordes porque el fondo es limpio y el contorno del logotipo se preserva bien tras Canny, logrando una coincidencia muy estable. Además, el uso de un rango de escalas reducido y un filtro por posición vertical permite evitar falsas detecciones en áreas sin contenido relevante.

**Características:** Botella con etiqueta frontal, texto BLANCO sobre fondo ROJO.  
**Problema:** En escala de grises, el contraste está invertido respecto al template.  
**Estrategia:** Template matching basado en bordes (Canny es invariante a la inversión de contraste).  
**Escalas:** 0.30 - 0.55

In [5]:
# coca_logo_1.png - Bottle with white text on red background
# Problem: Inverted contrast in grayscale
# Solution: Edge-based matching (edges are invariant to contrast inversion)
img_name = "coca_logo_1.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# Edge-based matching with scales optimized for this image
scales = np.linspace(0.30, 0.55, 20)

# Use relative y_range based on image height (robust to resizing)
h, w = img_gray.shape
y_min, y_max = int(0.3 * h), int(0.7 * h)

bbox, score, method = template_match_edges(
    img_gray, 
    template,
    scales=scales,
    threshold=0.25,
    y_range=(y_min, y_max)  # Restrict to label area (avoid false positives)
)

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (500, 207)
Image: coca_logo_1.png
Method: TM-edges@0.42
Score: 0.411
BBox: (np.int64(31), np.int64(198), 167, 73)


#### 2 coca_logo_2.png

Para esta imagen se aplicó SIFT debido a la curvatura del envase y las variaciones de iluminación, que afectan la correlación directa de plantillas pero no la coincidencia de puntos clave. La homografía obtenida a partir de los matches permite recuperar con precisión la región del logotipo pese a la deformación de perspectiva.

**Características:** Lata con texto BLANCO sobre fondo ROJO, superficie curva, gotas de agua.  
**Problema:** Template matching falla debido a la distorsión por curvatura e inversión de contraste.  
**Estrategia:** SIFT con template invertido (el texto blanco genera features que coinciden con el template invertido).  
**Nota:** SIFT maneja la distorsión de perspectiva de la superficie curva.

In [6]:
# coca_logo_2.png - Can with curved surface
# Problem: Template matching fails due to curved surface and contrast inversion
# Solution: SIFT with inverted template
img_name = "coca_logo_2.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# SIFT works well for curved surfaces (handles perspective distortion)
bbox, score, method = detect_by_sift(img_gray, template, min_matches=8)

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (363, 233)
Image: coca_logo_2.png
Method: SIFT-inverted(29 inliers)
Score: 29.000
BBox: (0, 96, 233, 132)


#### 3 coca_multi.png

Se utilizó template matching invertido y un rango de escalas estrecho porque todos los logos aparecen a tamaño similar y contrastan de manera consistente con el fondo. La supresión global de no-máximos permite conservar solo la mejor detección individual, cumpliendo con la restricción de una única coincidencia por imagen.

**Características:** Estante con múltiples botellas, texto BLANCO sobre etiquetas ROJAS.  
**Problema:** Múltiples logos similares, inversión de contraste en escala de grises.  
**Estrategia:** Template matching invertido con CLAHE (corrige la inversión de contraste).  
**Escalas:** Derivadas del tamaño esperado del logo (~10% del ancho de imagen).  
**Nota:** Para Assignment 1, se detecta solo UN logo (mejor coincidencia).

In [7]:
# coca_multi.png - Shelf with many bottles
# Problem: White text on red background = contrast inversion
# Solution: Use inverted template with CLAHE
img_name = "coca_multi.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# Create inverted template variants (for white-on-red labels)
template_inv = 255 - template
inverted_variants = {
    'inverted': template_inv,
    'inverted_clahe': preprocess_image(template_inv, 'clahe'),
}

# Derive scale range from template and expected logo size
# Reference: typical bottle label logo is ~80px wide in a 800px wide shelf image
# This gives us a reference ratio that scales with any image size
h, w = img_gray.shape
th, tw = template.shape  # template: 175x400

reference_logo_width = 80  # expected logo width in pixels (for ~800px wide image)
reference_image_width = 800
expected_logo_ratio = reference_logo_width / reference_image_width  # ~0.10

# Scale to actual image width
expected_logo_width = w * expected_logo_ratio
center_scale = expected_logo_width / tw  # scale where template matches expected logo

# Scale range: ±50% around center scale
scale_min = center_scale * 0.6
scale_max = center_scale * 1.5
scales = np.linspace(scale_min, scale_max, 20)

print(f"Expected logo width: ~{expected_logo_width:.0f}px")
print(f"Scale range: {scale_min:.2f} - {scale_max:.2f}")

# Min width as ratio of image (derived from expected size)
min_width_ratio = expected_logo_ratio * 0.5  # allow logos down to 50% of expected

bbox, score, method = template_match_multiscale(
    img_gray, 
    inverted_variants,
    scales=scales,
    threshold=0.35,
    preprocess='clahe',
    aspect_filter=(1.8, 3.5),  # Coca-Cola logo aspect ratio
    min_width_ratio=min_width_ratio
)

# NOTE: For Assignment 2 (multiple detections), modify template_match_multiscale
# to return `final` (all NMS results) instead of just `final[0]`

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (598, 799)
Expected logo width: ~80px
Scale range: 0.12 - 0.30
Image: coca_multi.png
Method: TM-inverted_clahe@0.23
Score: 0.501
BBox: (np.int64(274), np.int64(146), 93, 40)


#### 4 coca_retro_1.png

Se aplicó SIFT porque el logotipo retro difiere estructuralmente del template moderno, lo que hace que la correlación clásica falle al no haber similitud pixel-a-pixel. Los puntos clave permiten encontrar correspondencias parciales y estimar una homografía incluso cuando la forma global del logotipo no coincide con la plantilla.

**Características:** Etiqueta vintage B/N, logotipo estilizado diferente al template.  
**Problema:** Diferencias estructurales entre el logo retro y el template moderno.  
**Estrategia:** SIFT con `min_matches=6` (menor umbral por diferencias estructurales).  
**Fallback:** Template matching con aspect ratio amplio (1.2, 4.0) para formas retro.

In [8]:
# coca_retro_1.png - Vintage label (structurally different)
img_name = "coca_retro_1.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# Try SIFT first (best for this case)
# Lower min_matches=6 because vintage logo has different shape, fewer strong correspondences
bbox, score, method = detect_by_sift(img_gray, template, min_matches=6)

# Fallback to template matching if SIFT fails
if bbox is None:
    print("SIFT failed, trying template matching...")
    scales = np.linspace(0.5, 2.0, 25)
    bbox, score, method = template_match_multiscale(
        img_gray, 
        template_variants,
        scales=scales,
        threshold=0.25,  # Lower threshold for difficult case
        preprocess='clahe',
        aspect_filter=(1.2, 4.0)  # Wider range for retro logo shape
    )

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (493, 715)
Image: coca_retro_1.png
Method: SIFT-normal(19 inliers)
Score: 19.000
BBox: (67, 74, 568, 220)


#### 5 coca_retro_2.png

La detección se resolvió con SIFT ya que el logotipo aparece rotado, curvado y dentro de un disco circular, condiciones que degradan el desempeño del template matching. SIFT permite identificar características locales invariantes y recuperar la transformación geométrica del emblema con alta estabilidad.

**Características:** Póster vintage con emblema circular rojo, texto BLANCO sobre fondo ROJO.  
**Problema:** Texto curvado en emblema circular, inversión de contraste.  
**Estrategia:** SIFT con template invertido (maneja curvatura y contraste).  
**Nota:** SIFT funciona mejor que template matching para logos curvados/distorsionados.

In [9]:
# coca_retro_2.png - Vintage poster with circular badge
# Problem: White text on red, curved logo on circular badge
# Solution: SIFT with inverted template
img_name = "coca_retro_2.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# SIFT handles curved/distorted logos well
bbox, score, method = detect_by_sift(img_gray, template, min_matches=8)

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (429, 715)
Image: coca_retro_2.png
Method: SIFT-inverted(29 inliers)
Score: 29.000
BBox: (61, 187, 161, 63)


#### 6 COCA-COLA-LOGO.jpg

Se empleó SIFT porque el logotipo ocupa una región grande, con gradientes complejos, sombras y un fondo texturizado que altera fuertemente la correlación normalizada. Los descriptores locales permiten detectar el texto independientemente del color y la iluminación, obteniendo una caja bien ajustada mediante homografía.

**Características:** Imagen grande (1389x1389), texto BLANCO sobre emblema circular ROJO.  
**Problema:** Logo grande, inversión de contraste, fondo complejo (botella, hielo, burbujas).  
**Estrategia:** SIFT con template invertido (maneja escala y contraste).  
**Nota:** SIFT encuentra ~47 matches, proporcionando detección robusta.

In [10]:
# COCA-COLA-LOGO.jpg - Large image with complex background
# Problem: White text on red, very large logo
# Solution: SIFT with inverted template
img_name = "COCA-COLA-LOGO.jpg"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# SIFT handles large scale differences well
bbox, score, method = detect_by_sift(img_gray, template, min_matches=8)

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (1200, 1200)
Image: COCA-COLA-LOGO.jpg
Method: SIFT-inverted(24 inliers)
Score: 24.000
BBox: (10, 322, 1152, 486)


#### 7 logo_1.png

Se aplicó template matching con preprocesamiento CLAHE porque la imagen presenta reflejos, variaciones de iluminación y textura sobre el vidrio que dificultan la correlación directa. La ecualización adaptativa y el suavizado previo permiten estabilizar el contraste del logotipo y mejorar la respuesta del método en un entorno visual ruidoso.

**Características:** Botellas de vidrio con texto BLANCO sobre etiquetas ROJAS, reflejos, sombras.  
**Problema:** Inversión de contraste, variaciones de iluminación, reflejos en el vidrio.  
**Estrategia:** Template matching con preprocesamiento (Gaussian blur + CLAHE), fallback a SIFT.  
**Escalas:** Derivadas del ratio esperado del logo (~35% del ancho de imagen para tomas cercanas).

In [11]:
# logo_1.png - Glass bottles with glare and shadows
# Problem: White text on red, lighting variations, reflections
# Strategy: Try template matching first (with preprocessing), fall back to SIFT
img_name = "logo_1.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")

print(f"Image size: {img_gray.shape}")

# Preprocess: light Gaussian blur + CLAHE to handle reflections
img_blur = cv.GaussianBlur(img_gray, (3, 3), 0)
clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
img_preprocessed = clahe.apply(img_blur)

# Create inverted template variants (white text on red = inverted contrast)
template_inv = 255 - template
inverted_variants = {
    'inverted': template_inv,
    'inverted_clahe': preprocess_image(template_inv, 'clahe'),
}

# First try: Template matching with moderate scale band
# Logo is roughly 200-300px wide in 687px image, template is 400px
# Scale ~0.5-0.75 expected
h, w = img_gray.shape
th, tw = template.shape

# Derive scale from expected logo ratio
expected_logo_ratio = 0.35  # logo ~35% of image width for close-up bottle shots
expected_logo_width = w * expected_logo_ratio
center_scale = expected_logo_width / tw
scale_min = center_scale * 0.7
scale_max = center_scale * 1.4
scales = np.linspace(scale_min, scale_max, 20)

print(f"Scale range: {scale_min:.2f} - {scale_max:.2f}")

bbox, score, method = template_match_multiscale(
    img_preprocessed, 
    inverted_variants,
    scales=scales,
    threshold=0.35,
    preprocess=None,  # Already preprocessed
    aspect_filter=(1.5, 4.0),
    min_width_ratio=0.10
)

# Fall back to SIFT if template matching fails or score is low
if bbox is None or score < 0.40:
    print(f"Template matching {'failed' if bbox is None else f'score too low ({score:.3f})'}, trying SIFT...")
    bbox, score, method = detect_by_sift(img_gray, template, min_matches=8)

print(f"Image: {img_name}")
print(f"Method: {method}")
print(f"Score: {score:.3f}")
print(f"BBox: {bbox}")

results_summary.append({
    'image': img_name,
    'method': method,
    'score': score,
    'bbox': bbox
})

Image size: (450, 687)
Scale range: 0.42 - 0.84
Image: logo_1.png
Method: TM-inverted_clahe@0.73
Score: 0.426
BBox: (np.int64(198), np.int64(191), 292, 127)


---
### Results Summary

In [12]:
print("Resumen de detecciones:")
print(f"{'Image':<25} {'Method':<35} {'Score':>8} {'Size':>12}")
print("-" * 80)

detected_count = 0
for r in results_summary:
    img = r['image']
    method = r['method'][:33] if len(r['method']) > 33 else r['method']
    score = r['score']
    bbox = r['bbox']
    
    if bbox:
        size = f"{bbox[2]}x{bbox[3]}"
        detected_count += 1
    else:
        size = "N/A"
    
    print(f"{img:<25} {method:<35} {score:>8.3f} {size:>12}")

print("-" * 80)
print(f"Total detected: {detected_count}/{len(results_summary)}")

Resumen de detecciones:
Image                     Method                                 Score         Size
--------------------------------------------------------------------------------
coca_logo_1.png           TM-edges@0.42                          0.411       167x73
coca_logo_2.png           SIFT-inverted(29 inliers)             29.000      233x132
coca_multi.png            TM-inverted_clahe@0.23                 0.501        93x40
coca_retro_1.png          SIFT-normal(19 inliers)               19.000      568x220
coca_retro_2.png          SIFT-inverted(29 inliers)             29.000       161x63
COCA-COLA-LOGO.jpg        SIFT-inverted(24 inliers)             24.000     1152x486
logo_1.png                TM-inverted_clahe@0.73                 0.426      292x127
--------------------------------------------------------------------------------
Total detected: 7/7


In [13]:
# Final grid visualization
fig, axes = plt.subplots(3, 3, figsize=(15, 15))
axes = axes.flatten()

for idx, r in enumerate(results_summary):
    img_rgb, _, _ = load_image(f"images/{r['image']}")
    img_out = draw_bboxes(img_rgb, r['bbox'])
    
    axes[idx].imshow(img_out)
    axes[idx].set_title(f"{r['image']}\nScore: {r['score']:.3f}")
    axes[idx].axis('off')

# Hide unused axes
for idx in range(len(results_summary), len(axes)):
    axes[idx].axis('off')

plt.suptitle('Assignment 1: Single Detection per Image', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('results/TP3-assignment1.png', dpi=120, bbox_inches='tight')
plt.show()

print("Saved: results/TP3-assignment1.png")

Saved: results/TP3-assignment1.png


---
## 2: Detección múltiple por imagen

Plantear y validar un algoritmo para múltiples detecciones en la imagen `coca_multi.png` con el mismo template del ítem 1.

### Enfoque

En `coca_multi` hay muchas botellas alineadas con logos a diferentes escalas según el estante.

**Mejoras implementadas para reducir falsos positivos:**

1. **Filtro por banda horizontal (y_ranges):** Los logos aparecen solo en franjas específicas de la imagen. Restricción a zonas donde realmente hay etiquetas (ratios relativos al alto de imagen).

2. **Validación de consistencia de color:** Después de cada detección, se verifica que la región sea predominantemente roja (R > 110, R > G+25, R > B+25).

3. **Filtros de forma relativos:** 
   - Aspect ratio (2.0–3.8) - adimensional
   - Ancho: 9%–21% del ancho de imagen
   - Alto: 5%–13% del alto de imagen

4. **Test de aislamiento de picos:** Rechaza picos de correlación aislados (ruido) verificando que el vecindario local también tenga respuesta alta.

**Nota:** Todos los parámetros de tamaño son relativos al tamaño de imagen

In [14]:
def detect_multiple_logos(
    img_gray: np.ndarray,
    img_rgb: np.ndarray,
    template: np.ndarray,
    scales: np.ndarray,
    threshold_sigma: float = 2.5,
    min_threshold: float = 0.35,
    iou_threshold: float = 0.3,
    size_filter: Tuple[int, int] = None,
    height_filter: Tuple[int, int] = None,
    aspect_filter: Tuple[float, float] = (2.6, 3.8),
    y_ranges: List[Tuple[float, float]] = None,
    local_max_kernel: int = 7,
    peak_isolation_ratio: float = 0.5,
    color_check: bool = True,
    min_red_dominance: int = 30,
    min_red_value: int = 120
) -> List[Tuple]:
    """
    Multi-detection using template matching with local maxima and validation filters.
    
    Args:
        img_gray: Grayscale image
        img_rgb: RGB image (for color validation)
        template: Single template (use best variant, e.g., inverted + CLAHE)
        scales: Array of scales to test
        threshold_sigma: Number of std devs above mean for adaptive threshold
        min_threshold: Minimum absolute threshold
        iou_threshold: IoU threshold for NMS
        size_filter: Optional (min_width, max_width) to filter detections
        height_filter: Optional (min_height, max_height) to filter detections
        aspect_filter: (min_aspect, max_aspect) for logo shape validation
        y_ranges: List of (y_min_ratio, y_max_ratio) for valid detection zones
        local_max_kernel: Kernel size for local maxima detection
        peak_isolation_ratio: Minimum ratio of neighborhood mean to peak score
        color_check: Whether to validate red color dominance
        min_red_dominance: Minimum R - max(G, B) for color check
        min_red_value: Minimum R channel value for color check
    
    Returns: List of detections [(x, y, w, h, score), ...]
    """
    h, w = img_gray.shape
    th, tw = template.shape
    template_aspect = tw / th
    
    # Relative minimum sizes (3% width, 2% height)
    min_det_width = w * 0.03
    min_det_height = h * 0.02
    
    # Preprocess image
    img_processed = cv.GaussianBlur(img_gray, (3, 3), 0)
    img_processed = cv.equalizeHist(img_processed)
    
    all_detections = []
    
    for scale in scales:
        new_w, new_h = int(tw * scale), int(th * scale)
        
        # Skip invalid sizes (relative check)
        if new_w > w or new_h > h:
            continue
        if new_w < min_det_width or new_h < min_det_height:
            continue
        
        # Width filter (if provided)
        if size_filter:
            if new_w < size_filter[0] or new_w > size_filter[1]:
                continue
        
        # Height filter (if provided)
        if height_filter:
            if new_h < height_filter[0] or new_h > height_filter[1]:
                continue
        
        # Resize template
        scaled_tmpl = cv.resize(template, (new_w, new_h))
        
        # Template matching
        res = cv.matchTemplate(img_processed, scaled_tmpl, cv.TM_CCOEFF_NORMED)
        
        # Adaptive threshold
        mean_val, std_val = np.mean(res), np.std(res)
        adaptive_thresh = max(mean_val + threshold_sigma * std_val, min_threshold)
        
        # Find local maxima instead of simple thresholding
        kernel = np.ones((local_max_kernel, local_max_kernel), np.uint8)
        local_max = cv.dilate(res, kernel)
        mask = (res == local_max) & (res >= adaptive_thresh)
        
        ys, xs = np.where(mask)
        
        for x, y in zip(xs, ys):
            score = res[y, x]
            
            # Strict aspect ratio filter
            det_aspect = new_w / new_h
            if aspect_filter:
                if not (aspect_filter[0] <= det_aspect <= aspect_filter[1]):
                    continue
            elif abs(det_aspect - template_aspect) > 0.5:
                continue
            
            # Y-range filter (horizontal band where labels appear)
            if y_ranges:
                in_valid_zone = False
                for y_min_ratio, y_max_ratio in y_ranges:
                    y_min = int(h * y_min_ratio)
                    y_max = int(h * y_max_ratio)
                    if y_min <= y <= y_max:
                        in_valid_zone = True
                        break
                if not in_valid_zone:
                    continue
            
            # Peak isolation test
            # Reject sparse bright noise - keep peaks supported by local neighborhood
            y1 = max(0, y - 2)
            y2 = min(res.shape[0], y + 3)
            x1 = max(0, x - 2)
            x2 = min(res.shape[1], x + 3)
            window = res[y1:y2, x1:x2]
            if window.size > 0 and window.mean() < score * peak_isolation_ratio:
                continue
            
            # Color consistency check (red dominance)
            if color_check and img_rgb is not None:
                # Sample the bounding box region in RGB
                box_y1 = max(0, y)
                box_y2 = min(img_rgb.shape[0], y + new_h)
                box_x1 = max(0, x)
                box_x2 = min(img_rgb.shape[1], x + new_w)
                
                if box_y2 > box_y1 and box_x2 > box_x1:
                    patch = img_rgb[box_y1:box_y2, box_x1:box_x2]
                    mean_color = patch.mean(axis=(0, 1))  # R, G, B average
                    
                    # True Coca-Cola labels: R > 120, R > G + 30, R > B + 30
                    r, g, b = mean_color[0], mean_color[1], mean_color[2]
                    if not (r > min_red_value and 
                            r > g + min_red_dominance and 
                            r > b + min_red_dominance):
                        continue
            
            all_detections.append((x, y, new_w, new_h, score, scale))
    
    if not all_detections:
        return []
    
    # Apply NMS
    final_detections = nms_global(all_detections, iou_threshold=iou_threshold)
    
    # Return as (x, y, w, h, score)
    return [(d[0], d[1], d[2], d[3], d[4]) for d in final_detections]

In [15]:
# Assignment 2: Multi-detection en coca_multi.png
img_name = "coca_multi.png"
img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")
h, w = img_gray.shape
th, tw = template.shape

print(f"Image size: {w}x{h}")
print(f"Template size: {tw}x{th}")

# Create single best template: inverted + CLAHE
clahe = cv.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
template_inv_clahe = clahe.apply(255 - template)

# Relative size ratios for expected logos
width_ratio_min, width_ratio_max = 0.09, 0.21
height_ratio_min, height_ratio_max = 0.05, 0.13

w_min, w_max = int(w * width_ratio_min), int(w * width_ratio_max)
h_min, h_max = int(h * height_ratio_min), int(h * height_ratio_max)
scales = compute_scale_range(w, tw, width_ratio_min, width_ratio_max, num_scales=15)

print(f"Width: {width_ratio_min:.0%}-{width_ratio_max:.0%} → {w_min}px-{w_max}px")
print(f"Height: {height_ratio_min:.0%}-{height_ratio_max:.0%} → {h_min}px-{h_max}px")
print(f"Scale range: {scales[0]:.2f}-{scales[-1]:.2f}")

y_ranges = [(0.08, 0.35), (0.68, 0.95)]  # Top/bottom shelf
aspect_filter = (2.0, 3.8)

detections = detect_multiple_logos(
    img_gray, img_rgb, template_inv_clahe, scales=scales,
    threshold_sigma=2.5, min_threshold=0.34, iou_threshold=0.20,
    size_filter=(w_min, w_max), height_filter=(h_min, h_max),
    aspect_filter=aspect_filter, y_ranges=y_ranges,
    local_max_kernel=11, peak_isolation_ratio=0.5,
    color_check=True, min_red_dominance=25, min_red_value=110
)

print(f"\nDetected: {len(detections)} logos")
top_count = sum(1 for d in detections if d[1] < h * 0.5)
print(f"Top shelf: {top_count}, Bottom shelf: {len(detections) - top_count}\n")

for i, (x, y, bw, bh, score) in enumerate(detections):
    print(f"  {i+1:2d}. [{'Top' if y < h * 0.5 else 'Bottom':6s}] pos=({x:3d},{y:3d}), size={bw}x{bh}, score={score:.3f}")

img_out = draw_bboxes(img_rgb, detections, color=(0, 255, 0), thickness=2)
plt.figure(figsize=(14, 10))
plt.imshow(img_out)
plt.title(f"Assignment 2: {len(detections)} logos detected")
plt.axis('off')
plt.tight_layout()
plt.savefig('results/TP3-assignment2.png', dpi=120, bbox_inches='tight')
plt.show()
print("Saved: results/TP3-assignment2.png")

Image size: 799x598
Template size: 400x175
Width: 9%-21% → 71px-167px
Height: 5%-13% → 29px-77px
Scale range: 0.18-0.42

Detected: 18 logos
Top shelf: 9, Bottom shelf: 9

   1. [Bottom] pos=(562,428), size=92x40, score=0.444
   2. [Bottom] pos=(147,427), size=92x40, score=0.436
   3. [Bottom] pos=(305,430), size=92x40, score=0.430
   4. [Top   ] pos=( 31,148), size=85x37, score=0.426
   5. [Bottom] pos=(406,432), size=85x37, score=0.425
   6. [Bottom] pos=( 74,423), size=85x37, score=0.423
   7. [Top   ] pos=(275,146), size=92x40, score=0.422
   8. [Bottom] pos=(488,425), size=85x37, score=0.421
   9. [Bottom] pos=(235,431), size=85x37, score=0.420
  10. [Top   ] pos=(706,154), size=78x34, score=0.416
  11. [Top   ] pos=(107,151), size=92x40, score=0.416
  12. [Bottom] pos=(655,428), size=78x34, score=0.410
  13. [Top   ] pos=(542,156), size=78x34, score=0.392
  14. [Top   ] pos=(200,152), size=85x37, score=0.390
  15. [Bottom] pos=(  1,420), size=78x34, score=0.385
  16. [Top   ] pos=

---
## 3: Generalización del algoritmo

Generalizar el algoritmo del ítem 2 para todas las imágenes.

### Enfoque: Meta-Detector con Fusión de Detectores

En lugar de un único detector, implementamos un **meta-detector** que:

1. **Ejecuta múltiples detectores especializados:**
   - Template Matching (single) - bueno para fondos limpios
   - Template Matching con bordes - robusto a inversión de contraste
   - SIFT (single) - bueno para logos curvados/rotados/retro
   - Multi-detection - para imágenes con múltiples logos

2. **Normaliza las puntuaciones** a rango [0, 1] para comparación justa:
   - TM: `norm = clip((max_val - 0.2) / 0.5, 0, 1)`
   - SIFT: `norm = min(1.0, num_inliers / 25)`
   - Multi: `norm = 0.6 * norm_TM + 0.4 * min(1.0, num / 10)`

3. **Selecciona el mejor resultado** con lógica de meta-decisión:
   - Single mode: Si SIFT_norm > TM_norm + 0.15 → elegir SIFT
   - Auto mode: Si multi_tm ≥ 5 detecciones AND norm > 0.5 → elegir multi

In [16]:
# Meta-Detector: Unified Logo Detection Framework
from dataclasses import dataclass

@dataclass
class DetectionResult:
    """Standardized detection result from any detector."""
    method: str
    bboxes: List[Tuple]
    raw_score: float
    norm_score: float
    num: int
    details: str = ""


class UnifiedLogoDetector:
    """Meta-detector that runs multiple specialized detectors and selects the best result."""
    
    def __init__(self, template: np.ndarray):
        self.template = template
        self.th, self.tw = template.shape
        all_variants = create_template_variants(template, include_inverted=True)
        self.template_clahe = all_variants['clahe']
        self.template_inv = all_variants['inverted']
        self.template_inv_clahe = all_variants['inverted_clahe']
        self.template_variants = {'normal': all_variants['clahe'], 'inverted': all_variants['inverted_clahe']}
        self.sift_min_matches = 6
        
    def _normalize_tm_score(self, max_val: float) -> float:
        return np.clip((max_val - 0.2) / 0.5, 0.0, 1.0)
    
    def _normalize_sift_score(self, num_inliers: int) -> float:
        return min(1.0, num_inliers / 25.0)
    
    def _normalize_multi_score(self, mean_tm_score: float, num_detections: int) -> float:
        return 0.6 * self._normalize_tm_score(mean_tm_score) + 0.4 * min(1.0, num_detections / 10.0)
    
    def run_single_tm(self, img_gray: np.ndarray) -> DetectionResult:
        h, w = img_gray.shape
        # Use wider scale range to cover both small logos (coca_multi ~10%) and large logos
        scales = compute_scale_range(w, self.tw, 0.08, 0.85, num_scales=35)
        bbox, score, details = template_match_multiscale(
            img_gray, self.template_variants, scales, threshold=0.30,
            preprocess='clahe', aspect_filter=(1.5, 4.5), min_width_ratio=0.05
        )
        if bbox is None:
            return DetectionResult("single_tm", [], 0.0, 0.0, 0, "No detection")
        return DetectionResult("single_tm", [bbox], score, self._normalize_tm_score(score), 1, details)
    
    def run_edge_tm(self, img_gray: np.ndarray) -> DetectionResult:
        h, w = img_gray.shape
        scales = compute_scale_range(w, self.tw, 0.08, 0.85, num_scales=35)
        bbox, score, details = template_match_edges(img_gray, self.template, scales, threshold=0.20)
        if bbox is None:
            return DetectionResult("edge_tm", [], 0.0, 0.0, 0, "No detection")
        return DetectionResult("edge_tm", [bbox], score, self._normalize_tm_score(score), 1, details)
    
    def run_sift_single(self, img_gray: np.ndarray) -> DetectionResult:
        bbox, num_inliers, details = detect_by_sift(img_gray, self.template, min_matches=self.sift_min_matches)
        if bbox is None:
            return DetectionResult("sift_single", [], 0.0, 0.0, 0, details)
        return DetectionResult("sift_single", [bbox], float(num_inliers), self._normalize_sift_score(num_inliers), 1, details)
    
    def run_multi_tm(self, img_gray: np.ndarray, img_rgb: np.ndarray) -> DetectionResult:
        """Multi-logo detection - tuned for coca_multi.png (same params as Assignment 2)."""
        h, w = img_gray.shape
        
        # Same parameters as Assignment 2
        width_ratio_min, width_ratio_max = 0.09, 0.21
        height_ratio_min, height_ratio_max = 0.05, 0.13
        w_min, w_max = int(w * width_ratio_min), int(w * width_ratio_max)
        h_min, h_max = int(h * height_ratio_min), int(h * height_ratio_max)
        scales = compute_scale_range(w, self.tw, width_ratio_min, width_ratio_max, num_scales=15)
        
        detections = detect_multiple_logos(
            img_gray, img_rgb, self.template_inv_clahe, scales,
            threshold_sigma=2.5, min_threshold=0.34, iou_threshold=0.20,
            size_filter=(w_min, w_max), height_filter=(h_min, h_max),
            aspect_filter=(2.0, 3.8),
            y_ranges=[(0.08, 0.35), (0.68, 0.95)],  # Top/bottom shelf
            local_max_kernel=11, peak_isolation_ratio=0.5,
            color_check=True, min_red_dominance=25, min_red_value=110
        )
        
        if not detections:
            return DetectionResult("multi_tm", [], 0.0, 0.0, 0, "No detections")
        
        mean_score = np.mean([d[4] for d in detections])
        bboxes = [(d[0], d[1], d[2], d[3]) for d in detections]
        return DetectionResult("multi_tm", bboxes, mean_score,
                              self._normalize_multi_score(mean_score, len(detections)),
                              len(detections), f"{len(detections)} logos, avg={mean_score:.3f}")
    
    def detect(self, img_rgb: np.ndarray, img_gray: np.ndarray, mode: str = "single") -> Dict:
        results = {
            'single_tm': self.run_single_tm(img_gray),
            'edge_tm': self.run_edge_tm(img_gray),
            'sift_single': self.run_sift_single(img_gray),
            'multi_tm': self.run_multi_tm(img_gray, img_rgb) if mode in ['multi', 'auto'] else DetectionResult("multi_tm", [], 0.0, 0.0, 0, "Skipped")
        }
        
        print("Detector Results:")
        for name, r in results.items():
            print(f"  {name:12s}: num={r.num:2d}, norm={r.norm_score:.3f}, raw={r.raw_score:.3f}, {r.details}")
        
        if mode == "multi":
            best, reason = results['multi_tm'], "Mode=multi"
        elif mode == "single":
            sift = results['sift_single']
            best_tm = max([results['single_tm'], results['edge_tm']], key=lambda r: r.norm_score if r.num > 0 else 0)
            
            # Standard threshold for SIFT vs TM selection
            if sift.num > 0 and sift.norm_score > best_tm.norm_score + 0.15:
                best, reason = sift, f"SIFT ({sift.raw_score:.0f} inliers)"
            elif best_tm.num > 0:
                best, reason = best_tm, f"Best TM (norm={best_tm.norm_score:.2f})"
            else:
                best, reason = sift if sift.num > 0 else results['single_tm'], "Fallback"
        else:  # auto
            multi = results['multi_tm']
            if multi.num >= 5 and multi.norm_score > 0.5:
                best, reason = multi, f"Auto: Multi ({multi.num} logos)"
            else:
                sift = results['sift_single']
                best_tm = max([results['single_tm'], results['edge_tm']], key=lambda r: r.norm_score if r.num > 0 else 0)
                if sift.num > 0 and sift.norm_score > best_tm.norm_score + 0.15:
                    best, reason = sift, f"Auto: SIFT ({sift.raw_score:.0f} inliers)"
                elif best_tm.num > 0:
                    best, reason = best_tm, f"Auto: Best TM"
                else:
                    best, reason = sift if sift.num > 0 else results['single_tm'], "Auto: Fallback"
        
        print(f"\nSelected: {best.method} - {reason}")
        return {'best': best, 'all_results': results, 'reason': reason}

In [17]:
# Single Detection Mode (all images)

# Initialize the unified detector
detector = UnifiedLogoDetector(template)

# Test images
test_images = [
    "coca_logo_1.png",
    "coca_logo_2.png",
    "coca_multi.png",
    "coca_retro_1.png",
    "coca_retro_2.png",
    "COCA-COLA-LOGO.jpg",
    "logo_1.png"
]

# Run single detection on all images
print("SINGLE DETECTION MODE")

unified_results = []

for img_name in test_images:
    print(f"Processing: {img_name}")
    
    img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")
    
    # Run unified detector in single mode
    result = detector.detect(img_rgb, img_gray, mode="single")
    
    unified_results.append({
        'image': img_name,
        'result': result['best'],
        'reason': result['reason'],
        'all_results': result['all_results']
    })

print("SINGLE DETECTION - SUMMARY")
print(f"{'Image':<25} {'Method':<15} {'norm':>8} {'Details':<30}")

for r in unified_results:
    img = r['image']
    best = r['result']
    print(f"{img:<25} {best.method:<15} {best.norm_score:>8.3f} {best.details:<30}")

SINGLE DETECTION MODE
Processing: coca_logo_1.png
Detector Results:
  single_tm   : num= 1, norm=0.415, raw=0.408, TM-inverted@0.12
  edge_tm     : num= 1, norm=0.425, raw=0.413, TM-edges@0.43
  sift_single : num= 1, norm=1.000, raw=40.000, SIFT-inverted(40 inliers)
  multi_tm    : num= 0, norm=0.000, raw=0.000, Skipped

Selected: sift_single - SIFT (40 inliers)
Processing: coca_logo_2.png
Detector Results:
  single_tm   : num= 1, norm=0.480, raw=0.440, TM-inverted@0.17
  edge_tm     : num= 1, norm=0.251, raw=0.326, TM-edges@0.10
  sift_single : num= 1, norm=1.000, raw=31.000, SIFT-inverted(31 inliers)
  multi_tm    : num= 0, norm=0.000, raw=0.000, Skipped

Selected: sift_single - SIFT (31 inliers)
Processing: coca_multi.png
Detector Results:
  single_tm   : num= 1, norm=0.566, raw=0.483, TM-inverted@0.30
  edge_tm     : num= 1, norm=0.131, raw=0.265, TM-edges@0.25
  sift_single : num= 1, norm=0.240, raw=6.000, SIFT-inverted(6 inliers)
  multi_tm    : num= 0, norm=0.000, raw=0.000, Ski

In [18]:
# Visualize Single Detection Results
fig, axes = plt.subplots(3, 3, figsize=(16, 14))
axes = axes.flatten()

for idx, r in enumerate(unified_results):
    img_rgb, _, _ = load_image(f"images/{r['image']}")
    img_out = draw_bboxes(img_rgb, r['result'].bboxes)
    
    axes[idx].imshow(img_out)
    
    best = r['result']
    title = f"{r['image']}\n{best.method} | norm={best.norm_score:.2f}"
    axes[idx].set_title(title, fontsize=9)
    axes[idx].axis('off')

# Hide unused axes
for idx in range(len(unified_results), len(axes)):
    axes[idx].axis('off')

plt.suptitle('Unified Meta-Detector - Single Detection Mode', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('results/TP3-assignment3_single.png', dpi=120, bbox_inches='tight')
plt.show()

print("Saved: results/TP3-assignment3_single.png")

Saved: results/TP3-assignment3_single.png


In [19]:
# Multi Detection Mode (coca_multi.png only)

print("MULTI DETECTION MODE")

# Test images with multiple logos
multi_test_images = [
    "coca_multi.png",      # Shelf with many bottles
]

multi_results = []

for img_name in multi_test_images:
    print(f"Processing: {img_name}")
        
    img_rgb, img_gray, img_bgr = load_image(f"images/{img_name}")
    
    # Run unified detector in multi mode
    result = detector.detect(img_rgb, img_gray, mode="multi")
    multi_result = result['best']
    
    multi_results.append({
        'image': img_name,
        'result': multi_result,
        'img_rgb': img_rgb
    })
    
    print(f"\nMulti-detection found {multi_result.num} logos")
    print(f"Average TM score: {multi_result.raw_score:.3f}")
    print(f"Normalized score: {multi_result.norm_score:.3f}")

# Visualize results
fig, axes = plt.subplots(1, 1, figsize=(12, 8))

r = multi_results[0]
img_out = draw_bboxes(r['img_rgb'], r['result'].bboxes)
axes.imshow(img_out)
axes.set_title(f"{r['image']}\n{r['result'].num} logos detected | norm={r['result'].norm_score:.2f}")
axes.axis('off')

plt.suptitle('Assignment 3: Unified Meta-Detector - Multi Detection Mode', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('results/TP3-assignment3_multi.png', dpi=120, bbox_inches='tight')
plt.show()

print("\nSaved: results/TP3-assignment3_multi.png")

MULTI DETECTION MODE
Processing: coca_multi.png
Detector Results:
  single_tm   : num= 1, norm=0.566, raw=0.483, TM-inverted@0.30
  edge_tm     : num= 1, norm=0.131, raw=0.265, TM-edges@0.25
  sift_single : num= 0, norm=0.000, raw=0.000, SIFT: no valid homography found
  multi_tm    : num=18, norm=0.653, raw=0.411, 18 logos, avg=0.411

Selected: multi_tm - Mode=multi

Multi-detection found 18 logos
Average TM score: 0.411
Normalized score: 0.653

Saved: results/TP3-assignment3_multi.png


---
## Conclusiones

### Enfoque

El enfoque inicial fue intentar desarrollar un algoritmo genérico que funcionara para todas las imágenes de manera uniforme. Sin embargo, rápidamente se evidenció que generalizar un único método de detección para todos los casos resultaba difícil debido a la variabilidad en las características de las imágenes:

- **Inversión de contraste:** Texto blanco sobre fondo rojo produce contraste invertido en escala de grises
- **Superficies curvas:** Latas y botellas distorsionan la geometría del logo
- **Variaciones de escala:** Logos desde ~10% hasta ~80% del ancho de imagen
- **Logos retro:** Diferencias estructurales respecto al template moderno
- **Múltiples instancias:** Detección de varios logos en una misma imagen

### Análisis caso por caso

Ante esta dificultad, se optó por analizar cada imagen individualmente, identificando sus características particulares y seleccionando el método más adecuado:

| Característica | Método óptimo | Justificación |
|----------------|---------------|---------------|
| Fondo limpio, bordes definidos | Edge TM | Canny preserva contornos, invariante a inversión |
| Superficie curva/distorsionada | SIFT | Maneja transformaciones de perspectiva |
| Contraste invertido (blanco/rojo) | TM invertido | Corrige la inversión usando template negativo |
| Logo estructuralmente diferente | SIFT | Encuentra correspondencias parciales |
| Múltiples logos similares | Multi-TM + NMS | Detecta todos los picos de correlación |

### Algoritmo unificado (Meta-Detector)

Con el conocimiento adquirido del análisis individual, se construyó un **meta-detector** que:

1. Ejecuta múltiples detectores especializados en paralelo (TM, Edge-TM, SIFT)
2. Normaliza las puntuaciones a un rango comparable [0, 1]
3. Selecciona automáticamente el mejor resultado según reglas de decisión

Este enfoque permitió generalizar la detección sin hardcodear el método por imagen, aunque requirió ajustes adicionales:

- **Rangos de escala amplios** (8%-85% del ancho) para cubrir logos pequeños y grandes
- **Umbral adaptativo** para la selección SIFT vs TM (norm_SIFT > norm_TM + 0.15)
- **Parámetros específicos** para multi-detección (filtros de color, bandas horizontales, aspect ratio)

### Resumen

1. **No existe un método universal:** Cada técnica (TM, SIFT, Edge) tiene fortalezas y debilidades específicas
2. **El análisis previo es fundamental:** Entender las características de cada imagen guía la selección del método
3. **La fusión de detectores es efectiva:** Combinar múltiples métodos y comparar scores produce resultados más robustos
4. **El tuning es inevitable:** Incluso con un meta-detector, se requieren ajustes finos para casos límite