# Phase 3: Fixed Distance Estimation for Campus Navigation Assistant

This notebook implements a robust distance estimation module for the Mobile-Based Campus Navigation Assistant, optimized for iPhone 14 images. It uses the pretrained ResNet50 model (`resnet50_multiclass_building_detection_full.pth`) for landmark detection and estimates distances using object size comparison and triangulation. YOLOv5 detects doors (reference object, height: 2 meters) with a manual bounding box fallback. Stereo rectification enhances triangulation, and interactive widgets simplify input. The model loading is fixed for PyTorch 2.6+ to handle full model serialization.

## Objectives
- Load ResNet50 model, handling full model serialization in PyTorch 2.6+.
- Detect doors using YOLOv5 with manual input fallback.
- Estimate distance using:
  - Object size comparison (iPhone 14 specs).
  - Triangulation with stereo rectification.
- Provide interactive image upload and baseline input.
- Save results in JSON for Phase 4 integration.

## iPhone 14 Camera Specs
- Focal length: 4.25mm.
- Sensor size: 7.6mm x 5.7mm.
- Resolution: 4032x3024 pixels.
- Focal length in pixels: ~2253 pixels.

## Input Requirements
- Two images in `images/` folder (e.g., `image1.jpg`, `image2.jpg`).
- Baseline distance (e.g., 0.5m) between camera positions.
- `resnet50_multiclass_building_detection_full.pth` and `annotations.csv` in working directory.

## Outputs
- Estimated distances (size comparison and triangulation).
- Visualizations (annotated image, triangulation geometry).
- Saved files: `result_annotated.jpg`, `distance_results.json`.

## Step 1: Import Libraries

In [None]:
import cv2
import numpy as np
import torch
import torchvision.transforms as transforms
from torchvision.models import resnet50
from PIL import Image
import matplotlib.pyplot as plt
import pandas as pd
import os
import json
from ultralytics import YOLO
import ipywidgets as widgets
from IPython.display import display, clear_output
%matplotlib inline

## Step 2: Load Annotations and Define Classes

In [None]:
def load_annotations():
    if not os.path.exists('annotations.csv'):
        raise FileNotFoundError('annotations.csv not found')
    annotations = pd.read_csv('annotations.csv')
    class_names = sorted(annotations['label'].unique())
    class_to_idx = {cls: idx for idx, cls in enumerate(class_names)}
    idx_to_class = {idx: cls for cls, idx in class_to_idx.items()}
    return class_names, class_to_idx, idx_to_class

class_names, class_to_idx, idx_to_class = load_annotations()
num_classes = len(class_names)
print(f"Number of classes: {num_classes}")
print(f"Classes: {class_names}")

## Step 3: Load Pretrained ResNet50 Model

Fixed to load full model directly with `weights_only=False`, with fallback for state_dict.

In [None]:
def load_resnet_model():
    model_file = 'resnet50_multiclass_building_detection_full.pth'
    if not os.path.exists(model_file):
        raise FileNotFoundError(f'{model_file} not found')
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    try:
        # Try loading as full model
        model = torch.load(model_file, map_location=device, weights_only=False)
        # Verify the model has the correct number of classes
        if model.fc.out_features != num_classes:
            raise ValueError(f'Model has {model.fc.out_features} output classes, expected {num_classes}')
    except Exception as e:
        print(f"Error loading full model: {str(e)}")
        print("Attempting to load as state_dict...")
        # Fallback to state_dict loading
        model = resnet50(weights=None)
        model.fc = torch.nn.Linear(model.fc.in_features, num_classes)
        try:
            state_dict = torch.load(model_file, map_location=device, weights_only=True)
            model.load_state_dict(state_dict)
        except Exception as e2:
            raise RuntimeError(f'Failed to load model as state_dict: {str(e2)}')
    
    model.eval()
    return model.to(device), device

model, device = load_resnet_model()

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

## Step 4: Load YOLOv5 for Door Detection

In [None]:
yolo_model = YOLO('yolov5s.pt')
yolo_conf_threshold = 0.4

## Step 5: Landmark Detection Function

In [None]:
def detect_landmark(image_path):
    if not os.path.exists(image_path):
        raise FileNotFoundError(f'Image {image_path} not found')
    image = Image.open(image_path).convert('RGB')
    try:
        exif = image._getexif()
        if exif:
            from PIL.ExifTags import TAGS
            for tag, value in exif.items():
                if TAGS.get(tag) == 'Orientation':
                    if value == 3:
                        image = image.rotate(180, expand=True)
                    elif value == 6:
                        image = image.rotate(270, expand=True)
                    elif value == 8:
                        image = image.rotate(90, expand=True)
    except:
        pass
    
    image_tensor = transform(image).unsqueeze(0).to(device)
    with torch.no_grad():
        outputs = model(image_tensor)
        probs = torch.softmax(outputs, dim=1)
        confidence, predicted = torch.max(probs, 1)
        predicted_class = idx_to_class[predicted.item()]
    return predicted_class, confidence.item(), image

## Step 6: Door Detection with YOLOv5 and Fallback

In [None]:
def detect_door(image_path, manual_bbox=None):
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f'Failed to load image {image_path}')
    scale_factor = 0.5
    img_resized = cv2.resize(img, None, fx=scale_factor, fy=scale_factor)
    
    if manual_bbox:
        x1, y1, x2, y2 = manual_bbox
        pixel_height = (y2 - y1) / scale_factor
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(img, 'Manual Door', (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        return pixel_height, img, 0.5
    
    results = yolo_model(img_resized, conf=yolo_conf_threshold)
    detections = results.pandas().xyxy[0]
    door_detections = detections[detections['name'] == 'door']
    
    if door_detections.empty:
        print('No door detected. Use manual bounding box or try another image.')
        return None, img, 0.0
    
    door = door_detections.loc[door_detections['confidence'].idxmax()]
    x1, y1, x2, y2 = [int(v / scale_factor) for v in [door['xmin'], door['ymin'], door['xmax'], door['ymax']]]
    pixel_height = y2 - y1
    confidence = door['confidence']
    
    cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(img, f'Door (Conf: {confidence:.2f})', (x1, y1-10), 
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
    return pixel_height, img, confidence

## Step 7: Distance Estimation via Object Size Comparison

In [None]:
def estimate_distance_size_comparison(pixel_height, real_height=2.0, focal_length=2253):
    if pixel_height is None or pixel_height <= 0:
        return None, 0.0
    try:
        distance = (focal_length * real_height) / pixel_height
        if distance <= 0 or distance > 100:
            return None, 0.0
        confidence = min(1.0, pixel_height / 500)
        return distance, confidence
    except:
        return None, 0.0

## Step 8: Distance Estimation via Triangulation

In [None]:
def estimate_distance_triangulation(image_path1, image_path2, baseline=0.5, focal_length=2253):
    img1 = cv2.imread(image_path1, cv2.IMREAD_GRAYSCALE)
    img2 = cv2.imread(image_path2, cv2.IMREAD_GRAYSCALE)
    if img1 is None or img2 is None:
        raise ValueError('Failed to load one or both images')
    
    scale_factor = 0.5
    img1 = cv2.resize(img1, None, fx=scale_factor, fy=scale_factor)
    img2 = cv2.resize(img2, None, fx=scale_factor, fy=scale_factor)
    
    orb = cv2.ORB_create()
    kp1, des1 = orb.detectAndCompute(img1, None)
    kp2, des2 = orb.detectAndCompute(img2, None)
    if des1 is None or des2 is None:
        print('Insufficient keypoints detected')
        return None, 0.0
    
    bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches = bf.match(des1, des2)
    matches = sorted(matches, key=lambda x: x.distance)
    if len(matches) < 20:
        print('Insufficient matches for triangulation')
        return None, 0.0
    
    pts1 = np.float32([kp1[m.queryIdx].pt for m in matches])
    pts2 = np.float32([kp2[m.trainIdx].pt for m in matches])
    
    F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_RANSAC)
    if F is None:
        return None, 0.0
    inliers = mask.ravel() == 1
    pts1 = pts1[inliers]
    pts2 = pts2[inliers]
    if len(pts1) < 10:
        print('Insufficient inlier matches')
        return None, 0.0
    
    h, w = img1.shape
    _, H1, H2 = cv2.stereoRectifyUncalibrated(pts1, pts2, F, (w, h))
    img1_rect = cv2.warpPerspective(img1, H1, (w, h))
    img2_rect = cv2.warpPerspective(img2, H2, (w, h))
    
    stereo = cv2.StereoBM_create(numDisparities=64, blockSize=15)
    disparity = stereo.compute(img1_rect, img2_rect)
    valid_disparities = disparity[disparity > 0] / 16.0
    if len(valid_disparities) == 0:
        return None, 0.0
    disparity_median = np.median(valid_disparities) * scale_factor
    
    try:
        distance = (baseline * focal_length) / disparity_median
        if distance <= 0 or distance > 100:
            return None, 0.0
        confidence = min(1.0, len(valid_disparities) / 1000)
        return distance, confidence
    except:
        return None, 0.0

## Step 9: Triangulation Geometry Visualization

In [None]:
def visualize_triangulation_geometry(baseline, distance, landmark):
    plt.figure(figsize=(8, 4))
    plt.plot([0], [0], 'ro', label='Camera 1')
    plt.plot([baseline], [0], 'bo', label='Camera 2')
    plt.plot([baseline/2], [distance], 'g^', label=f'{landmark}')
    plt.plot([0, baseline/2], [0, distance], 'r--')
    plt.plot([baseline, baseline/2], [0, distance], 'b--')
    plt.xlabel('X (meters)')
    plt.ylabel('Distance (meters)')
    plt.title('Triangulation Geometry')
    plt.legend()
    plt.grid(True)
    plt.axis('equal')
    plt.show()

## Step 10: Visualization and Result Saving

In [None]:
def visualize_and_save_results(image_path1, image_path2, landmark, landmark_conf, 
                              distance_size, size_conf, distance_triang, triang_conf, 
                              img_with_door, baseline):
    img_with_door_rgb = cv2.cvtColor(img_with_door, cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(12, 6))
    plt.imshow(img_with_door_rgb)
    plt.title(f"Landmark: {landmark} (Conf: {landmark_conf:.2f})\n"
              f"Distance (Size): {distance_size:.2f if distance_size else 'N/A'}m (Conf: {size_conf:.2f})\n"
              f"Distance (Triang): {distance_triang:.2f if distance_triang else 'N/A'}m (Conf: {triang_conf:.2f})", fontsize=12)
    plt.axis('off')
    plt.savefig('result_annotated.jpg', bbox_inches='tight')
    plt.show()
    
    if distance_triang is not None:
        visualize_triangulation_geometry(baseline, distance_triang, landmark)
    
    results = {
        'landmark': landmark,
        'landmark_confidence': landmark_conf,
        'distance_size': distance_size,
        'size_confidence': size_conf,
        'distance_triangulation': distance_triang,
        'triangulation_confidence': triang_conf,
        'image1_path': image_path1,
        'image2_path': image_path2,
        'baseline': baseline
    }
    with open('distance_results.json', 'w') as f:
        json.dump(results, f, indent=4)

## Step 11: Interactive Input Widgets

In [None]:
image1_upload = widgets.FileUpload(accept='.jpg,.png', description='Image 1')
image2_upload = widgets.FileUpload(accept='.jpg,.png', description='Image 2')
baseline_input = widgets.FloatText(value=0.5, description='Baseline (m):')
manual_bbox_check = widgets.Checkbox(value=False, description='Manual Door BBox')
bbox_x1 = widgets.IntText(value=100, description='x1:')
bbox_y1 = widgets.IntText(value=100, description='y1:')
bbox_x2 = widgets.IntText(value=200, description='x2:')
bbox_y2 = widgets.IntText(value=300, description='y2:')
run_button = widgets.Button(description='Run Estimation')
output = widgets.Output()

def save_uploaded_image(upload_widget, filename):
    if upload_widget.value:
        uploaded_file = list(upload_widget.value.values())[0]
        os.makedirs('images', exist_ok=True)
        with open(filename, 'wb') as f:
            f.write(uploaded_file['content'])
        return filename
    return None

def on_run_button_clicked(b):
    with output:
        clear_output()
        try:
            image_path1 = save_uploaded_image(image1_upload, 'images/image1.jpg')
            image_path2 = save_uploaded_image(image2_upload, 'images/image2.jpg')
            if not image_path1 or not image_path2:
                print('Please upload both images')
                return
            
            baseline = baseline_input.value
            if baseline <= 0:
                print('Baseline must be positive')
                return
            
            manual_bbox = None
            if manual_bbox_check.value:
                manual_bbox = [bbox_x1.value, bbox_y1.value, bbox_x2.value, bbox_y2.value]
            
            landmark, landmark_conf, pil_image = detect_landmark(image_path1)
            pixel_height, img_with_door, door_conf = detect_door(image_path1, manual_bbox)
            distance_size, size_conf = estimate_distance_size_comparison(pixel_height)
            distance_triang, triang_conf = estimate_distance_triangulation(image_path1, image_path2, baseline)
            
            visualize_and_save_results(image_path1, image_path2, landmark, landmark_conf, 
                                     distance_size, size_conf, distance_triang, triang_conf, 
                                     img_with_door, baseline)
            
            print(f"Detected Landmark: {landmark} (Confidence: {landmark_conf:.2f})")
            print(f"Distance (Size): {distance_size if distance_size else 'N/A'} meters (Confidence: {size_conf:.2f})")
            print(f"Distance (Triang): {distance_triang if distance_triang else 'N/A'} meters (Confidence: {triang_conf:.2f})")
        except Exception as e:
            print(f"Error: {str(e)}")

run_button.on_click(on_run_button_clicked)

display(widgets.VBox([
    image1_upload, image2_upload, baseline_input,
    manual_bbox_check, bbox_x1, bbox_y1, bbox_x2, bbox_y2,
    run_button, output
]))

## Notes
- **Model Loading Fix**: Loads the full model directly with `weights_only=False`, with a fallback to `state_dict` loading.
- **PyTorch 2.6+**: Uses `weights=None` to replace deprecated `pretrained`.
- **iPhone 14 Optimization**: Focal length (2253 pixels) and EXIF handling.
- **Door Detection**: YOLOv5 with manual bbox fallback.
- **Triangulation**: Stereo rectification and fundamental matrix filtering.
- **Outputs**: JSON file (`distance_results.json`) for Phase 4.
- **Alternative**: If you prefer `weights_only=True`, re-save the model as a `state_dict` (see below).