# Image2GPS: Predicting Location from Street-Level Images

**A Beginner-Friendly Tutorial on Visual Geolocation with Deep Learning**

---

## What You'll Learn

In this tutorial, we'll explore how deep learning models can predict *where* a photo was taken just by looking at its visual content. This task is called **visual geolocation** or **Image2GPS**.

By the end of this notebook, you'll understand:
1. **The Problem**: Why is predicting GPS from images challenging?
2. **The Model**: How CNNs learn location-relevant features
3. **Preprocessing**: Why image transforms matter
4. **Evaluation**: How we measure geolocation accuracy
5. **Insights**: What visual cues help the model localize?

**Prerequisites**: Basic Python knowledge. Deep learning experience helpful but not required.

**Authors**: Cecilia Chen, Ranty Wang, Xun Wang (CIS 5190, Fall 2025)

---
## Part 1: Understanding the Problem

### The Challenge

Imagine you're shown a street photo and asked: *"Where was this taken?"*

Humans solve this by recognizing landmarks, reading signs, noticing architectural styles, or identifying vegetation patterns. Our goal is to teach a neural network to do the same!

### Why It's Hard

- **Fine-grained differences**: Two locations 50 meters apart may look very similar
- **Appearance variation**: The same spot looks different at dawn vs. dusk, summer vs. winter
- **Ambiguous scenes**: Generic sidewalks or grass could be anywhere

### Our Approach

We frame this as a **regression problem**:
- **Input**: A street-level image (224x224 pixels)
- **Output**: GPS coordinates (latitude, longitude)

Let's visualize what our target region looks like:

In [None]:
# Install required packages (run once)
!pip install -q torch torchvision matplotlib numpy pandas folium geopy pillow

In [None]:
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

In [None]:
# Visualize our target region: Penn Campus (33rd & Walnut to 34th & Spruce)
import folium

# Define the bounding box
bounds = {
    'north': 39.9535,   # 34th & Spruce (north edge)
    'south': 39.9500,   # 33rd & Walnut (south edge)
    'east': -75.1895,   # East boundary
    'west': -75.1945    # West boundary
}

# Create map centered on Penn
center_lat = (bounds['north'] + bounds['south']) / 2
center_lon = (bounds['east'] + bounds['west']) / 2

m = folium.Map(location=[center_lat, center_lon], zoom_start=17)

# Add bounding rectangle
folium.Rectangle(
    bounds=[[bounds['south'], bounds['west']], [bounds['north'], bounds['east']]],
    color='blue',
    fill=True,
    fill_opacity=0.2,
    popup='Target Region: ~400m x 350m'
).add_to(m)

# Add center marker
folium.Marker(
    [center_lat, center_lon],
    popup='Center of target region',
    icon=folium.Icon(color='red', icon='info-sign')
).add_to(m)

print("[MAP] Our target region on Penn's campus:")
print(f"   Size: approximately 400m x 350m")
print(f"   Center: ({center_lat:.4f} N, {center_lon:.4f} W)")
m

---
## Part 2: The Model Architecture

### Why ResNet?

We use **ResNet-50**, a convolutional neural network (CNN) that excels at image recognition. Here's the intuition:

1. **Early layers** detect simple patterns: edges, corners, textures
2. **Middle layers** combine these into shapes: windows, rooflines, tree canopies
3. **Deep layers** recognize complex structures: building facades, walkway layouts
4. **Final layer** maps these features to GPS coordinates

### From Classification to Regression

ResNet was designed for *classification* (e.g., "Is this a cat or dog?"). We modify it for *regression* by:
- Replacing the 1000-class output with 2 values (latitude, longitude)
- Using MSE loss instead of cross-entropy

Let's build our model:

In [None]:
class IMG2GPS(nn.Module):
    """
    A CNN that predicts GPS coordinates from images.

    Architecture:
        ResNet-50 backbone -> Custom regression head -> (lat, lon)
    """

    def __init__(self, pretrained=True):
        super().__init__()

        # Load ResNet-50 with ImageNet pretrained weights
        # These weights give us a "head start" - the model already knows
        # how to detect edges, textures, and shapes!
        weights = models.ResNet50_Weights.IMAGENET1K_V2 if pretrained else None
        self.backbone = models.resnet50(weights=weights)

        # Get the number of features from ResNet's final layer
        num_features = self.backbone.fc.in_features  # 2048 for ResNet-50

        # Replace classification head with regression head
        self.backbone.fc = nn.Identity()  # Remove original FC layer

        # Custom regression head with regularization
        self.regression_head = nn.Sequential(
            nn.Linear(num_features, 256),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.Dropout(0.5),  # Prevents overfitting
            nn.Linear(256, 128),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Dropout(0.25),
            nn.Linear(128, 2)  # Output: [latitude, longitude]
        )

        # GPS normalization parameters (computed from training data)
        # We normalize coordinates to have mean=0, std=1 for stable training
        self.lat_mean = 39.9517
        self.lat_std = 0.00065
        self.lon_mean = -75.1915
        self.lon_std = 0.00063

    def forward(self, x):
        """Forward pass - returns normalized coordinates."""
        features = self.backbone(x)
        return self.regression_head(features)

    def predict(self, x):
        """Make predictions in real GPS coordinates (degrees)."""
        self.eval()
        with torch.no_grad():
            normalized = self.forward(x)
            # Convert back to actual GPS coordinates
            lat = normalized[:, 0] * self.lat_std + self.lat_mean
            lon = normalized[:, 1] * self.lon_std + self.lon_mean
            return torch.stack([lat, lon], dim=1)

# Create model instance
model = IMG2GPS(pretrained=True)
print("[OK] Model created!")
print(f"   Total parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"   Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

### Visualizing the Architecture

Let's see what's inside our model:

In [None]:
# Visualize model structure
def count_parameters(module):
    return sum(p.numel() for p in module.parameters())

print("[INFO] Model Architecture Overview")
print("=" * 50)
print(f"\n>> ResNet-50 Backbone")
print(f"   - Initial conv + pooling: {count_parameters(model.backbone.conv1) + count_parameters(model.backbone.bn1):,} params")
print(f"   - Layer 1 (64 channels):  {count_parameters(model.backbone.layer1):,} params")
print(f"   - Layer 2 (128 channels): {count_parameters(model.backbone.layer2):,} params")
print(f"   - Layer 3 (256 channels): {count_parameters(model.backbone.layer3):,} params")
print(f"   - Layer 4 (512 channels): {count_parameters(model.backbone.layer4):,} params")
print(f"\n>> Regression Head")
print(f"   - Dense layers: {count_parameters(model.regression_head):,} params")
print(f"\n   Input: Image (3x224x224) -> Output: GPS (lat, lon)")

---
## Part 3: Image Preprocessing

### Why Preprocessing Matters

Raw images vary in size, lighting, and orientation. We need to standardize them:

1. **Resize** to 224x224 (ResNet's expected input size)
2. **Normalize** pixel values (ImageNet statistics)
3. **Augment** training images for robustness

### Our Key Finding: Grayscale Helps!

Through experiments, we discovered that **randomly converting images to grayscale** (10% of the time during training) improves accuracy by ~1.8%!

**Why?** Color can be misleading:
- The same building looks different at sunrise vs. sunset
- Seasonal changes alter foliage colors
- Weather affects color perception

By occasionally removing color, we force the model to focus on **geometry and structure** - features that are more stable for localization.

In [None]:
# Define our preprocessing pipelines

# Training: With augmentation
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),  # Random crop for variety
    transforms.RandomHorizontalFlip(p=0.5),              # Flip left-right
    transforms.RandomRotation(degrees=15),               # Small rotations
    transforms.ColorJitter(                              # Lighting variation
        brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1
    ),
    transforms.RandomGrayscale(p=0.1),                   # << Key augmentation!
    transforms.ToTensor(),
    transforms.Normalize(                                # ImageNet statistics
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# Inference: No augmentation (deterministic)
inference_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

print("[OK] Preprocessing pipelines defined!")

In [None]:
# Visualize augmentation effects
# We'll create a sample image to demonstrate

# Create a synthetic "street scene" for demonstration
np.random.seed(42)
sample_img = np.zeros((300, 400, 3), dtype=np.uint8)

# Sky (blue gradient)
for i in range(100):
    sample_img[i, :] = [135 + i//2, 206, 235]

# Building (brown/tan)
sample_img[100:250, 50:150] = [139, 119, 101]
sample_img[100:250, 250:350] = [160, 140, 120]

# Windows
for y in range(120, 240, 40):
    for x in [70, 110, 270, 310]:
        sample_img[y:y+25, x:x+20] = [200, 220, 255]

# Ground (gray sidewalk)
sample_img[250:, :] = [128, 128, 128]

# Trees (green)
sample_img[150:250, 175:225] = [34, 139, 34]

sample_pil = Image.fromarray(sample_img)

# Show original and augmented versions
fig, axes = plt.subplots(2, 4, figsize=(14, 7))
fig.suptitle('Augmentation Effects on a Street Scene', fontsize=14)

# Original
axes[0, 0].imshow(sample_img)
axes[0, 0].set_title('Original')
axes[0, 0].axis('off')

# Various augmentations
aug_names = ['Random Crop', 'Horizontal Flip', 'Color Jitter',
             'Grayscale', 'Rotation', 'Combined Aug', 'Final Input']

torch.manual_seed(42)
augmentations = [
    transforms.RandomResizedCrop(224, scale=(0.7, 1.0)),
    transforms.RandomHorizontalFlip(p=1.0),
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5),
    transforms.Grayscale(num_output_channels=3),
    transforms.RandomRotation(degrees=20),
    transforms.Compose([transforms.RandomResizedCrop(224),
                        transforms.ColorJitter(brightness=0.3)]),
    train_transform,
]

for idx, (name, aug) in enumerate(zip(aug_names, augmentations)):
    row, col = (idx + 1) // 4, (idx + 1) % 4

    try:
        augmented = aug(sample_pil)
        if isinstance(augmented, torch.Tensor):
            # Denormalize for visualization
            img_show = augmented.permute(1, 2, 0).numpy()
            img_show = img_show * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406])
            img_show = np.clip(img_show, 0, 1)
        else:
            img_show = np.array(augmented)

        axes[row, col].imshow(img_show)
        axes[row, col].set_title(name)
    except:
        axes[row, col].text(0.5, 0.5, 'N/A', ha='center', va='center')

    axes[row, col].axis('off')

plt.tight_layout()
plt.show()

print("\n[INSIGHT] Key insight: Grayscale removes color but preserves structure!")
print("   This helps the model focus on building geometry and spatial layout.")

---
## Part 4: Evaluation Metrics

### The Haversine Distance

We can't just use Euclidean distance for GPS coordinates because the Earth is a sphere! The **Haversine formula** calculates the great-circle distance between two points:

$$d = 2R \cdot \arcsin\left(\sqrt{\sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1)\cos(\phi_2)\sin^2\left(\frac{\Delta\lambda}{2}\right)}\right)$$

Where:
- $R$ = Earth's radius (6,371 km)
- $\phi$ = latitude in radians
- $\lambda$ = longitude in radians

### Why Not Just MSE?

Mean Squared Error on raw coordinates has issues:
1. A 0.001 degree error in latitude is not equal to 0.001 degree error in longitude (different physical distances)
2. Doesn't give intuitive "meters" interpretation

Haversine gives us the **actual physical distance** in meters - much more meaningful!

In [None]:
import math

def haversine_distance(pred_lat, pred_lon, true_lat, true_lon):
    """
    Calculate the great-circle distance between two GPS coordinates.

    Args:
        pred_lat, pred_lon: Predicted coordinates (degrees)
        true_lat, true_lon: Ground truth coordinates (degrees)

    Returns:
        Distance in meters
    """
    R = 6_371_000  # Earth's radius in meters

    # Convert to radians
    lat1, lon1 = math.radians(pred_lat), math.radians(pred_lon)
    lat2, lon2 = math.radians(true_lat), math.radians(true_lon)

    # Haversine formula
    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.asin(math.sqrt(a))

    return R * c


def calculate_rmse(predictions, ground_truth):
    """
    Calculate RMSE of Haversine distances.

    Args:
        predictions: Array of shape (N, 2) with [lat, lon]
        ground_truth: Array of shape (N, 2) with [lat, lon]

    Returns:
        RMSE in meters
    """
    distances = []
    for pred, true in zip(predictions, ground_truth):
        d = haversine_distance(pred[0], pred[1], true[0], true[1])
        distances.append(d)

    distances = np.array(distances)
    rmse = np.sqrt(np.mean(distances**2))
    return rmse, distances


# Demonstrate with examples
print("[DEMO] Haversine Distance Examples\n")

# Penn campus center
center = (39.9517, -75.1915)

examples = [
    ((39.9517, -75.1915), "Same point"),
    ((39.9518, -75.1915), "~11m north"),
    ((39.9517, -75.1914), "~8m east"),
    ((39.9520, -75.1910), "~50m away"),
    ((39.9530, -75.1900), "~200m away"),
]

for (lat, lon), desc in examples:
    d = haversine_distance(lat, lon, center[0], center[1])
    print(f"  {desc:20s} -> {d:7.2f} meters")

print("\n[NOTE] 0.0001 degrees is approximately 11 meters at this latitude!")

---
## Part 5: What Does the Model Learn?

Let's peek inside the CNN to understand what visual features it uses for localization.

### Feature Visualization

We'll extract intermediate feature maps to see what the model "sees" at different layers.

In [None]:
# Feature map visualization
class FeatureExtractor(nn.Module):
    """Extract intermediate feature maps from ResNet."""

    def __init__(self, model):
        super().__init__()
        self.backbone = model.backbone
        self.features = {}

        # Register hooks to capture intermediate outputs
        self.backbone.layer1.register_forward_hook(self._get_hook('layer1'))
        self.backbone.layer2.register_forward_hook(self._get_hook('layer2'))
        self.backbone.layer3.register_forward_hook(self._get_hook('layer3'))
        self.backbone.layer4.register_forward_hook(self._get_hook('layer4'))

    def _get_hook(self, name):
        def hook(module, input, output):
            self.features[name] = output.detach()
        return hook

    def forward(self, x):
        _ = self.backbone(x)
        return self.features

# Create feature extractor
feature_extractor = FeatureExtractor(model)
feature_extractor.eval()

# Process our sample image
sample_tensor = inference_transform(sample_pil).unsqueeze(0)

with torch.no_grad():
    features = feature_extractor(sample_tensor)

# Visualize feature maps
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle('What the CNN "Sees" at Different Layers', fontsize=14)

# Original image
axes[0, 0].imshow(sample_img)
axes[0, 0].set_title('Input Image')
axes[0, 0].axis('off')

# Feature maps from each layer
layer_info = [
    ('layer1', 'Layer 1: Edges & Textures'),
    ('layer2', 'Layer 2: Patterns & Shapes'),
    ('layer3', 'Layer 3: Object Parts'),
    ('layer4', 'Layer 4: High-Level Features'),
]

for idx, (layer_name, title) in enumerate(layer_info):
    feat = features[layer_name][0]  # Get first (only) batch item

    # Average across channels for visualization
    feat_avg = feat.mean(dim=0).cpu().numpy()

    row = (idx + 1) // 4
    col = (idx + 1) % 4

    axes[row, col].imshow(feat_avg, cmap='viridis')
    axes[row, col].set_title(f"{title}\n({feat.shape[0]} channels, {feat.shape[1]}x{feat.shape[2]})")
    axes[row, col].axis('off')

# Show individual channels from layer 1
axes[1, 1].imshow(features['layer1'][0, 0].cpu().numpy(), cmap='gray')
axes[1, 1].set_title('Layer 1, Channel 0\n(Edge detection)')
axes[1, 1].axis('off')

axes[1, 2].imshow(features['layer1'][0, 32].cpu().numpy(), cmap='gray')
axes[1, 2].set_title('Layer 1, Channel 32\n(Texture detection)')
axes[1, 2].axis('off')

axes[1, 3].imshow(features['layer2'][0, :16].mean(dim=0).cpu().numpy(), cmap='plasma')
axes[1, 3].set_title('Layer 2, Avg Channels 0-15\n(Building structures)')
axes[1, 3].axis('off')

plt.tight_layout()
plt.show()

print("\n[INTERPRETATION]")
print("   - Early layers detect edges (building outlines, windows)")
print("   - Middle layers recognize shapes (rooflines, sidewalk patterns)")
print("   - Deep layers capture high-level structure (overall scene layout)")
print("   - The model combines all these to predict location!")

---
## Part 6: Our Results

### Performance Summary

After training on 936 images from Penn's campus, our model achieves:

In [None]:
# Our experimental results
results = {
    'Baseline (ResNet-18, scratch)': 69.76,
    'Baseline (ResNet-18, pretrained)': 48.52,
    'Improved (ResNet-18, grayscale)': 33.01,
    'Improved (ResNet-50, grayscale)': 32.53,
    'Final (ResNet-50, 50 epochs)': 21.92,
}

# Visualize improvement
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart
colors = ['#ff6b6b', '#ffa94d', '#69db7c', '#4dabf7', '#9775fa']
bars = ax1.barh(list(results.keys()), list(results.values()), color=colors)
ax1.set_xlabel('Test RMSE (meters) - Lower is better')
ax1.set_title('Model Performance Comparison')
ax1.axvline(x=50, color='green', linestyle='--', alpha=0.7, label='Excellent (<50m)')
ax1.axvline(x=88, color='red', linestyle='--', alpha=0.7, label='Baseline requirement')
ax1.legend()

# Add value labels
for bar, val in zip(bars, results.values()):
    ax1.text(val + 1, bar.get_y() + bar.get_height()/2, f'{val:.1f}m',
             va='center', fontsize=10)

# Training progress (milestone comparison)
epochs = [10, 20, 30, 40, 50]
no_gray = [35.33, 29.14, 24.12, 22.64, 22.61]
with_gray = [36.38, 26.27, 23.67, 23.30, 21.92]

ax2.plot(epochs, no_gray, 'o-', label='Without grayscale', linewidth=2, markersize=8)
ax2.plot(epochs, with_gray, 's-', label='With grayscale', linewidth=2, markersize=8)
ax2.fill_between(epochs, no_gray, with_gray, alpha=0.2,
                  where=[ng > wg for ng, wg in zip(no_gray, with_gray)], color='green')
ax2.set_xlabel('Training Epochs')
ax2.set_ylabel('Test RMSE (meters)')
ax2.set_title('Effect of Grayscale Augmentation Over Training')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n[KEY FINDINGS]")
print(f"   * Pretrained weights improved performance by 30% (69.76m -> 48.52m)")
print(f"   * Grayscale augmentation provided consistent benefits")
print(f"   * Final model: 21.92m RMSE - about 60% better than baseline!")
print(f"   * This means on average, we can locate photos within ~22 meters")

### Ablation Study: What Helps vs. What Hurts?

We systematically tested different augmentation strategies:

In [None]:
# Ablation results
ablation = {
    'Base only': (31.58, 0),
    'Base + Grayscale': (31.02, -1.8),
    'Base + Blur': (37.94, 20.1),
    'Base + Erasing': (33.35, 5.6),
    'Base + Gray + Blur': (33.09, 4.8),
    'Full augmentation': (34.19, 8.2),
}

fig, ax = plt.subplots(figsize=(10, 6))

configs = list(ablation.keys())
rmses = [v[0] for v in ablation.values()]
changes = [v[1] for v in ablation.values()]

colors = ['green' if c <= 0 else 'red' for c in changes]
bars = ax.bar(configs, rmses, color=colors, alpha=0.7, edgecolor='black')

# Add change annotations
for bar, change in zip(bars, changes):
    height = bar.get_height()
    if change != 0:
        sign = '+' if change > 0 else ''
        ax.text(bar.get_x() + bar.get_width()/2, height + 0.3,
                f'{sign}{change}%', ha='center', fontsize=9,
                color='red' if change > 0 else 'green', fontweight='bold')

ax.axhline(y=31.58, color='gray', linestyle='--', alpha=0.5, label='Baseline')
ax.set_ylabel('Test RMSE (meters)')
ax.set_title('Ablation Study: Effect of Different Augmentations')
ax.set_ylim(28, 40)
plt.xticks(rotation=20, ha='right')
plt.tight_layout()
plt.show()

print("\n[TAKEAWAYS]")
print("   [+] Grayscale: Helps! Forces model to use geometry over color")
print("   [-] Blur: Hurts! Removes fine details needed for localization")
print("   [-] Erasing: Hurts! Occludes important spatial cues")
print("   [-] Combining aggressive augmentations makes things worse")

---
## Part 7: Try It Yourself!

Upload your own image and see where the model thinks it was taken.

**Note**: Our model is trained only on Penn's campus (33rd & Walnut to 34th & Spruce), so it will always predict locations within that region.

In [None]:
# Interactive prediction function
def predict_location(image_path=None, show_map=True):
    """
    Predict the GPS location of an image.

    Args:
        image_path: Path to image file, or None to use sample
        show_map: Whether to display location on map
    """
    # Use sample image if none provided
    if image_path is None:
        print("Using sample image (no image provided)")
        img = sample_pil
    else:
        img = Image.open(image_path).convert('RGB')

    # Preprocess
    img_tensor = inference_transform(img).unsqueeze(0)

    # Predict (note: model has random weights, so this is just a demo)
    model.eval()
    with torch.no_grad():
        coords = model.predict(img_tensor)
        pred_lat = coords[0, 0].item()
        pred_lon = coords[0, 1].item()

    print(f"\n[PREDICTION] Predicted Location:")
    print(f"   Latitude:  {pred_lat:.6f} N")
    print(f"   Longitude: {pred_lon:.6f} W")

    # Show on map
    if show_map:
        m = folium.Map(location=[pred_lat, pred_lon], zoom_start=18)
        folium.Marker(
            [pred_lat, pred_lon],
            popup=f'Predicted: ({pred_lat:.5f}, {pred_lon:.5f})',
            icon=folium.Icon(color='red', icon='camera')
        ).add_to(m)

        # Add accuracy circle (typical error ~22m)
        folium.Circle(
            [pred_lat, pred_lon],
            radius=22,  # meters
            color='blue',
            fill=True,
            fill_opacity=0.2,
            popup='~22m typical error'
        ).add_to(m)

        return m

# Demo with our sample
print("[DEMO] Predicting location from sample image")
print("   (Note: Model has random weights - just demonstrating the pipeline)")
predict_location()

### Upload Your Own Image

In Google Colab, you can upload an image using the file browser on the left, then run:

In [None]:
# Uncomment and modify to use your own image:
# predict_location('/content/your_image.jpg')

# Or use Colab's file upload:
# from google.colab import files
# uploaded = files.upload()
# for filename in uploaded.keys():
#     predict_location(filename)

---
## Summary and Key Takeaways

### What We Learned

1. **Visual geolocation is hard** - small physical distances can look very similar in images

2. **Transfer learning helps** - pretrained ImageNet weights give us a 30% performance boost

3. **Not all augmentation is good** - grayscale helps, but blur and erasing hurt localization

4. **CNNs learn meaningful features** - early layers detect edges, deep layers capture scene structure

5. **Haversine distance is essential** - use proper spherical geometry for GPS coordinates!

### Performance Achieved

| Metric | Value |
|--------|-------|
| Final RMSE | **21.92 meters** |
| Improvement over baseline | **75.1%** |
| Best augmentation | Grayscale (10% probability) |

### Resources

- [Our Hugging Face Dataset](https://huggingface.co/datasets/rantyw/image2gps)
- [ResNet Paper](https://arxiv.org/abs/1512.03385)
- [Haversine Formula](https://en.wikipedia.org/wiki/Haversine_formula)

---

*Tutorial created for CIS 5190 Applied Machine Learning, Fall 2025*

*Authors: Cecilia Chen, Ranty Wang, Xun Wang*