<a href="https://colab.research.google.com/github/A00785001/TC5035/blob/main/002_DL_Feature_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Feature Extraction Notebook - Camera & LiDAR

## Overview
This notebook extracts deep features from preprocessed camera images and LiDAR scans for sensor fusion and loop closure detection. It implements two parallel feature extraction branches:

1. **Visual Branch**: MobileNet V2 → 1280D features
2. **Geometric Branch**: 1D CNN → 256D features

Features are saved in HDF5 format for efficient storage and retrieval.

## Prerequisites
- Preprocessed images from camera notebook (`processed_images/`)
- Preprocessed LiDAR scans from LiDAR notebook (`processed_lidar/`)
- Both datasets with aligned timestamps in metadata

## Pipeline Architecture
```
Camera Images (224×224) → MobileNet V2 → 1280D → L2 Norm → Visual Features
LiDAR Scans (360,)      → 1D CNN      → 256D  → L2 Norm → Geometric Features
                                                ↓
                                      HDF5 Feature Database
```

## Installation & Setup

In [1]:
# Install required packages
!pip install --quiet --upgrade torch torchvision
!pip install --quiet h5py pandas numpy matplotlib tqdm pillow

print("✓ All packages installed successfully!")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m899.7/899.7 MB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m594.3/594.3 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m89.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.0/88.0 MB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m954.8/954.8 kB[0m [31m41.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.1/193.1 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m57.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.6/63.6 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, transforms
import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from tqdm import tqdm
import json
import os
from datetime import datetime

# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

print("Libraries loaded successfully!")

Using device: cpu
Libraries loaded successfully!


## Section 1: Camera Feature Extraction (Visual Branch)

### Purpose
Extract high-level visual features from camera images using MobileNet V2, a lightweight CNN pretrained on ImageNet. These features capture semantic and appearance information crucial for place recognition.

### Architecture: MobileNet V2
- **Input**: 224×224×3 RGB images
- **Preprocessing**: Scale pixels from [0,255] to [-1,1]
- **Backbone**: MobileNet V2 (pretrained on ImageNet)
- **Feature Layer**: Before final classification layer
- **Raw Output**: 1280D feature vector
- **Post-processing**: L2 normalization
- **Final Output**: 1280D normalized feature vector

### What This Section Does
1. Load preprocessed images from `processed_images/`
2. Apply MobileNet V2 preprocessing ([-1,1] scaling)
3. Extract 1280D features using pretrained MobileNet V2
4. L2 normalize features for similarity comparison
5. Save to HDF5 with timestamps and metadata

### Why MobileNet V2?
- Lightweight: ~3.5M parameters
- Fast inference: ~20-30ms on CPU
- Robust features proven for place recognition
- Pretrained on ImageNet (1000 classes)
- Efficient for embedded systems (Jetbot)

### Output Format
Features saved to HDF5: `features.h5/camera/features` [N, 1280]

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [4]:
session = '20251016_133216'

In [5]:
print(f"Using session: {session}")

# Specify the path to your ROS bag file within the shared folder
data_path = "/content/drive/MyDrive/DATA/Artificial_Intelligence/MNA-V/Subjects/TC5035-Proyecto_Integrador/TC5035.data/jetbot"
working_folder = data_path + '/session_' + session
bag_name = 'session_data.bag'


# Change to the specified subfolder
os.chdir(working_folder)
print(f"Changed directory to: {os.getcwd()}")

Using session: 20251016_133216
Changed directory to: /content/drive/MyDrive/DATA/Artificial_Intelligence/MNA-V/Subjects/TC5035-Proyecto_Integrador/TC5035.data/jetbot/session_20251016_133216


In [6]:
# Configuration for camera feature extraction
CAMERA_INPUT_DIR = "processed_images"
CAMERA_BATCH_SIZE = 32  # Adjust based on GPU memory

print(f"Camera input directory: {CAMERA_INPUT_DIR}")
print(f"Batch size: {CAMERA_BATCH_SIZE}")

Camera input directory: processed_images
Batch size: 32


In [7]:
# Load camera metadata
camera_metadata = pd.read_csv(os.path.join(CAMERA_INPUT_DIR, 'metadata.csv'))

print(f"Found {len(camera_metadata)} camera images")
print(f"\nFirst few entries:")
camera_metadata.head()

Found 100 camera images

First few entries:


Unnamed: 0,filename,timestamp,timestamp_sec,timestamp_nsec,frame_id,original_width,original_height,file_size_kb
0,img_00000.jpg,1760650000.0,1760649872,670898199,0,640,480,20.54
1,img_00001.jpg,1760650000.0,1760649872,671323537,1,640,480,20.52
2,img_00002.jpg,1760650000.0,1760649872,684802532,2,640,480,20.54
3,img_00003.jpg,1760650000.0,1760649872,685315847,3,640,480,20.54
4,img_00004.jpg,1760650000.0,1760649872,685797214,4,640,480,20.57


In [8]:
# Load MobileNet V2 (pretrained on ImageNet)
print("Loading MobileNet V2...")

# Load pretrained model
mobilenet = models.mobilenet_v2(pretrained=True)

# Remove the final classification layer to get features
# MobileNet V2 structure: features → classifier
# We want the output of 'features' (before classifier)
feature_extractor = nn.Sequential(
    mobilenet.features,
    nn.AdaptiveAvgPool2d((1, 1)),  # Global average pooling
    nn.Flatten()
)

# Move to GPU if available
feature_extractor = feature_extractor.to(device)
feature_extractor.eval()  # Set to evaluation mode

print(f"✓ MobileNet V2 loaded on {device}")
print(f"Feature dimension: 1280D")

Loading MobileNet V2...
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth


100%|██████████| 13.6M/13.6M [00:00<00:00, 90.6MB/s]


✓ MobileNet V2 loaded on cpu
Feature dimension: 1280D


In [9]:
# Define MobileNet V2 preprocessing
# Input: [0, 255] RGB → Output: [-1, 1] normalized
mobilenet_transform = transforms.Compose([
    transforms.ToTensor(),  # [0, 255] → [0, 1]
    transforms.Normalize(mean=[0.485, 0.456, 0.406],  # ImageNet stats
                        std=[0.229, 0.224, 0.225])     # Scale to ~[-1, 1]
])

print("MobileNet V2 preprocessing pipeline ready")

MobileNet V2 preprocessing pipeline ready


In [10]:
# Extract camera features
print("Extracting camera features...")

camera_features_list = []
camera_timestamps_list = []
camera_filenames_list = []

with torch.no_grad():  # Disable gradient computation
    for idx in tqdm(range(0, len(camera_metadata), CAMERA_BATCH_SIZE), desc="Camera batches"):
        batch_meta = camera_metadata.iloc[idx:idx+CAMERA_BATCH_SIZE]

        # Load batch of images
        batch_images = []
        batch_timestamps = []
        batch_filenames = []

        for _, row in batch_meta.iterrows():
            img_path = os.path.join(CAMERA_INPUT_DIR, row['filename'])

            try:
                # Load and preprocess image
                img = Image.open(img_path).convert('RGB')
                img_tensor = mobilenet_transform(img)
                batch_images.append(img_tensor)
                batch_timestamps.append(row['timestamp'])
                batch_filenames.append(row['filename'])
            except Exception as e:
                print(f"Error loading {img_path}: {e}")
                continue

        if len(batch_images) == 0:
            continue

        # Stack into batch tensor
        batch_tensor = torch.stack(batch_images).to(device)

        # Extract features
        features = feature_extractor(batch_tensor)

        # L2 normalize features
        features = F.normalize(features, p=2, dim=1)

        # Move to CPU and store
        camera_features_list.append(features.cpu().numpy())
        camera_timestamps_list.extend(batch_timestamps)
        camera_filenames_list.extend(batch_filenames)

# Concatenate all batches
camera_features = np.vstack(camera_features_list)
camera_timestamps = np.array(camera_timestamps_list)
camera_filenames = np.array(camera_filenames_list)

print(f"\n✓ Extracted camera features: {camera_features.shape}")
print(f"Feature dimension: {camera_features.shape[1]}D")
print(f"Number of images: {camera_features.shape[0]}")

Extracting camera features...


Camera batches: 100%|██████████| 4/4 [00:30<00:00,  7.57s/it]


✓ Extracted camera features: (100, 1280)
Feature dimension: 1280D
Number of images: 100





In [11]:
# Verify feature properties
print("Camera Feature Statistics:")
print(f"  Shape: {camera_features.shape}")
print(f"  Mean: {camera_features.mean():.4f}")
print(f"  Std: {camera_features.std():.4f}")
print(f"  Min: {camera_features.min():.4f}")
print(f"  Max: {camera_features.max():.4f}")

# Check L2 normalization (should be ~1.0)
norms = np.linalg.norm(camera_features, axis=1)
print(f"  L2 norms: mean={norms.mean():.4f}, std={norms.std():.6f}")
print(f"  (should be ~1.0 after L2 normalization)")

Camera Feature Statistics:
  Shape: (100, 1280)
  Mean: 0.0196
  Std: 0.0199
  Min: 0.0000
  Max: 0.1544
  L2 norms: mean=1.0000, std=0.000000
  (should be ~1.0 after L2 normalization)


## Section 2: LiDAR Feature Extraction (Geometric Branch)

### Purpose
Extract geometric features from LiDAR scans using a custom 1D CNN. These features capture spatial structure and geometry information complementary to visual features.

### Architecture: 1D CNN
- **Input**: 360 normalized distance values [0, 1]
- **Architecture**:
  - Conv1D(1→64, kernel=5) + ReLU + BatchNorm
  - Conv1D(64→128, kernel=5) + ReLU + BatchNorm
  - Conv1D(128→256, kernel=3) + ReLU + BatchNorm
  - Conv1D(256→256, kernel=3) + ReLU + BatchNorm
  - Global Average Pooling
- **Parameters**: ~350K
- **Raw Output**: 256D feature vector
- **Post-processing**: L2 normalization
- **Final Output**: 256D normalized feature vector

### What This Section Does
1. Define 1D CNN architecture matching pipeline specs
2. Initialize with random weights (no pretraining available)
3. Load preprocessed LiDAR scans from `processed_lidar/`
4. Extract 256D geometric descriptors
5. L2 normalize features
6. Save to HDF5 with timestamps and metadata

### Important Note
⚠️ **This network uses random initialization** (no pretrained weights available). For production use:
- Train on loop closure detection task
- Or use triplet loss with place recognition labels
- Current features serve as geometric descriptors but may not be optimal

### Why 1D CNN?
- Captures local geometric patterns in 360° scans
- Translation-invariant along angular dimension
- Lightweight: ~350K params (10× smaller than MobileNet V2)
- Fast: ~20-30ms inference on embedded systems
- Proven effective for LiDAR-based place recognition

### Output Format
Features saved to HDF5: `features.h5/lidar/features` [N, 256]

In [None]:
# Configuration for LiDAR feature extraction
LIDAR_INPUT_DIR = "processed_lidar"
LIDAR_BATCH_SIZE = 64  # LiDAR is lighter than images

print(f"LiDAR input directory: {LIDAR_INPUT_DIR}")
print(f"Batch size: {LIDAR_BATCH_SIZE}")

In [None]:
# Load LiDAR metadata
lidar_metadata = pd.read_csv(os.path.join(LIDAR_INPUT_DIR, 'metadata.csv'))

print(f"Found {len(lidar_metadata)} LiDAR scans")
print(f"\nFirst few entries:")
lidar_metadata.head()

In [None]:
# Define 1D CNN for Geometric Feature Extraction
class GeometricCNN(nn.Module):
    """
    1D CNN for LiDAR geometric feature extraction.

    Architecture:
    - 4 Conv1D layers with increasing channels
    - BatchNorm + ReLU after each conv
    - Global Average Pooling
    - Output: 256D feature vector

    Parameters: ~350K (matching pipeline specification)
    """
    def __init__(self, input_dim=360, output_dim=256):
        super(GeometricCNN, self).__init__()

        # Layer 1: 1 → 64 channels
        self.conv1 = nn.Conv1d(1, 64, kernel_size=5, padding=2)
        self.bn1 = nn.BatchNorm1d(64)

        # Layer 2: 64 → 128 channels
        self.conv2 = nn.Conv1d(64, 128, kernel_size=5, padding=2)
        self.bn2 = nn.BatchNorm1d(128)

        # Layer 3: 128 → 256 channels
        self.conv3 = nn.Conv1d(128, 256, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm1d(256)

        # Layer 4: 256 → 256 channels
        self.conv4 = nn.Conv1d(256, 256, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm1d(256)

        # Global Average Pooling
        self.gap = nn.AdaptiveAvgPool1d(1)

    def forward(self, x):
        # x shape: (batch, 1, 360)

        # Conv block 1
        x = F.relu(self.bn1(self.conv1(x)))

        # Conv block 2
        x = F.relu(self.bn2(self.conv2(x)))

        # Conv block 3
        x = F.relu(self.bn3(self.conv3(x)))

        # Conv block 4
        x = F.relu(self.bn4(self.conv4(x)))

        # Global Average Pooling
        x = self.gap(x)  # (batch, 256, 1)
        x = x.squeeze(-1)  # (batch, 256)

        return x

print("GeometricCNN class defined")

In [None]:
# Initialize the 1D CNN
print("Initializing Geometric CNN...")

geometric_cnn = GeometricCNN(input_dim=360, output_dim=256)
geometric_cnn = geometric_cnn.to(device)
geometric_cnn.eval()  # Evaluation mode

# Count parameters
total_params = sum(p.numel() for p in geometric_cnn.parameters())
print(f"✓ Geometric CNN initialized on {device}")
print(f"Total parameters: {total_params:,}")
print(f"Feature dimension: 256D")
print(f"\n⚠️  Using random initialization (no pretrained weights)")

In [None]:
# Test the network with a sample scan
print("Testing network with sample scan...")

# Load first scan
sample_scan = pd.read_csv(
    os.path.join(LIDAR_INPUT_DIR, lidar_metadata.iloc[0]['filename']),
    header=None
).values[0]

print(f"Sample scan shape: {sample_scan.shape}")
print(f"Value range: [{sample_scan.min():.4f}, {sample_scan.max():.4f}]")

# Convert to tensor and add batch/channel dimensions
sample_tensor = torch.tensor(sample_scan, dtype=torch.float32).unsqueeze(0).unsqueeze(0).to(device)
print(f"Tensor shape: {sample_tensor.shape}")

# Extract features
with torch.no_grad():
    sample_features = geometric_cnn(sample_tensor)
    sample_features_norm = F.normalize(sample_features, p=2, dim=1)

print(f"\nOutput features shape: {sample_features.shape}")
print(f"Feature norm (before L2): {torch.norm(sample_features, dim=1).item():.4f}")
print(f"Feature norm (after L2): {torch.norm(sample_features_norm, dim=1).item():.4f}")
print("✓ Network test successful!")

In [None]:
# Extract LiDAR features
print("Extracting LiDAR features...")

lidar_features_list = []
lidar_timestamps_list = []
lidar_filenames_list = []

with torch.no_grad():
    for idx in tqdm(range(0, len(lidar_metadata), LIDAR_BATCH_SIZE), desc="LiDAR batches"):
        batch_meta = lidar_metadata.iloc[idx:idx+LIDAR_BATCH_SIZE]

        # Load batch of scans
        batch_scans = []
        batch_timestamps = []
        batch_filenames = []

        for _, row in batch_meta.iterrows():
            scan_path = os.path.join(LIDAR_INPUT_DIR, row['filename'])

            try:
                # Load scan (single row CSV)
                scan = pd.read_csv(scan_path, header=None).values[0]
                batch_scans.append(torch.tensor(scan, dtype=torch.float32))
                batch_timestamps.append(row['timestamp'])
                batch_filenames.append(row['filename'])
            except Exception as e:
                print(f"Error loading {scan_path}: {e}")
                continue

        if len(batch_scans) == 0:
            continue

        # Stack into batch tensor (batch, 1, 360)
        batch_tensor = torch.stack(batch_scans).unsqueeze(1).to(device)

        # Extract features
        features = geometric_cnn(batch_tensor)

        # L2 normalize features
        features = F.normalize(features, p=2, dim=1)

        # Move to CPU and store
        lidar_features_list.append(features.cpu().numpy())
        lidar_timestamps_list.extend(batch_timestamps)
        lidar_filenames_list.extend(batch_filenames)

# Concatenate all batches
lidar_features = np.vstack(lidar_features_list)
lidar_timestamps = np.array(lidar_timestamps_list)
lidar_filenames = np.array(lidar_filenames_list)

print(f"\n✓ Extracted LiDAR features: {lidar_features.shape}")
print(f"Feature dimension: {lidar_features.shape[1]}D")
print(f"Number of scans: {lidar_features.shape[0]}")

In [None]:
# Verify feature properties
print("LiDAR Feature Statistics:")
print(f"  Shape: {lidar_features.shape}")
print(f"  Mean: {lidar_features.mean():.4f}")
print(f"  Std: {lidar_features.std():.4f}")
print(f"  Min: {lidar_features.min():.4f}")
print(f"  Max: {lidar_features.max():.4f}")

# Check L2 normalization
norms = np.linalg.norm(lidar_features, axis=1)
print(f"  L2 norms: mean={norms.mean():.4f}, std={norms.std():.6f}")
print(f"  (should be ~1.0 after L2 normalization)")

## Section 3: Save Features to HDF5

### Purpose
Store extracted features in HDF5 format for efficient access and sensor fusion. HDF5 provides fast random access, compression, and hierarchical organization.

### Output Structure
```
features.h5
├── camera/
│   ├── features [N_cam, 1280]     # Camera feature vectors
│   ├── timestamps [N_cam]         # ROS timestamps (float)
│   └── filenames [N_cam]          # Source image filenames
├── lidar/
│   ├── features [N_lid, 256]      # LiDAR feature vectors
│   ├── timestamps [N_lid]         # ROS timestamps (float)
│   └── filenames [N_lid]          # Source scan filenames
└── metadata (attributes)
    ├── creation_date
    ├── camera_model
    ├── lidar_model
    ├── camera_feature_dim
    ├── lidar_feature_dim
    └── ...
```

### What This Section Does
1. Create HDF5 file with hierarchical structure
2. Save camera features (1280D × N_cam)
3. Save LiDAR features (256D × N_lid)
4. Store timestamps for temporal alignment
5. Store filenames for traceability
6. Add comprehensive metadata as attributes
7. Generate summary JSON file

### Usage Example (Next Stage)
```python
import h5py

# Load features
with h5py.File('features.h5', 'r') as f:
    cam_features = f['camera/features'][:]  # (N, 1280)
    cam_timestamps = f['camera/timestamps'][:]
    
    lid_features = f['lidar/features'][:]   # (N, 256)
    lid_timestamps = f['lidar/timestamps'][:]
    
    # Access metadata
    camera_dim = f['camera'].attrs['feature_dim']
```

### Important Notes
- **No temporal alignment yet**: Camera and LiDAR features saved separately
- **Timestamps preserved**: Use for alignment in fusion stage
- **L2 normalized**: Features ready for cosine similarity
- **Traceability**: Filenames link back to original data

In [None]:
# Configuration
OUTPUT_FILE = "features.h5"
COMPRESSION = "gzip"  # Use gzip compression

print(f"Output file: {OUTPUT_FILE}")
print(f"Compression: {COMPRESSION}")

In [None]:
# Create HDF5 file and save features
print("Creating HDF5 file...")

with h5py.File(OUTPUT_FILE, 'w') as f:
    # Create camera group
    camera_group = f.create_group('camera')
    camera_group.create_dataset('features', data=camera_features, compression=COMPRESSION)
    camera_group.create_dataset('timestamps', data=camera_timestamps, compression=COMPRESSION)
    camera_group.create_dataset('filenames', data=camera_filenames.astype('S'))

    # Add camera metadata
    camera_group.attrs['feature_dim'] = camera_features.shape[1]
    camera_group.attrs['num_samples'] = camera_features.shape[0]
    camera_group.attrs['model'] = 'MobileNet V2'
    camera_group.attrs['pretrained'] = 'ImageNet'
    camera_group.attrs['normalization'] = 'L2'
    camera_group.attrs['input_size'] = '224x224x3'

    # Create lidar group
    lidar_group = f.create_group('lidar')
    lidar_group.create_dataset('features', data=lidar_features, compression=COMPRESSION)
    lidar_group.create_dataset('timestamps', data=lidar_timestamps, compression=COMPRESSION)
    lidar_group.create_dataset('filenames', data=lidar_filenames.astype('S'))

    # Add lidar metadata
    lidar_group.attrs['feature_dim'] = lidar_features.shape[1]
    lidar_group.attrs['num_samples'] = lidar_features.shape[0]
    lidar_group.attrs['model'] = '1D CNN (4 Conv1D + GAP)'
    lidar_group.attrs['pretrained'] = 'None (random init)'
    lidar_group.attrs['normalization'] = 'L2'
    lidar_group.attrs['input_size'] = '360'
    lidar_group.attrs['parameters'] = f'{total_params:,}'

    # Add global metadata
    f.attrs['creation_date'] = datetime.now().isoformat()
    f.attrs['camera_input_dir'] = CAMERA_INPUT_DIR
    f.attrs['lidar_input_dir'] = LIDAR_INPUT_DIR
    f.attrs['device'] = str(device)
    f.attrs['camera_batch_size'] = CAMERA_BATCH_SIZE
    f.attrs['lidar_batch_size'] = LIDAR_BATCH_SIZE
    f.attrs['temporal_alignment'] = 'Not performed - features extracted independently'

print(f"✓ Features saved to {OUTPUT_FILE}")

In [None]:
# Verify HDF5 file
print("\nVerifying HDF5 file...")

with h5py.File(OUTPUT_FILE, 'r') as f:
    print(f"\nGroups: {list(f.keys())}")

    print(f"\nCamera:")
    print(f"  features: {f['camera/features'].shape}")
    print(f"  timestamps: {f['camera/timestamps'].shape}")
    print(f"  filenames: {f['camera/filenames'].shape}")
    print(f"  feature_dim: {f['camera'].attrs['feature_dim']}")

    print(f"\nLiDAR:")
    print(f"  features: {f['lidar/features'].shape}")
    print(f"  timestamps: {f['lidar/timestamps'].shape}")
    print(f"  filenames: {f['lidar/filenames'].shape}")
    print(f"  feature_dim: {f['lidar'].attrs['feature_dim']}")

    print(f"\nGlobal attributes:")
    for key, value in f.attrs.items():
        print(f"  {key}: {value}")

# Get file size
file_size_mb = os.path.getsize(OUTPUT_FILE) / (1024 * 1024)
print(f"\nFile size: {file_size_mb:.2f} MB")

In [None]:
# Create summary JSON
summary = {
    "feature_extraction_summary": {
        "creation_date": datetime.now().isoformat(),
        "output_file": OUTPUT_FILE,
        "file_size_mb": round(file_size_mb, 2)
    },
    "camera": {
        "model": "MobileNet V2",
        "pretrained": "ImageNet",
        "feature_dim": int(camera_features.shape[1]),
        "num_samples": int(camera_features.shape[0]),
        "input_size": "224x224x3",
        "preprocessing": "Scale to [-1,1]",
        "normalization": "L2",
        "batch_size": CAMERA_BATCH_SIZE
    },
    "lidar": {
        "model": "1D CNN (4 Conv1D + GAP)",
        "pretrained": "None (random initialization)",
        "feature_dim": int(lidar_features.shape[1]),
        "num_samples": int(lidar_features.shape[0]),
        "parameters": total_params,
        "input_size": "360",
        "preprocessing": "Normalized to [0,1]",
        "normalization": "L2",
        "batch_size": LIDAR_BATCH_SIZE
    },
    "notes": {
        "temporal_alignment": "Not performed - features extracted independently",
        "next_steps": [
            "Temporal alignment using timestamps",
            "Sensor fusion (concatenate or attention-based)",
            "Loop closure detection training",
            "Fine-tune networks on place recognition task"
        ],
        "warnings": [
            "LiDAR CNN uses random initialization - train before production use",
            "Features from different modalities have different timestamps"
        ]
    }
}

summary_file = "feature_extraction_summary.json"
with open(summary_file, 'w') as f:
    json.dump(summary, f, indent=2)

print(f"✓ Summary saved to {summary_file}")

In [None]:
# Display final summary
print("\n" + "="*70)
print("FEATURE EXTRACTION COMPLETE")
print("="*70)
print(f"\nOutput file: {OUTPUT_FILE}")
print(f"File size: {file_size_mb:.2f} MB")
print(f"\nCamera Features:")
print(f"  Model: MobileNet V2 (pretrained on ImageNet)")
print(f"  Dimension: {camera_features.shape[1]}D")
print(f"  Samples: {camera_features.shape[0]:,}")
print(f"  Normalization: L2")
print(f"\nLiDAR Features:")
print(f"  Model: 1D CNN ({total_params:,} parameters)")
print(f"  Dimension: {lidar_features.shape[1]}D")
print(f"  Samples: {lidar_features.shape[0]:,}")
print(f"  Normalization: L2")
print(f"  ⚠️  Random initialization (train before production use)")
print(f"\nNext Steps:")
print(f"  1. Temporal alignment using timestamps")
print(f"  2. Sensor fusion pipeline")
print(f"  3. Loop closure detection")
print(f"  4. Network fine-tuning")
print("\n" + "="*70)

## Visualization & Analysis

In [None]:
# Visualize feature distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Camera feature distribution
axes[0, 0].hist(camera_features.flatten(), bins=50, alpha=0.7, edgecolor='black')
axes[0, 0].set_xlabel('Feature Value')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Camera Feature Distribution (1280D × N)')
axes[0, 0].grid(alpha=0.3)

# LiDAR feature distribution
axes[0, 1].hist(lidar_features.flatten(), bins=50, alpha=0.7, edgecolor='black', color='orange')
axes[0, 1].set_xlabel('Feature Value')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('LiDAR Feature Distribution (256D × N)')
axes[0, 1].grid(alpha=0.3)

# Camera L2 norms
cam_norms = np.linalg.norm(camera_features, axis=1)
axes[1, 0].hist(cam_norms, bins=30, alpha=0.7, edgecolor='black')
axes[1, 0].axvline(1.0, color='r', linestyle='--', label='Expected (1.0)')
axes[1, 0].set_xlabel('L2 Norm')
axes[1, 0].set_ylabel('Count')
axes[1, 0].set_title('Camera Feature L2 Norms')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# LiDAR L2 norms
lid_norms = np.linalg.norm(lidar_features, axis=1)
axes[1, 1].hist(lid_norms, bins=30, alpha=0.7, edgecolor='black', color='orange')
axes[1, 1].axvline(1.0, color='r', linestyle='--', label='Expected (1.0)')
axes[1, 1].set_xlabel('L2 Norm')
axes[1, 1].set_ylabel('Count')
axes[1, 1].set_title('LiDAR Feature L2 Norms')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

plt.tight_layout()
plt.savefig('feature_distributions.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Visualization saved as 'feature_distributions.png'")

In [None]:
# Analyze temporal coverage
print("\nTemporal Coverage Analysis:")
print(f"\nCamera:")
print(f"  Time range: {camera_timestamps.min():.2f} - {camera_timestamps.max():.2f} sec")
print(f"  Duration: {camera_timestamps.max() - camera_timestamps.min():.2f} sec")
print(f"  Average interval: {np.mean(np.diff(camera_timestamps)):.4f} sec")

print(f"\nLiDAR:")
print(f"  Time range: {lidar_timestamps.min():.2f} - {lidar_timestamps.max():.2f} sec")
print(f"  Duration: {lidar_timestamps.max() - lidar_timestamps.min():.2f} sec")
print(f"  Average interval: {np.mean(np.diff(lidar_timestamps)):.4f} sec")

# Plot timeline
fig, ax = plt.subplots(1, 1, figsize=(14, 4))
ax.plot(camera_timestamps, np.ones_like(camera_timestamps), '|', markersize=10, label='Camera')
ax.plot(lidar_timestamps, np.ones_like(lidar_timestamps) * 1.1, '|', markersize=10, label='LiDAR', color='orange')
ax.set_xlabel('Time (seconds)', fontsize=12)
ax.set_yticks([])
ax.set_title('Temporal Distribution of Features', fontsize=14)
ax.legend()
ax.grid(alpha=0.3, axis='x')
plt.tight_layout()
plt.savefig('temporal_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

print("✓ Timeline saved as 'temporal_distribution.png'")

### Output Files
- `features.h5` - Main feature database (HDF5)
- `feature_extraction_summary.json` - Processing metadata
- `feature_distributions.png` - Feature statistics visualization
- `temporal_distribution.png` - Timeline visualization

### Next Steps
1. **Temporal Alignment**: Match camera and LiDAR features by timestamp
2. **Sensor Fusion**: Combine modalities (concatenation, attention, etc.)
3. **Loop Closure Detection**: Train classifier on fused features
4. **Network Fine-tuning**: Train 1D CNN and fine-tune MobileNet V2

### Important Reminders
⚠️ LiDAR CNN uses random initialization - train on loop closure task before production  
⚠️ Features extracted independently - temporal alignment needed for fusion  
⚠️ All features are L2 normalized - use cosine similarity for comparison  