# üöÄ SONAR SignHiera Feature Extraction

Extract visual features from How2Sign videos using SONAR pretrained model.

**Requirements**:

- GPU runtime (T4)
- Videos uploaded to Google Drive: `MyDrive/How2Sign_SONAR/videos/`
- Manifests uploaded: `MyDrive/How2Sign_SONAR/manifests/`

**Estimated time**: 8-11 hours for all splits


## ‚úÖ Step 1: Setup Environment


In [None]:
# Install dependencies
!pip install -q torch torchvision opencv-python-headless pillow tqdm pandas

print("‚úÖ Dependencies installed")

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

import os
os.chdir('/content/drive/MyDrive/How2Sign_SONAR')

print("‚úÖ Google Drive mounted")
!pwd

## ‚úÖ Step 2: Download SONAR Model


In [None]:
# Create models directory
!mkdir -p models

# Download SignHiera pretrained model
!wget -q --show-progress https://dl.fbaipublicfiles.com/SONAR/asl/dm_70h_ub_signhiera.pth -O models/dm_70h_ub_signhiera.pth

# Verify download
!ls -lh models/dm_70h_ub_signhiera.pth

print("‚úÖ SONAR model downloaded")

## ‚úÖ Step 3: Upload Extraction Script

**IMPORTANT**: Upload `extract_features_signhiera.py` from your local repo to Colab:

1. Click folder icon (left sidebar)
2. Upload `extract_features_signhiera.py`
3. Or copy-paste code in next cell


In [None]:
# Verify extraction script exists
import os

if os.path.exists("extract_features_signhiera.py"):
    print("‚úÖ Extraction script found")
else:
    print("‚ùå Please upload extract_features_signhiera.py")
    print("   Use Files tab (left) ‚Üí Upload button")

## ‚úÖ Step 4: Extract TRAIN Features

‚è±Ô∏è Estimated time: 3-4 hours (2147 videos)


In [None]:
# Extract train features
!python extract_features_signhiera.py \
    --manifest manifests/train.tsv \
    --video_dir videos/train \
    --model_path models/dm_70h_ub_signhiera.pth \
    --output_dir features/train \
    --batch_size 8 \
    --device cuda

print("‚úÖ Train features extracted")

## ‚úÖ Step 5: Extract VAL Features

‚è±Ô∏è Estimated time: 2-3 hours (1739 videos)


In [None]:
# Extract val features
!python extract_features_signhiera.py \
    --manifest manifests/val.tsv \
    --video_dir videos/val \
    --model_path models/dm_70h_ub_signhiera.pth \
    --output_dir features/val \
    --batch_size 8 \
    --device cuda

print("‚úÖ Val features extracted")

## ‚úÖ Step 6: Extract TEST Features

‚è±Ô∏è Estimated time: 3-4 hours (2343 videos)


In [None]:
# Extract test features
!python extract_features_signhiera.py \
    --manifest manifests/test.tsv \
    --video_dir videos/test \
    --model_path models/dm_70h_ub_signhiera.pth \
    --output_dir features/test \
    --batch_size 8 \
    --device cuda

print("‚úÖ Test features extracted")

## ‚úÖ Step 7: Verify Output


In [None]:
# Check extracted features
print("üìä TRAIN features:")
!ls -lh features/train/ | head -10

print("\nüìä VAL features:")
!ls -lh features/val/ | head -10

print("\nüìä TEST features:")
!ls -lh features/test/ | head -10

# Count files
import os
train_count = len([f for f in os.listdir('features/train') if f.endswith('.npy')])
val_count = len([f for f in os.listdir('features/val') if f.endswith('.npy')])
test_count = len([f for f in os.listdir('features/test') if f.endswith('.npy')])

print(f"\n‚úÖ SUMMARY:")
print(f"   Train: {train_count} features")
print(f"   Val:   {val_count} features")
print(f"   Test:  {test_count} features")
print(f"\nüìç Features saved to: /content/drive/MyDrive/How2Sign_SONAR/features/")
print(f"\nüöÄ Next: Download features folder to your local machine!")

## üéâ Done!

Features are now saved to your Google Drive:

- `MyDrive/How2Sign_SONAR/features/train/`
- `MyDrive/How2Sign_SONAR/features/val/`
- `MyDrive/How2Sign_SONAR/features/test/`

**Next steps**:

1. Download features folder to local machine
2. Run fine-tuning locally: `python finetune_sonar_how2sign.py`
3. Evaluate model: `python evaluate_how2sign.py`
