# Axion-Sat Complete Pipeline (Colab)

This notebook:
1. Downloads BigEarthNet v2.0 data directly to Colab
2. Converts to paired tiles (same as your local process)
3. Runs Stage 1 precompute
4. Saves outputs to Google Drive

**Total time:** ~20-25 hours (download ~2h + convert ~3h + precompute ~15h)

## 1. Setup Environment

In [None]:
# Mount Google Drive (for saving outputs)
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Check GPU
!nvidia-smi

In [None]:
# Clone your repository
!git clone https://github.com/YOUR_USERNAME/Axion-Sat.git /content/Axion-Sat
%cd /content/Axion-Sat

In [None]:
# Install dependencies
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q terratorch albumentations tqdm numpy pillow scipy timm transformers diffusers rasterio

## 2. Download BigEarthNet v2.0 Data

Downloads directly to Colab (fast!)

In [None]:
# Create data directories
!mkdir -p /content/data/raw
!mkdir -p /content/data/tiles

In [None]:
# Download Sentinel-1 data
print("Downloading Sentinel-1 data...")
!wget -O /content/data/raw/BigEarthNet-S1-v2.0.tar.gz \
  "https://bigearth.net/downloads/BigEarthNet-S1-v2.0.tar.gz"

print("\nExtracting Sentinel-1...")
!tar -xzf /content/data/raw/BigEarthNet-S1-v2.0.tar.gz -C /content/data/raw/
!rm /content/data/raw/BigEarthNet-S1-v2.0.tar.gz

In [None]:
# Download Sentinel-2 data
print("Downloading Sentinel-2 data...")
!wget -O /content/data/raw/BigEarthNet-S2-v2.0.tar.gz \
  "https://bigearth.net/downloads/BigEarthNet-S2-v2.0.tar.gz"

print("\nExtracting Sentinel-2...")
!tar -xzf /content/data/raw/BigEarthNet-S2-v2.0.tar.gz -C /content/data/raw/
!rm /content/data/raw/BigEarthNet-S2-v2.0.tar.gz

In [None]:
# Verify downloads
!ls -lh /content/data/raw/

## 3. Convert to Paired Tiles

Same process as local - creates the benv2_catalog tiles

In [None]:
# Run tile conversion (same as your local build_tiles.py)
!python scripts/build_tiles.py \
  --s1-dir /content/data/raw/BigEarthNet-S1-v2.0 \
  --s2-dir /content/data/raw/BigEarthNet-S2-v2.0 \
  --output-dir /content/data/tiles/benv2_catalog \
  --num-workers 4

In [None]:
# Check how many tiles were created
import os
tiles = [f for f in os.listdir('/content/data/tiles/benv2_catalog') if f.endswith('.npz')]
print(f"✓ Created {len(tiles)} paired tiles")

## 4. Run Stage 1 Precompute

Now run the precompute with optimal settings for T4 GPU

In [None]:
# Create output directory in Google Drive
!mkdir -p /content/drive/MyDrive/Axion-Sat-Outputs/stage1_precompute

In [None]:
# Run Stage 1 precompute
!python scripts/00_precompute_stage1_fast.py \
  --data-dir /content/data/tiles/benv2_catalog \
  --output-dir /content/drive/MyDrive/Axion-Sat-Outputs/stage1_precompute \
  --batch-size 64 \
  --timesteps 3 \
  --device cuda

## 5. Verify Outputs

In [None]:
# Check outputs
output_dir = '/content/drive/MyDrive/Axion-Sat-Outputs/stage1_precompute'
outputs = [f for f in os.listdir(output_dir) if f.endswith('.npz')]

print(f"\n{'='*80}")
print(f"✓ Stage 1 Precompute Complete!")
print(f"{'='*80}")
print(f"Generated: {len(outputs)} output files")
print(f"Location:  {output_dir}")
print(f"\nFirst 5 outputs:")
for f in outputs[:5]:
    print(f"  - {f}")

## 6. Clean Up (Optional)

Free up space by removing raw data (outputs are in Google Drive)

In [None]:
# Remove raw data to free space (optional)
# Uncomment if you want to clean up:
# !rm -rf /content/data/raw
# !rm -rf /content/data/tiles
# print("✓ Cleaned up temporary files")

## Done! 🎉

Your Stage 1 outputs are now in Google Drive:
`MyDrive/Axion-Sat-Outputs/stage1_precompute/`

### Next Steps:
1. Download outputs to your local D: drive
2. Run Stage 2 training (can also do on Colab!)
3. Evaluate your GAC model