# KonkaniVani ASR Training - Simple Colab Setup
## Resume from Checkpoint 15

**Prerequisites**: Upload your entire project folder to Colab or extract from zip

**Configuration:**
- Model: d_model=256, 12 encoder, 6 decoder layers
- Batch size: 2 (gradient accumulation 4x)
- Mixed precision: FP16
- GPU: Tesla T4 (14GB)

---

## 1. Check GPU

In [None]:
!nvidia-smi

## 2. Upload Project Files

Choose ONE option below:

In [None]:
# OPTION A: Upload ZIP file and extract
from google.colab import files
import zipfile

print("üì§ Upload your konkani_project.zip file...")
uploaded = files.upload()

# Extract
for filename in uploaded.keys():
    print(f"\nüì¶ Extracting {filename}...")
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall('/content/')
    print("‚úÖ Extracted!")

!ls -la /content/

In [None]:
# OPTION B: Mount Drive and copy
from google.colab import drive
drive.mount('/content/drive')

# Update this path to your Drive location
DRIVE_PATH = "/content/drive/MyDrive/konkani_project.zip"

!cp {DRIVE_PATH} /content/
!unzip -q /content/konkani_project.zip -d /content/
!ls -la /content/

## 3. Navigate to Project Directory

In [None]:
import os

# Find the project directory
# Update this if your folder has a different name
possible_dirs = [
    '/content/konkani',
    '/content/konkani_project',
    '/content',
]

project_dir = None
for dir_path in possible_dirs:
    if os.path.exists(f"{dir_path}/training_scripts/train_konkanivani_asr.py"):
        project_dir = dir_path
        break

if project_dir:
    print(f"‚úÖ Found project at: {project_dir}")
    %cd {project_dir}
else:
    print("‚ùå Project not found. Please check the extracted folder name.")
    print("\nAvailable directories:")
    !ls -la /content/
    
!pwd

## 4. Install Dependencies

In [None]:
!pip install -q torch torchaudio librosa soundfile tensorboard tqdm pyyaml

## 5. Verify Project Structure

In [None]:
import os

required_files = [
    'training_scripts/train_konkanivani_asr.py',
    'models/konkanivani_asr.py',
    'data/audio_processing/dataset.py',
    'data/audio_processing/text_tokenizer.py',
    'data/vocab.json',
    'data/konkani-asr-v0/splits/manifests/train.json',
    'data/konkani-asr-v0/splits/manifests/val.json',
    'archives/checkpoint_epoch_15.pt'
]

print("Checking required files...\n")
all_good = True
for file in required_files:
    exists = os.path.exists(file)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"{status} {file}")
    if not exists:
        all_good = False

print("\n" + "="*60)
if all_good:
    print("‚úÖ All required files found! Ready to train.")
else:
    print("‚ùå Some files are missing.")
    print("\nMake sure your zip/folder contains:")
    print("  - training_scripts/")
    print("  - models/")
    print("  - data/")
    print("  - archives/checkpoint_epoch_15.pt")
print("="*60)

## 6. Prepare Checkpoint

In [None]:
!mkdir -p checkpoints
!cp archives/checkpoint_epoch_15.pt checkpoints/
!ls -lh checkpoints/

## 7. Verify Checkpoint

In [None]:
import torch
import json

checkpoint = torch.load('checkpoints/checkpoint_epoch_15.pt', map_location='cpu')

print("üìã Checkpoint Configuration:")
print("="*60)
print(json.dumps(checkpoint.get('config', {}), indent=2))

print("\nüìä Model Architecture:")
print("="*60)
state = checkpoint['model_state_dict']
encoder_layers = sum(1 for k in state.keys() if 'encoder.layers.' in k and '.ff1.0.weight' in k)
decoder_layers = sum(1 for k in state.keys() if 'decoder.decoder.layers.' in k and '.linear1.weight' in k)
d_model = state['encoder.input_proj.weight'].shape[0]
vocab_size = state['ctc_head.weight'].shape[0]

print(f"Encoder layers: {encoder_layers}")
print(f"Decoder layers: {decoder_layers}")
print(f"d_model: {d_model}")
print(f"vocab_size: {vocab_size}")
print(f"Epoch: {checkpoint['epoch']}")
print(f"Val loss: {checkpoint.get('val_loss', 'N/A')}")

del checkpoint
torch.cuda.empty_cache()

print("\n‚úÖ Checkpoint verified!")

## 8. Setup Environment

In [None]:
import os
import torch
import gc

# Set environment variables
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
os.environ['CUDA_LAUNCH_BLOCKING'] = '0'

# Clear GPU memory
gc.collect()
torch.cuda.empty_cache()

if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Total memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"   Allocated: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
    print(f"   Cached: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")
    print("\n‚úÖ Ready to train!")
else:
    print("‚ö†Ô∏è CUDA not available!")

## 9. üöÄ START TRAINING

### Configuration:
- **Batch size**: 2 (reduced from 8 to fit in 14GB GPU)
- **Gradient accumulation**: 4 steps (effective batch = 8)
- **Mixed precision**: FP16 (saves ~50% memory)
- **Model**: d_model=256, 12 encoder, 6 decoder layers
- **Resume from**: Epoch 15
- **Target**: Epochs 16-50

**Expected time**: ~8-12 hours

In [None]:
!python3 training_scripts/train_konkanivani_asr.py \
    --train_manifest data/konkani-asr-v0/splits/manifests/train.json \
    --val_manifest data/konkani-asr-v0/splits/manifests/val.json \
    --vocab_file data/vocab.json \
    --batch_size 2 \
    --gradient_accumulation_steps 4 \
    --num_epochs 50 \
    --learning_rate 0.0005 \
    --device cuda \
    --d_model 256 \
    --encoder_layers 12 \
    --decoder_layers 6 \
    --mixed_precision \
    --checkpoint_dir checkpoints \
    --log_dir logs \
    --resume checkpoints/checkpoint_epoch_15.pt

## 10. Monitor GPU (Run While Training)

In [None]:
!nvidia-smi

## 11. View TensorBoard

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs

## 12. Backup to Drive (Optional)

In [None]:
from google.colab import drive
import time

# Mount drive if not already mounted
if not os.path.exists('/content/drive'):
    drive.mount('/content/drive')

BACKUP_PATH = "/content/drive/MyDrive/konkanivani_backup"

print(f"üì§ Backing up to: {BACKUP_PATH}")
print(f"   Time: {time.strftime('%Y-%m-%d %H:%M:%S')}\n")

!mkdir -p {BACKUP_PATH}
!cp -r checkpoints {BACKUP_PATH}/
!cp -r logs {BACKUP_PATH}/

print("\n‚úÖ Backup completed!")
!ls -lh {BACKUP_PATH}/checkpoints/

## 13. Download Best Model

In [None]:
from google.colab import files
from pathlib import Path

!ls -lh checkpoints/

if Path('checkpoints/best_model.pt').exists():
    print("\nüì• Downloading best_model.pt...")
    files.download('checkpoints/best_model.pt')
    print("‚úÖ Downloaded!")
else:
    print("‚ö†Ô∏è best_model.pt not found yet")

## 14. If Out of Memory - Use This

In [None]:
# Clear memory
import torch
import gc
gc.collect()
torch.cuda.empty_cache()

# Run with batch_size=1
!python3 training_scripts/train_konkanivani_asr.py \
    --train_manifest data/konkani-asr-v0/splits/manifests/train.json \
    --val_manifest data/konkani-asr-v0/splits/manifests/val.json \
    --vocab_file data/vocab.json \
    --batch_size 1 \
    --gradient_accumulation_steps 8 \
    --num_epochs 50 \
    --learning_rate 0.0005 \
    --device cuda \
    --d_model 256 \
    --encoder_layers 12 \
    --decoder_layers 6 \
    --mixed_precision \
    --checkpoint_dir checkpoints \
    --log_dir logs \
    --resume checkpoints/checkpoint_epoch_15.pt

---

## Quick Reference

### Check GPU
```python
!nvidia-smi
```

### Clear Memory
```python
import torch
torch.cuda.empty_cache()
```

### List Checkpoints
```python
!ls -lh checkpoints/
```

### Resume from Different Checkpoint
```python
# Change --resume to:
--resume checkpoints/checkpoint_epoch_20.pt
```

---