# KonkaniVani ASR Training - From Google Drive
## Resume from Checkpoint 15

**Drive Folder**: https://drive.google.com/drive/folders/1-chxczmcNooqLDtsFgQ8ZT8NvzFuFARr

**Configuration:**
- Model: d_model=256, 12 encoder, 6 decoder layers
- Batch size: 2 (gradient accumulation 4x = effective batch 8)
- Mixed precision: FP16
- GPU: Tesla T4 (14GB)

---

## 1. Check GPU

In [None]:
!nvidia-smi

## 2. Install Dependencies

In [None]:
!pip install -q torch torchaudio librosa soundfile tensorboard tqdm pyyaml

## 3. Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 4. Locate Your Project Files

**IMPORTANT**: Update the path below to match where your files are in the shared Drive folder

In [None]:
import os
from pathlib import Path

# List files in the shared folder to find the correct path
print("üìÇ Checking Drive structure...\n")

# Try common locations
possible_paths = [
    "/content/drive/MyDrive/konkani",
    "/content/drive/Shareddrives/*/konkani",
    "/content/drive/MyDrive",
]

# List what's in MyDrive
!ls -la /content/drive/MyDrive/ | head -20

print("\n" + "="*60)
print("üëÜ Look for your project folder name above")
print("="*60)

## 5. Set Project Path and Copy to Colab

**IMPORTANT**: First upload your project to Google Drive!

### Option A: Upload as ZIP
1. On your local machine: `zip -r konkani_project.zip .`
2. Upload to Google Drive
3. Update path below

### Option B: Upload folder directly
1. Upload entire `konkani` folder to Drive
2. Update path below

In [None]:
import os
from pathlib import Path

# ========================================
# OPTION A: Extract from ZIP file
# ========================================
USE_ZIP = True  # Set to True if using zip file
ZIP_PATH = "/content/drive/MyDrive/konkani_project.zip"  # Update this

# ========================================
# OPTION B: Copy from folder
# ========================================
DRIVE_FOLDER_PATH = "/content/drive/MyDrive/konkani"  # Update this

# ========================================

%cd /content

if USE_ZIP:
    print(f"üì¶ Extracting from: {ZIP_PATH}")
    if Path(ZIP_PATH).exists():
        !unzip -q {ZIP_PATH} -d /content/
        # Find the extracted folder
        !ls -la /content/
        print("\n‚úÖ Extracted! Check folder name above and update next cell if needed.")
    else:
        print(f"‚ùå ZIP file not found at: {ZIP_PATH}")
        print("\nüìù Please:")
        print("   1. Create zip: zip -r konkani_project.zip .")
        print("   2. Upload to Google Drive")
        print("   3. Update ZIP_PATH above")
else:
    print(f"üìã Copying from: {DRIVE_FOLDER_PATH}")
    if Path(DRIVE_FOLDER_PATH).exists():
        !cp -r {DRIVE_FOLDER_PATH} /content/konkani
        print("‚úÖ Copied to /content/konkani")
    else:
        print(f"‚ùå Folder not found at: {DRIVE_FOLDER_PATH}")
        print("\nüìù Please upload your project folder to Google Drive")

## 5b. Navigate to Project Directory

In [None]:
# Update this if your extracted folder has a different name
PROJECT_DIR = "/content/konkani"  # or "/content/konkani_project" etc.

%cd {PROJECT_DIR}
!pwd
!ls -la

## 6. Verify Required Files

In [None]:
import os

required_files = [
    'training_scripts/train_konkanivani_asr.py',
    'models/konkanivani_asr.py',
    'data/audio_processing/dataset.py',
    'data/audio_processing/text_tokenizer.py',
    'data/vocab.json',
    'data/konkani-asr-v0/splits/manifests/train.json',
    'data/konkani-asr-v0/splits/manifests/val.json',
    'archives/checkpoint_epoch_15.pt'
]

print("Checking required files...\n")
all_good = True
for file in required_files:
    exists = os.path.exists(file)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"{status} {file}")
    if not exists:
        all_good = False

print("\n" + "="*60)
if all_good:
    print("‚úÖ All required files found! Ready to train.")
else:
    print("‚ùå Some files are missing. Please check your Drive folder.")
print("="*60)

## 7. Prepare Checkpoint

In [None]:
!mkdir -p checkpoints
!cp archives/checkpoint_epoch_15.pt checkpoints/
!ls -lh checkpoints/

## 8. Verify Checkpoint Configuration

In [None]:
import torch
import json

checkpoint = torch.load('checkpoints/checkpoint_epoch_15.pt', map_location='cpu')

print("üìã Checkpoint Configuration:")
print("="*60)
print(json.dumps(checkpoint.get('config', {}), indent=2))

print("\nüìä Model Architecture:")
print("="*60)
state = checkpoint['model_state_dict']
encoder_layers = sum(1 for k in state.keys() if 'encoder.layers.' in k and '.ff1.0.weight' in k)
decoder_layers = sum(1 for k in state.keys() if 'decoder.decoder.layers.' in k and '.linear1.weight' in k)
d_model = state['encoder.input_proj.weight'].shape[0]
vocab_size = state['ctc_head.weight'].shape[0]

print(f"Encoder layers: {encoder_layers}")
print(f"Decoder layers: {decoder_layers}")
print(f"d_model: {d_model}")
print(f"vocab_size: {vocab_size}")
print(f"Epoch: {checkpoint['epoch']}")
print(f"Val loss: {checkpoint.get('val_loss', 'N/A')}")

del checkpoint
torch.cuda.empty_cache()

print("\n‚úÖ Checkpoint verified!")

## 9. Set Environment Variables

In [None]:
import os

os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
os.environ['CUDA_LAUNCH_BLOCKING'] = '0'

print("‚úÖ Environment variables set for memory optimization")

## 10. Clear GPU Memory

In [None]:
import torch
import gc

gc.collect()
torch.cuda.empty_cache()

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Total memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"Allocated: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
    print(f"Cached: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")
    print("\n‚úÖ Ready to train!")
else:
    print("‚ö†Ô∏è CUDA not available!")

## 11. üöÄ START TRAINING

### Memory-Optimized Configuration:
- **Batch size**: 2 (reduced from 8)
- **Gradient accumulation**: 4 steps (effective batch = 8)
- **Mixed precision**: FP16
- **Model**: d_model=256, 12 encoder, 6 decoder layers
- **Resume from**: Epoch 15

**Expected time**: ~8-12 hours for epochs 16-50

In [None]:
!python3 training_scripts/train_konkanivani_asr.py \
    --train_manifest data/konkani-asr-v0/splits/manifests/train.json \
    --val_manifest data/konkani-asr-v0/splits/manifests/val.json \
    --vocab_file data/vocab.json \
    --batch_size 2 \
    --gradient_accumulation_steps 4 \
    --num_epochs 50 \
    --learning_rate 0.0005 \
    --device cuda \
    --d_model 256 \
    --encoder_layers 12 \
    --decoder_layers 6 \
    --mixed_precision \
    --checkpoint_dir checkpoints \
    --log_dir logs \
    --resume checkpoints/checkpoint_epoch_15.pt

## 12. Monitor GPU (Run in Separate Cell While Training)

In [None]:
!nvidia-smi

## 13. View TensorBoard Logs

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs

## 14. Backup to Google Drive (Run Every Few Epochs)

In [None]:
import shutil
from pathlib import Path
import time

# Set your backup location
BACKUP_PATH = "/content/drive/MyDrive/konkanivani_backup"

print(f"üì§ Backing up to: {BACKUP_PATH}")
print(f"   Time: {time.strftime('%Y-%m-%d %H:%M:%S')}\n")

!mkdir -p {BACKUP_PATH}
!cp -r checkpoints {BACKUP_PATH}/
!cp -r logs {BACKUP_PATH}/

print("\n‚úÖ Backup completed!")
!ls -lh {BACKUP_PATH}/checkpoints/

## 15. If Out of Memory - Use This Instead

In [None]:
# Clear memory first
import torch
import gc
gc.collect()
torch.cuda.empty_cache()

# Run with batch_size=1 (slower but uses less memory)
!python3 training_scripts/train_konkanivani_asr.py \
    --train_manifest data/konkani-asr-v0/splits/manifests/train.json \
    --val_manifest data/konkani-asr-v0/splits/manifests/val.json \
    --vocab_file data/vocab.json \
    --batch_size 1 \
    --gradient_accumulation_steps 8 \
    --num_epochs 50 \
    --learning_rate 0.0005 \
    --device cuda \
    --d_model 256 \
    --encoder_layers 12 \
    --decoder_layers 6 \
    --mixed_precision \
    --checkpoint_dir checkpoints \
    --log_dir logs \
    --resume checkpoints/checkpoint_epoch_15.pt

## 16. Download Best Model

In [None]:
from google.colab import files

# List all checkpoints
!ls -lh checkpoints/

# Download best model
if Path('checkpoints/best_model.pt').exists():
    print("\nüì• Downloading best_model.pt...")
    files.download('checkpoints/best_model.pt')
    print("‚úÖ Downloaded!")
else:
    print("‚ö†Ô∏è best_model.pt not found yet")

---

## Quick Troubleshooting

### Out of Memory
```python
# Run cell 15 instead (batch_size=1)
```

### Files Not Found
```python
# Update DRIVE_PROJECT_PATH in cell 5
# Make sure all files are in your Drive folder
```

### Session Disconnected
```python
# Resume from latest checkpoint
# Change --resume to latest checkpoint_epoch_XX.pt
```

### Slow Training
```python
# Check GPU is being used
!nvidia-smi
# Should show ~80-95% GPU utilization
```

---