# üéôÔ∏è KonkaniVani ASR Training - Shared Folder Setup

**Training from shared folder link**

---

## üìã Before You Start:

1. **Add shared folder to your Drive**:
   - Open: https://drive.google.com/drive/folders/1KX7k_z2negFKq3qFjHJh-K1U-MEcNp7P
   - Click "Add shortcut to Drive" (‚≠ê icon)
   - Choose "My Drive"

2. **Enable GPU in Colab**:
   - Runtime ‚Üí Change runtime type ‚Üí GPU (T4) ‚Üí Save

3. **Run cells in order** (1 ‚Üí 8)

---

**Estimated time**: 10 min setup + 12 hours training

---

## üì¶ Cell 1: Install Dependencies

**Time**: ~2 minutes

In [None]:
print("üì¶ Installing dependencies...")
!pip install -q torch torchaudio tensorboard jiwer pyyaml soundfile
print("‚úÖ Dependencies installed!\n")

# Check GPU
print("üîç GPU Check:")
!nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
print("\n‚úÖ If you see 'Tesla T4, 15360 MiB' above, you're ready!")

## üíæ Cell 2: Mount Drive & Find Shared Folder

**Time**: ~1 minute

**Important**: Make sure you added the shared folder shortcut to "My Drive" before running this!

In [None]:
from google.colab import drive
import os
import subprocess

# Mount Drive
drive.mount('/content/drive')

print("\n" + "="*70)
print("üîç ACCESSING SHARED FOLDER")
print("="*70 + "\n")

print("Searching for shared files...\n")

# Search for the files
result = subprocess.run(
    ["find", "/content/drive", "-name", "checkpoint_epoch_15.pt", "-o", "-name", "konkani_project.zip"],
    capture_output=True,
    text=True,
    timeout=30
)

found_files = [f for f in result.stdout.strip().split('\n') if f]

if found_files:
    print("‚úÖ Found files:\n")
    for f in found_files:
        print(f"   {f}")
    
    # Get the folder path
    folder_path = os.path.dirname(found_files[0])
    print(f"\n‚úÖ Folder location: {folder_path}\n")
    
    # Verify all 3 files
    files_to_check = ['checkpoint_epoch_15.pt', 'konkani_project.zip', 'vocab.json']
    print("üìã Verifying files:\n")
    
    all_found = True
    for filename in files_to_check:
        filepath = os.path.join(folder_path, filename)
        if os.path.exists(filepath):
            size_mb = os.path.getsize(filepath) / (1024*1024)
            print(f"‚úÖ {filename} ({size_mb:.1f} MB)")
        else:
            print(f"‚ùå {filename} - NOT FOUND")
            all_found = False
    
    if all_found:
        # Save the folder path for next cells
        with open('/tmp/folder_path.txt', 'w') as f:
            f.write(folder_path)
        print("\n" + "="*70)
        print("‚úÖ ALL FILES FOUND! Ready to proceed.")
        print("="*70)
    else:
        print("\n‚ö†Ô∏è  Some files missing!")
else:
    print("‚ùå Could not find files!")
    print("\nüîß Troubleshooting:")
    print("1. Did you add the shared folder shortcut to 'My Drive'?")
    print("2. Open this link and click 'Add shortcut to Drive':")
    print("   https://drive.google.com/drive/folders/1KX7k_z2negFKq3qFjHJh-K1U-MEcNp7P")
    print("3. Refresh this page and try again")
    print("\nSearching all of Drive (this may take a moment)...")
    !find /content/drive -name "*.pt" -o -name "konkani_project.zip" 2>/dev/null | head -10

## üìÇ Cell 3: Extract Project

**Time**: ~3 minutes

This extracts the 14GB project zip file.

In [None]:
import os

# Check if Cell 2 found the files
if not os.path.exists('/tmp/folder_path.txt'):
    print("‚ùå ERROR: Cell 2 didn't find the files!\n")
    print("üîß Please do this:")
    print("1. Open: https://drive.google.com/drive/folders/1KX7k_z2negFKq3qFjHJh-K1U-MEcNp7P")
    print("2. Click 'Add shortcut to Drive' (‚≠ê icon at top)")
    print("3. Choose 'My Drive'")
    print("4. Go back and re-run Cell 2")
    print("5. Then come back and run this cell\n")
    print("‚ö†Ô∏è  STOP HERE - Don't run more cells until Cell 2 succeeds!")
    raise SystemExit("Please fix Cell 2 first")

# Get folder path from previous cell
with open('/tmp/folder_path.txt', 'r') as f:
    folder_path = f.read().strip()

print(f"üìÇ Using files from: {folder_path}\n")
print("üì¶ Extracting konkani_project.zip...")
print("   This takes 2-3 minutes...\n")

# Extract
zip_path = os.path.join(folder_path, 'konkani_project.zip')
!unzip -q {zip_path} -d /content/

print("‚úÖ Extraction complete!\n")

# Find project location
print("üîç Locating project files...\n")

possible_paths = [
    '/content/konkani',
    '/content/konkani_project',
    '/content'
]

project_path = None
for path in possible_paths:
    if os.path.exists(f"{path}/training_scripts/train_konkanivani_asr.py"):
        project_path = path
        break

if project_path:
    print(f"‚úÖ Project found at: {project_path}")
    os.chdir(project_path)
    print(f"‚úÖ Working directory: {os.getcwd()}\n")
    
    # Verify key files
    print("üìã Verifying extracted files:\n")
    key_files = [
        'training_scripts/train_konkanivani_asr.py',
        'models/konkanivani_asr.py',
        'data/audio_processing/dataset.py',
        'data/konkani-asr-v0/splits/manifests/train.json'
    ]
    
    for f in key_files:
        status = "‚úÖ" if os.path.exists(f) else "‚ùå"
        print(f"{status} {f}")
    
    # Save project path for next cells
    with open('/tmp/project_path.txt', 'w') as f:
        f.write(project_path)
    
    print("\n‚úÖ Ready for next step!")
else:
    print("‚ùå Could not find project!")
    print("\nSearching...")
    !find /content -name "train_konkanivani_asr.py" -type f 2>/dev/null
    print("\nüí° Tip: The training script should be in training_scripts/ folder")

## üìã Cell 4: Copy Checkpoint & Vocab

**Time**: ~30 seconds

Copies the checkpoint and vocabulary files to the project directory.

In [None]:
import os
import shutil

# Get paths from previous cells
with open('/tmp/folder_path.txt', 'r') as f:
    folder_path = f.read().strip()

with open('/tmp/project_path.txt', 'r') as f:
    project_path = f.read().strip()

os.chdir(project_path)

print("üìã Setting up checkpoint and vocab...\n")

# Create checkpoints directory
os.makedirs('checkpoints', exist_ok=True)

# Copy checkpoint
checkpoint_src = os.path.join(folder_path, 'checkpoint_epoch_15.pt')
checkpoint_dst = 'checkpoints/checkpoint_epoch_15.pt'

if os.path.exists(checkpoint_src):
    print(f"üì• Copying checkpoint...")
    shutil.copy(checkpoint_src, checkpoint_dst)
    size_mb = os.path.getsize(checkpoint_dst) / (1024*1024)
    print(f"‚úÖ Checkpoint ready ({size_mb:.1f} MB)")
else:
    print(f"‚ùå Checkpoint not found!")

# Copy vocab if needed
vocab_src = os.path.join(folder_path, 'vocab.json')
if not os.path.exists('vocab.json') and os.path.exists(vocab_src):
    shutil.copy(vocab_src, 'vocab.json')
    print(f"‚úÖ Copied vocab.json")
elif os.path.exists('vocab.json'):
    print(f"‚úÖ vocab.json already present")
else:
    print(f"‚ö†Ô∏è  vocab.json not found")

print("\n" + "="*70)
print("‚úÖ SETUP COMPLETE! Ready to train.")
print("="*70)

## üöÄ Cell 5: Start Training!

**Time**: ~12 hours

**‚ö†Ô∏è IMPORTANT**: Keep this browser tab open during training!

This will:
- Resume from Epoch 15
- Train until Epoch 50 (35 epochs remaining)
- Save checkpoints every 5 epochs
- Save best model when validation improves

In [None]:
import os

print("="*70)
print("üöÄ STARTING KONKANIVANI ASR TRAINING")
print("="*70)

# Verify checkpoint
checkpoint_path = "checkpoints/checkpoint_epoch_15.pt"
if os.path.exists(checkpoint_path):
    print("\n‚úÖ Resuming from checkpoint_epoch_15.pt")
    print("   Training: Epoch 16 ‚Üí 50 (35 epochs)")
    print("   Estimated time: ~12 hours")
    print("   Using Account B's Colab quota\n")
    resume_flag = f"--resume {checkpoint_path}"
else:
    print("\n‚ö†Ô∏è  Starting from scratch")
    print("   Training: Epoch 1 ‚Üí 50")
    print("   Estimated time: ~20 hours\n")
    resume_flag = ""

print("üìä Configuration:")
print("   ‚Ä¢ GPU: Tesla T4")
print("   ‚Ä¢ Batch size: 2 (gradient accumulation: 4x = effective batch 8)")
print("   ‚Ä¢ Mixed precision: FP16 (saves GPU memory)")
print("   ‚Ä¢ Model: d_model=256, 12 encoder, 6 decoder layers")
print("   ‚Ä¢ Checkpoints: Every 5 epochs")
print("   ‚Ä¢ Data: From shared folder\n")

print("="*70)
print("TRAINING STARTED - KEEP THIS TAB OPEN!")
print("="*70 + "\n")

# Start training
!python3 training_scripts/train_konkanivani_asr.py \
    --train_manifest data/konkani-asr-v0/splits/manifests/train.json \
    --val_manifest data/konkani-asr-v0/splits/manifests/val.json \
    --vocab_file data/vocab.json \
    --batch_size 2 \
    --gradient_accumulation_steps 4 \
    --num_epochs 50 \
    --learning_rate 0.0005 \
    --device cuda \
    --d_model 256 \
    --encoder_layers 12 \
    --decoder_layers 6 \
    --mixed_precision \
    --checkpoint_dir checkpoints \
    --log_dir logs \
    {resume_flag}

## üìä Cell 6: Monitor Progress

**Run this anytime** to check training status.

Shows:
- Saved checkpoints
- GPU usage
- Recent training logs

In [None]:
from datetime import datetime

print(f"üìä Training Status - {datetime.now().strftime('%H:%M:%S')}\n")
print("="*70)

# Checkpoints
print("\nüíæ Saved Checkpoints:\n")
!ls -lth checkpoints/ 2>/dev/null | head -8 || echo "No checkpoints yet"

# GPU
print("\nüî• GPU Usage:\n")
!nvidia-smi --query-gpu=utilization.gpu,memory.used,temperature.gpu --format=csv,noheader

# Logs
print("\nüìù Recent Training Log:\n")
!tail -25 logs/training.log 2>/dev/null || echo "Log not created yet"

print("\n" + "="*70)

## üíæ Cell 7: Backup to Drive

**Run this every 2-3 hours** to save your progress!

Backs up:
- All checkpoints
- Training logs

Saved to: `MyDrive/konkanivani_backups/`

In [None]:
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
backup_path = f"/content/drive/MyDrive/konkanivani_backups/backup_{timestamp}"

print(f"üíæ Backing up to your Drive...\n")
print(f"Location: {backup_path}\n")

!mkdir -p {backup_path}
!cp -r checkpoints/* {backup_path}/ 2>/dev/null
!cp -r logs {backup_path}/ 2>/dev/null

print("‚úÖ Backup complete!\n")
print("üìã Backed up files:\n")
!ls -lh {backup_path}/

print("\nüí° Tip: Run this cell every 2-3 hours to save progress!")

## üì• Cell 8: Download Final Model

**Run this after training completes!**

This will:
1. Package the best model with all needed files
2. Save to your Drive
3. Download to your computer

You'll get a zip file with:
- `best_model.pt` - Your trained model
- `vocab.json` - Vocabulary
- `models/` - Model architecture code
- `inference_konkanivani.py` - Script to use the model

In [None]:
from google.colab import files
from datetime import datetime
import os

print("üì¶ Packaging final model...\n")

# Create package
!mkdir -p final_model

# Copy best model (or latest if best doesn't exist)
if os.path.exists('checkpoints/best_model.pt'):
    !cp checkpoints/best_model.pt final_model/
    print("‚úÖ Using best_model.pt")
else:
    !cp checkpoints/checkpoint_epoch_50.pt final_model/best_model.pt 2>/dev/null
    print("‚úÖ Using checkpoint_epoch_50.pt as best_model.pt")

# Copy supporting files
!cp vocab.json final_model/
!cp -r models final_model/
!cp inference_konkanivani.py final_model/ 2>/dev/null

print("‚úÖ Copied supporting files\n")

# Create zip
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
zip_name = f"konkanivani_final_{timestamp}.zip"
!zip -r {zip_name} final_model/

print(f"\n‚úÖ Created: {zip_name}")
!ls -lh {zip_name}

# Save to Drive
!mkdir -p /content/drive/MyDrive/konkanivani_final_models
!cp {zip_name} /content/drive/MyDrive/konkanivani_final_models/

print(f"\n‚úÖ Saved to Drive!")
print(f"   Location: MyDrive/konkanivani_final_models/{zip_name}")

# Download
print(f"\nüì• Downloading to your computer...")
files.download(zip_name)

print("\n" + "="*70)
print("‚úÖ TRAINING COMPLETE!")
print("="*70)
print("\nYour model is ready to use!")
print("Extract the zip and use inference_konkanivani.py to test it.")

---

## üîß Troubleshooting

### Cell 2: "Files not found"
**Solution**: Make sure you added the shared folder shortcut to "My Drive"
1. Open: https://drive.google.com/drive/folders/1KX7k_z2negFKq3qFjHJh-K1U-MEcNp7P
2. Click "Add shortcut to Drive" (‚≠ê icon at top)
3. Choose "My Drive"
4. Re-run Cell 2

### Cell 5: "Out of memory"
**Solution**: Reduce batch size
- Change `--batch_size 16` to `--batch_size 8`
- Or `--batch_size 4` if still failing

### Training stopped / Runtime disconnected
**Solution**: Just reconnect and resume
1. Reconnect to runtime
2. Re-run Cells 1-4 (quick setup)
3. Re-run Cell 5 (will resume from last checkpoint)

### "Account B also hit limit"
**Solution**: Wait or try alternatives
- Wait 24 hours for quota reset
- Try Kaggle (free GPU): https://kaggle.com
- Consider Colab Pro ($10/month)

---

## üí° Tips

1. **Keep tab open**: Colab disconnects after 90 min of inactivity
2. **Backup regularly**: Run Cell 7 every 2-3 hours
3. **Monitor progress**: Run Cell 6 to check status
4. **GPU usage**: Should be 90-100% during training
5. **Checkpoints**: Automatically saved every 5 epochs

---

## ‚è∞ Timeline

| Step | Time |
|------|------|
| Cell 1: Dependencies | 2 min |
| Cell 2: Mount Drive | 1 min |
| Cell 3: Extract | 3 min |
| Cell 4: Copy files | 30 sec |
| Cell 5: Training | ~12 hours |
| **Total** | **~12 hours** |

---

## ‚úÖ Success Checklist

Before starting Cell 5:
- [ ] Cell 1: GPU shows "Tesla T4"
- [ ] Cell 2: All 3 files found (‚úÖ marks)
- [ ] Cell 3: Project extracted successfully
- [ ] Cell 4: Checkpoint copied (293.9 MB)
- [ ] Ready to train!

---