# üéµ LoFi Music Generator - GPU Training on Colab

**Train your LoFi music AI model with FREE GPU!**

This notebook will:
- ‚úÖ Use Google's FREE GPU (100x faster than CPU)
- ‚úÖ Train on 178k MIDI files from Lakh dataset
- ‚úÖ Save trained model for download
- ‚úÖ Complete in 8-12 hours (not 43 days!)

---

## ‚ö° IMPORTANT: Enable GPU First!

1. Click **Runtime** ‚Üí **Change runtime type**
2. Select **T4 GPU** or **GPU** from Hardware accelerator
3. Click **Save**

**Then run the cells below in order!**

## üì¶ Step 1: Setup Environment

In [None]:
# Check GPU is available
import torch
print(f"üîç GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("‚ùå NO GPU! Go to Runtime ‚Üí Change runtime type ‚Üí Select GPU")

In [None]:
# Clone your repository
!git clone https://github.com/andy-regulore/lofi.git
%cd lofi

# Checkout the correct branch
!git checkout claude/add-wav-upload-support-018ZUMDKbSCBaAvMDiN7XXU9

In [None]:
# Install dependencies
print("üì¶ Installing dependencies (this takes 3-5 minutes)...")
!pip install -q torch transformers datasets accelerate
!pip install -q miditok miditoolkit pretty_midi
!pip install -q librosa soundfile scipy numpy pandas
!pip install -q pyyaml scikit-learn tqdm tensorboard
print("‚úÖ Dependencies installed!")

## üìÇ Step 2: Get Training Data

**Choose ONE option:**
- **Option A:** Download Lakh MIDI Dataset (176k files, ~20GB)
- **Option B:** Upload your own MIDI files from Google Drive

In [None]:
# OPTION A: Download Lakh MIDI Dataset (recommended)
print("üì• Downloading Lakh MIDI Dataset (~20GB, takes 10-20 minutes)...")
!mkdir -p data/training
!wget -q --show-progress http://hog.ee.columbia.edu/craffel/lmd/lmd_full.tar.gz
print("\nüì¶ Extracting dataset...")
!tar -xzf lmd_full.tar.gz -C data/training/
!rm lmd_full.tar.gz
print("‚úÖ Dataset ready!")

# Count files
import os
midi_count = sum(1 for root, dirs, files in os.walk('data/training') 
                 for f in files if f.endswith(('.mid', '.midi')))
print(f"\nüéµ Found {midi_count:,} MIDI files")

In [None]:
# OPTION B: Use Google Drive (skip if you used Option A)
# Uncomment if you have MIDI files in Google Drive

# from google.colab import drive
# drive.mount('/content/drive')

# # Copy from your Google Drive to Colab
# !mkdir -p data/training
# !cp -r /content/drive/MyDrive/your-midi-folder/* data/training/

# # Count files
# import os
# midi_count = sum(1 for root, dirs, files in os.walk('data/training') 
#                  for f in files if f.endswith(('.mid', '.midi')))
# print(f"üéµ Found {midi_count:,} MIDI files")

## üöÄ Step 3: Train the Model!

This will:
1. Tokenize all MIDI files (1-2 hours)
2. Train GPT-2 model (6-10 hours)
3. Save trained model

**Total time: 8-12 hours with GPU**

In [None]:
# Run training script
print("üöÄ Starting training...\n")
print("This will take 8-12 hours. You can close this tab and come back later.")
print("Colab will email you when the runtime disconnects (after 12 hours max).\n")

# First, tokenize all MIDI files
!python scripts/01_tokenize.py --config config.yaml --midi-dir data/training

# Build dataset
!python scripts/02_build_dataset.py --config config.yaml

# Train model
!python scripts/03_train.py --config config.yaml

## üìä Step 4: Monitor Training (Optional)

Run this in a separate cell to check progress

In [None]:
# Load TensorBoard to monitor training
%load_ext tensorboard
%tensorboard --logdir models/lofi-gpt2/logs

## üíæ Step 5: Download Trained Model

In [None]:
# Zip the trained model
!zip -r trained_lofi_model.zip models/lofi-gpt2/

# Download to your computer
from google.colab import files
files.download('trained_lofi_model.zip')

print("‚úÖ Model downloaded!")
print("\nUnzip this file and place in your local lofi/models/ directory")

## üéµ Step 6: Test Generation (Optional)

In [None]:
# Generate a test track
!python scripts/04_generate.py \
    --config config.yaml \
    --model-path models/lofi-gpt2 \
    --output-dir output/test \
    --num-tracks 1 \
    --mood chill \
    --tempo 75

print("\n‚úÖ Generated test track in output/test/")

# Download the generated MIDI file
from google.colab import files
import os
midi_file = [f for f in os.listdir('output/test/midi') if f.endswith('.mid')][0]
files.download(f'output/test/midi/{midi_file}')

## ‚úÖ Next Steps

After training completes:

1. **Download the model** (Step 5 above)
2. **Unzip** on your local machine
3. **Place in** `lofi/models/lofi-gpt2/`
4. **Generate music** using your local web UI!

---

### üéØ Tips:

- **Colab disconnects after 12 hours** - training will pause. Just re-run the training cell with `--resume` flag
- **Want to continue later?** Mount Google Drive and save checkpoints there
- **Training too long?** Reduce epochs in config.yaml (line 44)

---

### ‚ö° Alternative: Reduce Training Size

If you want faster results for testing, edit `config.yaml`:

```yaml
training:
  num_epochs: 10  # Instead of 50
  batch_size: 8   # Increase if you have GPU RAM
```

Or limit MIDI files by only extracting part of the dataset.