# Automated Guitar Amp Modelling Training on Mac (Apple Silicon)

This notebook sets up and runs neural network training for guitar amplifier/distortion pedal modelling using PyTorch with MPS (Metal Performance Shaders) support on Apple Silicon Macs.

## Setup Instructions:
1. **Mac Requirements**: Apple Silicon Mac (M1/M2/M3) with macOS 12.3+ 
2. **Python Environment**: Python 3.8+ with PyTorch 2.0+ (MPS support)
3. **Dataset**: Place your training data in the Data folder structure
4. **Run All Cells**: Execute cells in order from top to bottom

## Dataset Structure:
Your Data folder should contain:
```
Data/
  ‚îú‚îÄ‚îÄ train/
  ‚îÇ   ‚îú‚îÄ‚îÄ dls2-input.wav
  ‚îÇ   ‚îî‚îÄ‚îÄ dls2-target.wav
  ‚îú‚îÄ‚îÄ val/
  ‚îÇ   ‚îú‚îÄ‚îÄ dls2-input.wav
  ‚îÇ   ‚îî‚îÄ‚îÄ dls2-target.wav
  ‚îî‚îÄ‚îÄ test/
      ‚îú‚îÄ‚îÄ dls2-input.wav
      ‚îî‚îÄ‚îÄ dls2-target.wav
```

## Performance Notes:
- **MPS (Apple GPU)**: Faster than CPU, but slower than NVIDIA CUDA GPUs
- **Expected Training Time**: ~2-5x slower than T4 GPU on Google Colab
- **Memory**: Uses unified memory (shared with system RAM)
- **Mixed Precision**: Not available on MPS (CUDA only feature)


In [None]:
# Check device availability (MPS/CUDA/CPU)
import torch
import sys
import os

# Add current directory to path for device_utils
sys.path.insert(0, os.getcwd())

print("=" * 60)
print("Device Setup Check")
print("=" * 60)
print(f"PyTorch version: {torch.__version__}")

# Import device utilities
try:
    import device_utils
    print("‚úÖ device_utils module loaded")
except ImportError:
    print("‚ùå device_utils not found! Make sure you're in the mac-mps-version directory")
    raise

# Detect and display device information
device = device_utils.get_device()
device_utils.print_device_info(device)

print("\n" + "=" * 60)


In [None]:
# Install/verify required dependencies
print("Checking and installing dependencies...")

# Check PyTorch version (needs 2.0+ for MPS)
torch_version = torch.__version__
print(f"PyTorch version: {torch_version}")

# Install audio processing libraries if needed
try:
    import numpy
    import scipy
    import matplotlib
    import yaml
    import tqdm
    import librosa
    import psutil
    print("‚úÖ All required libraries already installed")
except ImportError as e:
    print(f"‚ö†Ô∏è  Missing library: {e}")
    print("Installing missing dependencies...")
    %pip install numpy scipy matplotlib pyyaml tqdm librosa psutil tensorboard -q

print("\n‚úÖ Dependencies ready!")


In [None]:
# Verify repository structure and imports
import sys
import os

print("=" * 60)
print("Repository Verification")
print("=" * 60)

# Ensure we're in the correct directory
current_dir = os.getcwd()
print(f"Current directory: {current_dir}")

# Add current directory to Python path
sys.path.insert(0, current_dir)
print(f"‚úÖ Added to Python path: {current_dir}")

# Verify required files exist
required_files = [
    'CoreAudioML',
    'dist_model_recnet_mps.py',
    'device_utils.py',
    'Configs/RNN3.json',
    'Data/train',
    'Data/val',
    'Data/test'
]

print("\nüìã Checking required files:")
for item in required_files:
    exists = os.path.exists(item)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"   {status} {item}")

# Test imports
print("\nüîç Testing module imports...")
try:
    import device_utils
    print("   ‚úÖ device_utils")
    
    import CoreAudioML.miscfuncs as miscfuncs
    print("   ‚úÖ CoreAudioML.miscfuncs")
    
    import CoreAudioML.training as training
    print("   ‚úÖ CoreAudioML.training")
    
    import CoreAudioML.dataset as dataset
    print("   ‚úÖ CoreAudioML.dataset")
    
    import CoreAudioML.networks as networks
    print("   ‚úÖ CoreAudioML.networks")
    
    print("\n‚úÖ All modules imported successfully!")
    
    # Verify device availability
    device = device_utils.get_device()
    print(f"\n‚úÖ Device ready: {device_utils.get_device_name(device)}")
        
except Exception as e:
    print(f"\n‚ùå Import error: {e}")
    print(f"   Current directory: {os.getcwd()}")
    if os.path.exists('CoreAudioML'):
        print(f"   CoreAudioML contents: {os.listdir('CoreAudioML')[:5]}")
    import traceback
    traceback.print_exc()
    raise

print("=" * 60)


In [None]:
# Create necessary directories
import os

dirs_to_create = [
    'Data/train',
    'Data/val', 
    'Data/test',
    'Results',
    'Configs'
]

for dir_path in dirs_to_create:
    os.makedirs(dir_path, exist_ok=True)
    print(f"‚úÖ Created: {dir_path}")

print("\n‚úÖ All directories created!")
print("\nüí° Note: Place your training data WAV files in Data/train/, Data/val/, and Data/test/")


In [None]:
# Create RNN3.json config file optimized for Mac MPS
import json

config = {
    "model": "SimpleRNN",
    "input_size": 1,
    "output_size": 1,
    "num_blocks": 2,
    "hidden_size": 96,  # Good balance for Mac MPS
    "unit_type": "LSTM",
    "skip_con": True,
    "segment_length": 22050,
    "batch_size": 256,  # Smaller batch for Mac (MPS has unified memory)
    "epochs": 100,
    "learn_rate": 0.005,
    "validation_f": 5,
    "validation_p": 20,
    "loss_fcns": {
        "ESR": 0.75,
        "DC": 0.10,
        "HFHinge": 0.15
    },
    "pre_filt": "None",
    "cuda": 1,  # Will use MPS if available, fallback to CPU
    "weight_decay": 0.000001,
    "gradient_clip": 1.0,
    "hf_hinge_fmin": 10000,
    "hf_hinge_strength": 0.5
}

# Save config file
config_path = 'Configs/RNN3.json'
with open(config_path, 'w') as f:
    json.dump(config, f, indent=2)

print(f"‚úÖ Config file created: {config_path}")
print(f"\nüìã Configuration:")
print(f"   Model: {config['model']}")
print(f"   Hidden Size: {config['hidden_size']}")
print(f"   Batch Size: {config['batch_size']} (optimized for Mac)")
print(f"   Epochs: {config['epochs']}")
print(f"   Learning Rate: {config['learn_rate']}")
print(f"\nüí° Tip: If you get memory errors, reduce batch_size to 128 or 64")
print(f"üí° Note: Mixed precision (AMP) is not available on MPS, only on CUDA")


In [None]:
# Verify data files exist
import os

print("Checking data files...")
print("\nüìÅ Data structure:")

for split in ['train', 'val', 'test']:
    split_path = f'Data/{split}'
    if os.path.exists(split_path):
        files = os.listdir(split_path)
        wav_files = [f for f in files if f.endswith('.wav')]
        print(f"\n{split_path}/:")
        if wav_files:
            for f in wav_files:
                file_path = os.path.join(split_path, f)
                size_mb = os.path.getsize(file_path) / (1024 * 1024)
                print(f"   ‚úÖ {f} ({size_mb:.2f} MB)")
        else:
            print(f"   ‚ö†Ô∏è  No WAV files found")
    else:
        print(f"\n{split_path}/: ‚ùå Directory not found")

print("\nüí° Make sure you have dls2-input.wav and dls2-target.wav in each split")


In [None]:
# Start Training on Mac!
# This cell runs the training using MPS (Apple Silicon GPU) if available

import torch
import gc
import device_utils

print("=" * 60)
print("Starting Training on Mac")
print("=" * 60)

# Get device and clear cache
device = device_utils.get_device()
device_utils.clear_cache()

print(f"\n‚úÖ Using device: {device_utils.get_device_name(device)}")
mem_info = device_utils.get_memory_info(device)
if mem_info['available'] and mem_info['total']:
    print(f"   Available memory: {mem_info['total']:.2f} GB")

print("\nüöÄ Starting training...")
print("   This may take a while. Training on Mac is slower than GPU but faster than CPU.")
print("   Monitor Activity Monitor to see GPU usage.")
print("\n" + "=" * 60 + "\n")

# Run training - will automatically use MPS if available
!python dist_model_recnet_mps.py --load_config RNN3 --epochs 100 --device dls2 --cuda 1

# Clean up after training
gc.collect()
device_utils.clear_cache()
print(f"\n‚úÖ Training complete! Memory cleared.")


In [None]:
# Optional: Continue training or run with custom parameters
# Uncomment and modify as needed

# Example 1: Run more epochs
# !python dist_model_recnet_mps.py --load_config RNN3 --epochs 200 --device dls2 --cuda 1

# Example 2: Adjust batch size (if you get memory errors, reduce this)
# !python dist_model_recnet_mps.py --load_config RNN3 --epochs 100 --batch_size 128 --device dls2 --cuda 1

# Example 3: Train with larger model
# !python dist_model_recnet_mps.py --load_config RNN3 --epochs 100 --hidden_size 128 --device dls2 --cuda 1

# Example 4: Force CPU (if MPS causes issues)
# !python dist_model_recnet_mps.py --load_config RNN3 --epochs 100 --device dls2 --cuda 0

print("üí° Tips for Mac Training:")
print("   - MPS (Apple GPU) is faster than CPU but slower than NVIDIA GPUs")
print("   - If you get memory errors, reduce batch_size to 128 or 64")
print("   - Monitor Activity Monitor ‚Üí Window ‚Üí GPU History to see GPU usage")
print("   - Training checkpoints are saved every 10 epochs in Results/")
print("   - Mixed precision (AMP) is not available on MPS - this is normal")


In [None]:
# Check training results
import os

results_dir = 'Results'
if os.path.exists(results_dir):
    print("üìä Training Results:")
    print("=" * 60)
    
    for item in os.listdir(results_dir):
        item_path = os.path.join(results_dir, item)
        if os.path.isdir(item_path):
            print(f"\nüìÅ {item}/")
            files = os.listdir(item_path)
            important_files = [f for f in files if f.endswith(('.json', '.wav', '.txt'))]
            for f in important_files[:10]:  # Show first 10 important files
                print(f"   - {f}")
            if len(files) > 10:
                print(f"   ... and {len(files) - 10} more files")
else:
    print("‚ö†Ô∏è  Results directory not found. Training may not have completed yet.")


## Mac-Specific Tips:

1. **Performance**: MPS is faster than CPU but expect ~2-5x slower than NVIDIA GPUs
2. **Memory**: Uses unified memory - monitor Activity Monitor for memory usage
3. **Batch Size**: Start with 256, reduce to 128 or 64 if you get memory errors
4. **GPU Monitoring**: Use Activity Monitor ‚Üí Window ‚Üí GPU History
5. **No Mixed Precision**: AMP is CUDA-only, MPS will train in FP32 (this is normal)

## Troubleshooting:

- **Import errors**: Make sure you're in the mac-mps-version directory
- **MPS not available**: Requires macOS 12.3+ and PyTorch 2.0+
- **Memory errors**: Reduce batch_size in config or use --cuda 0 to force CPU
- **Slow training**: This is normal - Mac MPS is slower than dedicated NVIDIA GPUs

## Performance Comparison:

- **T4 GPU (Colab)**: ~15-30 min for 100 epochs (hidden_size=96)
- **M3 Max MPS**: ~45-90 min for 100 epochs (estimated)
- **CPU**: ~2-4 hours for 100 epochs

## Next Steps:

1. Check Results/ folder for trained models
2. Test inference with proc_audio.py on new audio
3. Experiment with different configurations
4. Consider using Google Colab for faster training if speed is critical
