# Rythm AI 1.2 Europa - Training Setup on Google Colab

This notebook sets up and runs the training for Rythm AI model using Google Colab's free GPU.

## Steps:
1. Check GPU availability
2. Clone repository and setup environment
3. Install dependencies
4. Configure and start training

## 1. Check GPU Availability
First, let's verify we have GPU access and see what GPU we're working with.

In [None]:
!nvidia-smi

In [None]:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")

## 2. Clone Repository and Setup
Now let's clone your repository. First, create a GitHub repository and push your code there.
(Replace YOUR_USERNAME and REPO_NAME with your actual GitHub details)

In [None]:
!git clone https://github.com/YOUR_USERNAME/REPO_NAME.git
%cd REPO_NAME

## 3. Install Dependencies
Install all required packages for training

In [None]:
!pip install torch transformers wandb sentencepiece regex tiktoken tqdm

## 4. Reduce Model Size for Initial Testing
Let's modify the model configuration for initial testing with fewer parameters

In [None]:
from backend.rythm_model_architecture import RythmConfig

# Create a smaller config for testing
test_config = RythmConfig(
    vocab_size=128000,
    hidden_size=768,  # Reduced from 5120
    intermediate_size=3072,  # Reduced from 14336
    num_hidden_layers=12,  # Reduced from 48
    num_attention_heads=12,  # Reduced from 40
    num_key_value_heads=12,  # Adjusted accordingly
    max_position_embeddings=2048  # Reduced from 32768 for testing
)

# Save the test config
import json
with open('test_config.json', 'w') as f:
    json.dump(vars(test_config), f, indent=2)

## 5. Start Training
Now let's run the training with the smaller model configuration

In [None]:
from backend.train_rythm_model import TrainingConfig, RythmTrainer

# Create training configuration
training_config = TrainingConfig(
    model_name="rythm-europa-test",
    batch_size=8,  # Increased for GPU
    micro_batch_size=2,
    learning_rate=2e-4,
    num_epochs=3,
    max_seq_length=2048,  # Reduced for testing
    output_dir="./checkpoints",
    use_wandb=False,
    use_mixed_precision=True,
    gradient_checkpointing=True,
    model_config=test_config
)

# Create trainer and start training
trainer = RythmTrainer(training_config)
trainer.train()

## 6. Monitor Training
The training progress will be displayed above. You can monitor:
- Loss values
- Learning rate changes
- Training speed (samples/second)

The model checkpoints will be saved in the `checkpoints` directory.

## Important Notes:

1. This notebook uses a smaller model configuration for initial testing. Once everything works, you can gradually increase the model size.

2. Colab sessions have time limits (usually 12 hours). For longer training:
   - Save checkpoints frequently
   - Use `wandb` to track progress
   - Resume training from checkpoints

3. To train the full 8B model, you'll need:
   - Multiple training sessions
   - Gradient checkpointing
   - Careful memory management
   - Possibly Colab Pro for better GPUs