# Maya1 TTS Model Test Notebook

This notebook tests the Maya1 text-to-speech model from Maya Research - a state-of-the-art speech model for expressive voice generation with emotional intelligence.

## Maya1 Model Features
- **Model**: `maya-research/maya1` (3B parameters)
- **Natural Language Voice Control**: Describe voices like briefing a voice actor
- **Emotion Tags**: 20+ emotions (laugh, cry, whisper, angry, etc.)
- **Streaming Audio**: Real-time voice synthesis with SNAC neural codec
- **Multi-accent English**: Supports various English accents and speaking styles
- **Production Ready**: Single GPU deployment, vLLM compatible


In [1]:
!pip install accelerate


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [None]:
!pip install snac


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Import required libraries
import os
import sys
import logging
import numpy as np
import soundfile as sf
import IPython.display as ipd
from pathlib import Path
import time
import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from snac import SNAC

# Setup logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

print("Maya1 TTS Model Test Notebook")
print("=============================")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")


  from .autonotebook import tqdm as notebook_tqdm


Maya1 TTS Model Test Notebook
PyTorch version: 2.5.1+cu121
CUDA available: True
CUDA device: NVIDIA GeForce RTX 3090 Ti
CUDA memory: 23.6 GB


In [None]:

import time

In [None]:
# Add app to the python path
import sys
import os
sys.path.append(os.path.abspath('app'))


In [None]:
# Initialize Maya1 TTS service
from maya1_service import Maya1TTSService
print("Initializing Maya1 TTS Service...")
start_time = time.time()

try:
    # Maya1TTSService initializes the model automatically in constructor
    maya1_service = Maya1TTSService()
    
    init_time = time.time() - start_time
    print(f"\n✅ Maya1 Service ready in {init_time:.2f} seconds")
    print(f"Supported emotions: {len(maya1_service.get_supported_emotions())}")
    print(f"Emotion tags: {', '.join(maya1_service.get_supported_emotions()[:10])}...")
        
except Exception as e:
    print(f"❌ Error initializing Maya1 service: {e}")
    import traceback
    traceback.print_exc()


2025-11-18 14:57:42,131 - DEBUG - (logger = None, model_name = 'maya-research/maya1')
2025-11-18 14:57:42,131 - DEBUG - Loading config from maya1_config.yaml
2025-11-18 14:57:42,133 - DEBUG - (): Config = {'max_new_tokens': 2048, 'temperature': 0.7, 'top_k': 50, 'top_p': 0.9, 'device': 'auto', 'torch_dtype': 'bfloat16', 'snac_model': 'hubertsiuzdak/snac_24khz', 'sample_rate': 24000, 'max_audio_duration': 30.0, 'expected_wpm': 150, 'model_name': 'maya-research/maya1'}
2025-11-18 14:57:42,133 - DEBUG - (): Model name: maya-research/maya1
2025-11-18 14:57:42,133 - DEBUG - Initializing Maya1TTSService with model: maya-research/maya1
2025-11-18 14:57:42,133 - INFO - Starting Maya1 model initialization
2025-11-18 14:57:42,134 - INFO - Loading Maya1 tokenizer...


Initializing Maya1 TTS Service...


2025-11-18 14:57:49,416 - INFO - Loading Maya1 model...
2025-11-18 14:57:50,449 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.53it/s]
2025-11-18 14:57:51,906 - INFO - Loading SNAC audio codec...
  state_dict = torch.load(model_path, map_location="cpu")
2025-11-18 14:57:52,223 - INFO - ✅ Maya1 model initialized successfully in 10.09 seconds
2025-11-18 14:57:52,224 - INFO - Model: maya-research/maya1
2025-11-18 14:57:52,224 - INFO - Parameters: ~3B
2025-11-18 14:57:52,225 - INFO - Device: cuda:0



✅ Maya1 Service ready in 10.09 seconds
Supported emotions: 21
Emotion tags: laugh, laugh_harder, sigh, whisper, angry, giggle, chuckle, gasp, cry, snort...


## 1. Maya1 Model Implementation

Implement the Maya1 TTS service based on the Hugging Face model documentation.


## 2. Initialize Maya1 Service

Initialize the Maya1 TTS service and load the model.


In [None]:
# Initialize Maya1 TTS service
print("Initializing Maya1 TTS Service...")
start_time = time.time()

try:
    # Maya1TTSService initializes the model automatically in constructor
    maya1_service = Maya1TTSService()
    
    init_time = time.time() - start_time
    print(f"\n✅ Maya1 Service ready in {init_time:.2f} seconds")
    print(f"Supported emotions: {len(maya1_service.get_supported_emotions())}")
    print(f"Emotion tags: {', '.join(maya1_service.get_supported_emotions()[:10])}...")
        
except Exception as e:
    print(f"❌ Error initializing Maya1 service: {e}")
    import traceback
    traceback.print_exc()


2025-11-18 14:57:52,233 - DEBUG - (logger = None, model_name = 'maya-research/maya1')
2025-11-18 14:57:52,234 - DEBUG - Loading config from maya1_config.yaml
2025-11-18 14:57:52,235 - DEBUG - (): Config = {'max_new_tokens': 2048, 'temperature': 0.7, 'top_k': 50, 'top_p': 0.9, 'device': 'auto', 'torch_dtype': 'bfloat16', 'snac_model': 'hubertsiuzdak/snac_24khz', 'sample_rate': 24000, 'max_audio_duration': 30.0, 'expected_wpm': 150, 'model_name': 'maya-research/maya1'}
2025-11-18 14:57:52,236 - DEBUG - (): Model name: maya-research/maya1
2025-11-18 14:57:52,236 - DEBUG - Initializing Maya1TTSService with model: maya-research/maya1
2025-11-18 14:57:52,237 - INFO - Starting Maya1 model initialization
2025-11-18 14:57:52,237 - INFO - Loading Maya1 tokenizer...


Initializing Maya1 TTS Service...


2025-11-18 14:57:59,428 - INFO - Loading Maya1 model...
2025-11-18 14:57:59,505 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.61it/s]
2025-11-18 14:58:00,932 - INFO - Loading SNAC audio codec...
2025-11-18 14:58:01,223 - INFO - ✅ Maya1 model initialized successfully in 8.99 seconds
2025-11-18 14:58:01,223 - INFO - Model: maya-research/maya1
2025-11-18 14:58:01,223 - INFO - Parameters: ~3B
2025-11-18 14:58:01,224 - INFO - Device: cuda:0



✅ Maya1 Service ready in 9.01 seconds
Supported emotions: 21
Emotion tags: laugh, laugh_harder, sigh, whisper, angry, giggle, chuckle, gasp, cry, snort...


## 3. Test Basic Speech Generation

Test basic text-to-speech generation with natural language voice descriptions.


In [None]:
# Test basic speech generation
import soundfile as sf
import IPython.display as ipd

test_text = "Hello, this is a test of the Maya1 text-to-speech system with natural voice design."
voice_description = "Female, in her 30s with an American accent, warm and friendly tone, clear diction"

print(f"Testing basic speech generation...")
print(f"Text: {test_text}")
print(f"Voice: {voice_description}")

try:
    start_time = time.time()
    audio_data, sample_rate = maya1_service.generate_speech(
        text=test_text,
        description=voice_description
    )
    generation_time = time.time() - start_time
    
    print(f"\n✅ Speech generated successfully in {generation_time:.2f} seconds")
    print(f"Audio length: {len(audio_data)} samples")
    print(f"Sample rate: {sample_rate} Hz")
    print(f"Duration: {len(audio_data) / sample_rate:.2f} seconds")
    
    # Save and play audio
    output_file = "maya1_basic_test.wav"
    sf.write(output_file, audio_data, sample_rate)
    print(f"Audio saved to: {output_file}")
    
    # Display audio player
    display(ipd.Audio(audio_data, rate=sample_rate))
    
except Exception as e:
    print(f"❌ Speech generation failed: {e}")
    import traceback
    traceback.print_exc()


2025-11-18 14:58:01,252 - DEBUG - Generating speech: 'Hello, this is a test of the Maya1 text-to-speech ...' with voice: 'Female, in her 30s with an American accent, warm a...'


Testing basic speech generation...
Text: Hello, this is a test of the Maya1 text-to-speech system with natural voice design.
Voice: Female, in her 30s with an American accent, warm and friendly tone, clear diction


2025-11-18 14:58:10,778 - DEBUG - Level 0 tensor shape: torch.Size([1, 67]), values: 67
2025-11-18 14:58:10,778 - DEBUG - Level 1 tensor shape: torch.Size([1, 134]), values: 134
2025-11-18 14:58:10,779 - DEBUG - Level 2 tensor shape: torch.Size([1, 268]), values: 268



✅ Speech generated successfully in 10.25 seconds
Audio length: 137216 samples
Sample rate: 24000 Hz
Duration: 5.72 seconds
Audio saved to: maya1_basic_test.wav


## 4. Test Emotion Tags

Test Maya1's inline emotion tags for expressive speech generation.


In [None]:
# Test emotion tags
voice_description = "Female, in her 30s with an American accent, expressive and dynamic"

emotion_tests = [
    ("Neutral", "This is a normal speech without any emotions."),
    ("Laughter", "This is so funny <laugh> I can't stop laughing!"),
    ("Whisper", "I have a secret to tell you <whisper> can you hear me?"),
    ("Excitement", "I'm so excited about this <giggle> it's going to be amazing!"),
    ("Sadness", "I'm really disappointed about this <cry> it didn't work out."),
    ("Surprise", "Oh my goodness <gasp> I can't believe this happened!"),
    ("Anger", "I'm really frustrated with this situation <angry> this is unacceptable!"),
    ("Relief", "Finally, it's over <sigh> I'm so relieved right now."),
    ("Multiple Emotions", "I was so worried <nervous> but then <laugh> everything worked out perfectly!"),
    ("Snort", "Hey <snort> pay up little pay pigs!")
]

print(f"Testing {len(emotion_tests)} emotion variations...\n")

for emotion_name, text in emotion_tests:
    print(f"Testing: {emotion_name}")
    print(f"Text: {text}")
    
    try:
        start_time = time.time()
        audio_data, sample_rate = maya1_service.generate_speech(
            text=text,
            description=voice_description
        )
        generation_time = time.time() - start_time
        
        print(f"  ✅ Generated in {generation_time:.2f}s")
        
        # Save audio
        output_file = f"maya1_emotion_{emotion_name.lower().replace(' ', '_')}.wav"
        sf.write(output_file, audio_data, sample_rate)
        
        # Display audio player
        display(ipd.Audio(audio_data, rate=sample_rate))
        print()
        
    except Exception as e:
        print(f"  ❌ Failed: {e}")
        print()


2025-11-18 14:59:20,649 - DEBUG - Generating speech: 'This is a normal speech without any emotions....' with voice: 'Female, in her 30s with an American accent, expres...'


Testing 10 emotion variations...

Testing: Neutral
Text: This is a normal speech without any emotions.


2025-11-18 14:59:25,571 - DEBUG - Level 0 tensor shape: torch.Size([1, 36]), values: 36
2025-11-18 14:59:25,572 - DEBUG - Level 1 tensor shape: torch.Size([1, 72]), values: 72
2025-11-18 14:59:25,573 - DEBUG - Level 2 tensor shape: torch.Size([1, 144]), values: 144


  ✅ Generated in 4.97s


2025-11-18 14:59:25,627 - DEBUG - Generating speech: 'This is so funny <laugh> I can't stop laughing!...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Laughter
Text: This is so funny <laugh> I can't stop laughing!


2025-11-18 14:59:33,700 - DEBUG - Level 0 tensor shape: torch.Size([1, 57]), values: 57
2025-11-18 14:59:33,701 - DEBUG - Level 1 tensor shape: torch.Size([1, 114]), values: 114
2025-11-18 14:59:33,702 - DEBUG - Level 2 tensor shape: torch.Size([1, 228]), values: 228


  ✅ Generated in 8.09s


2025-11-18 14:59:33,735 - DEBUG - Generating speech: 'I have a secret to tell you <whisper> can you hear...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Whisper
Text: I have a secret to tell you <whisper> can you hear me?


2025-11-18 15:00:13,935 - DEBUG - Level 0 tensor shape: torch.Size([1, 292]), values: 292
2025-11-18 15:00:13,936 - DEBUG - Level 1 tensor shape: torch.Size([1, 584]), values: 584
2025-11-18 15:00:13,937 - DEBUG - Level 2 tensor shape: torch.Size([1, 1168]), values: 1168


  ✅ Generated in 40.24s


2025-11-18 15:00:14,000 - DEBUG - Generating speech: 'I'm so excited about this <giggle> it's going to b...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Excitement
Text: I'm so excited about this <giggle> it's going to be amazing!


2025-11-18 15:00:54,053 - DEBUG - Level 0 tensor shape: torch.Size([1, 292]), values: 292
2025-11-18 15:00:54,054 - DEBUG - Level 1 tensor shape: torch.Size([1, 584]), values: 584
2025-11-18 15:00:54,055 - DEBUG - Level 2 tensor shape: torch.Size([1, 1168]), values: 1168


  ✅ Generated in 40.09s


2025-11-18 15:00:54,120 - DEBUG - Generating speech: 'I'm really disappointed about this <cry> it didn't...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Sadness
Text: I'm really disappointed about this <cry> it didn't work out.


2025-11-18 15:01:34,668 - DEBUG - Level 0 tensor shape: torch.Size([1, 292]), values: 292
2025-11-18 15:01:34,669 - DEBUG - Level 1 tensor shape: torch.Size([1, 584]), values: 584
2025-11-18 15:01:34,670 - DEBUG - Level 2 tensor shape: torch.Size([1, 1168]), values: 1168


  ✅ Generated in 40.58s


2025-11-18 15:01:34,724 - DEBUG - Generating speech: 'Oh my goodness <gasp> I can't believe this happene...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Surprise
Text: Oh my goodness <gasp> I can't believe this happened!


2025-11-18 15:01:44,040 - DEBUG - Level 0 tensor shape: torch.Size([1, 68]), values: 68
2025-11-18 15:01:44,041 - DEBUG - Level 1 tensor shape: torch.Size([1, 136]), values: 136
2025-11-18 15:01:44,042 - DEBUG - Level 2 tensor shape: torch.Size([1, 272]), values: 272


  ✅ Generated in 9.33s


2025-11-18 15:01:44,069 - DEBUG - Generating speech: 'I'm really frustrated with this situation <angry> ...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Anger
Text: I'm really frustrated with this situation <angry> this is unacceptable!


2025-11-18 15:02:24,165 - DEBUG - Level 0 tensor shape: torch.Size([1, 292]), values: 292
2025-11-18 15:02:24,166 - DEBUG - Level 1 tensor shape: torch.Size([1, 584]), values: 584
2025-11-18 15:02:24,167 - DEBUG - Level 2 tensor shape: torch.Size([1, 1168]), values: 1168


  ✅ Generated in 40.13s


2025-11-18 15:02:24,234 - DEBUG - Generating speech: 'Finally, it's over <sigh> I'm so relieved right no...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Relief
Text: Finally, it's over <sigh> I'm so relieved right now.


2025-11-18 15:02:29,264 - DEBUG - Level 0 tensor shape: torch.Size([1, 37]), values: 37
2025-11-18 15:02:29,265 - DEBUG - Level 1 tensor shape: torch.Size([1, 74]), values: 74
2025-11-18 15:02:29,266 - DEBUG - Level 2 tensor shape: torch.Size([1, 148]), values: 148


  ✅ Generated in 5.05s


2025-11-18 15:02:29,305 - DEBUG - Generating speech: 'I was so worried <nervous> but then <laugh> everyt...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Multiple Emotions
Text: I was so worried <nervous> but then <laugh> everything worked out perfectly!


2025-11-18 15:02:39,751 - DEBUG - Level 0 tensor shape: torch.Size([1, 75]), values: 75
2025-11-18 15:02:39,752 - DEBUG - Level 1 tensor shape: torch.Size([1, 150]), values: 150
2025-11-18 15:02:39,753 - DEBUG - Level 2 tensor shape: torch.Size([1, 300]), values: 300


  ✅ Generated in 10.46s


2025-11-18 15:02:39,788 - DEBUG - Generating speech: 'Hey <snort> pay up little pay pigs!...' with voice: 'Female, in her 30s with an American accent, expres...'



Testing: Snort
Text: Hey <snort> pay up little pay pigs!


2025-11-18 15:03:19,833 - DEBUG - Level 0 tensor shape: torch.Size([1, 292]), values: 292
2025-11-18 15:03:19,834 - DEBUG - Level 1 tensor shape: torch.Size([1, 584]), values: 584
2025-11-18 15:03:19,835 - DEBUG - Level 2 tensor shape: torch.Size([1, 1168]), values: 1168


  ✅ Generated in 40.08s





## 5. Test Character Voices

Test Maya1's ability to create distinct character voices as shown in the documentation examples.


In [None]:
# Test character voices from Maya1 documentation examples
character_tests = [
    {
        "name": "Energetic Event Host",
        "description": "Female, in her 30s with an American accent and is an event host, energetic, clear diction",
        "text": "Wow. This place looks even better than I imagined. How did they set all this up so perfectly? The lights, the music, everything feels magical. I can't stop smiling right now."
    },
    {
        "name": "Dark Villain",
        "description": "Dark villain character, Male voice in their 40s with a British accent, low pitch, gravelly timbre, slow pacing, angry tone at high intensity",
        "text": "Welcome back to another episode of our podcast! <laugh_harder> Today we are diving into an absolutely fascinating topic"
    },
    {
        "name": "Calm Narrator",
        "description": "Male, late 20s, neutral American, warm baritone, calm pacing",
        "text": "In a world where technology advances at breakneck speed, we must remember to pause and appreciate the simple moments that make life meaningful."
    }
]

print(f"Testing {len(character_tests)} character voices...\n")

for character in character_tests:
    print(f"Testing: {character['name']}")
    print(f"Description: {character['description']}")
    print(f"Text: {character['text'][:100]}{'...' if len(character['text']) > 100 else ''}")
    
    try:
        start_time = time.time()
        audio_data, sample_rate = maya1_service.generate_speech(
            text=character['text'],
            description=character['description']
        )
        generation_time = time.time() - start_time
        
        print(f"  ✅ Generated in {generation_time:.2f}s, Duration: {len(audio_data) / sample_rate:.2f}s")
        
        # Save audio
        output_file = f"maya1_character_{character['name'].lower().replace(' ', '_')}.wav"
        sf.write(output_file, audio_data, sample_rate)
        
        # Display audio player
        print(f"  Character: {character['name']}")
        display(ipd.Audio(audio_data, rate=sample_rate))
        print()
        
    except Exception as e:
        print(f"  ❌ Failed: {e}")
        print()


## 6. Performance Benchmarks

Benchmark Maya1 performance with different text lengths and complexity.


In [None]:
# Performance benchmarks
voice_description = "Female, in her 30s with an American accent, clear and natural"

benchmark_texts = [
    ("Short", "Hello world."),
    ("Medium", "This is a medium length sentence for testing Maya1 speech generation performance with natural voice design."),
    ("Long", "This is a much longer text that contains multiple sentences and should test Maya1's ability to handle extended speech generation. It includes various words, punctuation marks, and natural language patterns to simulate real-world usage scenarios. The model should maintain consistent voice quality throughout the entire passage."),
    ("Emotional", "I'm so excited <giggle> about this new technology! It's absolutely amazing <laugh> how natural it sounds. Sometimes I get a bit nervous <whisper> about AI, but this is incredible. I can't believe <gasp> how expressive it is!")
]

print(f"Maya1 Performance Benchmarks\n")
print(f"Voice: {voice_description}\n")

results = []

for category, text in benchmark_texts:
    print(f"Testing {category} text ({len(text)} characters):")
    print(f"Text: {text[:100]}{'...' if len(text) > 100 else ''}")
    
    try:
        start_time = time.time()
        audio_data, sample_rate = maya1_service.generate_speech(
            text=text,
            description=voice_description
        )
        generation_time = time.time() - start_time
        audio_duration = len(audio_data) / sample_rate
        
        # Calculate metrics
        chars_per_second = len(text) / generation_time
        real_time_factor = audio_duration / generation_time
        
        result = {
            'category': category,
            'text_length': len(text),
            'generation_time': generation_time,
            'audio_duration': audio_duration,
            'chars_per_second': chars_per_second,
            'real_time_factor': real_time_factor
        }
        results.append(result)
        
        print(f"  ✅ Generated in {generation_time:.2f}s")
        print(f"  Audio duration: {audio_duration:.2f}s")
        print(f"  Speed: {chars_per_second:.1f} chars/sec")
        print(f"  Real-time factor: {real_time_factor:.2f}x")
        
        # Save audio
        output_file = f"maya1_benchmark_{category.lower()}.wav"
        sf.write(output_file, audio_data, sample_rate)
        print()
        
    except Exception as e:
        print(f"  ❌ Failed: {e}")
        print()

# Performance summary
if results:
    print("\n📊 Maya1 Performance Summary:")
    print("Category\t\tChars\tGen Time\tAudio Dur\tChars/sec\tRT Factor")
    print("-" * 70)
    for r in results:
        print(f"{r['category']:<15}\t{r['text_length']:<8}\t{r['generation_time']:<8.2f}\t{r['audio_duration']:<8.2f}\t{r['chars_per_second']:<8.1f}\t{r['real_time_factor']:<8.2f}")
    
    avg_chars_per_sec = sum(r['chars_per_second'] for r in results) / len(results)
    avg_rt_factor = sum(r['real_time_factor'] for r in results) / len(results)
    print(f"\nAverage: {avg_chars_per_sec:.1f} chars/sec, {avg_rt_factor:.2f}x real-time")
    print(f"\n🎯 Maya1 shows {'excellent' if avg_rt_factor > 1.0 else 'good'} real-time performance!")


## 7. Maya1 Model Information Summary

Display comprehensive information about the Maya1 model and its capabilities.


In [None]:
# Maya1 model information summary
print("🔍 Maya1 Model Information Summary")
print("=" * 50)

print(f"Model Name: {maya1_service.model_name}")
print(f"Parameters: ~3B")
print(f"Architecture: Llama-style transformer with SNAC codec")
print(f"Audio Quality: 24 kHz, mono")
print(f"Streaming Rate: ~0.98 kbps")
print(f"License: Apache 2.0 (Open Source)")

# Check system resources
print(f"\n💻 System Information:")
print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA Device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
    print(f"Memory Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"Memory Cached: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")

# Model capabilities
print(f"\n🎭 Voice Design Capabilities:")
print(f"  • Natural language descriptions (no complex parameters)")
print(f"  • Age, gender, accent, and character specification")
print(f"  • Pitch, pace, and tone control")
print(f"  • Character and role-based voices")

print(f"\n😊 Emotion Support ({len(maya1_service.get_supported_emotions())} emotions):")
emotions = maya1_service.get_supported_emotions()
for i in range(0, len(emotions), 6):
    emotion_line = ", ".join(emotions[i:i+6])
    print(f"  {emotion_line}")

print(f"\n🎯 Use Cases:")
use_cases = [
    "Game Character Voices",
    "Podcast & Audiobook Production", 
    "AI Voice Assistants",
    "Video Content Creation",
    "Customer Service AI",
    "Accessibility Tools",
    "Interactive Storytelling",
    "Educational Content"
]
for use_case in use_cases:
    print(f"  • {use_case}")

print(f"\n🔧 Technical Features:")
print(f"  • Single GPU deployment (16GB+ VRAM recommended)")
print(f"  • vLLM compatible for production scaling")
print(f"  • Real-time streaming with sub-100ms latency")
print(f"  • SNAC neural codec for efficient audio encoding")
print(f"  • Multi-scale hierarchical audio representation")
print(f"  • Automatic prefix caching for efficiency")

print(f"\n🌟 Key Differentiators:")
print(f"  ✅ Only open-source model with 20+ emotions")
print(f"  ✅ Zero-shot voice design (no training samples needed)")
print(f"  ✅ Production-ready streaming architecture")
print(f"  ✅ Natural language voice control interface")
print(f"  ✅ Commercial-friendly Apache 2.0 license")

print(f"\n✅ Maya1 model test completed successfully!")
print(f"\n📚 For more information:")
print(f"  • Hugging Face: https://huggingface.co/maya-research/maya1")
print(f"  • Maya Research: https://mayaresearch.ai")
print(f"  • Documentation: Model card and examples available on HF")
